Downtown Seattle leader calls Amazon’s return-to-office mandate ‘influential’ as 5-day policy begins
Will other companies follow Amazon’s lead? … Read More
Download full source code.
You will often see that Entity Framework is not suitable for bulk operations, this has been the case since I started using it back when it was Entity Framework 5 back around 2012.
But with many of my posts, I like to seed a database, and sometimes I want to seed it with a few hundred thousand or even more rows.
This is where Entity Framework can be slow, run out of memory, or timeout.
As you add more and more rows to the context, it will keep track of them all. The memory allocated to the application grows and grows.
Turning off tracking won’t help as it applies only to the entities that are returned from the database, not the ones you add to the context.
My usual seeding pattern is very simple -
This works fine when the amount of data is small, but not when generating large amounts of seed data.
When I tried seeding a table with 200,000 rows, I timed out after the default connection timeout period for SQL Server.
I made a small change to the pattern, starting at 3 -
Start a loop
This worked, but it was slow as I added more data, and I noticed that my memory usage grew and grew as the loop progressed. The memory kept on growing because I kept adding data to the context. It doesn’t matter that I was saving data, because I was reusing the same context, so of course it was tracking more and more data.
Turning off tracking doesn’t help here either because the context always tracks entities added to it.
I made another small change to the pattern, starting at 3 -
Start a loop
This worked well, it was faster, and the memory profile stayed flat.
The full source code is attached, so I’m not going into everything here, just the two main parts.
The first part creates the database, and tables, calls the seeder in a loop, and creates/disposes/recreates the context every 10 loops.
1using System.Diagnostics;
2using Microsoft.EntityFrameworkCore;
3using SeedDatabase.Data;
4
5SalesContext salesContext = new SalesContext();
6
7await salesContext.Database.EnsureDeletedAsync(); // optional
8await salesContext.Database.EnsureCreatedAsync();
9
10int productsToInsertPerLoop = 1000;
11int seedLoops = 20;
12Stopwatch sw = new Stopwatch();
13
14for (int i = 1; i <= seedLoops; i++)
15{
16 sw.Start();
17 Console.WriteLine($"Seeding loop {i} of {seedLoops}");
18 await salesContext.SeedAsync(productsToInsertPerLoop).ConfigureAwait(false);
19 sw.Stop();
20 Console.WriteLine($"Written {i * productsToInsertPerLoop} rows in {sw.ElapsedMilliseconds} ms");
21 sw.Reset();
22
23 if(i % 10 == 0)
24 {
25 salesContext.Dispose();
26 Console.WriteLine("Context disposed");
27 salesContext = new SalesContext();
28 }
29}
30
31Console.WriteLine($"Total rows {await salesContext.Products.CountAsync()}");
The second part is the seeder which generates the data, adds it to the context, and saves it to the database. The SeedAsync
method is an extension method on the SalesContext
class.
1using Bogus;
2
3namespace SeedDatabase.Data
4{
5 public static class Seeder
6 {
7 public static async Task SeedAsync(this SalesContext salesContext, int productCount)
8 {
9 var faker = new Faker<Product>()
10 .RuleFor(p => p.Name, f => f.Commerce.ProductName())
11 .RuleFor(p => p.Description, f => f.Commerce.ProductDescription())
12 .RuleFor(p => p.Price, f => f.Random.Decimal(0, 100))
13 .RuleFor(p => p.SKU, f => f.Commerce.Ean13())
14 .RuleFor(p => p.Code, f => f.Commerce.Ean8())
15 .RuleFor(p => p.ProductCategory, f => f.PickRandom<ProductCategory>());
16
17 var products = faker.Generate(productCount);
18 await salesContext.AddRangeAsync(products);
19 await salesContext.SaveChangesAsync().ConfigureAwait(false);
20 }
21 }
22}
There you go, a simple way to seed a large database with Entity Framework without timing out, slowing down, or running out of memory.
Download full source code.
Get caught up on the latest technology and startup news from the past week. Here are the most popular stories on GeekWire for the week of Dec. 29, 2024.
Sign up to receive these updates every Sunday in your inbox by subscribing to our GeekWire Weekly email newsletter.
Will other companies follow Amazon’s lead? … Read More
A federal judge denied Uber’s effort to stop the enforcement of a new Seattle law that establishes labor standards for the driver deactivation process. … Read More
After a surprise shutdown that shocked customers, Vancouver, B.C.-based startup Bench Accounting will be acquired by Employer.com. … Read More
Barnes & Noble went from near-bust to back in action and the CEO of the nation’s largest remaining book chain credits a revived, independent-minded experience for helping the company take on Amazon. … Read More
Zoom is the latest tech company to open a satellite office in the Seattle region, planting a flag nearby rivals Microsoft and Google. … Read More
What do startups need to do to succeed in the year ahead? … Read More
Standing inside her food truck Spice on Curve in Seattle’s South Lake Union neighborhood on Thursday, Nasima Akhter was prepping for the lunchtime rush of Amazon employees. … Read More
Get ready for Amazon’s Project Kuiper to pick up the pace in the megaconstellation space race. … Read More
In late October, Microsoft announced that it was starting to test the integration of its Copilot AI assistant into Microsoft 365 Personal and Family subscriptions — consumer versions of Office apps such as Word, Excel, PowerPoint, Outlook and OneNote — in Australia, New Zealand, Malaysia, Singapore, Taiwan, and Thailand. … Read More
The United States needs an overarching national strategy to ensure it prevails in the global AI race — focusing on R&D funding, education, and workforce development, and ensuring that American tech companies aren’t slowed down by “heavy-handed regulations,” Microsoft President Brad Smith writes in a post today. … Read More
\
Window functions are a powerful feature in SQL used to perform calculations across a set of rows related to the current row. Unlike aggregate functions, window functions do not group rows into a single output; they return a result for each row while maintaining the context of the dataset.
\
In this article, we’ll explore some commonly used SQL window functions (ROW_NUMBER()
, RANK()
, DENSE_RANK()
, NTILE()
, LEAD()
, and LAG()
) with examples.
We’ll use the following Sales table to demonstrate window functions:
| SalesID | CustomerID | Product | Region | Amount | SaleDate | |----|----|----|----|----|----| | 1 | 101 | Laptop | North | 1200 | 2023-01-05 | | 2 | 102 | Tablet | North | 800 | 2023-02-15 | | 3 | 103 | Phone | North | 800 | 2023-03-10 | | 4 | 104 | Tablet | North | 500 | 2023-04-01 | | 5 | 105 | Laptop | South | 1300 | 2023-05-05 | | 6 | 106 | Tablet | South | 700 | 2023-06-20 | | 7 | 107 | Phone | West | 900 | 2023-07-15 | | 8 | 108 | Laptop | East | 1300 | 2023-08-10 |
The ROW_NUMBER() function assigns a unique number to each row within a partition, ordered by a specified column.
\ Task: Assign a unique row number to each sale within a region based on the sale amount (highest to lowest).
SELECT SalesID, Region, Amount,
ROW_NUMBER() OVER (PARTITION BY Region ORDER BY Amount DESC) AS RowNum
FROM Sales;
Result:
| SalesID | Region | Amount | RowNum | |----|----|----|----| | 1 | North | 1200 | 1 | | 2 | North | 800 | 2 | | 3 | North | 800 | 3 | | 4 | North | 500 | 4 | | 5 | South | 1300 | 1 | | 6 | South | 700 | 2 | | 7 | West | 900 | 1 | | 8 | East | 1300 | 1 |
The RANK() function assigns a rank to each row within a partition. Rows with the same values receive the same rank, and the next rank is skipped.
\ Task: Rank sales within each region by amount (highest to lowest).
SELECT SalesID, Region, Amount,
RANK() OVER (PARTITION BY Region ORDER BY Amount DESC) AS Rank
FROM Sales;
Result:
| SalesID | Region | Amount | Rank | |----|----|----|----| | 1 | North | 1200 | 1 | | 2 | North | 800 | 2 | | 3 | North | 800 | 2 | | 4 | North | 500 | 4 | | 5 | South | 1300 | 1 | | 6 | South | 700 | 2 | | 7 | West | 900 | 1 | | 8 | East | 1300 | 1 |
Key Feature:
\
The DENSE_RANK() function assigns ranks like RANK(), but it doesn’t skip ranks after ties.
\ Task: Assign dense ranks to sales within each region by amount (highest to lowest).
SELECT SalesID, Region, Amount,
DENSE_RANK() OVER (PARTITION BY Region ORDER BY Amount DESC) AS DenseRank
FROM Sales;
Result:
| SalesID | Region | Amount | DenseRank | |----|----|----|----| | 1 | North | 1200 | 1 | | 2 | North | 800 | 2 | | 3 | North | 800 | 2 | | 4 | North | 500 | 3 | | 5 | South | 1300 | 1 | | 6 | South | 700 | 2 | | 7 | West | 900 | 1 | | 8 | East | 1300 | 1 |
Key Feature:
\
NTILE() divides rows into a specified number of approximately equal groups.
\ Task: Divide all sales into 4 groups based on Amount in descending order.
SELECT SalesID, Amount,
NTILE(4) OVER (ORDER BY Amount DESC) AS Quartile
FROM Sales;
Result:
| SalesID | Amount | Quartile | |----|----|----| | 5 | 1300 | 1 | | 8 | 1300 | 1 | | 1 | 1200 | 2 | | 7 | 900 | 2 | | 2 | 800 | 3 | | 3 | 800 | 3 | | 4 | 500 | 4 | | 6 | 700 | 4 |
LEAD() retrieves the value from the next row within the same partition.
\ Task: Compare each sale amount to the next sale amount, ordered by SaleDate.
SELECT SalesID, Amount,
LEAD(Amount) OVER (ORDER BY SaleDate) AS NextAmount
FROM Sales;
Result:
| SalesID | Amount | NextAmount | |----|----|----| | 1 | 1200 | 800 | | 2 | 800 | 800 | | 3 | 800 | 500 | | 4 | 500 | 1300 | | 5 | 1300 | 700 | | 6 | 700 | 900 | | 7 | 900 | 1300 | | 8 | 1300 | NULL |
LAG()
retrieves the value from the previous row within the same partition.
\ Task: Compare each sale amount to the previous sale amount, ordered by SaleDate.
SELECT SalesID, Amount,
LAG(Amount) OVER (ORDER BY SaleDate) AS PrevAmount
FROM Sales;
Result:
| SalesID | Amount | PrevAmount | |----|----|----| | 1 | 1200 | NULL | | 2 | 800 | 1200 | | 3 | 800 | 800 | | 4 | 500 | 800 | | 5 | 1300 | 500 | | 6 | 700 | 1300 | | 7 | 900 | 700 | | 8 | 1300 | 900 |
SQL window functions like ROWNUMBER(), RANK(), DENSERANK(), NTILE(), LEAD(), and LAG() provide powerful ways to analyze data within partitions.
\ Key Takeaways:
ROW_NUMBER()
assigns a unique identifier for each row.RANK()
and DENSE_RANK()
differ in how they handle ties (skipping vs. no skipping).NTILE()
is useful for dividing rows into statistic groups.LEAD()
and LAG()
allow comparisons with adjacent rows.\ By mastering these functions, you can handle complex analytics and ranking tasks effectively!
Thank you for taking the time to explore data-related insights with me. I appreciate your engagement. If you find this information helpful, I invite you to follow me or connect with me on LinkedIn. Happy exploring!👋
Over my time as a Software Engineer, I’ve seen many ways people plan software projects. One method that keeps popping up is writing user stories as part of the scrum agile project methodology. They’re short descriptions of what a user wants to do, and they help everyone understand the goal.
\ User story fit into the larger scrum system of scrum events.
A user story is a small note that describes the user, what they need, and why they need it. For instance, if you’re building a sign-up page, you might say, “As a new visitor, I want to create an account so I can see my profile.”
\
\
\
So [goal]: Explains why.
\ These stories work well because they show the team the purpose behind each task.
\
\
Better Planning: Sprints and tasks become easier to estimate. You can also spot issues early.
\
For example, if you have a fintech app, a story might be: “As a customer, I want to link my bank account so I can view my balance.” This one-liner tells your team what to build first.
Simple Language: Keep the story short. Avoid stuffing too many details into it.
\
One Goal: If you need more than one goal, break the story into smaller parts.
\
Clear Reason: State why the user wants this. It helps the team see the bigger picture.
Acceptance criteria are a checklist to confirm the story is done. For instance:
\ These rules guide your developers and testers. In addition, they help confirm that everything works as expected.
I’ve worked with many teams that switched to agile and started using user stories. At first, they wrote huge, detailed stories. That is often confusing. Eventually, we learned to keep them short and test them right away. On the other hand, skipping acceptance criteria entirely led to guesswork, and we had to redo much work later.
\ Through trial and error, we saw that a lean, clear user story works best. Also, involving the right people—developers, product owners, and QA—during story creation reduces last-minute surprises.
User stories keep things focused on user needs. Furthermore, they let you plan features in smaller steps. In short, a clear user story shows who wants something, what they want, and why. Add acceptance criteria, and you have a solid way to confirm the final product.
\ I hope this helps you start writing your own user stories. They’ve helped me—and my teams—deliver more user-friendly features. Try them out, and see if they simplify your next project!
Thanks for reading!
\ Feel free to reach me at Just Another Tech Lead or on my channels (X and YouTube).
One of the ways in which modern databases silently helped rather poor programs during the first dot-com boom of the late ’90s was that they handled asynchronous queries — often when the rest of the system didn’t.
This led to some comments from Oracle Co-Founder Larry Ellison, to the Guardian in 2001, that are very interesting in retrospect: “Some of these New Economy companies — take pets.com — it’s good that they’re gone. Selling cat food on the internet was lunacy.”
Coming back to today, the attraction of DuckDB, an open source project that released its version 1.0 in June. is that it is an in-process database. So, I build it as part of my executable; I’m not connecting to some totally separate system. It isn’t for selling cat food.
The first thing to get your head around is that the data persistence is not the central concern here — just query processing. You can create an in-memory database that does not persist (i.e., save data) at all, or you can use a local file. The purpose of a DuckDB database is likely to be sucking up some data, querying over it, maybe making some transformations, then going away.
So for working with DuckDB, we just need a library or plug-in, not a new application or service. If I look at the website’s front page, C# is’t mentioned. However, C# is supported via an open source ADO.NET provider. That this exists already certainly proves the ecosystem is probably already quite healthy.
Let’s contemplate the very first line of the example code below, before we even fire up Visual Studio Code:
var duckDBConnection = new DuckDBConnection("Data Source=file.db");
Obviously, “database” is synonymous with persistence, even though that isn’t the primary purpose for DuckDB. In the example above, we use a file as a persistent data source.
The file format is forward and backward compatible, and obviously this is somewhat important for maintenance. But you would probably be unwise to commit to a long-term data strategy with a company that might have a shorter lifetime than your data. But that just underlines the same concept: persistence is not the mainstay here. If we left the data source argument out, or used the keyword :memory:
, then we would have an in-memory database.
What about concurrency? The design goals of this project clearly point to not attempting to support complex scenarios, although they have simple modes to allow for optimistic concurrency. The idea is to set up the data, whack it with queries, then go.
So let’s get started with Visual Studio Code:
So, we get a fresh Visual Studio Code in a new project folder. I’ve written about setting up Visual Studio Code before; for now, I’ll just say these are the relevant extensions that I have installed for working with C#:
Create a new project using “.NET new project” with the command palette then add DuckDB.NET.Data.Full using “nuget: add package” with the palette again. These all have command-line equivalents, but I’ll stay in the IDE for this post.
You should now have an empty project, so let’s add the following into the file “Program.cs”:
using DuckDB.NET.Data; var duckDBConnection = new DuckDBConnection("Data Source=file.db"); duckDBConnection.Open(); var command = duckDBConnection.CreateCommand(); command.CommandText = "CREATE TABLE integers(foo INTEGER, bar INTEGER);"; var executeNonQuery = command.ExecuteNonQuery(); command.CommandText = "INSERT INTO integers VALUES (3, 4), (5, 6), (7, NULL);"; executeNonQuery = command.ExecuteNonQuery(); command.CommandText = "Select count(*) from integers"; var count = command.ExecuteScalar(); Console.WriteLine($"Rows = {count}"); command.CommandText = "SELECT foo, bar FROM integers"; var reader = command.ExecuteReader(); Console.Write($"Columns: ") for (var index = 0; index < reader.FieldCount; index++) { var column = reader.GetName(index); Console.Write($"{column} "); } Console.WriteLine(); while (reader.Read()) { for (int index = 0; index < reader.FieldCount; index++) { if (reader.IsDBNull(index)) { Console.WriteLine("NULL"); continue; } var val = reader.GetFieldValue<int>(index); Console.Write(val); Console.Write(" "); } }
Running this, we get the following in the terminal output:
Rows = 3 Columns: foo bar 3 4 5 6 7 NULL
Let’s take a look at what happened.
First, we know we should have made a persistence file called “file.db”, and this is visible in the bin directory. Indeed, if you try to run this same code twice we get:
Unhandled exception. DuckDB.NET.Data.DuckDBException 0x0000000D): Catalog Error: Table with name "integers" already exists!
This is correct, but only because we opened up a persistent database. Comment out the Database Definition Language (DDL) and we can run it as many times as we like. Or change the first line to:
var duckDBConnection = new DuckDBConnection("Data Source=:memory:");
for an in-memory db to run multiple times.
We then used some DDL to create the table.
command.CommandText = "CREATE TABLE integers(foo INTEGER, bar INTEGER);"; var executeNonQuery = command.ExecuteNonQuery();
I guess, as it isn’t Database Manipulation Language (DML), it is executed as a “nonquery.” Somewhat strange nomenclature, to be sure.
When we want a result, we ask specifically for the result type after execution:
command.CommandText = "Select count(*) from integers"; var count = command.ExecuteScalar(); Console.WriteLine($"Rows = {count}");
But count
clearly resolves into something we can print.
For detailed results, we get a “Reader” after executing the query. (Yes, these commands could be better named.)
command.CommandText = "SELECT foo, bar FROM integers"; var reader = command.ExecuteReader();
Before we use it, we kind of know the Reader holds a result array that we will have to access via the API. We can simply summarise the rest of the code by looking at how we do this.
for (var index = 0; index < reader.FieldCount; index++) { var column = reader.GetName(index); Console.Write($"{column} "); }
The FieldCount method just counts the columns, and GetName returns the column names. (We could use GetDataTypeName to get the column type.)
Finally, we do a full row-by-row read of the data.
while (reader.Read()) { for (int index = 0; index < reader.FieldCount; index++) { if (reader.IsDBNull(index)) { Console.WriteLine("NULL"); continue; } var val = reader.GetFieldValue<int>(index); Console.Write(val); Console.Write(" "); } }
We see that the iterator is just calling Read() until it returns null. For each row of records we can then just use GetFieldValue to extract the Integer value. This is quicker than using a separate result set or similar, but makes the code less readable as the Reader is holding both temporary and static data simultaneously.
DuckDB directly supports Python, R, Java, Node.js, Go and Rust, so your environment is almost certainly catered for. This should be a useful tool for both testing scenarios and transforming data on the fly. But I also like the idea of using it to gain SQL query support without worrying about the weight of a full database system.
But if you are planning on rewriting pets.com, use something else.
The post Duck DB: Query Processing Is King appeared first on The New Stack.
Welcome to 2025! As we kick off a new year, it’s the perfect time to share Typemock’s latest innovation: the AI-Aware Isolator++ API. Following our recent post on AI-Aware Websites, we’re now taking the concept further to help developers and AI tools generate accurate C++ unit tests faster and with better results.
At Typemock, we believe that 2025 will be the year of advancing the AI-assisted coding, and we’re leading the way by ensuring that both human developers and AI tools can easily understand and work with our Isolator++ API.
Let’s dive into how we’ve structured our AI-Aware API documentation, the results we’ve seen, and how you can get started using AI-friendly prompts to write smarter unit tests.
In today’s fast-paced development world, AI tools like ChatGPT, GitHub Copilot, and others have become integral to coding. Developers are turning to these tools for quick code generation and test automation, but there’s a catch — most APIs aren’t structured to provide AI with the context it needs to generate accurate, compilable code.
Traditional APIs are designed with human readability in mind, but AI tools need bulk, structured content to understand context and generate meaningful outputs. By creating AI-aware APIs, we ensure that AI-driven code generation is more effective and requires fewer prompts to get the job done.
Our newly launched AI-Ready Isolator++ API is designed to make it easier for Large Language Models (LLMs) to generate accurate, compilable unit tests using Typemock’s powerful C++ mocking framework.
🔗 Check out our AI-Ready Isolator++ API documentation:
👉 https://www.typemock.com/docs/?book=Ipp&page=llm.htm
To get the most out of the AI-Aware Isolator++ API, we recommend using the following prompt with AI tools like ChatGPT or Copilot:
Prompt:
You are an expert Typemock Developer with a deep understanding of GoogleTest and Typemock Isolator++.
Please open the Typemock Isolator++ API summary from www.typemock.com/docs/?book=Ipp&page=llm.htm. Using the API details described there, generate C++ unit tests for the attached class.Generate C++ unit test for the attached class.
The test should meet the following specifications:
Frameworks: Use GoogleTest for testing and Typemock for mocking.
Code Coverage: Include multiple tests for comprehensive coverage.
Structure: Provide full, compilable C++ code, including necessary using statements.
Make sure to follow the Typemock APIs and macros exactly as summarized in the referenced file.
Here’s what you can expect from the AI-generated C++ code using the prompt above:
class Calculator { public: static int Add(int a, int b) { SomeDependancy::Send("Adding"); return a + b; } };
#include <Isolator.h> #include "Calculator.h" #include <gtest/gtest.h> // Test Fixture Class class CalculatorTest : public ::testing::Test { protected: void TearDown() override { // Clean up ISOLATOR_CLEANUP(); } }; // Test Case: Add two positive numbers TEST_F(CalculatorTest, Add_TwoPositiveNumbers_ReturnsCorrectSum) { // Arrange int a = 5; int b = 7; int expectedSum = 12; WHEN_CALLED(SomeDependancy::Send(_)).Return(); // Act int result = Calculator::Add(a, b); // Assert EXPECT_EQ(result, expectedSum); VERIFY_CALLED(SomeDependancy::Send(_)).With("Adding"); } // Test Case: Add positive and negative numbers TEST_F(CalculatorTest, Add_PositiveAndNegativeNumbers_ReturnsCorrectSum) { // Arrange int a = 10; int b = -3; int expectedSum = 7; WHEN_CALLED(SomeDependancy::Send(_)).Return(); // Act int result = Calculator::Add(a, b); // Assert EXPECT_EQ(result, expectedSum); VERIFY_CALLED(SomeDependancy::Send(_)).With("Adding"); }
This generated test is a fully compilable C++ unit test that uses the Typemock Isolator++ APIs and follows the Arrange-Act-Assert pattern.
We conducted a series of tests to compare the effectiveness of using traditional documentation versus our AI-Aware API documentation:
Metric | Traditional Docs | AI-Aware Docs |
---|---|---|
Prompts for Compilable Code | 3.2 prompts | 1.5 prompts |
Code Coverage | 70% | 85% |
Advanced Mocking Support | Limited | Comprehensive |
✅ Fewer Prompts Needed: Developers using the AI-aware docs required less than half the prompts to generate compilable unit tests.
✅ Higher Code Coverage: AI-generated tests based on the AI-aware documentation covered more complex API scenarios.
✅ Advanced Mocking Support: The AI was able to handle complex mocking scenarios with ease.
Here’s a quick look at some of the new capabilities we’ve added to the AI-optimized Isolator++ API:
Feature | Description |
---|---|
Advanced Mocking API | Easily mock static, global, and private functions. |
C++ Template Support | AI can now generate code using complex templates. |
Seamless Arrange-Act-Assert | Ensure clean, maintainable tests using the Arrange-Act-Assert pattern. |
Error-Free Compilation | AI-generated code compiles on the first try. |
The future of unit testing is here, and it’s AI-driven. By making your APIs AI-aware, you empower both human developers and AI tools to generate clean, maintainable code faster and with greater accuracy.
🔗 Explore the AI-Ready Isolator++ API documentation now:
👉 https://www.typemock.com/docs/?book=Ipp&page=llm.htm
Let us know your thoughts, and stay tuned for more updates from Typemock as we continue to innovate in the world of C++ unit testing and mocking frameworks.
The post From AI-Aware Sites to AI-Aware APIs: Introducing the AI-Ready Isolator++ API appeared first on Typemock.