Did the query you sent to the database take longer than it should to arrive? The source of this problem may be a ‘Cartesian Explosion’.
When developing a project, we may need different data in different areas of the project. We can meet our needs by creating database queries for these needs. Sometimes we create raw SQL queries, and sometimes we use various ORM tools such as Entity Framework and Dapper. These tools create queries according to our needs and increase our management ability on the database.
So, are the SQL queries produced by the ORM tools we use always the most ideal queries?
In some cases, not.
Let’s start with an example if you want.
For example, I have more than one blog page, and each of these blog pages contains posts. Similarly, these blog pages contain contributors. If we need to obtain data from these tables that are related to each other, the code example we can use is as follows.
var allInformations = blogContext.Blogs
.Include( b => b.Posts)
.Include( b => b.Contributors)
.ToList();
With this code, we can have all the information we need, right? If you want, let’s look at the SQL equivalent of this code:
SELECT
b.Id AS BlogId,
b.Name AS BlogName,
p.Id AS PostId,
p.BlogId AS PostBlogId,
p.Title AS PostTitle,
c.Id AS ContributorId,
c.Name AS ContributorName
FROM Blogs b
LEFT JOIN Posts p ON b.Id = p.BlogId
LEFT JOIN BlogContributors bc ON b.Id = bc.BlogId
LEFT JOIN Contributors c ON bc.ContributorId = c.Id;
When we examine the SQL query, the tables related to the Blog are connected to each other via the ‘Id’ value with LEFT JOIN. If you remember, I used the sentence ‘I have more than one blog page, and each of these blog pages contains posts. Similarly, these blog pages contain contributors.’ Let’s try to visualize this sentence and see what the table equivalents will be like:

Let’s assume that there are 3 different Blogs in our project.

The posts on the blogs.

Contributors on my blogs.

Since there is a many to many relationship between Blog and Contributor, the intermediate table
Our tables were like this. If you want, let’s look at our query again and combine our tables:
SELECT
b.Id AS BlogId,
b.Name AS BlogName,
p.Id AS PostId,
p.BlogId AS PostBlogId,
p.Title AS PostTitle,
c.Id AS ContributorId,
c.Name AS ContributorName
FROM Blogs b
LEFT JOIN Posts p ON b.Id = p.BlogId
LEFT JOIN BlogContributors bc ON b.Id = bc.BlogId
LEFT JOIN Contributors c ON bc.ContributorId = c.Id;
First, let’s perform the join operation between the Blogs table and the Post table:
LEFT JOIN Posts p ON b.Id = p.BlogId
We can say that the following table will be created with the left join here:

Now let’s continue with the other join process:
LEFT JOIN BlogContributors bc ON b.Id = bc.BlogId
With this left join operation, the table we have will look like this:

We have only one left left to do:
LEFT JOIN Contributors c ON bc.ContributorId = c.Id;
After our last left join operation, the table corresponding to our query will be as follows:

When we run our query, the data we will pull from the database to the application side will be as follows. Entity Framework will receive this information for us, interpret it within itself and present it to us in the most appropriate way. Everything is fine, but there is a situation like this:
When we examine the table, we notice something, repeating data..

However, we only wanted the blog information, the post information in the blogs and the contributor information in the blocks.
I wanted to, but I didn’t want to a few times?
To avoid this problem, we use the AsSplitQuery() method.
var allInformations = blogContext.Blogs
.Include(b => b.Posts)
.Include(b => b.Contributors)
.AsSplitQuery()
.ToList();
When we run this code, the SQL query that will go to the database will be as follows:
SELECT [b].[Id], [b].[Name]
FROM [Blogs] AS [b]
SELECT [p].[Id], [p].[BlogId], [p].[Title]
FROM [Posts] AS [p]
WHERE EXISTS (
SELECT 1
FROM [Blogs] AS [b]
WHERE [b].[Id] = [p].[BlogId]
)
SELECT [c].[Id], [c].[Name], [bc].[BlogId]
FROM [Contributors] AS [c]
INNER JOIN [BlogContributors] AS [bc] ON [c].[Id] = [bc].[ContributorId]
WHERE EXISTS (
SELECT 1
FROM [Blogs] AS [b]
WHERE [b].[Id] = [bc].[BlogId]
)
When this query is run (actually 3 separate queries), there will be no copying of the data obtained. The information sent from the database will be interpreted by the Entity Framework and presented to us in the most appropriate way.
What will the resulting data be like after using AsSplitQuery()?
Data obtained before using AsSplitQuery():

Data obtained after using AsSplitQuery():

I have shown the differences in the obtained data in the same table to represent them. Since more than one SQL query will be executed, more than one table information will be sent to the application side.
As we can see, the dark colored data was not sent from the database to the application side because it was a copy. For this reason, our processing speed increased. After all, we got rid of the unnecessary data carrying burden.

I kept the number of data in each table low to be able to represent it in the table. The functionality of the AsSplitQuery() method is directly proportional to the amount of data in the tables and tables that are worked with. The more data, the more performance increase.
Of course, just as there are places where every method should be used, there are also places where it should not be used.
When to Use AsSplitQuery()?
- Cartesian Explosion) Risk:
When many related entities are pulled and there is a risk of Cartesian Explosion in the database, it may be more efficient to use AsSplitQuery().
In this case, separate queries are sent for related data and performance can be improved when pulling large datasets.
2. Big Datasets:
If the number of related entities is large and large data sets are being pulled, using AsSplitQuery() can lighten the load on the database server.
This can improve query performance and reduce memory usage.
3. Database Optimization:

In some cases, database optimization and performance testing may determine that using AsSplitQuery() yields better results. In these cases, database administrators and developers may prefer to use AsSplitQuery().
When Not to Use AsSplitQuery()?
- Small Datasets:
If the amount of data to be retrieved is small and the number of related entities is small, using AsSplitQuery() may be unnecessary.
In this case, it may be simpler and faster to retrieve the data with a single query.
2. Number of Database Calls:
When AsSplitQuery() is used, multiple queries are sent to the database, which increases the total number of database calls.
Excessive database calls can cause performance issues and increase network traffic.
3. Data Integrity:
If data integrity in the database needs to be maintained and related data needs to be retrieved with a single query, using AsSplitQuery() may not be appropriate.
Retrieving related data with a single query can better ensure data consistency.
In this article, I wanted to write down what I learned about the Cartesian Explosion problem and the AsSplitQuery() method that solves this problem. Since I just learned it, there may be mistakes, please excuse me :)
I wish you a good reading…
Comments 0
You cannot comment because you are not logged in.
Log in to comment.