LINQ to SQL: Single Data Context or Multiple Data Contexts?

One of the biggest questions that often arises when a LINQ to SQL newbie begins to think about how he/she is going to architect the data layer is whether to split the data model into multiple data contexts or put the entire model into a single context. This question is usually due to concerns about performance and/or difficulty of maintaining the model. The purpose of this article is to debunk those myths and explain why using at most one data context per strongly connected group of tables is not only the best solution, it is the only reasonable solution.

Myths About Large Data Contexts

Myth #1: Large data contexts have low performance

The first time a LINQ to SQL data context is created, the data model is loaded into memory: table mappings are created, foreign key relationships are analyzed, etc. This seems like a lot of overhead…and it is! But it only happens once in an application’s lifecycle! After the data model is loaded for the first time, it is cached in a static variable in the application’s memory. Subsequent instantiations of the data context use this cached variable and load almost instantly. To prove this, I installed two of Microsoft’s sample databases: Northwind (13 tables) and AdventureWorks (70 tables). I then ran a program that instantiated each data context 100,000 times in order to magnify any performance difference between the two contexts and recorded the total time. If the size of the data context affects performance, then the AdventureWorks context should have a higher time since it has 5x more tables than Northwind. Here is the code:

class Program
{
    const int MaxIterations = 100000;

    static void Main(string[] args)
    {
        DateTime start = DateTime.Now;

        for (int i = 0; i < MaxIterations; i++)
        {
            NorthwindDataContext nwdc = new NorthwindDataContext();
        }

        Console.WriteLine("Northwind: " + DateTime.Now.Subtract(start).TotalSeconds + " seconds");

        start = DateTime.Now;

        for (int i = 0; i < MaxIterations; i++)
        {
            AdventureWorksDataContext awdc = new AdventureWorksDataContext();
        }

        Console.WriteLine("AdventureWorks: " + DateTime.Now.Subtract(start).TotalSeconds + " seconds");
    }
}

…and here are the results:

Northwind: 3.025173 seconds
AdventureWorks: 3.0411739 seconds

After 100,000 iterations, they only had a total difference of 16 milliseconds!

Myth #2: Large data contexts are more difficult to maintain

OK, so I can’t disprove this one objectively. Everyone will have a different opinion about this. However, I have worked with LINQ to SQL on many different projects and have had the opportunity to manage small data models, large data models in one data context, and large data models split into multiple contexts. I’ve seen it all, folks…and here’s what I learned:

  1. The designer does not suffer from performance problems as the data context gets more complex.
  2. Although it is slightly more inconvenient to scroll around the designer to edit a larger model, the alternative (splitting it into multiple contexts) is more than just inconvenient: it causes extreme code complications and introduces additional load on the database (these will be discussed in more detail later).
  3. While large data contexts do result in more intellisense options, I have never found this to be a problem in practice. If you have a database with a large and complex schema, you should be splitting logical sections of the database into separate schemas in order to avoid name collisions and to make it easier to understand. If you are doing this, then you can name your LINQ to SQL entities like this: Schema_TableName (example: Person_PersonPerson_AddressOrder_OrderOrder_OrderDetail, etc.). Although this deviates from standard naming conventions a little bit, it is extremely practical, and it makes navigating the intellisense list easy and convenient.

Myth #3: Using multiple data contexts improves modularity

This is not actually a myth since multiple contexts improve modularity by definition, but I am listing it as a myth because 99% of the applications out there have absolutely nothing to gain from this. The only reason this argument would be valid is if you are selling a product that does not always ship with all of its features. For example, if you are selling a blogging application and your customers have to pay extra for the “reader comments” module, then you might benefit from putting the comment-related entities in a separate data context and not shipping it to the people who don’t pay for it. However, you will suffer from increased code complexity, so make sure you think this through.

If your application does not fit this scenario, then what do you possibly stand to gain from modularity? If you truly need this modularity, then the tables you’re worried about probably shouldn’t be sharing the same database to begin with.

Problems Caused by Multiple Data Contexts

Problem #1: No cross-context joins

This is easily the biggest problem with using multiple data contexts. Take the following schema as an example:

User.User(Id, Username, Password)
User.Profile(UserId, FavoriteMusic, FavoriteMovies, FavoriteGames, About)
User.Friend(UserId, FriendId)
Blog.Entry(Id, UserId, Title, Content)
Blog.Comment(Id, BlogId, UserId, Text)

Imagine that you’ve split your data contexts by schema, so you have a UserDataContext that contains the User.User, User.Profile, and User.Friend tables, and a BlogDataContext that contains the Blog.Entry and Blog.Comment tables. Let’s try to print a list of comments along with the username of the commentors. Pretty simple, right? Not with multiple data contexts. Here is the code to do so:

// Get the ID of the blog we're looking at
int blogId = int.Parse(args[0]);

// Get a list of comments to display.
// Notice the call to ToList(). I added this to prevent the query from running
// multiple times later.
var comments;
using (BlogDataContext blogContext = new BlogDataContext()) {
    comments = blogContext.Comments.Where(comment => comment.BlogId == blogId).ToList();
}

// We need to display the USERNAME of the commentors, which is not accessible
// by the BlogDataContext, so now we need to get a list of users from the UserDataContext.
// Notice the use of ToList() again. I had to do this because otherwise the
// UserDataContext would try to compile my comments query as part of a dynamic
// SQL query and an exception would result since it is part of another data context.
var users;
using (UserDataContext userContext = new UserDataContext()) {
    users = userContext.Users.Where(user => comments.Select(comment => comment.UserId).ToList().Contains(user.Id)).ToList();
}

// Now I have a list of comments and a list of users. I need to pull them together
// into one list so I can print them. Notice that the users list will be iterated once
// for each comment. This algorithm is O(comments*users).
var commentsToPrint = comments.Select(comment => new { Username = users.Single(user => user.UserId == comment.UserId), Text = comment.Text);

// Print each comment
foreach (var comment in commentsToPrint) {
    Console.WriteLine(comment.Username + " said: " + comment.Text);
}

You should be completely turned off by that code. It is extremely verbose, difficult to read…and just completely ridiculous. Now take a look at the single context version:

// Get the ID of the blog we're looking at
int blogId = int.Parse(args[0]);

using (MyDataContext myContext = new MyDataContext()) {
    // Get a list of comments to display.
    // Notice the Schema_TableName naming convention.
    var comments = from comment in myContext.Blog_Comments
                   where comment.BlogId == blogId
                   select new {
                       Username = comment.User_User.Username,
                       Text = comment.Text
                   };

    // Print the comments. This algorithm is O(comments).
    foreach (var comment in comments) {
        Console.WriteLine(comment.Username + " said: " + comment.Text);
    }
}

This is much more elegant and very easy to follow even if you have little understanding of LINQ to SQL.

Problem #2: Increased database load

Consider the examples from the previous section. The multiple context example uses two nearly simultaneous database connections, two queries, and multiple iterations through the “users” result set. The single context example uses one connection, one query, and one iteration through the results to print them. Clearly, using multiple contexts results in a higher load on the database and on the application server.

Conclusion

The LINQ to SQL data context was designed to be a one-to-one mapping with your database. Using it in any other way is counter to its design and will result in architectural problems. Your application might benefit from splitting your database into multiple contexts as long as the tables represented by those contexts are completely disconnected from each other.