EF Core bulk sync for MySQL

March 8th 2024 MySQL EF Core .NET

Using Entity Framework Core for loading large amounts of data into database is not very efficient. The bulk operations from the EFCore.BulkExtensions NuGet package can help in such scenarios. Unfortunately, not all of them work (yet) for all database types.

I wanted to use the synchronization operation (BulkInsertOrUpdateOrDelete) with a MySQL database. To make it work, I first had to enable local infile support, i.e., the option to load data from a CSV file on the client.

Before I did that, the synchronization operation failed with the following error:

System.NotSupportedException: 'To use MySqlBulkLoader.Local=true, set AllowLoadLocalInfile=true in the connection string. See https://fl.vu/mysql-load-data'

After adding this option to my connection string, I got another error:

MySqlConnector.MySqlException: 'Loading local data is disabled; this must be enabled on both the client and server sides'

This is controlled by the local_infile global variable. By default, it is disabled. You can check its value with the following command:

SHOW VARIABLES LIKE 'local_infile';

And you can enable it with:

SET GLOBAL local_infile = ON;

You can also enable it with the --local-infile=1 command-line flag, which can be added to your docker-compose.yml file if you're using it to start the server locally:

version: "3"
services:
  mysql:
    image: mysql:8.0.35
    command: --local-infile=1
    container_name: "mysql"
    ports:
      - 3306:3306
    environment:
      MYSQL_ROOT_PASSWORD: root
      MYSQL_DATABASE: blogsample

With this variable properly configured, I tried my BulkInsertOrUpdateOrDelete call again:

await dbContext.BulkInsertOrUpdateOrDeleteAsync(achievements);

It failed with another error:

System.NotImplementedException: 'For MySql method InsertOrUpdateOrDelete is not yet supported. Use combination of InsertOrUpdate with Read and Delete'

This wasn't something I could fix with configuration. I'd have to change the code. There was a bit more information in the documentation, but unfortunately there was no sample code:

BulkInsertOrUpdateOrDelete effectively synchronizes table rows with input data. Those in Db that are not found in the list will be deleted. Partial Sync can be done on table subset using expression set on config with method: bulkConfig.SetSynchronizeFilter<Item>(a => a.Quantity > 0);
Not supported for SQLite (Lite has only UPSERT statement) nor currently for PostgreSQL. Way to achieve there sync functionality is to Select or BulkRead existing data from DB, split list into sublists and call separately Bulk methods for BulkInsertOrUpdate and Delete.

Even after checking the BulkRead examples, I still couldn't figure out, why I'd need to call it to work around the not implemented method. I'd have to do a SELECT anyway, to get the keys for the BulkRead call. And having the keys was all I needed for BulkDelete:

var existingAchievements = await dbContext
    .Achievements.Select(
        achievement => new Achievement(
            achievement.Name,
            string.Empty,
            0,
            DateTime.Now
        )
    )
    .AsNoTracking()
    .ToListAsync();

await dbContext.BulkInsertOrUpdateAsync(achievements);

var achievementsToDelete = existingAchievements
    .ExceptBy(
        achievements.Select(achievement => achievement.Name),
        achievement => achievement.Name
    )
    .ToList();
await dbContext.BulkDeleteAsync(achievementsToDelete);

Alternatively, I could do InsertOrUpdate first and then use a large NOT IN query to select only the keys that weren't part of this call and had to be deleted:

await dbContext.BulkInsertOrUpdateAsync(achievements);

var achievementsToDelete = await dbContext
    .Achievements.Where(achievement => !achievements.Contains(achievement))
    .Select(achievement => new Achievement(
        achievement.Name,
        string.Empty,
        0,
        DateTime.Now
    ))
    .AsNoTracking()
    .ToListAsync();
await dbContext.BulkDeleteAsync(achievementsToDelete);

For my sample case, the performance of both approaches was similar. This could vary depending on the dataset size and the number of records that would typically have to be deleted.

Full source code for a sample application is available in my GitHub repository. It contains everything you need to run the synchronization yourself:

  • docker-compose.yml to run correctly configured MySQL in Docker,
  • SQL DDL script to create the destination database table,
  • sample dataset and code to randomly mutate it before each synchronization run,
  • all 3 synchronization approaches described here (1 not supported and 2 working). Full logging to console is enabled for MySQL so that you can see the exact SQL commands the extensions execute.

Bulk load operations can speed up massive data load operations in database. If you're using Entity Framework Core, you should try EFCore.BulkExtensions. Not all operations work with all database types, but even so you should still be able to make it work faster than if you had to write all the SQL by hand yourself.

Get notified when a new blog post is published (usually every Friday):

Copyright
Creative Commons License