Performance comparison of Regex in .NET
If you've used regular expressions a lot in .NET, you most likely already know that by using the Compiled
option you can improve the run time performance a lot at the cost of higher initialization time. What you might not know is that since .NET 7 you can use a source generator instead to avoid that initialization cost.
The simplest way to use a regular expression for validating an input is by calling a static method on the Regex
class:
Regex.IsMatch(input, pattern);
Since regular expressions are parsed and transformed into an optimized tree format that can be more efficiently interpreted, one would think that creating and reusing a static instance instead would benefit performance:
private static readonly Regex regex = new(pattern);
regex.IsMatch(input);
However, the regular expression engine caches the transformed regular expressions when static methods are used, which significantly reduces the performance difference between the two approaches.
To achieve even better performance, you need to use compiled regular expressions:
private static readonly Regex compiledRegex = new(pattern, RegexOptions.Compiled);
compiledRegex.IsMatch(input);
This instructs the regular expression engine to replace the tree representations with their compiled IL (intermediate language) counterparts. This significantly improves the run time performance of regular expression matching, but requires longer initialization time before the first use to actually emit the IL instructions.
Since .NET 7, there is an even better alternative available: a source generator that generates the code at compile time instead of run time. There's a preconfigured code analyzer to inform you about this new feature:
SYSLIB1045: Use
GeneratedRegexAttribute
to generate the regular expression implementation at compile-time.
You can learn more about regular expression source generator here. It's pretty easy to switch to using it:
- make your class
partial
, change your
Regex
field to apartial
method and add theGeneratedRegex
attribute to it:[GeneratedRegex(pattern)] private static partial Regex SourceGeneratedRegex(); SourceGeneratedRegex().IsMatch(input);
In addition to avoiding the initialization cost of compiled regular expressions before the first use, this also slightly improves their performance on subsequent uses.
Here's a performance comparison of all 4 approaches I described, using BenchmarkDotNet (the initialization time before the first call is not measured):
Method | Mean | Error | StdDev |
---|---|---|---|
ValidateUsingSingleUseRegex | 71.96 ns | 0.301 ns | 0.267 ns |
ValidateUsingStaticRegex | 68.44 ns | 0.397 ns | 0.331 ns |
ValidateUsingStaticCompiledRegex | 24.49 ns | 0.070 ns | 0.066 ns |
ValidateUsingSourceGeneratedRegex | 20.34 ns | 0.048 ns | 0.042 ns |
If you want to run the benchmark yourself, you can find the full source code in my GitHub repository.
Regular expression source generator is a great alternative to compiled regular expressions. If the regular expressions you're using are known at compile time, I can't think of a reason for not switching to it. You get even better performance without any extra initialization time before the first use.