Saturday, August 21, 2010

Cache in Regular Expressions

The chart illustrates the difference in performance between the code, which relies on repeatedly instantiating a Regex object with the same regular expression pattern to call an instance matching method, and the second, which calls a static matching method.

// BAD: Never reinstantiate the same object
Regex rgx = new Regex(pattern);
return rgx.IsMatch(input);

// GOOD: Take advantage of regex cache
return Regex.IsMatch(input, pattern);

The execution time of the first code is about 15 times the execution time of the second example. The difference in performance is due to the caching of regular expressions used in static method calls. Whereas the first example instantiates a regular expression object and converts the regular expression into opcodes in each of fourteen method calls (one for each element in the string array), the second example performs this conversion only once, on the first method call. Subsequently, it retrieves the interpreted regular expression.from the cache each time the expression is needed.

Only regular expressions used in static method calls are cached; regular expressions used in instance methods are not. The size of the cache is defined by the Regex.CacheSize property. By default, 15 regular expressions are cached, although this value can be modified if necessary. If the number of regular expressions exceeds the cache size, the regular expression engine discards the least recently used regular expression to cache the newest one.

Note that there is a breaking change in regular expression caching between versions 1.1 and subsequent versions of the .NET Framework. In version 1.1, both instance and static regular expressions are cached; in version 2.0 and all subsequent versions, only regular expressions used in static method calls are cached.

1 comment:

  1. While in one case we are recommended to avoid static variables/methods as much as possible. This is a classical example of using the static variables/methods.