Efficient string matching. Which is better?


#1

So, there are a ton of ways to search for strings in a trigger. But, which one is better? For heavy loaded trigger CPU’s, the more efficient, the better. But, you also want to keep your code readable for others to know. I ran a 30-second test on a light-loaded ExtraHop VM to get an idea for which searches are best on the processor.

I ran 4 different types of string searches:

  • String.indexOf(string)
  • String.search(string)
  • Regex.test(string)
  • String.match(regex)

For this test, I counted the number of times I was able to execute each function over 30 seconds. The idea being, the more executions, the more efficient the search function.

var string = "This text";
if ( string.indexOf("text") > -1) {
  // do something
}

if ( string.search("text") > -1) {
  // do something
}

var regex = /text/;
if ( regex.test(string) ) {
  // do something
}

if (string.match(regex).length > 0) {
  // do something
}

After the 30 seconds, the results show that Regex.test(string) was the best performing function, with String.indexOf(string) following not far behind.

The executions above are not absolute. In other words, your mileage will vary on the number of executions, depending on the size of your box. However, what IS important, is the relative value. The regex.test() is 2 times faster than using string.search(). And, in these tests, using regex.test() is the fastest.


#2

Great stuff @apax!

I’ve run similar tests on jsperf.com in the past. Unfortunately as of the time of this post, jsperf has been taken down due to spam overload and is being re-written from the ground up by the community (here). Which is great, but for now you can do a lot of this type of testing right in the browser following these instructions from Google using the same V8 engine we run on ExtraHop!

Also, for those who are ready to start building regular expressions for regex.test() instead of string.indexOf() I often use RegExr.com as a place to build and test my regular expressions. It’s also a great way to learn as the interface is very informative.

Happy string matching! :slight_smile:


#3

Testing is always the way to answer these sorts of questions, but you need to make sure the tests accurately reflect the situation under which they’ll be run. So, playing devil’s advocate here.

There’s a difference between this:

  • instantiate a test
  • a million times, do
    • do the test

and

  • a million times, do
    • instantiate a test
    • do the test

What if, under the hood, any of your methods have internal caching? For example many Regular expression engines can have an expensive ‘compile’ phase, but subsequent uses of the same regex skip that and pull the compiled version from cache. In the case of ‘instantiate and do this test a million times’ you get advantage of this caching, but in ‘a million times / instantiate and do this test’ you start afresh, with no shared state in between. It’s the latter version that we’ll see in product when your triggers run anew on each flow.

Could be that the results are the exact same when run in the opposite way, but I’d love to see what you get when you flip it on its head!


#4

This is an excellent point that I wasn’t considering, thanks!


#5

Agreed. A great point. The expense of compiling a regular expression the first time is being missed in these tests. Perhaps I should use a Date().now() before the pattern compile and after the match, and take the difference.