DISQUS

Caffeinated Simpleton: Exploring memcmp

  • Michael Hanson · 7 months ago
    Interesting article. Your approach has poor runtime if your haystack contains many instances of a "needle prefix" -- that is, regions where some X bytes of the needle are present in the haystack but the pattern then fails to match. You'll be forced to examine those same bytes over and over again in order to get through the prefix, looking for the correct starting point.

    Take a look at search algorithms based on incremental approaches, for example the Boyer-Moore algorithm, for a much more efficient solution.
  • Michael Hanson · 7 months ago
    I got curious and dug up an implementation of Boyer-Moore-Horspool (which has a worse worst case than Boyer-Moore, but a comparable average case) and ran it on a theoretical worst-case file. The implementation I used was at http://www.dcc.uchile.cl/~rbaeza/handbook/algs/... but there are others (and better documented) elsewhere, I'm sure.

    Running on a 328 MB haystack, 16 KB needle, using mmap for I/O, with a match at the last possible spot, on a MB Pro, the BMH algorithm found the match in 0.625 sec of real time. The naive memcmp algorithm, even with -O3 and -march=opteron, took 157 seconds!

    As usual, algorithm choice is much, much more important than processor optimizations, especially when N is large.
  • justin · 7 months ago
    The naive implementation was fast enough, so I was just playing around. For any serious optimization, I agree that you definitely want to look at algorithmic optimizations first.

    Nice work on actually coming up with some numbers!
  • justin · 7 months ago
    I'm aware of how poor this is algorithmically, but it's fast enough for what
    I'm doing. This was just a fun experiment in the impact of SIMD
    instructions; I have no plans to use anything except the most naive approach
    with the dumbest compiler settings for actually accomplishing my task.

    That algorithm looks interesting though, and pretty much exactly what's
    needed. Thanks!
  • Jordan · 7 months ago
    You could speed this up even more by using a smarter searching method than memcmp. Something like http://en.wikipedia.org/wiki/Knuth–Morris–Pratt... should be much better than just moving ahead one character at a time, though depending on the size of the two files, memcmp might be faster just because of the (much) faster SIMD instructions, instead of the algorithmic improvements from KMP.