Always interesting to see applications/algorithms sped up with GPGPU implementations. It looks like they get a great amount of speedup by parsing each line in parallel (in the thread blocks), followed up by parsing each individual regular expression in a separate blocks.
Admittedly, I assumed this was already the case after reading the article title, since parsing each line in parallel would present a great deal of parallelism with growing filesizes. Indeed, they also came to the same conclusion:
we realized that it was a much better idea to parallelize across the lines of the input file against which we had to match regexs
That was also a great choice for future scalability, as no doubt you would get greater parallelism with a growing input file (while also allowing opportunities to speed up more complex expressions).
Admittedly, I assumed this was already the case after reading the article title, since parsing each line in parallel would present a great deal of parallelism with growing filesizes. Indeed, they also came to the same conclusion:
we realized that it was a much better idea to parallelize across the lines of the input file against which we had to match regexs
That was also a great choice for future scalability, as no doubt you would get greater parallelism with a growing input file (while also allowing opportunities to speed up more complex expressions).