Faster Regexes: What to do when text matching is your bottleneck
By Aaron Crane from Edinburgh.pm
Date: Thursday August 14, 2008 15:10
Duration: 30 minutes
Tags: optimisation regex regexp
We all know how good Perl is at munging text. But what do you do when your Perl text-munging code isn't fast enough for what you're trying to do?
We needed to extract useful information from tens of gigabytes of web-server log files. Our Perl code was simple and obvious, but not fast enough for our purposes. When profiling revealed a frequently-executed regex as the bottleneck, we tried several things to make it faster.
This talk looks at what we did to speed up our regex-heavy code (by a factor of well over 100 in some places), identifying a few general-purpose optimisation techniques on the way.
- Nicholas Clark
- Alberto Simões (ambs)
- Stephane Payrard (cognominal)
- Léon Brocard (acme)
- Barbie
- Mark Fowler (Trelane)
- Karen Pauley
- Nuno Carvalho (smash)
- Anton Berezin (Grrrr)
- Casiano Rodriguez-Leon (casiano)
- Dmitry Karasik (dk)
- Gertraud Unterreitmeier (Gertraud)
- Arne Sommer (Arne)
- Martin Schipany (ElCondor)
- Sue Spence (virtualsue)
- Andreas Hetey
- Wendy Van Dijk (woolfy)
- Jörg Plate (Patterner)
- Damian Conway (damian)
- Erik Johansen (uniejo)
- allan dystrup (ady)
- Søren Døygaard
- Martin Kjeldsen (baest)
- Francoise Dehinbo
- David Jack Wange Olrik (da5id)
- Patrick Michaud (Pm)
- Andrew Shitov (ash)
- William Travis Holton
- Alex Kapranoff (kappa)
- Matija Grabnar (matija)
- Alex Balhatchet (Kaoru)
- Lars Dɪᴇᴄᴋᴏᴡ (迪拉斯)
- Michael Zedeler
- Stefan Hornburg (racke)
- Steffen Mueller (tsee)
- Vincent Pit (VPIT)
- Stefan Hanski
- Stan Sawa
- Rasmus Hansen (rasmoo)
- Darius Jokilehto
- Jacob Bunk Nielsen
- Bart Lateur
- Roel de Cock