A Brief Origin Story
Back in the 90’s when the World Wide Web was just getting started and you could teach yourself to publish for the web in a week, building HTML forms and processing them with CGI was my first real foray into text processing. I wrote some CGI-processing scripts in C, but it was hard! It was tedious!
About the same time a college intern I worked with introduced me to Perl. I couldn’t believe how easy it made everything. I could read in text files, process the lines, and output the results? Without allocating any memory or guessing how big the file was? I could work directly on entire strings, lines, or even files instead of processing them character by character? It was magic!
Regular expressions were (and still are) a fundamental part of Perl. The Learning Perl book, which is still an incredible resource, devotes three entire chapters to them:
Regular expressions are actually tiny programs in their own special language, built inside Perl.
Regular expressions are an integral part of Perl. Other languages seem to bolt on the functionality almost as an afterthought. My suspicion is that is why some programmers haven’t learned to use them.
My Own Regular Expression Usage
Using ack I took a quick look at my Advent of Code solutions over the years to see how many of my programs contained regular expressions:
kacmbp23:~$ ls -1 adventofcode20*/*.pl | wc -l 266 kacmbp23:~$ ack -l \[\!\=\]\~ adventofcode*/*.pl | wc -l 148
148/266 = 55.6% of my Advent of Code programs had at least one regular expression. I was actually a little surprised it was that low.
Running over a sample of perl scripts at work (which are mostly web applications), regular expression usage is even higher:
ramone8:/var/www/apps$ ls -1 */*.pl | wc -l 566 ramone8:/var/www/apps$ ack -l \[\!\=\]\~ */*.pl | wc -l 445
445/566 = 78.6% of our web application programs have at least one regular expression. That’s more like it!
Conclusion and Recommendation
If you aren’t using regular expressions, um, regularly, in your code, you may be doing yourself a disservice! Take it from Learning Perl, they aren’t all that terribly difficult to learn. You don’t have to use all the features immediately. Some of the more advanced ones, like
/(negative\s)?look(ahead|behind)/ can come later.
Learn regex the easy way
A very nice, short summary of the basics of regular expressions
Learning Perl and Mastering Regular Expressions
Mastering Regular Expressions is supposed to be the absolute Tome of Enlightenment on the subject, but I wouldn’t know because I felt like I had enough background from Perl. It is probably a blind spot for me.
Mastering Lookahead and Lookbehind
I’ll admit that lookahead/lookbehind has always been troublesome for me and in the past I would go out of my way to avoid them. But this is my reference for when I do use them.
This one is also good: