Inert Detritus The Internet's dust bunnies

5 October 2007 @ 6pm

Regular Expressions, and Nails

Every time I learn a new thing about regular expressions, I find myself using it as a solution to every regular expression matching problem I find (see “When all you have is a hammer, everything looks like a nail”).

I just learned lookahead and backreferences, and now, for example I had this great paragraph about how I used a lookahead unnecessarily, but after some testing, I realized it’s not possible to match what I want, and not consume extra characters, without using what I used. Well, so much for that that post.

For those that were wondering, the match I was performing is as follows. I wanted to match an ampersand (&) that’s not followed by amp; (and is therefore not properly encoded for HTML), but not match a properly encoded & amp;. I used s/&(?!amp;)/& amp;/ to do it. I tested &[^a][^m][^p][^;], but that eats the next four characters after the &, which obviously doesn’t do what we’d like.

It’s good to have lots and lots of hammers in your toolbox. Then you can hit every shape and size nail you come across.

(And apologies on the nasty spaces for the ampersands. Markdown is conspiring to properly encode every entity, which means I’ve got to space things out to stop it.)