Inert Detritus The Internet's dust bunnies

Posted
5 October 2007 @ 6pm

Regular Expressions, and Nails

Every time I learn a new thing about reg­u­lar expres­sions, I find myself using it as a solu­tion to every reg­u­lar expres­sion match­ing prob­lem I find (see “When all you have is a ham­mer, every­thing looks like a nail”).

I just learned looka­head and back­ref­er­ences, and now, for exam­ple I had this great para­graph about how I used a looka­head unnec­es­sar­i­ly, but after some test­ing, I real­ized it’s not pos­si­ble to match what I want, and not con­sume extra char­ac­ters, with­out using what I used. Well, so much for that that post.

For those that were won­der­ing, the match I was per­form­ing is as fol­lows. I want­ed to match an amper­sand (&) that’s not fol­lowed by amp; (and is there­fore not prop­er­ly encod­ed for HTML), but not match a prop­er­ly encod­ed & amp;. I used s/&(?!amp;)/& amp;/ to do it. I test­ed &[^a][^m][^p][^;], but that eats the next four char­ac­ters after the &, which obvi­ous­ly does­n’t do what we’d like.

It’s good to have lots and lots of ham­mers in your tool­box. Then you can hit every shape and size nail you come across.

(And apolo­gies on the nasty spaces for the amper­sands. Mark­down is con­spir­ing to prop­er­ly encode every enti­ty, which means I’ve got to space things out to stop it.)