30 June, 2007

Learned a new perl trick today

So I was parsing html, which is always kind of an icky job. But perl has this great regex engine I can employ to do the parsing for me. The problem with the regex engine is it's very difficult to debug a bad expression. I remember being confounded for hours by them in the past.


@bottle{qw{ upc_code year name varietal size }} = $bottling =~ m{
(\d+)</a></td> # this is the UPC code
<td>([^<]+)</td> # this is the year
<td>([^<]+)</td> # this should be the name
<td>([^<]+)</td> # this is the varietal
<td>([^<]+)</td> # bottle size
}x;

Luckily, perl gives us the /x modifier to regexes. So in this case you can see a very simple expression, but I'm sure you can imagine much more complicated expressions. If we want to see where the expression is broken, we can just do this:


@bottle{qw{ upc_code year name varietal size }} = $bottling =~ m{
(\d+)</a></td> # this is the UPC code
# <td>([^<]+)</td> # this is the year
# <td>([^<]+)</td> # this should be the name
# <td>([^<]+)</td> # this is the varietal
# <td>([^<]+)</td> # bottle size
}x;


and run it each time, opening up another little piece of the expression each time. This way we can "walk" down the expression finding where we goofed. In the case above, there was just a line that needed a \s* (which is easy to forget about when using /x!).

Please Donate To Bitcoin Address: [[address]]

Donation of [[value]] BTC Received. Thank You.
[[error]]
Post a Comment