Parsing HTML with Ruby

While working on a little school project I found that at some point I would have to do some HTML parsing. having heard of why I went, then, to Hpricot page and started working with it.

Not a good decision, it turned out, out, because when you look for Hpricot, there is not much out there i terms of examples, documentation, or simple code to read out about.

Nokogiri, on the other hand, has much many responses and gloogle results, and all of them within a few months, as opposed to those from Hpricot, that are scattered, old and don’t really answer many questions.

As an example, check this conversation from SO:

Pick nokogiri, for at least points 1-4, I don’t know about (5).

And the points are

1. Support & Future enhancements
2. Community & general knowledge base (on the Internet)
3. Comprehensive (i.e proven to parse a wide range of *.*ml pages)
4. Performance
5. Memory Footprint (runtime, not the code-base)

Even mechanize is using Nokogri instead of the more elegant hpricot.

I will continue playing with Hpricot, but at which point will this cost me?

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.