Wednesday, September 30, 2009

Hpricot

As part of my work, I have to work with a large number of XML files. Since my project is implemented using Ruby on Rails, I first used Rexml to process the XML files. However, when dealing with large XML files, Rexml takes quite long a time to process. Thus, I switched to Hpricot . Though Hpricot is originally written to do HTML parsing, it turns out to be capable of working with XML also. It suits my need in traversing through the XML document and extracting values using XPath queries. Besides, it does help to improve the speed tremendously as compared to Rexml.


However, while using Hpricot, I encountered this error

TypeError: can't convert nil into String from /usr/local/lib/ruby/gems/1.8/gems/hpricot-0.6/lib/hpricot/parse.rb:51:in `scan'


As what I've found in this post, this is a known error, Hpricot can’t handle files whose size is multiple of 16384 bytes. The fix is to add some extra bytes to your "special-sized" files

echo " " >> special_sized_file

No comments: