As part of my work, I have to work with a large number of XML files. Since my project is implemented using Ruby on Rails, I first used
Rexml to process the XML files. However, when dealing with large XML files, Rexml takes quite long a time to process. Thus, I switched to
Hpricot . Though Hpricot is originally written to do HTML parsing, it turns out to be capable of working with XML also. It suits my need in traversing through the XML document and extracting values using XPath queries. Besides, it does help to improve the speed tremendously as compared to Rexml.
However, while using Hpricot, I encountered this error
TypeError: can't convert nil into String from /usr/local/lib/ruby/gems/1.8/gems/hpricot-0.6/lib/hpricot/parse.rb:51:in `scan'
As what I've found in
this post, this is a known error, Hpricot can’t handle files whose size is multiple of
16384 bytes. The fix is to add some extra bytes to your "special-sized" files
echo " " >> special_sized_file