Nokogiri gem issues

Hello all,

I recently was pointed towards the Nokogiri gem recently to find all html elements with a particular class, rather than attempting my own regular expression. (Thanks John-John Tedro and Hassan Schroeder!!!!)

It works perfectly on my local machine, (Lion OS X and passenger), but when I deployed it to my server (Centos 5.5 and passenger) Nokogiri seems to not grab all the elements of the html file.

Here is my method:

I'll guess your server has an older (buggier) libxml2 than your Mac -- you can check versions using `xmllint --version`.

If it's older, try upgrading and see if that fixes it.

I did $ yum list updates and the only one that said there was an update that was close to libxml2 was 'libxml2-python'.

I updated that and it still does not work. When I checked the version, the output was this:

I did $ yum list updates and the only one that said there was an update that was close to libxml2 was 'libxml2-python'.

I don't think that's relevant :slight_smile:

I updated that and it still does not work. When I checked the version, the output was this:

xmllint --version xmllint: using libxml version 20626

On my Mac (Snow Leopard) version 20708 On a very old SuSE box (home office spare) version 20620 On an Ubuntu server provisioned in the last year version 20626

So I would bet that difference could still account for your problem.

I rarely use package managers like yum; usually it seems faster and easier to just install/update from source :slight_smile:

You might want to research the change history but I would probably just update and see what happens...

FWIW,

Unsubscribe

Unsubscribe

It says this at the bottom of each message...

Keith Raymond wrote in post #1018677:

    # I tried using xpath, but was not able to get it to grab elements w/ 'class="my-class icon"'     # only 'class="my-class"'

If you give up on installing a newer version of libxml (I've tried it in the past and found it impossible), you can use xpath() to do what you want:

<!DOCTYPE html> <html>   <head>     <title>Test</title>   </head>

  <body>     <h1 class='editor highlight'>Hello world</h1>

    <div class="editor_fruit red">Apple</div>

    <div class="article_editor">       <div>Not this node</div>       <div class="hide editor">Papillon</div>     </div>

  </body>

</html>

require 'nokogiri'

f = File.open('2html.htm') doc = Nokogiri::HTML(f)

results = doc.xpath("//*[contains(concat(' ', @class, ' '), ' editor ')]").each do |el|   p [       el.attributes['class'].value,       el.children[0].text   ] end

--output:-- ["editor highlight", "Hello world"] ["hide editor", "Papillon"]

Just an update...

I removed older versions of libxml2 and libxslt and reinstalled them from the source and it appears to be working.

Thank you for all your help, again