Nokogiri gem issues

Hello all,

I recently was pointed towards the Nokogiri gem recently to find all
html elements with a particular class, rather than attempting my own
regular expression. (Thanks John-John Tedro and Hassan Schroeder!!!!)

It works perfectly on my local machine, (Lion OS X and passenger), but
when I deployed it to my server (Centos 5.5 and passenger) Nokogiri
seems to not grab all the elements of the html file.

Here is my method:

I'll guess your server has an older (buggier) libxml2 than your Mac --
you can check versions using `xmllint --version`.

If it's older, try upgrading and see if that fixes it.

I did $ yum list updates and the only one that said there was an update
that was close to libxml2 was 'libxml2-python'.

I updated that and it still does not work. When I checked the version,
the output was this:

I did $ yum list updates and the only one that said there was an update
that was close to libxml2 was 'libxml2-python'.

I don't think that's relevant :slight_smile:

I updated that and it still does not work. When I checked the version,
the output was this:

xmllint --version
xmllint: using libxml version 20626

On my Mac (Snow Leopard) version 20708
On a very old SuSE box (home office spare) version 20620
On an Ubuntu server provisioned in the last year version 20626

So I would bet that difference could still account for your problem.

I rarely use package managers like yum; usually it seems faster and
easier to just install/update from source :slight_smile:

You might want to research the change history but I would probably
just update and see what happens...

FWIW,

Unsubscribe

Unsubscribe

It says this at the bottom of each message...

Keith Raymond wrote in post #1018677:

    # I tried using xpath, but was not able to get it to grab elements
w/ 'class="my-class icon"'
    # only 'class="my-class"'

If you give up on installing a newer version of libxml (I've tried it in
the past and found it impossible), you can use xpath() to do what you
want:

<!DOCTYPE html>
<html>
  <head>
    <title>Test</title>
  </head>

  <body>
    <h1 class='editor highlight'>Hello world</h1>

    <div class="editor_fruit red">Apple</div>

    <div class="article_editor">
      <div>Not this node</div>
      <div class="hide editor">Papillon</div>
    </div>

  </body>

</html>

require 'nokogiri'

f = File.open('2html.htm')
doc = Nokogiri::HTML(f)

results = doc.xpath("//*[contains(concat(' ', @class, ' '), ' editor
')]").each do |el|
  p [
      el.attributes['class'].value,
      el.children[0].text
  ]
end

--output:--
["editor highlight", "Hello world"]
["hide editor", "Papillon"]

Just an update...

I removed older versions of libxml2 and libxslt and reinstalled them
from the source and it appears to be working.

Thank you for all your help, again