I’m getting closer, but still having trouble.
In my Service Object (app/services/download.rb
), I have:
request = HTTParty.get("https://bradonomics.com/about/")
document = Nokogiri::HTML(request.body)
readability = %x[node lib/readability/parse.js "#{document}"]
puts readability
I’ve created a new file, lib/readability/parse.js
, with this:
var { Readability } = require('@mozilla/readability');
var JSDOM = require('jsdom').JSDOM;
var data = process.argv[2];
var doc = new JSDOM(data);
var article = new Readability(doc.window.document).parse();
console.log(article.content);
This is throwing the error: TypeError: Cannot read property 'content' of null
If I just try returning data
with console.log(data)
, I get this:
sh: 14: amp: not found
sh: 6: initial-scale=1.0>
<meta name=generator content=Jekyll: not found
sh: 50: link>/canon</a>
<p>A collection of things that have shaped my thinking.</p>
</li>
<li>
<a href=https://bradonomics.com/blogroll/ class=section-title: not found
<!DOCTYPE html>
<html lang=en>
<head>
<meta http-equiv=Content-Type content=text/html
If I create a local html document and call that with fs.readFile
it’s returning content as expected. So it seems like I’ve got a problem passing a string from Rails to Node. Any ideas here?
Update:
Right after I typed the above, it occured to me that maybe I should have Ruby write a file to disk and then have Node read said file. It seems a bit much, but it is returning expected data to the Rails server log.
Here’s what I’ve got in the Service Object:
request = HTTParty.get("https://rossta.net/blog/rails-apps-overpacking-with-webpacker.html")
document = Nokogiri::HTML(request.body)
File.open("tmp/unique-string/document-title.html", 'w') do |outfile|
outfile.puts document
end
file = Rails.root.join("tmp/unique-string/", "document-title.html")
readability = %x[node lib/readability/parse.js "#{file}"]
puts readability
Then in the Javascript file:
var file = process.argv[2];
var fs = require('fs');
var { Readability } = require('@mozilla/readability');
var JSDOM = require('jsdom').JSDOM;
fs.readFile(file, 'utf8', function(err, data) {
if (err) throw err;
var doc = new JSDOM(data);
var article = new Readability(doc.window.document).parse();
console.log(article.content);
});
I’ve tested a few webpages and it seems to be working. Do you see any gotchas in this implementation?