How to Call Javascript from Service Object?

I’d like to call a Javascript library from a Service Object.

I have the following Javascript in /app/javascript/packs/readability.js.

import Readability from '@mozilla/readability';

function articleBody(document) {
  var article = new Readability(document).parse();
  return article.content;
}

I found that webpacker-ed files are avaliable via asset_pack_path for use in Views. I tried adding this to my Service Object, as shown below (using the therubyracer gem).

(I tried using ExecJS, which seems to be much more popular than therubyracer, but even using an unprocessed file with ES5 Javascript, it only returned errors.)

cxt = V8::Context.new
cxt.load(ActionController::Base.helpers.asset_pack_path 'readability.js')
puts cxt.eval("articleBody('http://example.com/')")

This is returning No such file or directory @ rb_sysopen - /packs/js/readability-20f57636.js.

localhost:3000/packs/js/readability-20f57636.js loads fine in the browser.

How can I load a webpacker processed file in a Service Object?

I’m assuming this is in local development while you’re using the webpack-dev-server—is that correct?

If so, the webpack-dev-server compiles JS in memory so there’s no files for other processes to access.

You could turn off the webpack-dev-server and make sure compile: true is set in config/webpacker.yml to output the files, but—

What is the use case? This isn’t running in a browser, i.e., you want to use NPM package(s) on the server-side? If so, I would skip Webpacker altogether and treat this as a node.js script. If you have to use Ruby, you can shell out to the node.js script in your server-side process.

Yes, I’m still in local development.

The use-case: I’m trying to capture the content of a web article using Mozilla’s Readability library. I’d like to have the content returned as a string for further processing in the Service Object.

If I skip Webpacker and call the file directly from app/javascript/packs I get V8::Error Unexpected reserved word. As I understand it, this is because import is ES6 syntax and the file hasn’t been compiled.

Node doesn’t use import/module syntax as the default (yet). You can either force Node to treat it as a module or use the old require syntax to import the readability library. I’d use require for now, last time I checked using import still felt hacky. (With modules that are processed by webpack, though, I prefer ES6 import syntax.)

If I replace import Readability from '@mozilla/readability'; with require('@mozilla/readability'); I get V8::Error require is not defined.

Does that happen when you run it directly with node /app/javascript/packs/readability.js or is that a webpack/browser error? Maybe this Stack Overflow link is relevant to you. javascript - require is not defined? Node.js - Stack Overflow

I don’t understand almost anything in that Stack Overflow thread. If there is something relevant to me, I can’t find it.

The error I described above is being thrown in the browser when Rails balks, but you can also see it in the server logs.

If I run node app/javascript/packs/readability.js in the terminal, there is no error … but then, there is no output either. I’m trying to get data for processing and/or saving.

The README from the mozilla/readability page has an example for using this file with Node.js, which includes using require() (CommonJS) instead of import syntax.

Also, I’d not put this file in app/javascript but instead make it an executable file in your bin/ directory.

The problem is that it doesn’t seem clear what you actually want. If you put it in packs, then webpack will try to process it and make it available to the frontend and put it in the public path with the rest of your assets. If you don’t want that, you need to keep it away from webpacker’s asset pipeline so that it can run in the background somehow. Then you’re basically on your own, and it’s not a frontend Javascript question anymore. It’s a bit confusing to me whether you want one or the other.

I apologize for the confusion. Let me try an example in effort to be more clear.

What I’m hoping to achieve is, given a URL, I’d like to save the content of that page. I’d like to use Mozilla’s Readability library to do this. Let’s say I want to save the content of Ross’s latest article: These Rails apps are overpacking their JavaScript bundles - rossta.net. I want the content, as captured by article.content, to be available in a variable in app/services/download.rb. I do not want the article content displayed in the browser or otherwise visible on the frontend.

I assumed the bin directory was not to be touched. I’ve never come across a tutorial that mentioned it.

Do you know of any good tutorials for doing as you’ve suggested?

I consulted Google by searching rails make javascript executable from bin directory. Nothing in the results made sense.

I still don’t really understand what you’re trying to do, so take that into consideration.

given a URL, I’d like to save the content of that page

Thanks for the extra info, but now I have more questions. In what context are you saving this URL? In a background job? Why do you need the Readability library? If all you want to do is save the DOM content of a URL, there are ways to do this in pure Ruby.

I assumed the bin directory was not to be touched.

Rails or Bundler may generate files in bin/; don’t edit those. Other than that, you can put whatever you want in bin/.

I think what you’re trying to do is create a Node.js script and call that from Rails and you’re trying to run server-side that JavaScript code from Rails outside the context of a Rails request (i.e., you don’t need to return results to a browser), then I’d consider that a script. Personally, I’d choose Node.js to run that script which you can call from Ruby. You could also take a look at ExecJS, but I haven’t used it much.

My teams traditionally have put scripts in either the bin/ directory or scripts/ directory. I don’t know of any tutorials that say where you should put scripts because scripts are not treated specially by Rails, so it ends up being team preference.

I may not understand

Thank you for sticking with me on this. I’m sorry for my inability to describe what I’m after. It seems so clear in my mind. To that end let’s try a different example.

Let’s say you were going to create a read-later app like Pocket. You wouldn’t want to grab everything from a saved page. You’d want to strip the navigation, comments, et cetera. My first thought was to write my own parser using Nokogiri. It worked … but it was terrible. Once I saw how good Mozilla’s Readability is, I figured why re-create the wheel? The thing is, I don’t know how to use it—being an Javascript library—from a Rails project. The code in the first post of this thread was my best attempt.

If I move the script I have in app/javascript/packs/readability.js into bin/readability.js, how do I get the content from article.content for use in a method in the app/services/download.rb file?

Can you describe how to do this? Or if you know of a tutorial on the subject, I’ll be happy to read that.

Dumb question: is the Ruby Readability project very different from Mozilla’s Javascript implementation? GitHub - cantino/ruby-readability: Port of arc90's readability project to Ruby

1 Like

I’m having trouble seeing why your use case is better served by Node.js tool than a Ruby one if your primary environment is Ruby. The library @evenreven posted seems like a good alternative. I’d suggest starting there.

If you must use a Node.js tool, then you’d need to do at least these three things:

  • figure out how to execute a shell script from Ruby. Here’s a tutorial.
  • figure out how to pass data from your Node script to Ruby. You’d do that either by saving data to a file or printing that data to STDOUT (i.e., console.log()) in JavaScript. If the data is complex, you’d probably treat it as a JSON object and call JSON.stringify() to serialize it.
  • figure out how to read the file (probably File.read + JSON.parse) or parse STDOUT (JSON.parse) string result in Ruby

All of the above what I just described is probably more complicated than using a Ruby alternative.

2 Likes

It is. After writing my own parser and realizing how bad it was, I tried Andrew’s gem. It did work better than mine, but not by much. The last meaningful update to that repo was 2014. A lot has changed in seven years. It seems to struggle most with anything using a newfangled Javascript framework. The Mozilla library has made accommodations for React, Vue, et al.

I would, very much, rather use a Ruby gem. But I can only find two libraries that do a good job and seem worth using. Both are written in Javascript.

I’m getting closer, but still having trouble.

In my Service Object (app/services/download.rb), I have:

request = HTTParty.get("https://bradonomics.com/about/")
document = Nokogiri::HTML(request.body)

readability = %x[node lib/readability/parse.js "#{document}"]
puts readability

I’ve created a new file, lib/readability/parse.js, with this:

var { Readability } = require('@mozilla/readability');
var JSDOM = require('jsdom').JSDOM;


var data = process.argv[2];
var doc = new JSDOM(data);
var article = new Readability(doc.window.document).parse();
console.log(article.content);

This is throwing the error: TypeError: Cannot read property 'content' of null

If I just try returning data with console.log(data), I get this:

sh: 14: amp: not found
sh: 6: initial-scale=1.0>
<meta name=generator content=Jekyll: not found
sh: 50: link>/canon</a>
<p>A collection of things that have shaped my thinking.</p>
</li>
<li>
<a href=https://bradonomics.com/blogroll/ class=section-title: not found
<!DOCTYPE html>
<html lang=en>
<head>
<meta http-equiv=Content-Type content=text/html

If I create a local html document and call that with fs.readFile it’s returning content as expected. So it seems like I’ve got a problem passing a string from Rails to Node. Any ideas here?

Update:

Right after I typed the above, it occured to me that maybe I should have Ruby write a file to disk and then have Node read said file. It seems a bit much, but it is returning expected data to the Rails server log.

Here’s what I’ve got in the Service Object:

request = HTTParty.get("https://rossta.net/blog/rails-apps-overpacking-with-webpacker.html")
document = Nokogiri::HTML(request.body)

File.open("tmp/unique-string/document-title.html", 'w') do |outfile|
  outfile.puts document
end

file = Rails.root.join("tmp/unique-string/", "document-title.html")
readability = %x[node lib/readability/parse.js "#{file}"]
puts readability

Then in the Javascript file:

var file = process.argv[2];

var fs = require('fs');
var { Readability } = require('@mozilla/readability');
var JSDOM = require('jsdom').JSDOM;

fs.readFile(file, 'utf8', function(err, data) {
  if (err) throw err;
  var doc = new JSDOM(data);
  var article = new Readability(doc.window.document).parse();
  console.log(article.content);
});

I’ve tested a few webpages and it seems to be working. Do you see any gotchas in this implementation?

I think this could potentially leave behind a zombie process. Similar to problem described here Ruby: check for zombie children processes, kill them, and spawn a new parent process - Eric London » Open Source » Software Blog

Since you’ve gotten to the point where you have a working Node script, I wrote a gem which may help you:

You can pass Ruby data into an inline JS script and get parsed JSON data back out…no shuttling of content in files needed, don’t even need a separate JS file anywhere. AFAIK, it’s the easiest method of executing reasonably modern Node scripts via Ruby.

1 Like