I have a program I'd like to write and I'd like to get off of Java and
start using something more productive. I have played around with
Rails, and I think I could the same app in a fraction of the time.
There's one part of the app though that I am not if Rails/Ruby would
be a good fit... so I'd like to get some expert advice on this one
portion.
There's one part of the application that needs to check a thesaurus
about 100 or 200 times per request. There might be a lot of these
requests. There is also a grammar checker that is going to be used
(although I need to find one that works with Ruby).
I can imagine that these are some very expensive requests. In the java
world, I can simply put the thesaurus and grammar checker (with rules)
entirely in memory as Java beans in a spring container. When the app
starts up, we initialize these two services and they just stay in
memory, waiting for requests.
When user requests come in that need to use the thesaurus or grammar
checker, they happen extremely quickly because they are just sitting
in memory the entire time.
Since Ruby is a scripted language, does that mean this would not be
possible? Would there be a way to make these extremely performant
using the Rails paradigm? Or is this a bad fit for Ruby and Rails?
I can imagine that these are some very expensive requests. In the java
world, I can simply put the thesaurus and grammar checker (with rules)
entirely in memory as Java beans in a spring container. When the app
starts up, we initialize these two services and they just stay in
memory, waiting for requests.
When user requests come in that need to use the thesaurus or grammar
checker, they happen extremely quickly because they are just sitting
in memory the entire time.
Since Ruby is a scripted language, does that mean this would not be
possible?
Does sound to me like this would be a problem - it's entirely possible
to set something like this up during app startup.
The only issue you might have is that (unless you go down the jruby
world) with ruby you'd have multiple rails application processes (MRI
never runs more than one thread at once), and so each one would have a
copy of this thesaurus, you might end up using a lot of memory
You may want to consider going with a hybrid approach - keep the heavy
lifting in Java, with a lightweight API on top that the Ruby code can
call. There are plenty of examples of this sort of thing - Solr is
precisely this technique applied (with some enhancements) to the
Lucene libraries.
Ruby already has may c apis that run very fast with a ruby wrapper
around them c code. Also, you can run services that respond to all
this, it would work like memcached does, extremely fast service with a
ruby client. I dont recommend having the thesaurus as a pure ruby
gems. If you can work around that there will be no difference between
using rails or java
I have a program I'd like to write and I'd like to get off of Java and
start using something more productive. I have played around with
Rails, and I think I could the same app in a fraction of the time.
Welcome! I think you wouldn't have to look far to find any number of
ex-Java developers on this list who could provide you with personal
experience supporting your thinking.
There's one part of the app though that I am not if Rails/Ruby would
be a good fit... so I'd like to get some expert advice on this one
portion.
There's one part of the application that needs to check a thesaurus
about 100 or 200 times per request. There might be a lot of these
requests. There is also a grammar checker that is going to be used
(although I need to find one that works with Ruby).
I can imagine that these are some very expensive requests. In the java
world, I can simply put the thesaurus and grammar checker (with rules)
entirely in memory as Java beans in a spring container. When the app
starts up, we initialize these two services and they just stay in
memory, waiting for requests.
I, too, will recommend a hybrid approach, though I suggest looking at
an IMDB approach to the shared thesaurus. Here's a link to one
approach that looks ready-made for integration with a RoR app.
There's some really interesting approaches in all of these emails.
I like this one the best. I haven't learned how to setup 2 databases
in rails... but is this complicated to do? I guess I could make a
database just like this person is doing, put all the thesaurus lookups
in it, and then keep it running just like any other database.
I want to avoid doing a ruby<->java bridge if I can. The last thing I
want is some bloated java process needing 512 megabytes of ram just to
do its think without throwing some kind of out of permgem space or out
of memory errors At least if it's all in rails, I can just dedicate
all of extra ram to rails and my database.
Are the in-memory databases fairly quickly, such as sqlite? I've not
used it mysql. If it can take do at least 1000 individual select
statements very quickly (like less than a second), then that could
work.
If a thesaurus is just a grouping of words then you can use memcached, is a shared key value store, fast and is integrated to rails, you can do Rails.cache.fetch(word){ Words.thesaurus(word) } and the first time it will hit the db write the result to memcached and there after it will pull it from memcached. Memcached can be clustered and the key are shared among the servers.
I also was doing some other research and JRuby completely slipped my
mind. Is that also a reasonable alternative too? Do you guys have any
experience using rails 3 under Jruby? Is it still easy and straight-
forward, or is the mismatch (like getting rails to run on tomcat and
using jdbc, among other things) going to be frustrating?
I'm using JRuby for a number of Rails projects. Generally it works very well with the very occasional glitch (just had one running capistrano which depends on highline which requires the ffi-ncurses gem under JRuby.
Generally it's pretty good though. And you can just use rvm like you would with anything else, so it's like using MRI - just with a slower startup time in dev mode.
Actually, this might the best alternative because the way I would do
it in Java is just load the entire thesaurus into memory (all the
nouns, verbs, adj's, etc.) I didn't care too much about memory, but
I'm reasonable certain that 90% of the values were ever hit I guess
it might be a bit slower at first, but caching the values that end up
getting used a lot is actually quite a bit much better use of
memory
I'm think I'm going to go with Rails. I want to do something different
than Java entirely. It'll be fun for a change. I haven't had any fun
in Java in years. I want to have fun AND make money. With java
development, it's always been this torturous thing to bear, and money
was the only reason I could find for continuing to do it
Why don’t you just run a thesaurus app as a standalone ruby application which you connect to via tcp sockets?
This way you only use the memory for the thesaurus once and use rails to send requests to your standalone server?
And, if even necessary, you could look at making it multithreaded…
Why don't you just run a thesaurus app as a standalone ruby application
which you connect to via tcp sockets?
This way you only use the memory for the thesaurus once and use rails to
send requests to your standalone server?
And, if even necessary, you could look at making it multithreaded...
Andrew
Well, before i didn't even have a thesaurus app. I just had 4 text
files and wrote some code to load them up in memory, creating the hash
maps with all the data. I put all of this in a facade/service that hid
it all away, and clients just called 1 simple method in order to get
the list of words that related.
So if I use memcached, the memory would be used up multiple times?
So if I use memcached, the memory would be used up multiple times?
nop, what is more awesome is that you can pass an ip (or an array) to the config and it will use that (those) server’s memcached so in the future scaling your app is just a matter of putting the memcached server on another box(check the doc to find out how easy is to cluster) and restarting your web server, it would take you 3 minutes
My initial recommendation was based on the assumption that you're
going to need to store the data somewhere and a db is, to me, the
logical place for it. So, to me, memcached is a second step, not a
first. YMMV.
To answer your earlier question about multiple database connections,
it can be as simple as adding an entry for it to database.yml and a
before_filter in application.rb to establish a connection to same for
every Rails instance.