BackgrounDRb 1.0 pre-release available now

Hi Folks,

We are glad to announce shiny new release of BackgrounDRb, which will
soon become 1.0.

A quick summary of changes:

- BackgrounDRb is no londer DRb, its based on event driven network
programming library packet ( http://www.packet.googlecode.com ) .

- Since we moved to packet, many nasty thread issues, result hash
corruption issues are totally gone. Lots of work has went in
  making scheduler rock solid stable.

- Each worker, still runs in its own process, but each worker has a
event loop of its own and all the events are triggered by the internal
  reactor loop. In a nutshell, you are not encouraged to use threads
in your workers now. All the workers are already concurrent, but you
  are encouraged to use co-operative multitasking, rather than
pre-emptive. A simple example is,

  For implement something like progress bar in old version of bdrb, you would:
    - start your processing a thread (so as your worker can receive
further request from rails ) and have a instance
      variable ( protected by mutex ) which is updated on progress and
can be send to rails.

  With new backgroundrb, progrss bar would be:
    - process your damn request and just use register_status() to
register status of your worker. Just because
      you are doing some processing won't mean that your worker will
block. It can still receive requests from rails.

- Now, you can schedule mulitple methods with their own triggers.

- Inside each worker, you can start tcp server or connect to a
external server. Two important methods available in all workers are:

   start_server("localhost",port,ModuleName)
   connect("localhost",port,ModuleName)

  Connected client or outgoing connection would be integrated with
Event Loop and you can process requests from these guys
  asynchronously. This mouse trap can allow you to build truly
distributed workers across your network.

The detailed list of changes can be found here:

http://backgroundrb.rubyforge.org/

Please give it a try and let me know if you found any bugs.

Hi

Does this means that slave/daemons are not the dependency anymore?

Yes, its gone. bdrb no longer depends on slave and daemons.

By 'not encouraged' do you mean that 1.0 is not supporting multiple
threads in the worker or just as a general guidance?

Could you please comment, how would you approach the following
scenario with 1.0. Currently, we have a worker that creates threads
that process financial payment transactions. An http request sends
several 10s or 100s payment transaction records. They are handled by
the single worker instance. Within the worker there is a pool of
threads created that is calculated based on the number of
transactions. For example for 200 transactions there will be 20
threads where each thread handles 10 requests in a squence. Each
transaction takes about 3-5 seconds, so our throughput is
significantly improved by internal worker parallelization with a
thread pool. The worker periodically updates custom backgroundjob
databse record, so that following ajax request from the client can
read the status of the worker process. The job is identified with the
worker key that is stored in the session.

Its not encouraged, thats all. You can still have threads in your
workers. However, I am planning to add thread pool feature in bdrb
itself, that should simplify things a bit.

Also ideally, when using EventDriven network programming, you want all
your sockets within select loop for efficiency. So, you wouldn't need
any damn threads, if you can use a HTTP handler that works in Evented
manner. What i mean to say is, you don't do this:

a = Net::HTTP.get("http://www.google.com")

but you do,

Backgroundrb::HTTP.open("http://www.google.com") do |data|
process_data(data)
end

What I am trying to illustrate is, when you ask to open, google.com
page, evented model allows you to attach callback ( the block in this
case ), which will be called when data arrives from google.com, rather
than waiting for it in a thread. So, BackgrounDRb::HTTP.open() returns
immediately. And you are concurrent as hell.

But this is not possible, because if you are charging cards, then you
are probably using ActiveMerchant which is using Net::HTTP and which
blocks when you make request. But trust me, writing a simple http client
is not that difficult, there is already connect() available in all
workers.

How this works with fastcgi or multiple mongrel based engines where it
is not guaranteed to hit the same process with the next request? We
are using custom database tables and code for sharing the status
information now but I was wandering whether the plumbing includes
something to address this.

Thats no problem at all, BackgrounDRb is a TCP server, so if you have
followed the README file, no matter from which machine, you are making
the request if you are specifying worker X, then its guaranteed to hit
the same worker(with optional job_key if you are starting your worker
dynamically)

At one point with the old version it was fairly straight-forward to
test workers, but that broke at one point. Could you give any
pointers writing tests for workers in the new version?

Hi Brandon,

update your bdrb copy from svn and run rake backgroundrb:setup and you
should have a RAILS_ROOT/test/bdrb_test_helper.rb file.

Now, all your worker test cases can go in RAILS_ROOT/test/unit
directory, just make sure that you require bdrb_test_helper file, and
you can write test cases.

For example:

require File.join(File.dirname(__FILE__) + "/../bdrb_test_helper")
require "god_worker"

context "When god worker starts" do
  setup do
    god_worker = GodWorker.new
  end
end

I hope this helps.

Hemant, this looks great. Could one use BackgroundRb to have workers
interact programatically with a remote telnet service? Or would I
simply start a worker that does this interaction via a
shell/spawn/telnet/expect...

Great doco too, thanks.

George

Sure as hell.. with any tcp service actually in a evented manner.
However, that area is not polished ( no one ever asked. :slight_smile: )

@Hemant :

Has this been tested on Windows? If so, are there known issues? Previous versions did not work on Windows, although the original version did.

Hi

We are actually on of the ActiveMerchant providers (E-Xact), so
strictly we are talking what is actually behind ActiveMerchant. There
are many protocols involved in financial networks, depending where the
transaction is routed. We are very familiar with Reactor engines and
patterns you are advocating, and they work great, especially in
uniform scenarios without throttling, sequencing etc. In our case, I
don't see a clear gain I'm afraid. While a thread pool was done in
no-time and is dead simple maintain, test etc.

Cool. You can use existing approach provided you handle your threads
with as much care. I will get back to this in sometime. There are other
ways also, that I am looking. For example: co-routines ( on top of
fibers ) from Ruby1.9. Just watch bdrb mailing list, or submit some
patch. As i guess, you guys are already running somewhat customized
version of bdrb.

Our Rails cluster runs bdrb on each Rails server and uses domain
sockets. This to avoid a single point of failure and have uniform
architecture. Would that work too? That is, does bdrb now works sort
of like memcache where each server knows of every other instance? But
even with that in place, in fastcgi for example, fastcgi processor may
recycle the Rails process where callback has been registered.

Hmm, this is cool. So, how did you handle this situation earlier?
Prolly, what you can do is, have bdrb instances running on each cluster
and have cluster specific backgroundrb configuration file. So as,
requests from mongrels running on cluster1 will be served by bdrb
running on cluster1 only, and update some db/memcache key to indicate
it, so as even if next time request goes to another worker on another
machine, you know the state.
Again, I would love any patch, ideas from you and I am myself working on
something like this, which would avoid logging to db and stuff.

No, it won't work on Windows. Even when I removed "slave", still we need
unix domain sockets for internal communication, which is not available
on windows.

@Hemant:

Thank you. That’s what I suspected.

Hi,

Looking forward to a chance to use this library. Thanks for the work!

Hi

How does this affect the licensing of BackgrounDRb (not to mention the
name of the project :-)? The packet library is GPLv2 (the url doesn't
have the leading www by the way), while BackgrounDRb is dual licensed
with the Ruby License or an MIT license.

Damn I realized it after posting the message. But then thought "packet"
may be irrelevant anyways ( to rails guys i mean )

Regarding license issues, since packet is dual licensed under GPL2 and
Ruby, you can take shit from packet and embed in your app and forget
that its under GPL2, since Ruby license allows you do that. There is a
clause from Ruby license that says:

"place your modifications in the Public Domain or otherwise
make them Freely Available, such as by posting said
modifications to Usenet or an equivalent medium, or by allowing
the author to include your modifications in the software."

So, I guess its ok to have that.

Sorry about wrong link, correct one is: http://packet.googlecode.com

Well, i think one of the strengths of packet is, it lets you write
tightly integrated workers with master process. So, this way, you can
offload blocking tasks to these workers, which will run parallely and
keep processing further requests in master.

And its pure ruby.