Ruby on Rails performance on lots of requests and DB updates per second

I’m developing a polling application that will deal with an average of 1000-2000 votes per second coming from different users. In other words, it’ll receive 1k to 2k requests per second with each request making a DB insert into the table that stores the voting data.

I’m using RoR 4 with MySQL and planning to push it to Heroku or AWS.

What performance issues related to database and the application itself should I be aware of?

How can I address this amount of inserts per second into the database?

EDIT

I was thinking in not inserting into the DB for each request, but instead writing to a memory stream the insert data. So I would have a scheduled job running every second that would read from this memory stream and generate a bulk insert, avoiding each insert to be made atomically. But i cannot think in a nice way to implement this.

On heroku your most important bottleneck (although not the only one) is the average response time of your requests. You want all your response times under 500ms, ideally under 200ms

See this document for an explanation:

https://devcenter.heroku.com/articles/request-timeout

This is the most important thing you should worry about.

The performance of the database, and its proximity to the Heroku dynos, are also important, but those can be optimized by getting a bigger database.

Moving anything long-running into a job queue is definitely the way to go. Generally you do this with Resque (or Delayed Job) back-end, and in Rails 4 you can use the ActiveJob paradigm to create your Job classes. Most of the time jobs use a Redis back-end, which fortunately for you is really, really performant and fast.

As far as “bulk” operations you would have to write some logic yourself to do that, you may want to experiment with using a separate Redis instance (separate from the one keeping track of the job queue) as your temporary data store, then having your jobs do bulk operations reading from Redis and moving the data into MySQL.

Check out this tool for load testing – I’ve found it slightly hard to work with but it is very powerful:

https://www.blitz.io

Particularly if you can use it to measure your average response times on Heroku, you will want to make sure your response times don’t slow down at scale. Make sure you have a good understanding of Heroku random (aka “dumb”) routing and why scale creates request queuing.

-Jason

it'll receive 1k to 2k requests per second with each request making a DB
insert into the table that stores the voting data.

I'm using RoR 4 with MySQL and planning to push it to Heroku or AWS.

Heroku offers PostgreSQL only, so you might want to switch over
in development to avoid any incompatibilities.

How can I address this amount of inserts per second into the database?

This seems like a perfect example of premature optimization :slight_smile:

If you're really concerned, set up a test app on Heroku and fire up
jmeter or ab or something and see exactly how it performs. You may
find you have nothing to worry about.

FWIW,

Heroku offers PostgreSQL only, so you might want to switch over
in development to avoid any incompatibilities.

That’s not entirely true, Heroku offers an addon through ClearDB for MySQL. Also you can use Heroku with an Amazon RDS instance too (obviously additional setup required)

Since scale is an issue, you might want to test postgres & mysql itself for the specific operations you are doing (at scale), then choose the database based on the results of that test.

How can I address this amount of inserts per second into the database?

This seems like a perfect example of premature optimization :slight_smile:

If you’re really concerned, set up a test app on Heroku and fire up
jmeter or ab or something and see exactly how it performs. You may
find you have nothing to worry about.

Yes, I agree, although if it actually is thousands of web requests per second it will require more than 1 dyno, or a larger Heroku dyno (like the Performance dynos), and it certainly will hit some bottlenecks at some point.

I wouldn’t agree that thinking about how an app is going to scale (and how to architect it so that it will scale) is necessarily premature optimization, although over-architecting for scale before load testing to determine where the bottlenecks are certainly could lead to premature optimization (particularly, optimizing things that don’t need to be optimized). I think the point is, make sure you can’t disprove the hypothesis, “If I optimize X, I will see performance gain Y” for any specific part of the stack you might optimize. If you can’t disprove (and of course you must actually try to disprove) that hypothesis, it is reasonable to conclude you have identified the bottlenecks and can justify spending time on optimizing that area of the stack.

Since no one else mentioned it, I would add you will use New Relic significantly here. You will need to get the New Relic “plus” or “premium” plan, so you can drill down into the requests to see where the bottlenecks are.

By “ab” do you mean “Apache Bench”?

I think both those tools will generate request from the programmer’s connection. One thing I like about blitz is that it can hit your website with thousands of concurrent requests from different data centers across the planet, so you can see how it performs different depending on geographic region.

-Jason

Heroku offers PostgreSQL only

That's not entirely true, Heroku offers an addon through ClearDB for MySQL.

I hadn't looked through the addons for a while - good to know.

Also you can use Heroku with an Amazon RDS instance too (obviously
additional setup required)

Of course there are multiple options if you're not trying to keep the
deployment within Heroku's sphere, but that kind of gets away from
the whole 'no-admin-work' reason to use Heroku in the first place :slight_smile:

By "ab" do you mean "Apache Bench"?

Yes.

I think both those tools will generate request from the programmer's
connection.

Mostly; JMeter does have the ability to run via multiple distributed
engines so you can use any systems you have available.

Interestingly, at just 1K votes per second sustained for 10 hours, you could record a vote for the entire population of California! At 2K/sec for 12 hours, you could record votes for every single person in California, Texas, New York, and Florida!

As for the question at hand, depending on how long you expect your voting to run, you may want to make sure your database is set up well enough to handle inserts at that speed when you’re pushing past 80M votes, especially considering that the biggest rush probably occurs toward the end of a voting period.

Don’t implement that, certainly not as a first thing. Build something straightforward that does what you intend (collecting poll results), then load-test it. Building a hyper-scalable DB layer isn’t going to do much good until you’re sure the layers in front of it (app servers, etc) can handle the load.

You may also want to consider what part of these results needs to be durable. For instance, if an exact “user -> vote” mapping isn’t needed you could hack something together with Redis and its INCR command.

–Matt Jones