Rails validations and concurrency

I'm finally back to Vit�ria, after a great time in RubyConfBR, even though I've been target of jokes from Aaron Patterson regarding Rspec questions due my questions to David Chelimsky in a non-related talk from him. But that's ok, it was fun and I deserved it :slight_smile:

When listening to Yehuda's talk about development on the client side, which was really great by the way, I got really worried when he commented about some Rails validations not being concurrent-safe nor even thread-safe.

While I can understand it is hard to guarantee uniqueness validation among different server instances, this can be easily avoided in a single server configuration with config.threadsafe! enabled.

Actually, I can't understand why thread-safe isn't enabled by default in production, since we have much better thread support in MRI 1.9.2 and always had on JRuby.

I've just read the documentation for validates_uniqueness_of and it explains well the problem:

http://api.rubyonrails.org/classes/ActiveRecord/Validations/ClassMethods.html#method-i-validates_uniqueness_of

But I was thinking if Rails could provide some way for avoiding dealing with exceptions when using a single multi-thread server environment. For instance:

@record.validate :lock => @shared_lock do    # code run if @record.valid?    ...    @record.save end or (render :edit; return) redirect_to :list ...

The :lock parameter should be optional and an internal lock (one per model, maybe) should be used when not specified in the case it is not necessary to share this code with another block which could also affect the record, for instance.

An error should raise if 'config.threadsafe!' is not enabled and it should be pointed out in the docs that this won't work for multiple servers setup, for avoiding confusing end-developer users. Maybe a warning instead of an error should suffice, for allowing this usage by plugins that don't have control of the deployment decisions.

If the user calls 'save' directly without validating first, the validation and save operations shoud be atomic in this case. So, 'save' should also support the :lock parameter.

Is this reasonable or am I missing something?

Rodrigo.

I'm finally back to Vitória, after a great time in RubyConfBR, even though I've been target of jokes from Aaron Patterson regarding Rspec questions due my questions to David Chelimsky in a non-related talk from him. But that's ok, it was fun and I deserved it :slight_smile:

I think I was poking fun at David, not you. :wink:

When listening to Yehuda's talk about development on the client side, which was really great by the way, I got really worried when he commented about some Rails validations not being concurrent-safe nor even thread-safe.

While I can understand it is hard to guarantee uniqueness validation among different server instances, this can be easily avoided in a single server configuration with config.threadsafe! enabled.

Actually, I can't understand why thread-safe isn't enabled by default in production, since we have much better thread support in MRI 1.9.2 and always had on JRuby.

I've just read the documentation for validates_uniqueness_of and it explains well the problem:

ActiveRecord::Validations::ClassMethods

But I was thinking if Rails could provide some way for avoiding dealing with exceptions when using a single multi-thread server environment. For instance:

@record.validate :lock => @shared_lock do   # code run if @record.valid?   ...   @record.save end or (render :edit; return) redirect_to :list ...

The :lock parameter should be optional and an internal lock (one per model, maybe) should be used when not specified in the case it is not necessary to share this code with another block which could also affect the record, for instance.

A shared lock like this would work if you only had one process running. It won't work for people that run multiple processes as they won't share the lock.

An error should raise if 'config.threadsafe!' is not enabled and it should be pointed out in the docs that this won't work for multiple servers setup, for avoiding confusing end-developer users. Maybe a warning instead of an error should suffice, for allowing this usage by plugins that don't have control of the deployment decisions.

If the user calls 'save' directly without validating first, the validation and save operations shoud be atomic in this case. So, 'save' should also support the :lock parameter.

Is this reasonable or am I missing something?

Even people running multi threaded servers run multiple processes. You could use the database connection to create a shared lock.

Though, IMHO if you want to guarantee a unique column, you should add a unique index on the column. The uniqueness validation should work most of the time, and for edge cases, the user could see an exception.

If you're really paranoid, you could implement it now with something like this (note this is mysql specific):

def shared_lock(name)   r = AR::Base.connection.execute("SELECT GET_LOCK('#{name}', 2)")   # ... make sure to check the return value ...   yield   ensure   AR::Base.connection.execute("SELECT RELEASE_LOCK('#{name}')") end

def some_function   # this function must calculate a value that you can reproduce across   # servers and processes end

shared_lock(some_function) do   my_model.save! end

Read here for more info:

  http://dev.mysql.com/doc/refman/4.1/en/miscellaneous-functions.html#function_get-lock

There may be a way to do this in a non-database specific way (a lock server or something).

...

When listening to Yehuda's talk about development on the client side, which was really great by the way, I got really worried when he commented about some Rails validations not being concurrent-safe nor even thread-safe.

While I can understand it is hard to guarantee uniqueness validation among different server instances, this can be easily avoided in a single server configuration with config.threadsafe! enabled.

... I've just read the documentation for validates_uniqueness_of and it explains well the problem:

ActiveRecord::Validations::ClassMethods

But I was thinking if Rails could provide some way for avoiding dealing with exceptions when using a single multi-thread server environment. For instance:

@record.validate :lock => @shared_lock do    # code run if @record.valid?    ...    @record.save end or (render :edit; return) redirect_to :list ...

The :lock parameter should be optional and an internal lock (one per model, maybe) should be used when not specified in the case it is not necessary to share this code with another block which could also affect the record, for instance.

A shared lock like this would work if you only had one process running. It won't work for people that run multiple processes as they won't share the lock.

Yes, I know, but for lots of applications, a single multi-thread server would suffice and Ruby locks are probably much faster than database locks and don't have the database locks shortcomings.

For instance, usually I use PostgreSQL. There is no LOCK instruction in ANSI SQL. PostgreSQL only support locking an entire table, and it is not possible to name the locks (maybe an alternative could be the pg_advisory_lock function added in 8.2). So, it is probably better to rely on another kind of shared locks between different servers, but probably this won't be as efficient as it can be in a single multi-thread server, even when using a solution similar to memcached, I mean a socket in-memory lock server.

An error should raise if 'config.threadsafe!' is not enabled and it should be pointed out in the docs that this won't work for multiple servers setup, for avoiding confusing end-developer users. Maybe a warning instead of an error should suffice, for allowing this usage by plugins that don't have control of the deployment decisions.

If the user calls 'save' directly without validating first, the validation and save operations shoud be atomic in this case. So, 'save' should also support the :lock parameter.

Is this reasonable or am I missing something?

Even people running multi threaded servers run multiple processes. You could use the database connection to create a shared lock.

Though, IMHO if you want to guarantee a unique column, you should add a unique index on the column. The uniqueness validation should work most of the time, and for edge cases, the user could see an exception.

Yes, the unique index should exist anyway. Maybe it would be great to create a 'db:create_unique_indexes' Rake task for creating a migration to add the missing unique indexes to a new migration. But for a single multi-thread server, such a lock would avoid the exception generation in the corner case. But my concerns aren't specific to unique validations only. For instance, imagine a situation where a user needs credits to do something and the system provides the services 'put_credit' and 'consume_credit'.

# current user's credit = 5

# consume_credit is launched [consume_service] (render :error; return) unless @user.credit > 0 [consume_service] do_consume_credit

# put_credit is launched in another thread [put_credit_service] @user.credit += 1 # 5 + 1 = 6 [put_credit_service] @user.save

[consume_service] @user.credit -= 1 # 5 - 1 = 4 [consume_service] @user.credit = 4

The correct expected user's credit should obviously be 5 instead of 4. For this simple case, a manual SQL UPDATE would solve this concurrency problem (UPDATE user SET credit = credit + 1). Also, this situation would probably be modeled as 'user has_many credits' and this situation wouldn't happen, but you get the idea.

The problem could easily be avoided in the above situation with a shared named lock between the two services. In this case, a normal Ruby lock could be used instead of modifying Rails API to support locks. But the advantage of improving the API is that the user can choose if he wants to enable a shared lock between multiple servers (more expensive) or using a less expensive Ruby lock in some configuration file that could be changed later. Also, maybe the Rails community can come with a better implementation of locking and it would be easier to apply it instead of having to modify all manually put locking code.

Maybe we could start with a new Concurrency guide on Rails Guides.

If you're really paranoid, you could implement it now with something like this (note this is mysql specific):

def shared_lock(name)    r = AR::Base.connection.execute("SELECT GET_LOCK('#{name}', 2)")    # ... make sure to check the return value ...    yield    ensure    AR::Base.connection.execute("SELECT RELEASE_LOCK('#{name}')") end

I'm just curious. What would happen in this case (MySQL) if the application is killed after the lock is acquired but before the ensure block is executed? The connection would probably be closed. Would this free the lock too?

def some_function    # this function must calculate a value that you can reproduce across    # servers and processes end

shared_lock(some_function) do    my_model.save! end

Read here for more info:

   http://dev.mysql.com/doc/refman/4.1/en/miscellaneous-functions.html#function_get-lock

There may be a way to do this in a non-database specific way (a lock server or something).

Maybe a lock server of just temporary file locking would fit better. But then, what would happen again in the case I presented above regarding killing the app between acquiring and releasing the lock?

Best regards,

Rodrigo.

I might be wrong in understanding the problem, but I believe I would use transactions to solve the problems you’re describing. I don’t see the need to manage locks manually.

Allen Madsen http://www.allenmadsen.com

If one cares about this edge case, I think the best approach is the one we use for model IDs: let the database take care of it. Set a UNIQUE constraint, and deal with it in the application.

Perhaps you could even wrap AR::Base#save in your application so that if violation of that constraint is detected (assuming the exception has information that allows you to do that), you intercept it and build a regular validation error on the model. Just off the top of my head, have not written it.

Some other folks have. :slight_smile:

https://rails.lighthouseapp.com/projects/8994/tickets/3486

--Matt Jones

Awesome! :slight_smile:

>... >>When listening to Yehuda's talk about development on the client >>side, which was really great by the way, I got really worried when >>he commented about some Rails validations not being concurrent-safe >>nor even thread-safe. >> >>While I can understand it is hard to guarantee uniqueness validation >>among different server instances, this can be easily avoided in a >>single server configuration with config.threadsafe! enabled. >> >>... >>I've just read the documentation for validates_uniqueness_of and it >>explains well the problem: >> >>ActiveRecord::Validations::ClassMethods >> >>But I was thinking if Rails could provide some way for avoiding >>dealing with exceptions when using a single multi-thread server >>environment. For instance: >> >>@record.validate :lock => @shared_lock do >> # code run if @record.valid? >> ... >> @record.save >>end or (render :edit; return) >>redirect_to :list >>... >> >>The :lock parameter should be optional and an internal lock (one per >>model, maybe) should be used when not specified in the case it is >>not necessary to share this code with another block which could also >>affect the record, for instance. >A shared lock like this would work if you only had one process running. >It won't work for people that run multiple processes as they won't share >the lock.

Yes, I know, but for lots of applications, a single multi-thread server would suffice and Ruby locks are probably much faster than database locks and don't have the database locks shortcomings.

For instance, usually I use PostgreSQL. There is no LOCK instruction in ANSI SQL. PostgreSQL only support locking an entire table, and it is not possible to name the locks (maybe an alternative could be the pg_advisory_lock function added in 8.2). So, it is probably better to rely on another kind of shared locks between different servers, but probably this won't be as efficient as it can be in a single multi-thread server, even when using a solution similar to memcached, I mean a socket in-memory lock server.

So why don't use use a mutex and wrap your saves with that? I don't see a reason to change AR API for this.

>>An error should raise if 'config.threadsafe!' is not enabled and it >>should be pointed out in the docs that this won't work for multiple >>servers setup, for avoiding confusing end-developer users. Maybe a >>warning instead of an error should suffice, for allowing this usage >>by plugins that don't have control of the deployment decisions. >> >>If the user calls 'save' directly without validating first, the >>validation and save operations shoud be atomic in this case. So, >>'save' should also support the :lock parameter. >> >>Is this reasonable or am I missing something? >Even people running multi threaded servers run multiple processes. >You could use the database connection to create a shared lock. > >Though, IMHO if you want to guarantee a unique column, you should add a >unique index on the column. The uniqueness validation should work most >of the time, and for edge cases, the user could see an exception.

Yes, the unique index should exist anyway. Maybe it would be great to create a 'db:create_unique_indexes' Rake task for creating a migration to add the missing unique indexes to a new migration. But for a single multi-thread server, such a lock would avoid the exception generation in the corner case. But my concerns aren't specific to unique validations only. For instance, imagine a situation where a user needs credits to do something and the system provides the services 'put_credit' and 'consume_credit'.

# current user's credit = 5

# consume_credit is launched [consume_service] (render :error; return) unless @user.credit > 0 [consume_service] do_consume_credit

# put_credit is launched in another thread [put_credit_service] @user.credit += 1 # 5 + 1 = 6 [put_credit_service] @user.save

[consume_service] @user.credit -= 1 # 5 - 1 = 4 [consume_service] @user.credit = 4

The correct expected user's credit should obviously be 5 instead of 4. For this simple case, a manual SQL UPDATE would solve this concurrency problem (UPDATE user SET credit = credit + 1). Also, this situation would probably be modeled as 'user has_many credits' and this situation wouldn't happen, but you get the idea.

Yes, this would seem reasonable if AR objects were thread safe, but they aren't. If you're going to use a threaded server, you need to know what data structures are thread safe and which ones aren't. Then you need to design your system accordingly.

My point is that we can document what is thread safe and what isn't. Then *you* can figure out how to handle concurrency.

The problem could easily be avoided in the above situation with a shared named lock between the two services. In this case, a normal Ruby lock could be used instead of modifying Rails API to support locks. But the advantage of improving the API is that the user can choose if he wants to enable a shared lock between multiple servers (more expensive) or using a less expensive Ruby lock in some configuration file that could be changed later. Also, maybe the Rails community can come with a better implementation of locking and it would be easier to apply it instead of having to modify all manually put locking code.

You can decide this without changes to the AR api. Just write a method that returns a lock depending on a configuration file:

  def gimme_lock     case YAML.load_file('/some/config')[Rails.env]['lock_type']     when 'mutex'       ...     when 'database'       ...     else       ...     end   end

  def safe_update     gimme_lock.synchronize do       yield     end   end

  safe_update { mymodel.save }

This seems like a feature easily implemented without the help of rails.

Maybe we could start with a new Concurrency guide on Rails Guides.

Probably documenting which things are thread safe and which things aren't would be good. For now you can assume no AR objects are thread safe. :slight_smile:

>If you're really paranoid, you could implement it now with something >like this (note this is mysql specific): > >def shared_lock(name) > r = AR::Base.connection.execute("SELECT GET_LOCK('#{name}', 2)") > # ... make sure to check the return value ... > yield > ensure > AR::Base.connection.execute("SELECT RELEASE_LOCK('#{name}')") >end >

I'm just curious. What would happen in this case (MySQL) if the application is killed after the lock is acquired but before the ensure block is executed? The connection would probably be closed. Would this free the lock too?

Yes. MySQL frees the lock when the connection is closed:

  http://dev.mysql.com/doc/refman/5.0/en/miscellaneous-functions.html#function_get-lock

>def some_function > # this function must calculate a value that you can reproduce across > # servers and processes >end > >shared_lock(some_function) do > my_model.save! >end > >Read here for more info: > > http://dev.mysql.com/doc/refman/4.1/en/miscellaneous-functions.html#function_get-lock > >There may be a way to do this in a non-database specific way (a lock >server or something).

Maybe a lock server of just temporary file locking would fit better. But then, what would happen again in the case I presented above regarding killing the app between acquiring and releasing the lock?

That's something you need to work out with your lock server. :slight_smile:

Ok, I already realized there is no interest from the Rails core team to integrate concurrency features in Rails and maybe I'll create a plugin for integrating concurrency helpers to Rails (mostly AR) when I need to.

Thanks for the feedback,

Rodrigo.

I may not have read thoroughly enough what you're trying to achieve. My understanding of concurrency control within Rails is that it would be mostly pointless given the current processing model where one process handles a single request at a time. Things are different if multiple threads (or fibers) are running in the same process, but that's far from mainstream Rails usage, as far as I can tell.

In production, any but the most trivial Rails application will run across multiple application server processes. The necessary(!) concurrency control among these can only be handled at the database level. First and foremost by using transactions and appropriate constraints. Occasionally with the help of pessimistic locks.

In an earlier message you wrote in favor of running a Rails app in a single process with multiple threads. I'm curious if you have considered, or even measured the trade-offs. Apparently, common assumptions are that resource consumption is lower for threads than for processes and switching among threads is faster than among processes. I don't know how significant these differences are in practice if at all. At least on Linux, threads and processes mark two points on a continuum, the real difference being what kind of resources are shared. Passenger makes good use of this by establishing a tree of processes who share (copy-on-write) common memory pages.

Also well worth considering is that a failure in a single thread can bring down all threads in the same process, whereas processes are much better isolated against each other.

By all this I don't mean to imply that processes are obviously the superior alternative to threads. I'm only trying to point out that the reverse isn't true: threads are not obviously better than processes. (And event-driven servers aren't obviously better than thread- or process-based ones -- but that's another discussion.)

Michael

Ok, I already realized there is no interest from the Rails core team to integrate concurrency features in Rails and maybe I'll create a plugin for integrating concurrency helpers to Rails (mostly AR) when I need to.

We do have interest in concurrency, but we definitely don't have interest in adding a solution that would only work if you are deploying on one machine and it will bite you when you add another one. This proposal is even less attractive if we consider we have a better solution available as a patch in Lighthouse.

Don't treat a negative answer from the people involved in this discussion to your solution as lack of interest in the subject.

... So why don't use use a mutex and wrap your saves with that? I don't see a reason to change AR API for this. ...

Ok, I already realized there is no interest from the Rails core team to integrate concurrency features in Rails and maybe I'll create a plugin for integrating concurrency helpers to Rails (mostly AR) when I need to.

I may not have read thoroughly enough what you're trying to achieve. My understanding of concurrency control within Rails is that it would be mostly pointless given the current processing model where one process handles a single request at a time. Things are different if multiple threads (or fibers) are running in the same process, but that's far from mainstream Rails usage, as far as I can tell.

In production, any but the most trivial Rails application will run across multiple application server processes. The necessary(!) concurrency control among these can only be handled at the database level. First and foremost by using transactions and appropriate constraints. Occasionally with the help of pessimistic locks.

In an earlier message you wrote in favor of running a Rails app in a single process with multiple threads. I'm curious if you have considered, or even measured the trade-offs. Apparently, common assumptions are that resource consumption is lower for threads than for processes and switching among threads is faster than among processes. I don't know how significant these differences are in practice if at all. At least on Linux, threads and processes mark two points on a continuum, the real difference being what kind of resources are shared. Passenger makes good use of this by establishing a tree of processes who share (copy-on-write) common memory pages.

I've already benchmarked it and the threaded model certainly consumes much less memory than a multi-process setup. Probably thread/process switching won't make any significant difference in Linux, for instance, but usually the deployment cost is mostly associated to memory consumption (the VPS plans use to be based on available memory) .

Even REE (or copy-on-write feature) won't substantially reduce the memory usage as much as the threaded model.

Also well worth considering is that a failure in a single thread can bring down all threads in the same process, whereas processes are much better isolated against each other.

Yes, writing thread-safe code is harder, but I would like to take the risk...

By all this I don't mean to imply that processes are obviously the superior alternative to threads. I'm only trying to point out that the reverse isn't true: threads are not obviously better than processes. (And event-driven servers aren't obviously better than thread- or process-based ones -- but that's another discussion.)

Michael

Actually, deciding if one approach is better than another one is based on some criteria. In my case, I'm using two criteria:

1- I want as low-memory consumption as possible (not really, I wouldn't mind to use some memory for memcached, for instance) 2- I would like to be able to deal with mutex easier and using the minimal CPU and IO resources

I believe that for these criteria, the threaded model is way better, but I may be missing something and I'll be glad if someone could note that.

Best regards,

Rodrigo.

Hi Jos�,

First, I'm sorry if I sounded rude. It was not my intention, but it seems I'm not good with words. I was just trying to say that I was quiting the thread because it seamed the Rails core team had already a solid opinion about the subject and I would not like to bother them with my thoughts. Please, understand that it had the best of the intentions and I'm sorry if made myself misinterpreted.

Now, regarding your first statement, I don't see as a bad thing having some features that are possible just in some setup (like a single threaded server). It won't hurt those using a different setup, but can work great if your setup matches the one that makes it possible (or easier) to have some great features.

Regarding the patch you've mentioned, I guess you are talking about the uniqueness validation patch... Maybe I shouldn't have used the thread subject or mentioned the uniqueness validation example. My real intention was to bring the concurrency integration subject instead of the isolated uniqueness validation case. But it seems most of the replies to this thread were toward uniqueness validation issues, while it was not my intention. Aaron Patterson and Michael Schuerig may have understood my intention better.

Again, sorry if I was rude. I guess I need to go back to my English classes as soon as possible... Changing the subject completely, I was setting up a new application yesterday and decided to try Devise. Although I missed i18n in generated views, I found your gem to be really great! Thank you!

Best Regards,

Rodrigo.

FWIW, I think dealing with concurrency issues in Rails is very important. I just think that in this case, we would be better served by having some sort of gem that could provide proper mutex and locking support based on your server configuration.

Once we have an easy to use, and well-vetted solution, *then* fold it in to AR API.

Ok, now I get it, Aaron. Great!

Thank you,

Rodrigo.

I think there's too much misinformation and too many misunderstandings going in this thread.

First, "Rails validations" aren't thread-unsafe or concurrency-unsafe, only validates_uniqueness_of potentially is. The problem has been known for years. In fact, a few years back I've explicitly documented the problem in detail, as well as possible solutions and a cost- benefit analysis of each solution. The documentation can be found in what I thought is the most obvious place to look: inside the API documentation for validates_uniqueness_of: http://apidock.com/rails/ActiveRecord/Validations/ClassMethods/validates_uniqueness_of

Some people here propose using mutexes, and other people quickly point out that mutexes are of no use in multi-process setups. People then try to advocate the use of multi-threaded setups over multi-process setups. In my opinion any kind of solution involving either inter- thread locks (i.e. mutexes) or interprocess locks (e.g. lock files) are not the way to go for one reason: neither of them work in multi- machine setups.

I am not advocating the superiority of multithreaded vs multiprocess vs evented or whatever. But any kind of good solution *must* work across multiple machines. That really only leaves 2 possible solutions: - Inter-machine locks, e.g. a database lock. - Optimistic concurrency control.

The preferred solution that I've documented at http://apidock.com/rails/ActiveRecord/Validations/ClassMethods/validates_uniqueness_of involves the use of unique indices and is a form of database-aided optimistic concurrency control. Unique indices: - Are universal and a lowest common denominator. They work on every single SQL database, on every operating system, on any combination of multithreaded/multiprocess/multiple-machine setups. - Are well-understood. - Are easy to create. - Are zero-maintenance. Lock files have to be set up during deployment and monitored.

The only down side of unique indices, as currently utilized by ActiveRecord, is the extremely small chance that you run into a race condition during a #save action. This race condition will result in an exception, but in no circumstance will duplicate records be saved into the database. In my opinion the chance that such an exception occurs (which is bad for end-user usability) is so small that it's usually not worth the effort to check for it and handling it gracefully (i.e. in a manner other than telling the user that some internal server error occurred), though you still have the possibility to do it yourself if you so desire. This too is documented at http://apidock.com/rails/ActiveRecord/Validations/ClassMethods/validates_uniqueness_of

Pretty much everything that is currently being discussed in this thread is covered by the documentation that I wrote, and it is for that reason that I think this issue does not warrant further discussion. Though better handling of the unique index race condition in AR would be nice.

The patch mentioned in the thread will be taken into account for sure. I think it would be great that we have this robustness and at the same time be able to deal with it as if it was an ordinary validation error. Assuming it is doable in a robust and portable way (have not studied the patch in detail).

In such solution there's a potential leak I think, which is that albeit it would be presented as an ordinary validation error, before_save and friends still run. Those callbacks are allowed to assume nowadays the model is valid at that point.

In existing code in general that won't matter because there's a transaction going on, but in theory they could have side-effects not covered by the transaction.

That could be documented for future code, and I guess breakage in existing apps would be rare.

If you do things in before_save that cannot be put into the same database transaction (e.g. if you're modifying files on the filesystem) then yes, you'll still run into trouble. In that case there really is no other way than using a locking server, e.g. the database server. However that is a fairly complex and non-standard case, and I do not think it would be warranted to pollute the validates_presence_of documentation with references to that problem. Instead, a separate document dealing with common concurrency issues should be written.