Rails validations and concurrency

I'm finally back to Vit�ria, after a great time in RubyConfBR, even though I've been target of jokes from Aaron Patterson regarding Rspec questions due my questions to David Chelimsky in a non-related talk from him. But that's ok, it was fun and I deserved it :slight_smile:

When listening to Yehuda's talk about development on the client side, which was really great by the way, I got really worried when he commented about some Rails validations not being concurrent-safe nor even thread-safe.

While I can understand it is hard to guarantee uniqueness validation among different server instances, this can be easily avoided in a single server configuration with config.threadsafe! enabled.

Actually, I can't understand why thread-safe isn't enabled by default in production, since we have much better thread support in MRI 1.9.2 and always had on JRuby.

I've just read the documentation for validates_uniqueness_of and it explains well the problem:

http://api.rubyonrails.org/classes/ActiveRecord/Validations/ClassMethods.html#method-i-validates_uniqueness_of

But I was thinking if Rails could provide some way for avoiding dealing with exceptions when using a single multi-thread server environment. For instance:

@record.validate :lock => @shared_lock do
   # code run if @record.valid?
   ...
   @record.save
end or (render :edit; return)
redirect_to :list
...

The :lock parameter should be optional and an internal lock (one per model, maybe) should be used when not specified in the case it is not necessary to share this code with another block which could also affect the record, for instance.

An error should raise if 'config.threadsafe!' is not enabled and it should be pointed out in the docs that this won't work for multiple servers setup, for avoiding confusing end-developer users. Maybe a warning instead of an error should suffice, for allowing this usage by plugins that don't have control of the deployment decisions.

If the user calls 'save' directly without validating first, the validation and save operations shoud be atomic in this case. So, 'save' should also support the :lock parameter.

Is this reasonable or am I missing something?

Rodrigo.

I'm finally back to Vitória, after a great time in RubyConfBR, even
though I've been target of jokes from Aaron Patterson regarding
Rspec questions due my questions to David Chelimsky in a non-related
talk from him. But that's ok, it was fun and I deserved it :slight_smile:

I think I was poking fun at David, not you. :wink:

When listening to Yehuda's talk about development on the client
side, which was really great by the way, I got really worried when
he commented about some Rails validations not being concurrent-safe
nor even thread-safe.

While I can understand it is hard to guarantee uniqueness validation
among different server instances, this can be easily avoided in a
single server configuration with config.threadsafe! enabled.

Actually, I can't understand why thread-safe isn't enabled by
default in production, since we have much better thread support in
MRI 1.9.2 and always had on JRuby.

I've just read the documentation for validates_uniqueness_of and it
explains well the problem:

http://api.rubyonrails.org/classes/ActiveRecord/Validations/ClassMethods.html#method-i-validates_uniqueness_of

But I was thinking if Rails could provide some way for avoiding
dealing with exceptions when using a single multi-thread server
environment. For instance:

@record.validate :lock => @shared_lock do
  # code run if @record.valid?
  ...
  @record.save
end or (render :edit; return)
redirect_to :list
...

The :lock parameter should be optional and an internal lock (one per
model, maybe) should be used when not specified in the case it is
not necessary to share this code with another block which could also
affect the record, for instance.

A shared lock like this would work if you only had one process running.
It won't work for people that run multiple processes as they won't share
the lock.

An error should raise if 'config.threadsafe!' is not enabled and it
should be pointed out in the docs that this won't work for multiple
servers setup, for avoiding confusing end-developer users. Maybe a
warning instead of an error should suffice, for allowing this usage
by plugins that don't have control of the deployment decisions.

If the user calls 'save' directly without validating first, the
validation and save operations shoud be atomic in this case. So,
'save' should also support the :lock parameter.

Is this reasonable or am I missing something?

Even people running multi threaded servers run multiple processes.
You could use the database connection to create a shared lock.

Though, IMHO if you want to guarantee a unique column, you should add a
unique index on the column. The uniqueness validation should work most
of the time, and for edge cases, the user could see an exception.

If you're really paranoid, you could implement it now with something
like this (note this is mysql specific):

def shared_lock(name)
  r = AR::Base.connection.execute("SELECT GET_LOCK('#{name}', 2)")
  # ... make sure to check the return value ...
  yield
  ensure
  AR::Base.connection.execute("SELECT RELEASE_LOCK('#{name}')")
end

def some_function
  # this function must calculate a value that you can reproduce across
  # servers and processes
end

shared_lock(some_function) do
  my_model.save!
end

Read here for more info:

  http://dev.mysql.com/doc/refman/4.1/en/miscellaneous-functions.html#function_get-lock

There may be a way to do this in a non-database specific way (a lock
server or something).

...

When listening to Yehuda's talk about development on the client
side, which was really great by the way, I got really worried when
he commented about some Rails validations not being concurrent-safe
nor even thread-safe.

While I can understand it is hard to guarantee uniqueness validation
among different server instances, this can be easily avoided in a
single server configuration with config.threadsafe! enabled.

...
I've just read the documentation for validates_uniqueness_of and it
explains well the problem:

http://api.rubyonrails.org/classes/ActiveRecord/Validations/ClassMethods.html#method-i-validates_uniqueness_of

But I was thinking if Rails could provide some way for avoiding
dealing with exceptions when using a single multi-thread server
environment. For instance:

@record.validate :lock => @shared_lock do
   # code run if @record.valid?
   ...
   @record.save
end or (render :edit; return)
redirect_to :list
...

The :lock parameter should be optional and an internal lock (one per
model, maybe) should be used when not specified in the case it is
not necessary to share this code with another block which could also
affect the record, for instance.

A shared lock like this would work if you only had one process running.
It won't work for people that run multiple processes as they won't share
the lock.

Yes, I know, but for lots of applications, a single multi-thread server would suffice and Ruby locks are probably much faster than database locks and don't have the database locks shortcomings.

For instance, usually I use PostgreSQL. There is no LOCK instruction in ANSI SQL. PostgreSQL only support locking an entire table, and it is not possible to name the locks (maybe an alternative could be the pg_advisory_lock function added in 8.2). So, it is probably better to rely on another kind of shared locks between different servers, but probably this won't be as efficient as it can be in a single multi-thread server, even when using a solution similar to memcached, I mean a socket in-memory lock server.

An error should raise if 'config.threadsafe!' is not enabled and it
should be pointed out in the docs that this won't work for multiple
servers setup, for avoiding confusing end-developer users. Maybe a
warning instead of an error should suffice, for allowing this usage
by plugins that don't have control of the deployment decisions.

If the user calls 'save' directly without validating first, the
validation and save operations shoud be atomic in this case. So,
'save' should also support the :lock parameter.

Is this reasonable or am I missing something?

Even people running multi threaded servers run multiple processes.
You could use the database connection to create a shared lock.

Though, IMHO if you want to guarantee a unique column, you should add a
unique index on the column. The uniqueness validation should work most
of the time, and for edge cases, the user could see an exception.

Yes, the unique index should exist anyway. Maybe it would be great to create a 'db:create_unique_indexes' Rake task for creating a migration to add the missing unique indexes to a new migration. But for a single multi-thread server, such a lock would avoid the exception generation in the corner case. But my concerns aren't specific to unique validations only. For instance, imagine a situation where a user needs credits to do something and the system provides the services 'put_credit' and 'consume_credit'.

# current user's credit = 5

# consume_credit is launched
[consume_service] (render :error; return) unless @user.credit > 0
[consume_service] do_consume_credit

# put_credit is launched in another thread
[put_credit_service] @user.credit += 1 # 5 + 1 = 6
[put_credit_service] @user.save

[consume_service] @user.credit -= 1 # 5 - 1 = 4
[consume_service] @user.credit = 4

The correct expected user's credit should obviously be 5 instead of 4. For this simple case, a manual SQL UPDATE would solve this concurrency problem (UPDATE user SET credit = credit + 1). Also, this situation would probably be modeled as 'user has_many credits' and this situation wouldn't happen, but you get the idea.

The problem could easily be avoided in the above situation with a shared named lock between the two services. In this case, a normal Ruby lock could be used instead of modifying Rails API to support locks. But the advantage of improving the API is that the user can choose if he wants to enable a shared lock between multiple servers (more expensive) or using a less expensive Ruby lock in some configuration file that could be changed later. Also, maybe the Rails community can come with a better implementation of locking and it would be easier to apply it instead of having to modify all manually put locking code.

Maybe we could start with a new Concurrency guide on Rails Guides.

If you're really paranoid, you could implement it now with something
like this (note this is mysql specific):

def shared_lock(name)
   r = AR::Base.connection.execute("SELECT GET_LOCK('#{name}', 2)")
   # ... make sure to check the return value ...
   yield
   ensure
   AR::Base.connection.execute("SELECT RELEASE_LOCK('#{name}')")
end

I'm just curious. What would happen in this case (MySQL) if the application is killed after the lock is acquired but before the ensure block is executed? The connection would probably be closed. Would this free the lock too?

def some_function
   # this function must calculate a value that you can reproduce across
   # servers and processes
end

shared_lock(some_function) do
   my_model.save!
end

Read here for more info:

   http://dev.mysql.com/doc/refman/4.1/en/miscellaneous-functions.html#function_get-lock

There may be a way to do this in a non-database specific way (a lock
server or something).

Maybe a lock server of just temporary file locking would fit better. But then, what would happen again in the case I presented above regarding killing the app between acquiring and releasing the lock?

Best regards,

Rodrigo.

I might be wrong in understanding the problem, but I believe I would use transactions to solve the problems you’re describing. I don’t see the need to manage locks manually.

Allen Madsen
http://www.allenmadsen.com

If one cares about this edge case, I think the best approach is the
one we use for model IDs: let the database take care of it. Set a
UNIQUE constraint, and deal with it in the application.

Perhaps you could even wrap AR::Base#save in your application so that
if violation of that constraint is detected (assuming the exception
has information that allows you to do that), you intercept it and
build a regular validation error on the model. Just off the top of my
head, have not written it.

Some other folks have. :slight_smile:

https://rails.lighthouseapp.com/projects/8994/tickets/3486

--Matt Jones

Awesome! :slight_smile:

>...
>>When listening to Yehuda's talk about development on the client
>>side, which was really great by the way, I got really worried when
>>he commented about some Rails validations not being concurrent-safe
>>nor even thread-safe.
>>
>>While I can understand it is hard to guarantee uniqueness validation
>>among different server instances, this can be easily avoided in a
>>single server configuration with config.threadsafe! enabled.
>>
>>...
>>I've just read the documentation for validates_uniqueness_of and it
>>explains well the problem:
>>
>>http://api.rubyonrails.org/classes/ActiveRecord/Validations/ClassMethods.html#method-i-validates_uniqueness_of
>>
>>But I was thinking if Rails could provide some way for avoiding
>>dealing with exceptions when using a single multi-thread server
>>environment. For instance:
>>
>>@record.validate :lock => @shared_lock do
>> # code run if @record.valid?
>> ...
>> @record.save
>>end or (render :edit; return)
>>redirect_to :list
>>...
>>
>>The :lock parameter should be optional and an internal lock (one per
>>model, maybe) should be used when not specified in the case it is
>>not necessary to share this code with another block which could also
>>affect the record, for instance.
>A shared lock like this would work if you only had one process running.
>It won't work for people that run multiple processes as they won't share
>the lock.

Yes, I know, but for lots of applications, a single multi-thread
server would suffice and Ruby locks are probably much faster than
database locks and don't have the database locks shortcomings.

For instance, usually I use PostgreSQL. There is no LOCK instruction
in ANSI SQL. PostgreSQL only support locking an entire table, and it
is not possible to name the locks (maybe an alternative could be the
pg_advisory_lock function added in 8.2). So, it is probably better
to rely on another kind of shared locks between different servers,
but probably this won't be as efficient as it can be in a single
multi-thread server, even when using a solution similar to
memcached, I mean a socket in-memory lock server.

So why don't use use a mutex and wrap your saves with that? I don't see
a reason to change AR API for this.

>>An error should raise if 'config.threadsafe!' is not enabled and it
>>should be pointed out in the docs that this won't work for multiple
>>servers setup, for avoiding confusing end-developer users. Maybe a
>>warning instead of an error should suffice, for allowing this usage
>>by plugins that don't have control of the deployment decisions.
>>
>>If the user calls 'save' directly without validating first, the
>>validation and save operations shoud be atomic in this case. So,
>>'save' should also support the :lock parameter.
>>
>>Is this reasonable or am I missing something?
>Even people running multi threaded servers run multiple processes.
>You could use the database connection to create a shared lock.
>
>Though, IMHO if you want to guarantee a unique column, you should add a
>unique index on the column. The uniqueness validation should work most
>of the time, and for edge cases, the user could see an exception.

Yes, the unique index should exist anyway. Maybe it would be great
to create a 'db:create_unique_indexes' Rake task for creating a
migration to add the missing unique indexes to a new migration. But
for a single multi-thread server, such a lock would avoid the
exception generation in the corner case. But my concerns aren't
specific to unique validations only. For instance, imagine a
situation where a user needs credits to do something and the system
provides the services 'put_credit' and 'consume_credit'.

# current user's credit = 5

# consume_credit is launched
[consume_service] (render :error; return) unless @user.credit > 0
[consume_service] do_consume_credit

# put_credit is launched in another thread
[put_credit_service] @user.credit += 1 # 5 + 1 = 6
[put_credit_service] @user.save

[consume_service] @user.credit -= 1 # 5 - 1 = 4
[consume_service] @user.credit = 4

The correct expected user's credit should obviously be 5 instead of
4. For this simple case, a manual SQL UPDATE would solve this
concurrency problem (UPDATE user SET credit = credit + 1). Also,
this situation would probably be modeled as 'user has_many credits'
and this situation wouldn't happen, but you get the idea.

Yes, this would seem reasonable if AR objects were thread safe, but they
aren't. If you're going to use a threaded server, you need to know what
data structures are thread safe and which ones aren't. Then you need to
design your system accordingly.

My point is that we can document what is thread safe and what isn't.
Then *you* can figure out how to handle concurrency.

The problem could easily be avoided in the above situation with a
shared named lock between the two services. In this case, a normal
Ruby lock could be used instead of modifying Rails API to support
locks. But the advantage of improving the API is that the user can
choose if he wants to enable a shared lock between multiple servers
(more expensive) or using a less expensive Ruby lock in some
configuration file that could be changed later. Also, maybe the
Rails community can come with a better implementation of locking and
it would be easier to apply it instead of having to modify all
manually put locking code.

You can decide this without changes to the AR api. Just write a method
that returns a lock depending on a configuration file:

  def gimme_lock
    case YAML.load_file('/some/config')[Rails.env]['lock_type']
    when 'mutex'
      ...
    when 'database'
      ...
    else
      ...
    end
  end

  def safe_update
    gimme_lock.synchronize do
      yield
    end
  end

  safe_update { mymodel.save }

This seems like a feature easily implemented without the help of rails.

Maybe we could start with a new Concurrency guide on Rails Guides.

Probably documenting which things are thread safe and which things
aren't would be good. For now you can assume no AR objects are thread
safe. :slight_smile:

>If you're really paranoid, you could implement it now with something
>like this (note this is mysql specific):
>
>def shared_lock(name)
> r = AR::Base.connection.execute("SELECT GET_LOCK('#{name}', 2)")
> # ... make sure to check the return value ...
> yield
> ensure
> AR::Base.connection.execute("SELECT RELEASE_LOCK('#{name}')")
>end
>

I'm just curious. What would happen in this case (MySQL) if the
application is killed after the lock is acquired but before the
ensure block is executed? The connection would probably be closed.
Would this free the lock too?

Yes. MySQL frees the lock when the connection is closed:

  http://dev.mysql.com/doc/refman/5.0/en/miscellaneous-functions.html#function_get-lock

>def some_function
> # this function must calculate a value that you can reproduce across
> # servers and processes
>end
>
>shared_lock(some_function) do
> my_model.save!
>end
>
>Read here for more info:
>
> http://dev.mysql.com/doc/refman/4.1/en/miscellaneous-functions.html#function_get-lock
>
>There may be a way to do this in a non-database specific way (a lock
>server or something).

Maybe a lock server of just temporary file locking would fit better.
But then, what would happen again in the case I presented above
regarding killing the app between acquiring and releasing the lock?

That's something you need to work out with your lock server. :slight_smile:

Ok, I already realized there is no interest from the Rails core team to integrate concurrency features in Rails and maybe I'll create a plugin for integrating concurrency helpers to Rails (mostly AR) when I need to.

Thanks for the feedback,

Rodrigo.

I may not have read thoroughly enough what you're trying to achieve. My
understanding of concurrency control within Rails is that it would be
mostly pointless given the current processing model where one process
handles a single request at a time. Things are different if multiple
threads (or fibers) are running in the same process, but that's far from
mainstream Rails usage, as far as I can tell.

In production, any but the most trivial Rails application will run
across multiple application server processes. The necessary(!)
concurrency control among these can only be handled at the database
level. First and foremost by using transactions and appropriate
constraints. Occasionally with the help of pessimistic locks.

In an earlier message you wrote in favor of running a Rails app in a
single process with multiple threads. I'm curious if you have
considered, or even measured the trade-offs. Apparently, common
assumptions are that resource consumption is lower for threads than for
processes and switching among threads is faster than among processes. I
don't know how significant these differences are in practice if at all.
At least on Linux, threads and processes mark two points on a continuum,
the real difference being what kind of resources are shared. Passenger
makes good use of this by establishing a tree of processes who share
(copy-on-write) common memory pages.

Also well worth considering is that a failure in a single thread can
bring down all threads in the same process, whereas processes are much
better isolated against each other.

By all this I don't mean to imply that processes are obviously the
superior alternative to threads. I'm only trying to point out that the
reverse isn't true: threads are not obviously better than processes.
(And event-driven servers aren't obviously better than thread- or
process-based ones -- but that's another discussion.)

Michael

Ok, I already realized there is no interest from the Rails core team to
integrate concurrency features in Rails and maybe I'll create a plugin
for integrating concurrency helpers to Rails (mostly AR) when I need to.

We do have interest in concurrency, but we definitely don't have
interest in adding a solution that would only work if you are
deploying on one machine and it will bite you when you add another
one. This proposal is even less attractive if we consider we have a
better solution available as a patch in Lighthouse.

Don't treat a negative answer from the people involved in this
discussion to your solution as lack of interest in the subject.

...
So why don't use use a mutex and wrap your saves with that? I
don't see a reason to change AR API for this.
...

Ok, I already realized there is no interest from the Rails core team
to integrate concurrency features in Rails and maybe I'll create a
plugin for integrating concurrency helpers to Rails (mostly AR) when
I need to.

I may not have read thoroughly enough what you're trying to achieve. My
understanding of concurrency control within Rails is that it would be
mostly pointless given the current processing model where one process
handles a single request at a time. Things are different if multiple
threads (or fibers) are running in the same process, but that's far from
mainstream Rails usage, as far as I can tell.

In production, any but the most trivial Rails application will run
across multiple application server processes. The necessary(!)
concurrency control among these can only be handled at the database
level. First and foremost by using transactions and appropriate
constraints. Occasionally with the help of pessimistic locks.

In an earlier message you wrote in favor of running a Rails app in a
single process with multiple threads. I'm curious if you have
considered, or even measured the trade-offs. Apparently, common
assumptions are that resource consumption is lower for threads than for
processes and switching among threads is faster than among processes. I
don't know how significant these differences are in practice if at all.
At least on Linux, threads and processes mark two points on a continuum,
the real difference being what kind of resources are shared. Passenger
makes good use of this by establishing a tree of processes who share
(copy-on-write) common memory pages.

I've already benchmarked it and the threaded model certainly consumes much less memory than a multi-process setup. Probably thread/process switching won't make any significant difference in Linux, for instance, but usually the deployment cost is mostly associated to memory consumption (the VPS plans use to be based on available memory) .

Even REE (or copy-on-write feature) won't substantially reduce the memory usage as much as the threaded model.

Also well worth considering is that a failure in a single thread can
bring down all threads in the same process, whereas processes are much
better isolated against each other.

Yes, writing thread-safe code is harder, but I would like to take the risk...

By all this I don't mean to imply that processes are obviously the
superior alternative to threads. I'm only trying to point out that the
reverse isn't true: threads are not obviously better than processes.
(And event-driven servers aren't obviously better than thread- or
process-based ones -- but that's another discussion.)

Michael

Actually, deciding if one approach is better than another one is based on some criteria. In my case, I'm using two criteria:

1- I want as low-memory consumption as possible (not really, I wouldn't mind to use some memory for memcached, for instance)
2- I would like to be able to deal with mutex easier and using the minimal CPU and IO resources

I believe that for these criteria, the threaded model is way better, but I may be missing something and I'll be glad if someone could note that.

Best regards,

Rodrigo.

Hi Jos�,

First, I'm sorry if I sounded rude. It was not my intention, but it seems I'm not good with words. I was just trying to say that I was quiting the thread because it seamed the Rails core team had already a solid opinion about the subject and I would not like to bother them with my thoughts. Please, understand that it had the best of the intentions and I'm sorry if made myself misinterpreted.

Now, regarding your first statement, I don't see as a bad thing having some features that are possible just in some setup (like a single threaded server). It won't hurt those using a different setup, but can work great if your setup matches the one that makes it possible (or easier) to have some great features.

Regarding the patch you've mentioned, I guess you are talking about the uniqueness validation patch... Maybe I shouldn't have used the thread subject or mentioned the uniqueness validation example. My real intention was to bring the concurrency integration subject instead of the isolated uniqueness validation case. But it seems most of the replies to this thread were toward uniqueness validation issues, while it was not my intention. Aaron Patterson and Michael Schuerig may have understood my intention better.

Again, sorry if I was rude. I guess I need to go back to my English classes as soon as possible... Changing the subject completely, I was setting up a new application yesterday and decided to try Devise. Although I missed i18n in generated views, I found your gem to be really great! Thank you!

Best Regards,

Rodrigo.

FWIW, I think dealing with concurrency issues in Rails is very
important. I just think that in this case, we would be better served by
having some sort of gem that could provide proper mutex and locking
support based on your server configuration.

Once we have an easy to use, and well-vetted solution, *then* fold it in
to AR API.

Ok, now I get it, Aaron. Great!

Thank you,

Rodrigo.

I think there's too much misinformation and too many misunderstandings
going in this thread.

First, "Rails validations" aren't thread-unsafe or concurrency-unsafe,
only validates_uniqueness_of potentially is. The problem has been
known for years. In fact, a few years back I've explicitly documented
the problem in detail, as well as possible solutions and a cost-
benefit analysis of each solution. The documentation can be found in
what I thought is the most obvious place to look: inside the API
documentation for validates_uniqueness_of:
http://apidock.com/rails/ActiveRecord/Validations/ClassMethods/validates_uniqueness_of

Some people here propose using mutexes, and other people quickly point
out that mutexes are of no use in multi-process setups. People then
try to advocate the use of multi-threaded setups over multi-process
setups. In my opinion any kind of solution involving either inter-
thread locks (i.e. mutexes) or interprocess locks (e.g. lock files)
are not the way to go for one reason: neither of them work in multi-
machine setups.

I am not advocating the superiority of multithreaded vs multiprocess
vs evented or whatever. But any kind of good solution *must* work
across multiple machines. That really only leaves 2 possible
solutions:
- Inter-machine locks, e.g. a database lock.
- Optimistic concurrency control.

The preferred solution that I've documented at
http://apidock.com/rails/ActiveRecord/Validations/ClassMethods/validates_uniqueness_of
involves the use of unique indices and is a form of database-aided
optimistic concurrency control. Unique indices:
- Are universal and a lowest common denominator. They work on every
single SQL database, on every operating system, on any combination of
multithreaded/multiprocess/multiple-machine setups.
- Are well-understood.
- Are easy to create.
- Are zero-maintenance. Lock files have to be set up during deployment
and monitored.

The only down side of unique indices, as currently utilized by
ActiveRecord, is the extremely small chance that you run into a race
condition during a #save action. This race condition will result in an
exception, but in no circumstance will duplicate records be saved into
the database. In my opinion the chance that such an exception occurs
(which is bad for end-user usability) is so small that it's usually
not worth the effort to check for it and handling it gracefully (i.e.
in a manner other than telling the user that some internal server
error occurred), though you still have the possibility to do it
yourself if you so desire. This too is documented at
http://apidock.com/rails/ActiveRecord/Validations/ClassMethods/validates_uniqueness_of

Pretty much everything that is currently being discussed in this
thread is covered by the documentation that I wrote, and it is for
that reason that I think this issue does not warrant further
discussion. Though better handling of the unique index race condition
in AR would be nice.

The patch mentioned in the thread will be taken into account for sure.
I think it would be great that we have this robustness and at the same
time be able to deal with it as if it was an ordinary validation
error. Assuming it is doable in a robust and portable way (have not
studied the patch in detail).

In such solution there's a potential leak I think, which is that
albeit it would be presented as an ordinary validation error,
before_save and friends still run. Those callbacks are allowed to
assume nowadays the model is valid at that point.

In existing code in general that won't matter because there's a
transaction going on, but in theory they could have side-effects not
covered by the transaction.

That could be documented for future code, and I guess breakage in
existing apps would be rare.

If you do things in before_save that cannot be put into the same
database transaction (e.g. if you're modifying files on the
filesystem) then yes, you'll still run into trouble. In that case
there really is no other way than using a locking server, e.g. the
database server. However that is a fairly complex and non-standard
case, and I do not think it would be warranted to pollute the
validates_presence_of documentation with references to that problem.
Instead, a separate document dealing with common concurrency issues
should be written.