Why BufferedLogger?

Hi,

BufferedLogger appears to just implement write buffering, which is normally the job of the operating system. I'm assuming I'm missing something, but I'm having trouble finding any justification for it anywhere. Can someone explain why we have it?

Thanks,

John.

I'd guess it helps with handling concurrency on production systems. If you have a couple workers writing to one log file at the same time you will get messages from different requests interleaved.

Cheers, Adam

This is a different kind of buffering. It's for thread-safety, not for improving performance like OS-level buffering. Each thread is assigned its own buffer so that log data from different threads don't interleave each other.

Actually BufferedLogger does not protect you against inter-request interleaving. It only protects against inter-message interleaving. That is, if one thread calls Rails.logger.info("hello") and another calls Rails.logger.info("world") then you will either get "helloworld" or "worldhello", but not "helworldlo". This is all the guarantees that it gives. To prevent inter-request interleaving one will need to buffer the entire request's logger output and print everything at once. Something like https://github.com/FooBarWidget/workling/blob/master/lib/workling/thread_local_logger.rb

You don't need to buffer your own writes to protect against inter-message interleaving. Opening the log file in O_APPEND mode should get you that (assuming each message is made with one write call)

John.

That only works if the data you're writing is smaller than PIPBUF, which seems to be 512 bytes on most systems. And don't forget Windows, I don't what kind of guarantees it gives to file append operations.

A POSIX write call to a file in O_APPEND writes to the end of the file atomically - writes should never get interleaved. Linux does this. I can't find any mention of this not working in Windows (it does support O_APPEND apparently), but I haven't tested it. It isn't guaranteed over NFS though.

I'm speaking slightly more from theory than practice, but I can't reproduce an interleaving write on Linux.

I'm pretty sure PIPE_BUF doesn't apply to files.

John.

Is there anything in the standards that say appending to files is always atomic? I can't find anything that indicate that it isn't just undocumented linux behavior.

BufferedLogger appears to just implement write buffering, which is normally the job of the operating system. I'm assuming I'm missing something, but I'm having trouble finding any justification for it anywhere. Can someone explain why we have it?

The short version is that logging showed up in hello world benchmarks and as a result some optimisation, of questionable real world utility, took place. Buffering up a bunch of writes into a single write does make a difference though and it seems mostly harmless. It also has some nice benefits now that it prevents interleaving of requests in multi-threaded mode.

You can set your own logger instance in application.rb if you want to use something simpler / customised.

http://pubs.opengroup.org/onlinepubs/009695399/functions/write.html

"If the O_APPEND flag of the file status flags is set, the file offset shall be set to the end of the file prior to each write and no intervening file modification operation shall occur between changing the file offset and the write operation."

and then:

"If the value of nbyte is greater than {SSIZE_MAX}, the result is implementation-defined."

SSIZE_MAX is apparently guaranteed to be a minimum of 32k-1

John.

> BufferedLogger appears to just implement write buffering, which is > normally the job of the operating system. I'm assuming I'm missing > something, but I'm having trouble finding any justification for it > anywhere. Can someone explain why we have it?

The short version is that logging showed up in hello world benchmarks and as a result some optimisation, of questionable real world utility, took place. Buffering up a bunch of writes into a single write does make a difference though and it seems mostly harmless.

BufferedLogger will reduce the number of write system calls, so there is some performance benefits there but I also do rather suspect it would be of little real world gain.

Coupled with the default auto_flushing value in production (1000 lines!) it's not actually that harmless. If the Rails instance crashes or is killed, it loses up to 1000 lines of logs.

I've just done a quick test and BufferedLogger isn't automatically flushed at the end of rake tasks (or when rails console exits), so up to 1000 log lines get lost there too. I'm not that familiar with the full execution path, but I can't actually see where Rails.logger gets closed or flushed on Rails app instance shutdown either.

I came across BufferedLogger's behaviour trying to debug a problem in production by tailing the log file and getting no output (even without the app being killed). So it makes troubleshooting pretty hard too.

It also has some nice benefits now that it prevents interleaving of requests in multi-threaded mode.

as discussed with Hongli, it certainly shouldn't be needed to prevent interleaving of individual log lines. It doesn't really prevent interleaving of requests either, it just reduces how often they occur (which in an app busy enough to notice system call overhead, probably isn't enough :wink:

You can set your own logger instance in application.rb if you want to use something simpler / customised.

We're using SysLogLogger in a few places, which is pretty neat.

IMHO though, I think that something simpler than BufferedLogger should be used by default - it looks to have enough caveats that it should be only used when fully understanding it's behaviour.

John.

The short version is that logging showed up in hello world benchmarks and as a result some optimisation, of questionable real world utility, took place. Buffering up a bunch of writes into a single write does make a difference though and it seems mostly harmless.

I've been working quite extensively with logging recently and can also confirm that there is little if any performance benefit from BufferedLogger's buffering (buffering can improve performance, but not the way BufferedLogger does it). Its only real advantage is grouping together the log messages from each request. There are also bugs with the implementation in multi-threaded applications addressed here #6671 BufferedLogger threading issues - Ruby on Rails - rails.

You can set your own logger instance in application.rb if you want to use something simpler / customised.

My problem with Rails logging has always been that it is way too sparse in that messages are logged without any sort of context (timestamp, process id, log level). This can make interpreting logs from a busy application quite painful and makes buffering necessary to provide some sort of context by grouping all messages from each request.

Logging has always been one of the least customizable and rudimentary parts of Rails. It is true that you can swap out different logging solutions, but it practice it either requires monkey patching BufferedLogger or reimplementing all of its methods. I would advocate that Rails should use something like the lumberjack gem (http:// github.com/bdurand/lumberjack) in place of BufferedLogger. This sort of architecture would provide Rails with a standard, supported logging interface, and allow for better log formats and a simpler means of writing messages to non-file based devices (syslog, mongodb, etc.).

Brian

Logging has always been one of the least customizable and rudimentary parts of Rails. It is true that you can swap out different logging solutions, but it practice it either requires monkey patching BufferedLogger or reimplementing all of its methods.

I was under the possibly mistaken impression that with the exception of our log initialization code, we only relied on the public API for the Logger class from ruby's stdlib. I've used both Logger.new and sys log logger without any issues, what did you have to monkeypatch? That definitely sounds lame.

I would advocate that Rails should use something like the lumberjack gem (http:// github.com/bdurand/lumberjack) in place of BufferedLogger. This sort of architecture would provide Rails with a standard, supported logging interface, and allow for better log formats and a simpler means of writing messages to non-file based devices (syslog, mongodb, etc.).

All of this stuff should be possible by simply providing an alternative implementation of Logger, no?

I was under the possibly mistaken impression that with the exception of our log initialization code, we only relied on the public API for the Logger class from ruby's stdlib. I've used both Logger.new and sys log logger without any issues, what did you have to monkeypatch? That definitely sounds lame.

In general, you're safe relying the public API methods of fatal, error, warn, info, debug and the corresponding level check methods (i.e. fatal?), but I have never seen this formally stated anywhere. In addition, there are public methods in Logger that are not implemented in Logger and vice versa (i.e. BufferedLogger#flush and Logger#<<, Logger#log).

Now that I think about it, I haven't had to technically monkey patch logging since Rails 1.2. However, if you want to change the log format using BufferedLogger (since Logger is declared deprecated for Rails in the core_ext/logger.rb code), you need to reimplement the add method which and call undocumented, protected methods from within your code rather than simply setting a formatter as you do with Logger.

> I would advocate > that Rails should use something like the lumberjack gem (http:// > github.com/bdurand/lumberjack) in place of BufferedLogger. This sort > of architecture would provide Rails with a standard, supported logging > interface, and allow for better log formats and a simpler means of > writing messages to non-file based devices (syslog, mongodb, etc.).

All of this stuff should be possible by simply providing an alternative implementation of Logger, no?

Kind of. One thing I found in trying to make it extensible to write to other devices is that I couldn't just write the formatted message as a string to the device. Logging to Syslog requires you to conform to the Syslog conventions. Writing to something like MongoDB is most powerful if you are writing structured data that can be sorted and sifted instead of just strings. That required moving the formatting from the logger to the device.

Brian

Kind of. One thing I found in trying to make it extensible to write to other devices is that I couldn't just write the formatted message as a string to the device. Logging to Syslog requires you to conform to the Syslog conventions. Writing to something like MongoDB is most powerful if you are writing structured data that can be sorted and sifted instead of just strings. That required moving the formatting from the logger to the device.

We have the notification / subscription code in 3.x now which is intended for tracking more detailed / structured data, if there's data you can't subscribe to right now we should add the necessary hook. Log messages are just strings and the API reflects that.