Failure cases for assignment of binary string to AR model

I've run into a strange bug that I need some help with. I have a model with a binary column. I'm using Postgres 8.2.4 on the backend. For the majority of cases, I'm able to store and retrieve binary content using this setup with no problem. However, I've found two test cases that fail. Consider the following test code, taken from one of my actions:

# The test file I write out here contains out the binary data as expected, so I know that @incoming_blob is correct. f = File.open("/tmp/incoming_blob", "w+") f.write @incoming_blob f.close

# Set the value in the AR record, but don't even save it. @staged_extension.value = @incoming_blob

# Now write out the value from the AR record, and it is truncated to the first null byte. f = File.open("/tmp/from_ar_#{params[:extension_local_ref]}", "w+") f.write @staged_extension.value f.close

I've done the same thing from the console. It seems that the act of assigning the binary String to the ActiveRecord model truncates it to the first null byte. The really weird thing is that it works for other blobs (jpgs and other binary data). At first my code was saving to the database (of course) and I thought the bug was somewhere in the postgres driver, but it's not -- as you can see here, there is no database activity going on at all.

There's nothing fancy going on with the model -- I've stripped out all validations on this field, and I still run into this problem.

The two test cases that fail with the code above can be found here: http://rubycloud.com/pg/1288.1024.jpg http://rubycloud.com/pg/base.tsz

I have to think given the simplicity of this code that there is a bug in the ActiveRecord column assignment methods. I'm looking into it, but any ideas/insight/help you can give would be greatly appreciated.

Thanks, Matt

I have discovered the problem. My logic in thinking that there was no interaction with the Postgres adapter simply because there was no actual database interaction was flawed. Turns out that setting and retrieving values to/from the attributes hash makes calls to type_cast which in turn makes calls to string_to_binary and binary_to_string.

The problem lies in ActiveRecord::ConnectionAdapters::PostgreSQLColumn. The heuristic used to determine whether a column's value has been previously encoded by escape_bytea is not sufficient in all cases.

In binary_to_string, it checks whether it should unescape a value by looking for the tell-tale sequence \nnn. However, it the two source files that are causing so much trouble, these sequences appear in the original data. So unescape_bytea gets called on a block of data that was not previously escaped.

In base.tsz, the sequence \690 appears at byte offset 64,688. In 1288.1024.jpg, the sequence \754 appears at byte offset 27,316.

It seems that we need a more reliable way to mark a data block as previously escaped by escape_bytea. This is a somewhat tricky problem. We could save a marker with the data and then remove it when unescaping it, but this would mean that the persisted data would only be usable from within Rails, or that any other code that works with the database would have to know about this trick. Kind of ugly. Any other ideas?

--Matt