PGError: ERROR: invalid byte sequence for encoding "UTF8": 0xa0

Hello, my app is reading emails with attachments and inserting the Email message into the database to be sent to delayed job for processing.

When inserting an email with attachments into the database, I get the following error:

PGError: ERROR: invalid byte sequence for encoding "UTF8": 0xa0

Has anyone seen this before? There doesn't seem to be much rails related via googling.

Right now I'm saving the email with attachments to a text column, has anyone tried using a blob to resolve?

Thanks

I haven't been using attachments but for files it sounds like blob might be better, "files" don't have character sets, they're just binary data, right?

This Postgres error did happen to me though, because I was receiving emails from Sendgrid in all sorts of different encodings. My code reconstructed a Mail::Message object from the Sendgrid params and then pulled specific fields off of it to put in the database. I falsely assumed that all incoming messages would have the same encoding as the test messages I was sending and so I just forced 'charset=UTF-8;' on the Message objects and I started getting very similar errors to you when Windows users started sending my service emails from Outlook.

To remedy this, I used Iconv to convert the incoming data to a standard charset which was the same as the Postgres DB I was using on Heroku. Sendgrid sends emails as POST parameters and includes a JSON array of the encodings of each of the other fields, so I was able to use these to tell Iconv what to convert from. I also told it to ignore invalid characters by appending "//IGNORE" to the "from" argument. The code looks like this:

encodings = ActiveSupport::JSON.decode(params[:charsets]) # Sendgrid auto-decodes the headers into UTF8 mail = Mail.new(params[:headers]) mail.text_part = Mail::Part.new(:charset => 'UTF-8', :content_type => "text/plain;", :body => Iconv.conv(encodings['text']+"//IGNORE", 'UTF-8', params[:text])) if params[:text].present?

If you aren't using Sendgrid, make certain that the text you insert into the database is in the same encoding as what the database expects. I believe something in the ActiveRecord stack, be it ActiveRecord, the PgSQL backend, the ruby postgres bindings, or whatever, something assumed that the incoming string was properly encoded and just sent it as it was to the database, which errored because it was not.

I'm on REE 1.8.7, so maybe this whole thing would go away if you used 1.9.2. I've read string encodings are a lot more magical there, and wycats has a good blog post explaining how it all works if you are curious.