String Encoding / Importing Feeds

Howdy,

I have a Rails app that grabs from RSS feed info and then tries to insert it into a database.

The problem I'm having is that some feeds appear to use funky characters and my INSERTs are failing. The actual error is below.

Any idea how I can make this work reliably, not really knowing if a feed will have these or not?

I'm using PostgreSQL and the DB was initialized as UTF-8.

Any help is very much appreciated.

feeds#update_feeds (ActiveRecord::StatementInvalid) "PGError: ERROR: invalid byte sequence for encoding \"UTF8\": 0xb9\n: INSERT INTO feed_items (\"item_id\", \"updated_at\", \"title\", \"item_updated\", \"description\", \"feed_id\", \"item_link\", \"created_at\") VALUES(NULL, '2007-05-27 20:40:32.433422', 'Quote du jour', '2007-05-25 15:57:55.000000', '<p><a href=\"http://www.blah.lhah.com\">My Name</

: <i>There\271s really only one rule for community as far as I\271m

concerned, and it\271s this - in order to call some gathering of people a \"community\", it is a requirement that if you\271re a member of the community, and one day you stop showing up, people will come looking for you to see where you went.</i></p>', 6, 'feed_url', '2007-05-27 20:40:32.433422')"

You need to encode your string rather than using it raw. Then reverse the process on read.

Probably the easiest and safest is to use base64 encoding, or to place the string in a blob rather than a string field.

Michael

Hunter, Did you ever find a solution to this problem? I am having very similar issues:

RAW RESPONSE TEXT: [Salut Alex, écoute c'est Alex je teste un peu et puis bonjour français françaises. ] UTF-8 Response text: [Salut Alex, écoute c'est Alex je teste un peu et puis bonjour français françaises. ] Unique Id: 1212337033-59   SQL (0.000081) BEGIN   GlobalInbox Update (0.000000) PGError: ERROR: invalid byte sequence for encoding "UTF8": 0xe9636f HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding". : UPDATE global_inboxes SET "created_at" = '2008-06-01 12:17:13.416834', "voicemail_status_id" = 4, "deleted_at" = NULL, "voicemail_folder_id" = 1, "deleted" = 'f', "sender_cid" = '953794484', "conversion_to_text" = 'Salut Alex, écoute c'est Alex je teste un peu et puis bonjour français françaises. ', "notes" = NULL, "voicemail_id" = 491, "updated_at" = '2008-06-01 12:18:09.957716', "user_id" = 28 WHERE "id" = 491

The output above is from my production.log file. The RAW response is the text that is output to the console by simply printing the string with the data. the UTF-8 is the .chars method called on the string type.

I am unable to insert the data into the DB using a string type or .chars method.

Did you find a solution?

Hunter Hillegas wrote: