HTML character codes in ActionMailer subject

Jaap_Haagmans · May 27, 2010, 2:28pm

Hi,

We have our e-mail subjects stored in I18n locale files. For the English version of the website, we had no problem, but now we're translating to German, which uses a lot of symbols like ü and Ü for example. We store them as character codes in the YAML files, but most e-mail clients won't parse HTML-code in the subjects. Is it fine to just use the actual characters in the YAML files? Will that work on any system?

Thanks in advance!

11155 · May 27, 2010, 4:25pm

Jaap Haagmans wrote:

Hi,

We have our e-mail subjects stored in I18n locale files. For the English version of the website, we had no problem, but now we're translating to German, which uses a lot of symbols like ü and Ü for example. We store them as character codes in the YAML files,

Why? That's a really bad idea.

but most e-mail clients won't parse HTML-code in the subjects. Is it fine to just use the actual characters in the YAML files?

Yes.

Will that work on any system?

Yes, provided you specify the correct encoding (I recommend UTF-8 for everything).

Thanks in advance!

Best,

Jaap_Haagmans · May 28, 2010, 5:35am

> We have our e-mail subjects stored in I18n locale files. For the > English version of the website, we had no problem, but now we're > translating to German, which uses a lot of symbols like ü and > Ü for example. We store them as character codes in the YAML > files,

Why? That's a really bad idea.

Because the subject lines are used in our website as well. xHTML strict needs our "strange" characters to be character codes.

> but most e-mail clients won't parse HTML-code in the subjects. > Is it fine to just use the actual characters in the YAML files?

Yes.

> Will > that work on any system?

Yes, provided you specify the correct encoding (I recommend UTF-8 for everything).

Thanks. Is there a way to convert these HTML character codes to the actual characters?

Jaap

hassan · May 28, 2010, 3:21pm

? What makes you think that? Or to put it another way: no, it doesn't

11155 · May 28, 2010, 5:50pm

Hassan Schroeder wrote:

> We have our e-mail subjects stored in I18n locale files. For the > English version of the website, we had no problem, but now we're > translating to German, which uses a lot of symbols like ü and > Ü for example. We store them as character codes in the YAML > files,

Because the subject lines are used in our website as well. xHTML strict needs our "strange" characters to be character codes.

? What makes you think that? Or to put it another way: no, it doesn't

Right. In both HTML 4 and XHTML, any Unicode character is valid.

But even if that weren't so, you should be storing the actual "strange" characters and escaping them on output. And you've just discovered why.

-- Hassan Schroeder ------------------------ hassan.schroeder@gmail.com twitter: @hassan

Best,

Jaap_Haagmans · May 28, 2010, 6:45pm

Let me put in another way: we have multiple people updating the language strings in the YAML files. For people working on OS X, the characters will show up as squares on Linux and vice versa. I only control one of these systems. How would you approach this? Our approach has worked for some time now this way and we only have problems with ActionMailer subjects.

11155 · May 28, 2010, 6:52pm

Jaap Haagmans wrote:

Let me put in another way: we have multiple people updating the language strings in the YAML files. For people working on OS X, the characters will show up as squares on Linux and vice versa. I only control one of these systems. How would you approach this?

Put the UTF-8 characters in the files. Make sure everyone's editor is set to UTF-8.

Our approach has worked for some time now this way

It's the wrong approach. You should apply any necessary escaping on output. As you've discovered, pre-escaped data is unusable if you're generating multiple formats.

and we only have problems with ActionMailer subjects.

No, you'll have problems with anything that isn't HTML. So...do it right. Store the actual characters. Let the renderer escape characters on output. This is the correct and flexible approach.

Best,