Reg Expression Help Please?

Hi all

I'm having a bit of trouble trying to acheive something, maybe someone can help?

I have a model Article, which has an attribute 'body'

The body is a text column in which people can add text and HTML.

I'd like to edit certain properties of some of the HTML tags, for example, converting all spaces inside <code> </code> tags to &nbsp;

I presume using a reg expression is the best way to achieve this, I'm just not sure of how to word an expression to scan for only characters within the html tags.

any ideas?

Ta

Gavin

You don't want to fall into the rat hole of parsing HTML with regexes. You need a parsing library like hpricot or similar.

http://wiki.github.com/why/hpricot

good luck! Tim

The body is a text column in which people can add text and HTML.

If this is the use case, then you want to protect what the user enters.

Using RedCloth or other markup library might do a better job of what you are trying to achieve.

Use a CSS rule to style it?

code { white-space: pre; }

or:

code { white-space: nowrap; }

You're just asking for a headache if you try to match paired and
possibly nested HTML tags with a Regexp. (Not that it can't be done,
but it gets ugly fast and you need a very capable regular expression
engine like Oniguruma from Ruby1.9)

-Rob

http://www.w3.org/TR/CSS2/text.html#white-space-prop

I'd like to edit certain properties of some of the HTML tags, for example, converting all spaces inside <code> </code> tags to &nbsp;

Take a look at CGI#escape and CGI#escapeElement

I was planning on editing the text blob before it's saved to the database.

So a blob like: "This is a big blob of text <code>This is the code part</code> This is another line"

would be converted to: "This is a big blob of text <code>This&nbsp;is&nbsp;the&nbsp;code&nbsp;part</code> This is another line"

And then called back as <%= @article.body %> in the view.

I thought that was a simple option.

Would hpricot be appropriate here?

I should also add that I plan on having a safe-list of tags, so any potentially harmful tags like <script> would be removed

Does that clarify at all?

Thanks

I had planned on formatting the code tags with CSS as you suggested Rob but I also need to wrap specific words in spans to specify their colour

Actually...

Just found this => http://coderay.rubychan.de/

looks perfect for my needs

Thanks for your suggestions guys

Hi all

Managed to solve this issue with CodeRay

The site is now up and running - here is a quick tutorial on how I did it incase anybody else wants to do the same:

:slight_smile: