Parsing html tags to ruby characters

11175 · May 20, 2008, 5:03am

Hi,

I have a string containing some ruby code and html tags in-between. For example,

str = "require 'my_class.rb' require 'your_class.rb'   :key=>'hello'"

I want these html tags(' ', ' ', '>', '<', '', '' etc...) to be replaced by the equivalent ruby characters("\n", " ", ">", "<" etc...).

These html tags can change dynamically according to the inputs.

Is there any way to parse these html tags to equivalent ruby characters?

Thanks in advance...

11175 · May 20, 2008, 5:58am

Thanks Ryan. But I can't guess what are all the tags i will be getting. Because those are dynamic. Any possible tag can come. So if I have to use the 'gsub' method, I will have to write for each and every html tag. Then that will be big.

So I am looking for any other easier way to implement this(something like html parser kind of).

radar · May 20, 2008, 6:04am

You never specified what you wanted the

and tags replaced with either.

11175 · May 20, 2008, 6:12am

Sorry. That's my mistake. The final thing i want from the string is a runnable ruby code. So and tags can be removed from the string without any replacement.

Now I think, the only way to implement this is to use the 'gsub' method for each and every possible tag.

Frederick_Cheung · May 20, 2008, 8:15am

Sorry. That's my mistake. The final thing i want from the string is a runnable ruby code. So and tags can be removed from the string without any replacement.

Now I think, the only way to implement this is to use the 'gsub'
method for each and every possible tag.

Well assuming the only tag with special meaning is Then you can
just convert entities to their respective characters (there are tables
of these), to "\n" and then just replace every other tag with ''.
No need for one regexp per tag for that!

Fred

11175 · May 20, 2008, 8:42am

But ">" and "<" need to be replaced with ">" and "<" respectively. Because I will having some ruby hash code in the string.

Also I need to find out all the html tags in that string. Is there any way to find that?

Frederick_Cheung · May 20, 2008, 9:59am

But ">" and "<" need to be replaced with ">" and "<"
respectively. Because I will having some ruby hash code in the string.

I'm not seeing the problem Replace entities and then look for
everything between < and >. Change it to a newline if it's a br, or
just replace it with blank and add it to your list of html tags. Fred

11175 · May 21, 2008, 11:30am

Thanks for your replies. I have done as I wanted. The following the code for that.

markup = markup.gsub(' ', "\n") markup = markup.gsub(/[\<]([\/])*([A-Za-z0-9])*[\>]/, '') markup = markup.gsub('>', ">") markup = markup.gsub('<', "<") markup = markup.gsub(' ', " ") markup = markup.gsub('&', "&")

It's working fine now. But I am not sure whether I have covered all the tags and characters or not.

Frederick_Cheung · May 21, 2008, 11:53am

depends what you are trying todo. there are far more html entities that that. (a partial list is here http://www.w3schools.com/tags/ref_entities.asp) and of course there are the unicode style ones (http://theorem.ca/~mvcorks/code/charsets/auto.html)

Fred

AndyV · May 21, 2008, 2:56pm

I have a very, very strong suspicion that the need is only to translate character enconding (e.g., &amp=>'&').

It might be worth considering iterating over an array of hashes rather than repeating the same code with different parameters:

[{:regex=>/\<br\>/, :decoded=>"\n"}, {:regex=>/[\<]([\/])*([A-Za-z0-9])*[\>]/, :decoded=>''}, {:regex=>/>/, :decoded=>'>'} ... ].each do |decoding_hash| markup.gsub!(decoding_hash[:regex], decoding_hash[:decoded]) end

The advantage is in keeping the code DRY and making the intentions of the block a bit clearer.

Topic		Replies	Views
Help with a regex rubyonrails-talk	4	121	December 7, 2006
Method to remove html escape characters from strings rubyonrails-talk	3	1180	April 8, 2009
How to remove special characters? (ie. &) rubyonrails-talk	3	131	February 10, 2009
How to search and replace all urls on a HTML string using RUBY gsub rubyonrails-talk	2	687	September 30, 2009
Problems using the RedCloth gem !?! rubyonrails-talk	2	96	January 22, 2007

Parsing html tags to ruby characters

Related topics

More Resources