improving on: truncate(..) + rendered text ?

Hello all,

In a summary page I need to show the 1st 100 chars of textilized messages. Problem: truncate(..) would often cut in the middle of html tags => random result.

My first idea was to "repair" the broken text with Hpricot (as I use it elsewhere in the project), but it's not perfect:    <h1>abcd</h would give    <h1>abcd</h</h1>

(I also use white_list to clean the <script>..)

I guess there are only 2 alternatives: - a smart html_truncate(..) or - "unrender" the text (html => plain text)

Has anybody explored those directions?


Alain Ravet.

Here is my 'improved' truncate that transforms html to text:

In sequence it :   - sanitizes (with white_list) and remove images   - strips html tags => you're only left with plain text   - truncates   - simply_format => you get newlines and paragraphs back.


   WhiteListHelper.bad_tags = %w(script img)    def strip_and_truncate(text, length = 30, truncate_string = "...")       if text.nil? then return end       snip = truncate(strip_tags(white_list(text)), length, truncate_string)       simple_format(snip)    end