improving on: truncate(..) + rendered text ?

Hello all,

In a summary page I need to show the 1st 100 chars of textilized messages.
Problem: truncate(..) would often cut in the middle of html tags =>
random result.

My first idea was to "repair" the broken text with Hpricot (as I use
it elsewhere in the project), but it's not perfect:
   <h1>abcd</h
would give
   <h1>abcd</h</h1>

(I also use white_list to clean the <script>..)

I guess there are only 2 alternatives:
- a smart html_truncate(..)
or
- "unrender" the text (html => plain text)

Has anybody explored those directions?

TIA

Alain Ravet.

Here is my 'improved' truncate that transforms html to text:

In sequence it :
  - sanitizes (with white_list) and remove images
  - strips html tags => you're only left with plain text
  - truncates
  - simply_format => you get newlines and paragraphs back.

code:

   WhiteListHelper.bad_tags = %w(script img)
   def strip_and_truncate(text, length = 30, truncate_string = "...")
      if text.nil? then return end
      snip = truncate(strip_tags(white_list(text)), length, truncate_string)
      simple_format(snip)
   end