We have twenty-or-so MS Word 2000 documents that we want to display on
our website.
What we did was convert the MS Word documents to Compact HTML. We then
display a document via an
<object data="/doc/somedoc.htm" height='100%' id='xyz' width='100%'>
This all works great except for a bit of a fly in the ointment.
Doing an SEO (Search Engine Optimization) analysis shows that a least
one analyzer does not analyze the contents of "/doc/somedoc.htm".
I guess it is reasonable not to count the contents of the document
pointed to because it might not even be owned by the displaying page.
- - - -
But these MS Word 2000 are _ours_. So does anyone know of a way to
automatically convert the htm file produced so that I can render it
rather than refer to the document via object/data?
We have twenty-or-so MS Word 2000 documents that we want to display on
our website.
What we did was convert the MS Word documents to Compact HTML. We then
display a document via an
<object data="/doc/somedoc.htm" height='100%' id='xyz' width='100%'>
This all works great except for a bit of a fly in the ointment.
Doing an SEO (Search Engine Optimization) analysis shows that a least
one analyzer does not analyze the contents of "/doc/somedoc.htm".
I guess it is reasonable not to count the contents of the document
pointed to because it might not even be owned by the displaying page.
- - - -
But these MS Word 2000 are _ours_. So does anyone know of a way to
automatically convert the htm file produced so that I can render it
rather than refer to the document via object/data?
If these documents are all alike in internal structure, you could write a little script using Nokogiri to capture only the id="whatever" node containing the page content, and then write that back out as a sort of partial. OR suck it into an ActiveRecord object and persist it in your database.