tl;dr - how do you future-proof database content, i.e. avoid specific HTML in the database? (Sorry if this has been discussed properly before, I couldn’t find anything relevant.)
I’ve been maintaining a Rails project for the past couple of years, and we’re running into a couple of problems with what you could call content maintainability. We publish several new pages a day (public event pages and some news stories) and we have a large repository of more static pages too. We’re using Refinery CMS, which is… okay. A good choice in 2014 when we first implemented this version of the site, a bad choice in 2021. But the real problem isn’t the CMS per se, but the fact that the body text for each page is stored as site-specific HTML. We’ve tried to use page parts and custom fields that isolate specific content like images and video, but it’s unreasonable and counterproductive to stop editors (there are about 7 people actively creating content) from using images or other content in the main document flow, e.g. in a long interview.
All this could have been avoided with a better way to store content. Jeff Eaton’s excellent 2014 article about this is still largely correct (except the stuff about Web Components being the future, which, I think, turned out to be wrong). CMSes, headless or traditional, should abstract the database-stored content from the final HTML with a custom transformation layer in between. When presentational markup (like in my example above) changes, you change the code in the transformation layer abstraction, not in 100 separate CMS pages.
Jeff Eaton’s 2014 solution was using XML under the hood for custom content editor elements, and let the backend compile/transform to the desired HTML. This seems more cumbersome, but at least this way you can keep html for most of the content (like bog-standard p’s and regular links) and use custom xml elements for specifics (like pull quotes, specific tables, video embeds, etc). (i.e.
<Pullquote><p>Quote here</p></Pullquote> is compiled to the desired
<aside class="pullquote"><blockquote><p>Quote here</p></blockquote></aside>)
Writing my own, say, Nokogiri-based, Rails helper logic is doable, but there must be some Rails content solution that has considered these questions and even solved them? I could probably implement something like this for Refinery, but it would be trying to fit a square peg into a round hole. At minimum, it would break preview and I would have to loosen the strict html allowlist, and I don’t particularly want to do this. Most (all?) CMS or CMS-adjacent (like Trestle with tinyMCE extension) solutions store raw HTML in the database, so this must be a problem for a lot of people. Any ideas how to tackle this, obvious or not so obvious?
PS: If a good solution for future-proofing this exists, I’m willing to spend a lot of time fixing existing pages/db entries if it means I won’t have to do it again in two years.
PPS: There might be some things to leverage from ActionText, but sadly it’s off the table for me because of this Trix bug.