resource_feeder feedback

Let me apologize in advance as this is a topic that I feel passionate and opinionated about. If I'm coming on too strong, please forgive me.

=== Be opinionated===

My first piece of advice is simple: be opinionated. Change the method name to simply <feed_for>. Have it default to doing the right thing. Perhaps give an option to change the default feed format, but default to providing only a single feed format. That's what Microsoft [1], FeedDemon [2], Bloglines [3] and many others would prefer.

[1] Microsoft RSS Blog | Microsoft Docs [2] Nick Bradbury: RSS Good Practices: Pick a Format (Any Format) [3] http://www.25hoursaday.com/weblog/PermaLink.aspx?guid=cce109de-de25-4852-a8cf-319ccbf0013

If you want, pick RSS 1.0. Or RSS 2.0. Or Atom 1.0. Just pick one. I'd personally recommend that you be a bit forward thinking, like Mozilla [4] or the Intelligence community [5]; but if for some reason you want to pick another format from that list; I won't complain.

[4] 313441 - [SECURITY] Query RSS should HTML-escape summary in <title> [5] https://dnidata.org/education/xnotes/RSS_and_Atom_Considerations.xml

=== Summary and Content ===

Independent of feed format, my biggest input is that you think a bit more about description. It has a number of problems. The first can be expressed thus: undoubtedly the column in the database from which this element is often sourced is of some string type; in such cases: what would a '<' character in such a column mean? Does it signal the start of markup? Or is it simply a less than character?

My experience is that most string columns in a database are simply plain text. That also tends to be the 'safe' choice in most cases; treating markup as plain text will expose the markup. This , while suboptimal, at least is obvious. In the reverse case, what you have is silent data loss [6].

[6] Détente

If you want to put plain text in an RSS 1.0 or RSS 2.0 <description>, and you are using Builder 2.0 (you *are* using Builder 2.0, aren't you?), then you need to add a .to_xs to escape the string. If you want to put plain text into an Atom <summary> or <content element>, you need do nothing as that is the default.

Now you need a convention (possibly as simple as .downcase.ends_with?("html")) and an a way to override this (perhaps options[:feed][:content_type]?). If content_type=='html', you want to omit the call to .to_xs for RSS 1.0 and RSS 2.0. For Atom, simply do

   xml.content(:type => options[:feed][:content_type] || ... || 'text')

Next, consider internationalization / character encoding issues. Builder prefers utf-8; but with Builder 2.0 it will fall back to iso-8859-1/win-1252. This is the right default for a wide number of cases, but it will be wrong in a case that you might care about: native Mac applications.

Finally, consider splitting summary and content. Summary should go into description in RSS 1.0 and RSS 2.0; but content should go into content:encoded. This extension enjoys wide support, so it is safe to use [7]. For Atom, summary and content go into summary and content respectively. Additionally, with Atom, you can put XML directly into the content; this may be a reasonable default.

[7] RSS Best Practices Profile

=== Other details ===

You declare xmlns:dc; you really want to use this. Put some author information into the feed [8]. This goes for all feed versions, though Atom structures this information differently. Author information can either be a single author for the entire feed, or an author for each entry, or both.

[8] RSS Best Practices Profile

The default feed title probably should include the database name.

You omit channel description if there isn't an options which specify this value; doing this is only valid in Atom 1.0; with RSS 1.0 and RSS 2.0 channel description is required.

Omitting language if it isn't specified is probably better than defaulting to en-us.

Re: TTL, before you go there, read [9].

[9] RSS Best Practices Profile

Entry/Item titles have much the same problems as descriptions in RSS 2.0, only there isn't nearly as much consensus on how to escape it. In particular, in generally it isn't possible find a representation that allows you to express a '<' character in a title. This problem doesn't exist in Atom 1.0.

Overall on the nomenclature: be consistent. You currently have options[:feed][:item]. This should either be options[:channel][:item] (matching RSS) or options[:feed][:entry] (matching Atom).

=== PDI ===

Since PDI is potentially the response that this post deserves, let me point out that resource_feeder doesn't really require much code to implement. :slight_smile:

Hi Sam, thanks for the feedback! Comments below...

Let me apologize in advance as this is a topic that I feel passionate and opinionated about. If I'm coming on too strong, please forgive me.

=== Be opinionated===

My first piece of advice is simple: be opinionated. Change the method name to simply <feed_for>. Have it default to doing the right thing. Perhaps give an option to change the default feed format, but default to providing only a single feed format. That's what Microsoft [1], FeedDemon [2], Bloglines [3] and many others would prefer.

I like this idea. I tend to think of Atom as the correct feed format to use, but I don't know if anyone else will agree.

=== Summary and Content ===

Independent of feed format, my biggest input is that you think a bit more about description. It has a number of problems. The first can be expressed thus: undoubtedly the column in the database from which this element is often sourced is of some string type; in such cases: what would a '<' character in such a column mean? Does it signal the start of markup? Or is it simply a less than character?

My experience is that most string columns in a database are simply plain text. That also tends to be the 'safe' choice in most cases; treating markup as plain text will expose the markup. This , while suboptimal, at least is obvious. In the reverse case, what you have is silent data loss [6].

[6] Détente

If you want to put plain text in an RSS 1.0 or RSS 2.0 <description>, and you are using Builder 2.0 (you *are* using Builder 2.0, aren't you?), then you need to add a .to_xs to escape the string. If you want to put plain text into an Atom <summary> or <content element>, you need do nothing as that is the default.

Now you need a convention (possibly as simple as .downcase.ends_with?("html")) and an a way to override this (perhaps options[:feed][:content_type]?). If content_type=='html', you want to omit the call to .to_xs for RSS 1.0 and RSS 2.0. For Atom, simply do

   xml.content(:type => options[:feed][:content_type] || ... || 'text')

I think this is a good idea. I had a feeling I'd be revisiting this part as I wrote the initial version.

Next, consider internationalization / character encoding issues. Builder prefers utf-8; but with Builder 2.0 it will fall back to iso-8859-1/win-1252. This is the right default for a wide number of cases, but it will be wrong in a case that you might care about: native Mac applications.

Finally, consider splitting summary and content. Summary should go into description in RSS 1.0 and RSS 2.0; but content should go into content:encoded. This extension enjoys wide support, so it is safe to use [7]. For Atom, summary and content go into summary and content respectively. Additionally, with Atom, you can put XML directly into the content; this may be a reasonable default.

[7] RSS Best Practices Profile

I thought about doing that too. I just wasn't really sure how to handle that cleanly. I wasn't sure if most resources would have a summary field or not. Perhaps they would if you're providing a feed for it however.

=== Other details ===

You declare xmlns:dc; you really want to use this. Put some author information into the feed [8]. This goes for all feed versions, though Atom structures this information differently. Author information can either be a single author for the entire feed, or an author for each entry, or both.

[8] RSS Best Practices Profile

The default feed title probably should include the database name.

You omit channel description if there isn't an options which specify this value; doing this is only valid in Atom 1.0; with RSS 1.0 and RSS 2.0 channel description is required.

Should I just leave it blank for RSS? I figured it was up to the developer to provide a description if they wanted valid RSS.

Omitting language if it isn't specified is probably better than defaulting to en-us.

If that's the best practice, that's cool.

Re: TTL, before you go there, read [9].

[9] RSS Best Practices Profile

Entry/Item titles have much the same problems as descriptions in RSS 2.0, only there isn't nearly as much consensus on how to escape it. In particular, in generally it isn't possible find a representation that allows you to express a '<' character in a title. This problem doesn't exist in Atom 1.0.

Overall on the nomenclature: be consistent. You currently have options[:feed][:item]. This should either be options[:channel][:item] (matching RSS) or options[:feed][:entry] (matching Atom).

I'm in favor of matching Atom terminology, since the current restful/resource stuff was heavily inspired by it.

=== PDI ===

Since PDI is potentially the response that this post deserves, let me point out that resource_feeder doesn't really require much code to implement. :slight_smile:

We accept patches :slight_smile: Since implementing this on a small side project, I've been rethinking the approach a bit. I'm wondering if a little DSL would work better?

Example:

# simple format.atom { render_feed_for(@rumors) }

# actual usage in my app format.atom do   render_atom_feed_for(@rumors, {     :feed => { :title => 'Upcoming Rumors | is it fake or not?',       :description => 'Upcoming rumors on isitfakeornot.com',       :link => upcoming_rumors_url },     :item => { :description => :body,       :title => lambda { |r| "#{r.title} by #{r.user.login}"} }     }   ) end

# to format.atom { render :action => 'index.rfeed' }

# index.rfeed

feed_for @rumors, :format => :rss do |f|   f.title 'Upcoming Rumors | is it fake or not?'   f.description 'Upcoming rumors on isitfakeornot.com'   f.link upcoming_rumors_url   f.time_to_live 45

  f.entry do |e|     e.title do |resource|       "#{resource.title} by #{resource.user.login}"     end

    e.description :body, :content_type => :html   end end

I think the expanded DSL will look better, especially once you start adding options for summaries, authors, etc. Also, since it's rendered with ActionView, it has access to any helpers that you don't have in the controller.

Now, I just need to write this because I really want it in Mephisto...

Rick Olson wrote:

# index.rfeed

feed_for @rumors, :format => :rss do |f|

:rss should be :rss20. Leave room for other versions of RSS. RSS 1.0, in particular, is still quite popular.

  f.title 'Upcoming Rumors | is it fake or not?'

Note: title should accept an optional :type attribute, defaulting to :text.

  f.description 'Upcoming rumors on isitfakeornot.com'

s/description/subtitle/. :type is still available.

  f.link upcoming_rumors_url

Optional attributes: :rel, :type, :hreflang, :title, :length.

  f.time_to_live 45

Again, what does this mean (is it a maximum? is it a minimum?) and who supports it? YAGNI.

  f.entry do |e|     e.title do |resource|       "#{resource.title} by #{resource.user.login}"     end

    e.description :body, :content_type => :html

s/description/content/ s/content_type/type/

  end end

I think the expanded DSL will look better, especially once you start adding options for summaries, authors, etc. Also, since it's rendered with ActionView, it has access to any helpers that you don't have in the controller.

These two approaches are not incompatible. An active-record aware DSL for feeds would make it easy to produce a full function and tailored feed. A set of conventions could make it even easier to get started for tables that follow the conventions.

Now, I just need to write this because I really want it in Mephisto...

I'm checking out Mephisto as I write this. :slight_smile:

I'd be more than glad to flesh out an initial implementation for you to critique, adapt, and/or adopt.

- Sam Ruby

These two approaches are not incompatible. An active-record aware DSL for feeds would make it easy to produce a full function and tailored feed. A set of conventions could make it even easier to get started for tables that follow the conventions.

One issue I have is that my models usually store raw textile/markdown content. I usually run a few filters like RedCloth, the auto_link helper, or the white_list plugin on my content before displaying them in a feed. Though, I suppose one compromise is to include the modules into the model and do the processing in a callback.

I'm also discovering the importance of proper xml:base usage (or just converting relative links to absolute links): http://mephistoblog.com/assets/2006/9/26/SafariScreenSnapz003.png

> Now, I just need to write this because I really want it in Mephisto...

I'm checking out Mephisto as I write this. :slight_smile:

I'd be more than glad to flesh out an initial implementation for you to critique, adapt, and/or adopt.

That would be great. It would be nice to have a good library for this so I can feel confident my future projects output good feeds.

I strongly agree. Atom is the right format to choose if you want to be sure there's no silent data loss.

Thijs

PGP.sig (186 Bytes)

Rick Olson wrote:

These two approaches are not incompatible. An active-record aware DSL for feeds would make it easy to produce a full function and tailored feed. A set of conventions could make it even easier to get started for tables that follow the conventions.

One issue I have is that my models usually store raw textile/markdown content. I usually run a few filters like RedCloth, the auto_link helper, or the white_list plugin on my content before displaying them in a feed. Though, I suppose one compromise is to include the modules into the model and do the processing in a callback.

I may have misunderstood or read too much into the use of :body in your example. Taking a look at Mephisto, I see

  xm << %{<content type="html">#{sanitize_feed_content article.body_html}</content>}

Can you explain how :body was intended to work?

I'm also discovering the importance of proper xml:base usage (or just converting relative links to absolute links): http://mephistoblog.com/assets/2006/9/26/SafariScreenSnapz003.png

That's another one of the areas that is grossly underspecified with RSS 2.0. If I read app/views/feed/_article.rxml correctly, the following would address the issue:

   xm.entry "xml:base" => site.permalink_for(article) do ...

- Sam Ruby

P.S. Off topic for this mailing list, but I'm going to take a look at importing my weblog into Mephisto. Be forewarned that I'm a tough customer to please. I'm particularly unhappy about sites like mephistoblog.com that purport to be XHTML, but aren't even well formed XML as they contain strings like

   <a href="/tips">Tips & Tricks</a>

Next, consider internationalization / character encoding issues. Builder prefers utf-8; but with Builder 2.0 it will fall back to iso-8859-1/win-1252. This is the right default for a wide number of cases, but it will be wrong in a case that you might care about: native Mac applications.

I had a little trouble following this. Is UTF-8 the right default for a number of cases? or latin1? Our position in the other thread of 'utf-8 everywhere' seems to me to be a natural position to take here, but I'm bound to be missing something.

I may have misunderstood or read too much into the use of :body in your example. Taking a look at Mephisto, I see

  xm << %{<content type="html">#{sanitize_feed_content article.body_html}</content>}

Can you explain how :body was intended to work?

That Mephisto code was put in place because I had entities being encoded twice. I think it had to do with Textile encoding ampersands, and then Builder was encoding it again. It's ugly, and is a good reason why I'm really interested in resource feeder.

Rather than looping through the resources and creating atom entries, resource feeder lets you map attributes to atom elements. By default, it uses @resource.description to fill that in. If you provide :item => { :description => :body }, then it uses @resource.body. Finally, you can pass a block and customize it even further:

:item => { :title => lambda { |r| ... } }

# or in my mock feed dsl: entry.item do |resource|   ... end

> I'm also discovering the importance of proper xml:base usage (or just > converting relative links to absolute links): > http://mephistoblog.com/assets/2006/9/26/SafariScreenSnapz003.png

That's another one of the areas that is grossly underspecified with RSS 2.0. If I read app/views/feed/_article.rxml correctly, the following would address the issue:

   xm.entry "xml:base" => site.permalink_for(article) do ...

Oh, I'll try that. Thanks.

- Sam Ruby

P.S. Off topic for this mailing list, but I'm going to take a look at importing my weblog into Mephisto. Be forewarned that I'm a tough customer to please. I'm particularly unhappy about sites like mephistoblog.com that purport to be XHTML, but aren't even well formed XML as they contain strings like

   <a href="/tips">Tips & Tricks</a>

Ouch! I have a Google Group setup at http://groups.google.com/group/MephistoBlog. I'd love to hear all about why Mephisto sucks :slight_smile:

Michael Koziarski wrote:

Next, consider internationalization / character encoding issues. Builder prefers utf-8; but with Builder 2.0 it will fall back to iso-8859-1/win-1252. This is the right default for a wide number of cases, but it will be wrong in a case that you might care about: native Mac applications.

I had a little trouble following this. Is UTF-8 the right default for a number of cases? or latin1? Our position in the other thread of 'utf-8 everywhere' seems to me to be a natural position to take here, but I'm bound to be missing something.

From a Builder 2.0 perspective, utf-8 everywhere is ideal.

Now, not every random stream of bytes is valid utf-8. Builder 2.0 tries really hard not produce invalid XML, so if the input isn't utf-8, it falls back to the web-centric (or US centric) iso-8859-1 default; as embraced and extended by Microsoft in the form of win-1252.

And by fall back, I mean convert to utf-8.

- Sam Ruby

I agree that we should promote Atom over RSS. If you're going to make a new feed, you might as well make it RSS. Of course, you should be able to overwrite this to still output RSS. With RSS, we might as well be opinionated there too and just say RSS 2.0.

Regarding rfeed, I think it's a nice solution for involved feeds. But I definitely think we should strive to make it a no-option call to generate a feed for a class that looks as we expect.

In the specifics, it all looks like good suggestions. I'm waiting to svn up and see them all implemented :wink:

Thanks for taking the time to be so meticulous, Sam.

I'm confused. Maybe you meant to say 'you might as well make it Atom'?

Kind regards, Thijs

> I agree that we should promote Atom over RSS. If you're going to > make a new feed, you might as well make it RSS.

I'm confused. Maybe you meant to say 'you might as well make it Atom'?

Sorry, yes, Atom. We'll assume Atom is what people want to do.

No problem. Atom is indeed what we, the people, want.

Thijs.