resource_feeder feedback

Let me apologize in advance as this is a topic that I feel passionate and opinionated about. If I'm coming on too strong, please forgive me.

=== Be opinionated===

My first piece of advice is simple: be opinionated. Change the method name to simply <feed_for>. Have it default to doing the right thing. Perhaps give an option to change the default feed format, but default to providing only a single feed format. That's what Microsoft [1], FeedDemon [2], Bloglines [3] and many others would prefer.

[1] http://blogs.msdn.com/rssteam/archive/2005/08/03/446904.aspx
[2] http://nick.typepad.com/blog/2006/05/pick_a_format_a.html
[3] http://www.25hoursaday.com/weblog/PermaLink.aspx?guid=cce109de-de25-4852-a8cf-319ccbf0013

If you want, pick RSS 1.0. Or RSS 2.0. Or Atom 1.0. Just pick one. I'd personally recommend that you be a bit forward thinking, like Mozilla [4] or the Intelligence community [5]; but if for some reason you want to pick another format from that list; I won't complain.

[4] http://bugzilla.mozilla.org/show_bug.cgi?id=313441
[5] https://dnidata.org/education/xnotes/RSS_and_Atom_Considerations.xml

=== Summary and Content ===

Independent of feed format, my biggest input is that you think a bit more about description. It has a number of problems. The first can be expressed thus: undoubtedly the column in the database from which this element is often sourced is of some string type; in such cases: what would a '<' character in such a column mean? Does it signal the start of markup? Or is it simply a less than character?

My experience is that most string columns in a database are simply plain text. That also tends to be the 'safe' choice in most cases; treating markup as plain text will expose the markup. This , while suboptimal, at least is obvious. In the reverse case, what you have is silent data loss [6].

[6] http://www.intertwingly.net/blog/2004/05/28/detente

If you want to put plain text in an RSS 1.0 or RSS 2.0 <description>, and you are using Builder 2.0 (you *are* using Builder 2.0, aren't you?), then you need to add a .to_xs to escape the string. If you want to put plain text into an Atom <summary> or <content element>, you need do nothing as that is the default.

Now you need a convention (possibly as simple as .downcase.ends_with?("html")) and an a way to override this (perhaps options[:feed][:content_type]?). If content_type=='html', you want to omit the call to .to_xs for RSS 1.0 and RSS 2.0. For Atom, simply do

   xml.content(:type => options[:feed][:content_type] || ... || 'text')

Next, consider internationalization / character encoding issues. Builder prefers utf-8; but with Builder 2.0 it will fall back to iso-8859-1/win-1252. This is the right default for a wide number of cases, but it will be wrong in a case that you might care about: native Mac applications.

Finally, consider splitting summary and content. Summary should go into
description in RSS 1.0 and RSS 2.0; but content should go into
content:encoded. This extension enjoys wide support, so it is safe to use [7]. For Atom, summary and content go into summary and content respectively. Additionally, with Atom, you can put XML directly into the content; this may be a reasonable default.

[7] http://www.rssboard.org/rss-profile#namespace-elements-content-encoded

=== Other details ===

You declare xmlns:dc; you really want to use this. Put some author
information into the feed [8]. This goes for all feed versions, though Atom structures this information differently. Author information can either be a single author for the entire feed, or an author for each entry, or both.

[8] http://www.rssboard.org/rss-profile#namespace-elements-dublin-creator

The default feed title probably should include the database name.

You omit channel description if there isn't an options which specify this value; doing this is only valid in Atom 1.0; with RSS 1.0 and RSS 2.0 channel description is required.

Omitting language if it isn't specified is probably better than defaulting to en-us.

Re: TTL, before you go there, read [9].

[9] http://www.rssboard.org/rss-profile#element-channel-ttl

Entry/Item titles have much the same problems as descriptions in RSS 2.0, only there isn't nearly as much consensus on how to escape it. In particular, in generally it isn't possible find a representation that allows you to express a '<' character in a title. This problem doesn't exist in Atom 1.0.

Overall on the nomenclature: be consistent. You currently have
options[:feed][:item]. This should either be options[:channel][:item]
(matching RSS) or options[:feed][:entry] (matching Atom).

=== PDI ===

Since PDI is potentially the response that this post deserves, let me
point out that resource_feeder doesn't really require much code to
implement. :slight_smile:

Hi Sam, thanks for the feedback! Comments below...

Let me apologize in advance as this is a topic that I feel passionate
and opinionated about. If I'm coming on too strong, please forgive me.

=== Be opinionated===

My first piece of advice is simple: be opinionated. Change the method
name to simply <feed_for>. Have it default to doing the right thing.
Perhaps give an option to change the default feed format, but default to
providing only a single feed format. That's what Microsoft [1],
FeedDemon [2], Bloglines [3] and many others would prefer.

I like this idea. I tend to think of Atom as the correct feed format
to use, but I don't know if anyone else will agree.

=== Summary and Content ===

Independent of feed format, my biggest input is that you think a bit
more about description. It has a number of problems. The first can be
expressed thus: undoubtedly the column in the database from which this
element is often sourced is of some string type; in such cases: what
would a '<' character in such a column mean? Does it signal the start
of markup? Or is it simply a less than character?

My experience is that most string columns in a database are simply plain
text. That also tends to be the 'safe' choice in most cases; treating
markup as plain text will expose the markup. This , while suboptimal,
at least is obvious. In the reverse case, what you have is silent data
loss [6].

[6] http://www.intertwingly.net/blog/2004/05/28/detente

If you want to put plain text in an RSS 1.0 or RSS 2.0 <description>,
and you are using Builder 2.0 (you *are* using Builder 2.0, aren't
you?), then you need to add a .to_xs to escape the string. If you want
to put plain text into an Atom <summary> or <content element>, you need
do nothing as that is the default.

Now you need a convention (possibly as simple as
.downcase.ends_with?("html")) and an a way to override this (perhaps
options[:feed][:content_type]?). If content_type=='html', you want to
omit the call to .to_xs for RSS 1.0 and RSS 2.0. For Atom, simply do

   xml.content(:type => options[:feed][:content_type] || ... || 'text')

I think this is a good idea. I had a feeling I'd be revisiting this
part as I wrote the initial version.

Next, consider internationalization / character encoding issues.
Builder prefers utf-8; but with Builder 2.0 it will fall back to
iso-8859-1/win-1252. This is the right default for a wide number of
cases, but it will be wrong in a case that you might care about: native
Mac applications.

Finally, consider splitting summary and content. Summary should go into
description in RSS 1.0 and RSS 2.0; but content should go into
content:encoded. This extension enjoys wide support, so it is safe to
use [7]. For Atom, summary and content go into summary and content
respectively. Additionally, with Atom, you can put XML directly into
the content; this may be a reasonable default.

[7] http://www.rssboard.org/rss-profile#namespace-elements-content-encoded

I thought about doing that too. I just wasn't really sure how to
handle that cleanly. I wasn't sure if most resources would have a
summary field or not. Perhaps they would if you're providing a feed
for it however.

=== Other details ===

You declare xmlns:dc; you really want to use this. Put some author
information into the feed [8]. This goes for all feed versions, though
Atom structures this information differently. Author information can
either be a single author for the entire feed, or an author for each
entry, or both.

[8] http://www.rssboard.org/rss-profile#namespace-elements-dublin-creator

The default feed title probably should include the database name.

You omit channel description if there isn't an options which specify
this value; doing this is only valid in Atom 1.0; with RSS 1.0 and RSS
2.0 channel description is required.

Should I just leave it blank for RSS? I figured it was up to the
developer to provide a description if they wanted valid RSS.

Omitting language if it isn't specified is probably better than
defaulting to en-us.

If that's the best practice, that's cool.

Re: TTL, before you go there, read [9].

[9] http://www.rssboard.org/rss-profile#element-channel-ttl

Entry/Item titles have much the same problems as descriptions in RSS
2.0, only there isn't nearly as much consensus on how to escape it. In
particular, in generally it isn't possible find a representation that
allows you to express a '<' character in a title. This problem doesn't
exist in Atom 1.0.

Overall on the nomenclature: be consistent. You currently have
options[:feed][:item]. This should either be options[:channel][:item]
(matching RSS) or options[:feed][:entry] (matching Atom).

I'm in favor of matching Atom terminology, since the current
restful/resource stuff was heavily inspired by it.

=== PDI ===

Since PDI is potentially the response that this post deserves, let me
point out that resource_feeder doesn't really require much code to
implement. :slight_smile:

We accept patches :slight_smile: Since implementing this on a small side project,
I've been rethinking the approach a bit. I'm wondering if a little
DSL would work better?

Example:

# simple
format.atom { render_feed_for(@rumors) }

# actual usage in my app
format.atom do
  render_atom_feed_for(@rumors, {
    :feed => { :title => 'Upcoming Rumors | is it fake or not?',
      :description => 'Upcoming rumors on isitfakeornot.com',
      :link => upcoming_rumors_url },
    :item => { :description => :body,
      :title => lambda { |r| "#{r.title} by #{r.user.login}"} }
    }
  )
end

# to
format.atom { render :action => 'index.rfeed' }

# index.rfeed

feed_for @rumors, :format => :rss do |f|
  f.title 'Upcoming Rumors | is it fake or not?'
  f.description 'Upcoming rumors on isitfakeornot.com'
  f.link upcoming_rumors_url
  f.time_to_live 45

  f.entry do |e|
    e.title do |resource|
      "#{resource.title} by #{resource.user.login}"
    end

    e.description :body, :content_type => :html
  end
end

I think the expanded DSL will look better, especially once you start
adding options for summaries, authors, etc. Also, since it's rendered
with ActionView, it has access to any helpers that you don't have in
the controller.

Now, I just need to write this because I really want it in Mephisto...

Rick Olson wrote:

# index.rfeed

feed_for @rumors, :format => :rss do |f|

:rss should be :rss20. Leave room for other versions of RSS. RSS 1.0, in particular, is still quite popular.

  f.title 'Upcoming Rumors | is it fake or not?'

Note: title should accept an optional :type attribute, defaulting to :text.

  f.description 'Upcoming rumors on isitfakeornot.com'

s/description/subtitle/. :type is still available.

  f.link upcoming_rumors_url

Optional attributes: :rel, :type, :hreflang, :title, :length.

  f.time_to_live 45

Again, what does this mean (is it a maximum? is it a minimum?) and who supports it? YAGNI.

  f.entry do |e|
    e.title do |resource|
      "#{resource.title} by #{resource.user.login}"
    end

    e.description :body, :content_type => :html

s/description/content/
s/content_type/type/

  end
end

I think the expanded DSL will look better, especially once you start
adding options for summaries, authors, etc. Also, since it's rendered
with ActionView, it has access to any helpers that you don't have in
the controller.

These two approaches are not incompatible. An active-record aware DSL for feeds would make it easy to produce a full function and tailored feed. A set of conventions could make it even easier to get started for tables that follow the conventions.

Now, I just need to write this because I really want it in Mephisto...

I'm checking out Mephisto as I write this. :slight_smile:

I'd be more than glad to flesh out an initial implementation for you to critique, adapt, and/or adopt.

- Sam Ruby

These two approaches are not incompatible. An active-record aware DSL
for feeds would make it easy to produce a full function and tailored
feed. A set of conventions could make it even easier to get started for
tables that follow the conventions.

One issue I have is that my models usually store raw textile/markdown
content. I usually run a few filters like RedCloth, the auto_link
helper, or the white_list plugin on my content before displaying them
in a feed. Though, I suppose one compromise is to include the modules
into the model and do the processing in a callback.

I'm also discovering the importance of proper xml:base usage (or just
converting relative links to absolute links):
http://mephistoblog.com/assets/2006/9/26/SafariScreenSnapz003.png

> Now, I just need to write this because I really want it in Mephisto...

I'm checking out Mephisto as I write this. :slight_smile:

I'd be more than glad to flesh out an initial implementation for you to
critique, adapt, and/or adopt.

That would be great. It would be nice to have a good library for this
so I can feel confident my future projects output good feeds.

I strongly agree. Atom is the right format to choose if you want to be sure there's no silent data loss.

Thijs

PGP.sig (186 Bytes)

Rick Olson wrote:

These two approaches are not incompatible. An active-record aware DSL
for feeds would make it easy to produce a full function and tailored
feed. A set of conventions could make it even easier to get started for
tables that follow the conventions.

One issue I have is that my models usually store raw textile/markdown
content. I usually run a few filters like RedCloth, the auto_link
helper, or the white_list plugin on my content before displaying them
in a feed. Though, I suppose one compromise is to include the modules
into the model and do the processing in a callback.

I may have misunderstood or read too much into the use of :body in your example. Taking a look at Mephisto, I see

  xm << %{<content type="html">#{sanitize_feed_content article.body_html}</content>}

Can you explain how :body was intended to work?

I'm also discovering the importance of proper xml:base usage (or just
converting relative links to absolute links):
http://mephistoblog.com/assets/2006/9/26/SafariScreenSnapz003.png

That's another one of the areas that is grossly underspecified with RSS 2.0. If I read app/views/feed/_article.rxml correctly, the following would address the issue:

   xm.entry "xml:base" => site.permalink_for(article) do ...

- Sam Ruby

P.S. Off topic for this mailing list, but I'm going to take a look at importing my weblog into Mephisto. Be forewarned that I'm a tough customer to please. I'm particularly unhappy about sites like mephistoblog.com that purport to be XHTML, but aren't even well formed XML as they contain strings like

   <a href="/tips">Tips & Tricks</a>

Next, consider internationalization / character encoding issues.
Builder prefers utf-8; but with Builder 2.0 it will fall back to
iso-8859-1/win-1252. This is the right default for a wide number of
cases, but it will be wrong in a case that you might care about: native
Mac applications.

I had a little trouble following this. Is UTF-8 the right default
for a number of cases? or latin1? Our position in the other thread
of 'utf-8 everywhere' seems to me to be a natural position to take
here, but I'm bound to be missing something.

I may have misunderstood or read too much into the use of :body in your
example. Taking a look at Mephisto, I see

  xm << %{<content type="html">#{sanitize_feed_content
article.body_html}</content>}

Can you explain how :body was intended to work?

That Mephisto code was put in place because I had entities being
encoded twice. I think it had to do with Textile encoding ampersands,
and then Builder was encoding it again. It's ugly, and is a good
reason why I'm really interested in resource feeder.

Rather than looping through the resources and creating atom entries,
resource feeder lets you map attributes to atom elements. By default,
it uses @resource.description to fill that in. If you provide :item
=> { :description => :body }, then it uses @resource.body. Finally,
you can pass a block and customize it even further:

:item => { :title => lambda { |r| ... } }

# or in my mock feed dsl:
entry.item do |resource|
  ...
end

> I'm also discovering the importance of proper xml:base usage (or just
> converting relative links to absolute links):
> http://mephistoblog.com/assets/2006/9/26/SafariScreenSnapz003.png

That's another one of the areas that is grossly underspecified with RSS
2.0. If I read app/views/feed/_article.rxml correctly, the following
would address the issue:

   xm.entry "xml:base" => site.permalink_for(article) do ...

Oh, I'll try that. Thanks.

- Sam Ruby

P.S. Off topic for this mailing list, but I'm going to take a look at
importing my weblog into Mephisto. Be forewarned that I'm a tough
customer to please. I'm particularly unhappy about sites like
mephistoblog.com that purport to be XHTML, but aren't even well formed
XML as they contain strings like

   <a href="/tips">Tips & Tricks</a>

Ouch! I have a Google Group setup at
http://groups.google.com/group/MephistoBlog. I'd love to hear all
about why Mephisto sucks :slight_smile:

Michael Koziarski wrote:

Next, consider internationalization / character encoding issues.
Builder prefers utf-8; but with Builder 2.0 it will fall back to
iso-8859-1/win-1252. This is the right default for a wide number of
cases, but it will be wrong in a case that you might care about: native
Mac applications.

I had a little trouble following this. Is UTF-8 the right default
for a number of cases? or latin1? Our position in the other thread
of 'utf-8 everywhere' seems to me to be a natural position to take
here, but I'm bound to be missing something.

From a Builder 2.0 perspective, utf-8 everywhere is ideal.

Now, not every random stream of bytes is valid utf-8. Builder 2.0 tries really hard not produce invalid XML, so if the input isn't utf-8, it falls back to the web-centric (or US centric) iso-8859-1 default; as embraced and extended by Microsoft in the form of win-1252.

And by fall back, I mean convert to utf-8.

- Sam Ruby

I agree that we should promote Atom over RSS. If you're going to make a
new feed, you might as well make it RSS. Of course, you should be able
to overwrite this to still output RSS. With RSS, we might as well be
opinionated there too and just say RSS 2.0.

Regarding rfeed, I think it's a nice solution for involved feeds. But I
definitely think we should strive to make it a no-option call to
generate a feed for a class that looks as we expect.

In the specifics, it all looks like good suggestions. I'm waiting to
svn up and see them all implemented :wink:

Thanks for taking the time to be so meticulous, Sam.

I'm confused. Maybe you meant to say 'you might as well make it Atom'?

Kind regards,
Thijs

> I agree that we should promote Atom over RSS. If you're going to
> make a new feed, you might as well make it RSS.

I'm confused. Maybe you meant to say 'you might as well make it Atom'?

Sorry, yes, Atom. We'll assume Atom is what people want to do.

No problem. Atom is indeed what we, the people, want.

Thijs.