Parsing multiple RSS & Atom feed formats

I'm working on a RSS aggregator, and I've based the parser on a script from this post;

http://www.superwick.com/archives/2007/06/09/rss-feed-parsing-in-ruby-on-rails/

But, being the complete newbie, I've found that this parser only works for specifically formatted feeds. For example, some feeds will throw a 'nil text' error. I know that I could make this script handle 'nil' attributes, but I'm betting that someone out there has already found a good solution for handling all Atom/RSS1.0/RSS2.0 formats.

I've scoured Google, but there only seems to be snippets or short posts on how to handle individual feeds or one type of format. Would anyone be able to enlighten me on this one? Are there any glaringly well-documented snippets, gems, plugins, tutorials, or books which I'm completely missing?

I'm working on a RSS aggregator, and I've based the parser on a script from this post;

http://www.superwick.com/archives/2007/06/09/rss-feed-parsing-in-ruby-on-rails/

But, being the complete newbie, I've found that this parser only works for specifically formatted feeds. For example, some feeds will throw a 'nil text' error. I know that I could make this script handle 'nil' attributes, but I'm betting that someone out there has already found a good solution for handling all Atom/RSS1.0/RSS2.0 formats.

I've scoured Google, but there only seems to be snippets or short posts on how to handle individual feeds or one type of format. Would anyone be able to enlighten me on this one? Are there any glaringly well-documented snippets, gems, plugins, tutorials, or books which I'm completely missing?

http://simple-rss.rubyforge.org/

Philip Hallstrom wrote:

I've scoured Google, but there only seems to be snippets or short posts on how to handle individual feeds or one type of format. Would anyone be able to enlighten me on this one? Are there any glaringly well-documented snippets, gems, plugins, tutorials, or books which I'm completely missing?

http://simple-rss.rubyforge.org/

Thanks Philip.

I've seen simple-rss, I just haven't come across a definitive example of how to pull down all elements of a feed, like in the tutorial link in the original post. Am I right in presuming I can use simple-rss to handle all feeds in place of the RSSParser defined in the aforementioned link, so I just need to write one parser, rather than conditionals for Atom/RSS?

I'm actually having problems installing the open-uri gem;

"could not find open-uri locally or in a repository"

I've tried usual fixes of captial letters etc, but no luck...

I've seen simple-rss, I just haven't come across a definitive example of how to pull down all elements of a feed, like in the tutorial link in the original post. Am I right in presuming I can use simple-rss to handle all feeds in place of the RSSParser defined in the aforementioned link, so I just need to write one parser, rather than conditionals for Atom/RSS?

I'm actually having problems installing the open-uri gem;

"could not find open-uri locally or in a repository"

I've tried usual fixes of captial letters etc, but no luck...

That's because open-uri is part of the ruby standard library, rather than a gem. Just "require 'open-uri'" in your script/application and you're good to go.

SimpleRSS does take care of some of the difficulties of parsing multiple feeds, and does take care of the atom/RSS differences. You still might need some conditional or chained assignment to deal with things like the publishing date. (Is it <pubDate>? <published>? <dc:date>? or whatever else ...?)

hth Jon

Jonathan Stott wrote:

I've tried usual fixes of captial letters etc, but no luck...

That's because open-uri is part of the ruby standard library, rather than a gem. Just "require 'open-uri'" in your script/application and you're good to go.

SimpleRSS does take care of some of the difficulties of parsing multiple feeds, and does take care of the atom/RSS differences. You still might need some conditional or chained assignment to deal with things like the publishing date. (Is it <pubDate>? <published>? <dc:date>? or whatever else ...?)

hth Jon

That helps, thanks Jon. I definitely needed some clarification on how streamlined I could expect the parser to be, and I've only seen pubDate so far, so that example was useful.

At the risk of sounding like a lazy fool, I'm quite suprised there isn't a one-stop chunk of Ruby code for parsing generic blog post feeds and handling the differences in Atom/RSS (like publishing date) out there. If I work it out I'll be sure to share it.

Feed normalizer might work for you. http://code.google.com/p/feed-normalizer/

I've also had really good luck with FeedTools.

Jamey