Mashup Tutorial

Antonio Eggberg wrote:

Hi:

I am looking for a good hands on web mashup tutorial? or books for that
matter.. Google gives me bits and pieces of the info but not any step by
step or hands on tutorial..

I can not really answer this question, but if you would like to hack on these things in Ruby/Rails, I am just going to release a web extraction framework (planning to add functionality in the next releases to make every task to build a mashup possible) written in Ruby. I will add a lot of tutorials (e.g. how build a list of Ruby books by scraping say amazon, b&n, buy.com and putting the result together for price comparison etc.). and possibly some theory, too, so you could check it out. Unfortunately I have tons of other tasks right now so I am not progressing as fast as I would like, but if you can wait at most a few weeks, it will happen sooner or later for sure.

I would be really happy for any feedback and constructive ideas on which new features to add etc.

Peter

I can not really answer this question, but if you would like to hack on
these things in Ruby/Rails, I am just going to release a web extraction
framework (planning to add functionality in the next releases to make
every task to build a mashup possible) written in Ruby.

This is such a wonderful news! I am very very interested as a matter of fact
I been to your site couple of times this week for the screen scrapping
article :slight_smile:

Something for Antonio maybe ...

http://www.rubyrailways.com/data-extraction-for-web-20-screen-scraping-in-rubyrails/

I will add a lot

of tutorials (e.g. how build a list of Ruby books by scraping say
amazon, b&n, buy.com and putting the result together for price
comparison etc.). and possibly some theory, too, so you could check it
out. Unfortunately I have tons of other tasks right now so I am not
progressing as fast as I would like, but if you can wait at most a few
weeks, it will happen sooner or later for sure.

YES, I would like to have a GO! Please send me the URL/Instruction
directly.. Cooool!

I would be really happy for any feedback and constructive ideas on which
new features to add etc.

Off course thats the least I can do.. Super! Look forward to hear from you!

Something for Antonio maybe ...

http://www.rubyrailways.com/data-extraction-for-web-20-screen-scraping-in-rubyrails/

Ah, the good old screen scraping article. I am just planning to beef it up with some more theoretical stuff, as well as HPricot, (Fire)Watir and other 'chapters'...

YES, I would like to have a GO! Please send me the URL/Instruction
directly.. Cooool!

Well, you will have to wait a few days at least... Until then, here is an example of what to expect in 0.1.0:

ebay_data = Scrubyt::Extractor define do
   #Navigate to our page
   fetch 'ebay.com'
   fill_textfield 'dell laptop'
   submit
   click_link 'Laptops/Notebooks'

   #Construct the scraper
   record do
     name "DELL LATITUDE C600 P3 1.0 LAPTOP NOTEBOOK 256MB 20GB HD"
     price "$192.50" do
       (value /\d+\.\d\d/).ensure_greater_than 150
     end
     shipping "$37.00"
   end.ensure_presence_of_pattern(:price)
   next_page "8 9 Next", :limit => 5
end

So, this extractor will automatically navigate to the page of interest (powered by Mechanize, later planning to add (Fire)Watir for javascript navigation or maybe even selenium). There, extract all records (specified by the example) which have a price, and this price is higher than 150. Once finished, navigate to the next 5 result pages and extract the records there too.

what you get back in 'ebay_data' may be used in various ways. Either calling different to_xxx methods on it (for example to_xml - planning to add to_csv, to_atom etc.) or like this:

ebay_data.record[3].price

etc.

This is but the top of the iceberg - I have tons of todo lists for the future releases :-). Also, the DSL can be very easily extended by anyone (with things like 'ensure_greater_than' - for example 'ensure_contains_string' etc.) so it is not a take-it-and-use-it thing but a rather flexible tool to do these kind of stuff.

Stay tuned,
Peter