Where to store scrape results?

Why not write the results to a file.
You could write the raw (pre-scraped) data to a file and re-scrape it
or you could save the data structure in some format (YAML is an option
here)

Andrew Timberlake
http://ramblingsonrails.com
http://www.linkedin.com/in/andrewtimberlake

"I have never let my schooling interfere with my education" - Mark Twain

You can easily create a table, and stick it in as a row.
in rails sqlite is easy enough, if you site is bigger you
can use db2.
If its like most sites, you make a "result" table
that is associated to a user table.

The benefit of YAML is that once you've scraped the data, you probably
already have a structure in place which can easily be saved and
restored.
You could combine the two by storing the YAML in the database.

From a performance perspective, consider caching the results of the
scraping for at least some period of time so that you don't have to
scrape on every search (unless the source websites change VERY
frequently)

Andrew Timberlake
http://ramblingsonrails.com
http://www.linkedin.com/in/andrewtimberlake

"I have never let my schooling interfere with my education" - Mark Twain

Thanks Andrew for ruling out any doubts i had regarding using yaml.

I will cache the reuslts then for around 2 hours in a db.

Im now wondering how this will affect the performance of filtering.

My guess is that when a user selects some filters on the results screen,
these get passed as params back to the controllers index. Logic there
will determine its a request to filter existing results and will access
the cache in the db and grab the yaml. Then use yaml to turn the info
into the relevant objects and then use enumerators find_all method to
filter the results...

do you think that approach is ok or is there a better way of doing it?

many thanks once again. you have been a great help.

Sounds good to me.
I always focus on getting the job done in the simplest way possible
first. Then work on optimisation if you see a bottleneck.
Your biggest problem is likely to be fetching all the other sites for
scraping which caching will hopefully help with.

Andrew Timberlake
http://ramblingsonrails.com
http://www.linkedin.com/in/andrewtimberlake

"I have never let my schooling interfere with my education" - Mark Twain

Excellent thanks once again Andrew! Appreciate your advice.

Just thinking, your scrape should probably be in a worker, stick the
results in
a db, Depending on what your using, you configure it to be a temp
table even.
Then in your search window you can do ajax based updated from the
scrape.
With the ability to then clear up the cache. You get more concurrency,
and with
the right javascript you could cancel the scrape in process.

Think this would scale and be more responsive

Thanks glennswest, im relatively new to rails. Whilst i think i
understood what you said can you (or anyone else) elaborate furhter on
the points below? I really appreciated your help.

Just thinking, your scrape should probably be in a worker,

when you say a worker i take it you mean some temporary database?

Depending on what your using, you configure it to be a temp
table even.
Then in your search window you can do ajax based updated from the
scrape.

From the above do you mean whilst im scraping results from sites, when
one sites results get added to the db and i go off scraping another
sites results, i can simultaneously show the results that were just
added to the screen?

With the ability to then clear up the cache.

after i get all the results and display them to the screen i can then
clear the table?

You get more concurrency,

Wasnt too sure what you meant by this but thats because im fresh to
rails and cant gather from the context.

and with
the right javascript you could cancel the scrape in process.

ahh so if whilst im scraping and simultaneously presenting already
scraped data from the db, the user decides to cancel the request, via
some javascript call i can terminate the outstanding scrape tasks and
move on?

Think this would scale and be more responsive

In general how fast/slow is it to update a table with around 1000
results? is it fast enough to handle this situation? Id prefer to stick
the objects in a temporary db because then id get to use existing
activerecord methods and mysql statements. Im just worrying about the
performance.

Here's your problem in rails:

Your web server is "single" threaded, so while you scrapping, its not
doing anything else, so you will
need more mongrels to take care of the users.

Generally you scale by having more threads, and cpu working on the
problem.
The database is probably not going to be your bottleneck for a while,
its more the
style.

Why dont I train you a bit. We can do a screen share/skype session.

Hi Glennwest,sorry for the late reply. Id be up for chatting over skype
if you are. Let me know either here or via a message. Thank you for your
kind offer!

adam.