ActiveRecord Classes

To expand upon the issue:

There are approximately 37 different categories for College Football
that house statistics. I will be parsing 37 different URLs to pull and
retrieve data that will be pushed to my database. The Scraper class is
the tool for doing that.

Each call within my rake task is going to call specific URLs using the
methods located in the Scraper class but will update to specific table
names.

Example:

rushing_offense.rb ---> connects to the rushing_offenses table
passing_offense.rb ---> connects to the passing_offenses table
scoring_offense.rb ---> connects to the scoring_offenses table

Call to scraper.rb to parse data from a rushing offense URL
Call to scraper.rb to update data to rushing_offenses table
Call to scraper.rb to parse data from a passing offense URL
Call to scraper.rb to update data to passing_offenses table
Call to scraper.rb to parse data from a scoring offense URl
Call to scraper.rb to update data to scoring_offenses table
etc. etc.
-- for 37 different categories

To add another thought to the mix:

The only reason why I'm defining a rake task is that eventually the rake
task will be managed by a cron job for populating the data for my
database on a weekly basis (say every sunday night).

The main bulk of the remainder of my project will just be dealing with
controllers and views for how the site is listed..

So, the population of data from an external source is the big issue
right now.

Does Scraper need to be an activerecord class at all ? you could pass
to it the class whose table needs to be updated ie

def do_something(some_klass)
  some_klass.update_all(...)
end

or perhaps you might want to couple things a little more loosely

def do_something(some_klass)
  some_klass.handle_scraper_data(...)
end

Fred

Any ideas of what I might be doing wrong?

You're not using update_all correctly - check the documentation

Fred

> Any ideas of what I might be doing wrong?

You're not using update_all correctly - check the documentation

Well the documentation may not mention the usage you are using, but it
does exist, sorry about that. You do seem to be using it slightly
oddly though: you call update_all multiple times, but you don't
specify any conditions, so each call to update_all overwrites the
changes made by the previous one.

Fred

Hi Fred,

Yeah I'm stuck with this one. I've checked the documentation but I'm
just not following it.

What I basically need it to do is to update the table with the data
that's parsed into @rows.

In this case @rows is listed by:

offensive_rushing.rows[i][1] (:name)
offensive_rushing.rows[i][2] (:games)

I was trying to do a for loop to go through all of the rows and send the
new data to the database. I'm just not sure how to do it properly. I
catch on quick but I've been searching the web and reading the
documentation and I just don't see a very detailed model for what I'm
trying to do.

So, in a readability format what I see is:

for i in 0..offensive_rushing.numrows-1
--> starting my loop and it's going to repeat approx 120 times (120
teams)
puts "Updating Team Name = #{offensive_rushing.rows[i][1]}."
--> Print me out an update to show me that you are updating the teams
RushingOffense.update_all(:name => offensive_rushing.rows[i][1],
        :games => offensive_rushing.rows[i][2])
--> Update the :name with the name of the team
--> Update the :games with the number of games that team has played
--> Update it if the team already exists (not sure how to do this part)
--> Add new data if the team doesn't exist (don't know how to do this
part)

I hope that helps..

Sounds like you shouldn't be using update_all at all here, rather you
should be using find to find an appropriate row to update and if there
is none, create a new one.

Fred

Frederick Cheung wrote:

Sounds like you shouldn't be using update_all at all here, rather you
should be using find to find an appropriate row to update and if there
is none, create a new one.

Fred

Again, the problem is I don't know how. I'm simply guessing based on
what I see with the documentation. I don't have any working examples
and most of the tutorials I see are very basic..

How I plan to manage the data is important as well.

For instance, I want to keep weekly data snapshots. So, as an example
just using the rushing offense table:

A user will be able to check by a particular week (the cron job will run
the rake task once per week)

Therefore, my database table needs to account for "new data" every
single week.

Scenario:

Rake Task begins
Check for weekly snapshot data (for current week)
-- If no snapshot data then create it
-- If data already exists for current week do nothing
Next Week
Rake Task begins
Check for weekly snapshot data (for current week)
-- If no snapshot data then create it
-- If data already exists for current week do nothing

So, let's look at my current table structure:

:rank
:name
:games
:carries
:net
:avg
:tds
:ydspg
:wins
:losses
:ties

So, the first issue I see is that I do not have a column that accounts
for some type of weekly snapshot event notification. Would you
recommend this be tied to a timestamp? How would I check (based on the
conditions above) to check against a particular timestamp range and
produce the results..?

Or should I create another column to check this out?

And, lastly, is there somewhere online that code is available to view
for "advanced table manipulation"? Much of the code that I have found
is either very outdated, very basic, or not something I can use. The
documentation is a decent start but it does not contain a lot of
advanced examples..

I know I may be asking a lot of questions (and I apologize if I am).
However, I do learn quickly and I'm the type of person that likes to
dive in and get started. I've read one full ruby book and am midway
through my first rails book. However, even these books do not provide
me scenario based examples.

This is why I'm here. I am better at understanding code when I see
code. I don't mind working through code that contains errors and trying
to get it to work. That just helps me gain an understanding of what
occurs. The API can only be used as a code bits reference. I always
look there first but which code are you looking for? If you know
exactly what method you are going to be working with, looking in the API
and then scouring the web for information is a little easier. In the
case of my example above, I'm not sure which methods I will be working
with exactly to accomplish my task.

Thanks.

Hi Fred,

I think I will use this for my find parameter:

start_date = Time.now.beginning_of_week
end_date = Time.now.end_of_week
@rushing_offenses = RushingOffense.find(:all, :conditions =>
['created_at > ? and created_at < ?', start_date, end_date])

That will let me find anything created within the set week. Now I just
have to figure out how to check whether or not it returns nil and create
data..

It will never return nil. It will return an array (possibly an empty
one). You might want to set your own timestamp and use that rather
than relying on created at (so that the date is one that is
significant to your data and not just when you happened to run your
scraper)

Fred

Frederick Cheung wrote:

Just to throw another spanner in the works for you, I wonder if this
wouldn't be achieved more easily using scRUBYt!. The latest skimr
branch (http://github.com/scrubber/scrubyt/tree/skimr) lets you quite
easily store the results of a scrape directly into an ActiveRecord
model.

Drop me a line if you need me to provide a more concrete example.

Glenn