1. Where do custom .rb files go inside of my rails project? (for
instance I understand the MVC but with a rake task - in my brain it's
outside of the project and I'm not sure how it is supposed to
communicate with controllers or pull/associate variables from those
areas.
This custom file should be called scraper.rb and should be placed at
the /lib folder of your application. In a rake task you don't really
access or call controllers, you just run the task, which is telling
the scrapper to load the data and then save it to the DB.
2. With my custom .rb I'm also requiring 'hpricot'. Is there anything
special I need to do with a .rake file to make sure that it knows to
pull this gem? And, if I export to my real site, how do I ensure that
hpricot is loaded there too? In otherwords, what expectations should I
be relying on?
You don't need to do anything else, Rails will automatically enable
rubygems and by requiring hpricot you will tell it to load the gem.
3. When I run a rake task and need to communicate with my database (for
uploading purposes) is there an easy way to do this? Can I utilize
.rake with my DB inside of my rails environment? Or, are rake tasks
completely seperate and distinct and need to be considered outside of
scope?
Now you have to learn the Rails database access framework,
ActiveRecord, you should probably find plenty of material about it.
Because your task depends on :environment Rails is loaded, in
particular its dependency management is loaded so it will find your
Scraper class as long as it's in scraper.rb somewhere on its search
path. Don't take my word for it though, try it!
I put scraper.rb in my Libs folder
I put scraper.rake in my libs/tasks
I took the end portion of scraper.rb and removed it placing it in my
rake file:
desc "This task will parse data from ncaa.org and upload the data to our
db"
task :scraper => :environment do
# In our search we are supplying the website url to parse, the type of
element (ex: table), the class name of that element
# and the child element that contains the data we wish to retrieve.
offensive_rushing =
Scraper.new('http://web1.ncaa.org/mfb/natlRank.jsp?year=2008&rpt=IA_teamrush&site=org’,
'table', 'statstable', '//tr')
offensive_rushing.scrape_data
offensive_rushing.clean_celldata
offensive_rushing.print_values
end
And it did a print out when I called the rake.. So, now I'll have to
test this with the database and see how it works...
Thanks a ton (I understand it now)..
The part that was => environment do was telling my rake task to make
sure that the environment was fully loaded before running it.
So, if I wanted to run another rake task in the same rake file and I
wanted to make sure the first was done, I'd do something like:
task: next_task => :scraper do
# code
end
which would make it run only after the scraper task had finished..