Rake Tasks

1. Where do custom .rb files go inside of my rails project? (for instance I understand the MVC but with a rake task - in my brain it's outside of the project and I'm not sure how it is supposed to communicate with controllers or pull/associate variables from those areas.

This custom file should be called scraper.rb and should be placed at the /lib folder of your application. In a rake task you don't really access or call controllers, you just run the task, which is telling the scrapper to load the data and then save it to the DB.

2. With my custom .rb I'm also requiring 'hpricot'. Is there anything special I need to do with a .rake file to make sure that it knows to pull this gem? And, if I export to my real site, how do I ensure that hpricot is loaded there too? In otherwords, what expectations should I be relying on?

You don't need to do anything else, Rails will automatically enable rubygems and by requiring hpricot you will tell it to load the gem.

3. When I run a rake task and need to communicate with my database (for uploading purposes) is there an easy way to do this? Can I utilize .rake with my DB inside of my rails environment? Or, are rake tasks completely seperate and distinct and need to be considered outside of scope?

Now you have to learn the Rails database access framework, ActiveRecord, you should probably find plenty of material about it.

Because your task depends on :environment Rails is loaded, in particular its dependency management is loaded so it will find your Scraper class as long as it's in scraper.rb somewhere on its search path. Don't take my word for it though, try it!

Fred

Thanks - I understand that part now!

I put scraper.rb in my Libs folder I put scraper.rake in my libs/tasks

I took the end portion of scraper.rb and removed it placing it in my rake file:

desc "This task will parse data from ncaa.org and upload the data to our db" task :scraper => :environment do   # In our search we are supplying the website url to parse, the type of element (ex: table), the class name of that element   # and the child element that contains the data we wish to retrieve.   offensive_rushing = Scraper.new('http://web1.ncaa.org/mfb/natlRank.jsp?year=2008&rpt=IA_teamrush&site=org’,     'table', 'statstable', '//tr')   offensive_rushing.scrape_data   offensive_rushing.clean_celldata   offensive_rushing.print_values end

And it did a print out when I called the rake.. So, now I'll have to test this with the database and see how it works...

Thanks a ton (I understand it now)..

The part that was => environment do was telling my rake task to make sure that the environment was fully loaded before running it.

So, if I wanted to run another rake task in the same rake file and I wanted to make sure the first was done, I'd do something like:

task: next_task => :scraper do   # code end

which would make it run only after the scraper task had finished..