#Flexible Web Crawler designed for Product Scraping#
##Dependencies##
- open-uri
- openssl
- hpricot
- anemone
##Usage##
- Create a list of hash describing the objects to search for.
[{
:location=>"class/id name | html location (i.e. ul[@class='vehicleDescription']/li[9])",
:type=>"custom | class | id"
:name=>"Name"
}]
- Instantiate a copy of the scraper.
scraper = Scrapy::Crawler.new(<website url>, options)
- Start the crawl session.
scraper.crawl
Note: this call will return immediately, but crawling will take some time
- You can poll the crawler to see if it has finished
scraper.crawling_complete? => boolean
- Once crawling has finished, use retrieve_products to receive a list of products matched.
scrapper.receive_products
Note: each item will be under scrapper[0][:Name], page_url is scrapper[0][:page_url]