GitHub - dealerignition/Scrapy: flexible threaded web crawler based on hpricot and anemone

#Flexible Web Crawler designed for Product Scraping#

##Dependencies##

##Usage##

[{
  :location=>"class/id name | html location (i.e. ul[@class='vehicleDescription']/li[9])",
  :type=>"custom | class | id"
  :name=>"Name"
}]

scraper = Scrapy::Crawler.new(<website url>, options)

scraper.crawl
Note: this call will return immediately, but crawling will take some time

scraper.crawling_complete? => boolean

Once crawling has finished, use retrieve_products to receive a list of products matched.

scrapper.receive_products
Note: each item will be under scrapper[0][:Name], page_url is scrapper[0][:page_url]

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
README.md		README.md
scrapy.rb		scrapy.rb
test.rb		test.rb

Provide feedback