-
Notifications
You must be signed in to change notification settings - Fork 0
Arachpyd - Your friendly neighborhood spider and web crawler.
License
denisgomes/arachpyd
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
**arackpy**
===========
**arackpy** is a simple but powerful web crawler and scraper. Although it is
good natured and respectful at heart, it can be used to do evil. Remember with
great power comes great responsibility.
Requirements
------------
**arackpy** currently supports Python 2.7 to 3.6+ out of the box. Depending on
how you want to extract data, several other dependencies from the list below is
required to be installed to support the various backends:
* lxml - for html parsing and url extraction
* requests - for downloading html pages
* pysocks - for making tor based connections
* fake_useragent - for browser spoofing
* selenium (coming soon!)
Installation
------------
For a vanilla **arackpy** install with no other dependencies:
pip install arackpy
For proxy and tor support:
pip install lxml, requests, fake_useragent, pysocks
Quickstart
----------
Open up your favorite python text editor and type the following:
# hello_spider.py
from __future__ import print_function # python 2 support
from arackpy.spider import Spider
class HelloSpider(Spider):
"""A simple spider in just ten lines of working code"""
start_urls = ["https://www.python.org"]
def parse(self, url, html):
"""Extract data from the raw html"""
print("Crawling url, %s" % url)
if __name__ == "__main__":
print("Press Ctrl-c to stop crawling")
spider = HelloSpider()
spider.crawl()
Run the program using:
$ python hello_spider.py
Note
Press Ctrl-c to terminate crawling. To use proxies or tor, change the backend
accordingly.
Documentation
-------------
To learn more, go read the docs at https://arackpy.readthedocs.org. The
**arackpy** logo was taken from https://clipart-library.com.
About
Arachpyd - Your friendly neighborhood spider and web crawler.
Topics
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published