OCrawler

A small Crawler Written in Go to retrieve the Site Map of any domain

Why O? Just because my name starts with O, nothing special 😊.

Prerequisites

Go 1.8^
github.com/beego/bee/logger/colors

...Adds colour to the output of the program

golang.org/x/net/html ...Parses HTML documents

Installation

To download and install the libraries just type on the CLI

$ go get github.com/beego/bee/logger/colors
$ go get golang.org/x/net/html

Don't forget to setup your GOPATH environment variable!

Build

To build the source code go to the working directory and just type

$ go build

This will generate the executable file

Run

To run simply execute the file that was generated after the build in this case OCrawler

The first variable is the domain to crawl, without http/https at the beginning and without any URI, even without a single slash /

The second variable is the maximum depth on when crawling, this will set a threshold on how many levels more want to crawl, a good number should be 1 or 2.

The third variable is how many processes you want to have alive at the same time while the execution is in progress. A good rule of thumb is to use the number of threads your computer has.

$ ./OCrawler [domain] [depth] [max processes]
$ ./OCrawler golang.org 2 8

Output

The tree has 3 main colours

Blue: new discovered links
Magenta: Already discovered links
Green: Assets

Developed by Omar Contreras [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
crawler.go		crawler.go
demo.gif		demo.gif
logging.go		logging.go
main.go		main.go
parser.go		parser.go
parser_test.go		parser_test.go
sitemap.go		sitemap.go
url_parser.go		url_parser.go
valid_urls.txt		valid_urls.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OCrawler

Prerequisites

Installation

Build

Run

Output

About

Releases

Packages

Languages

omarch7/OCrawler

Folders and files

Latest commit

History

Repository files navigation

OCrawler

Prerequisites

Installation

Build

Run

Output

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages