Skip to content

Commit c7d8082

Browse files
author
Abhijit Roy
committed
Abhijit Roy | Python script to download one piece episodes from the CLI
1 parent 372a47d commit c7d8082

File tree

2 files changed

+70
-0
lines changed

2 files changed

+70
-0
lines changed

OnePieceScraper/README.md

+35
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
### One Piece Scraper
2+
This python script can be used to scrape all one piece episodes using the CLI
3+
4+
### Technology
5+
Python3 is needed to run the script
6+
7+
### Usage
8+
#### Installing python3
9+
Install python3 using the following [Link](https://www.python.org/downloads/)
10+
11+
#### Installing Required Packages
12+
* Once python is installed, install pip using the steps in this [Link](https://pip.pypa.io/en/stable/installing/)
13+
* Install requests module using pip
14+
15+
```
16+
$ pip install requests
17+
```
18+
19+
* Install BeautifulSoup using pip
20+
21+
```
22+
$ pip install beautifulsoup4
23+
```
24+
25+
* Installing TQDM
26+
27+
```
28+
pip install tqdm
29+
```
30+
31+
#### Running the script
32+
Simply run the script using python
33+
```
34+
$ python ./one_piece_scraper.py
35+
```

OnePieceScraper/one_piece_scraper.py

+35
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
#! python3
2+
3+
import sys, webbrowser, requests, re, math
4+
from bs4 import BeautifulSoup
5+
from tqdm import tqdm
6+
7+
def check_link(link):
8+
return link.endswith(('.mkv', '.avi'))
9+
10+
def sort_links(links):
11+
links = sorted(links, key= lambda x: int(re.search(r'\d+', x.text).group()))
12+
return links
13+
14+
if __name__ == '__main__':
15+
DN_STR = 'https://storage.kanzaki.ru/ANIME___/One_Piece/'
16+
itr=0
17+
BLOCK_SIZE = 1024
18+
html = requests.get(DN_STR)
19+
soup = BeautifulSoup(str(html.text), "html.parser")# lxml is just the parser for reading the html
20+
links = soup.find_all('a') # this is the line that does what you want
21+
links = links[2:]
22+
links = sort_links(links)
23+
for index in range(0, len(links)):
24+
if not check_link(links[index]['href']):
25+
continue
26+
print('Downloading Link : {}'.format(links[index].text))
27+
res = requests.get(DN_STR+links[index]['href'], stream = True)
28+
total_size = int(res.headers.get('content-length', 0))
29+
bytes_written = 0
30+
with open('./{}'.format(links[index].text), 'wb') as f_ep:
31+
for data in tqdm(res.iter_content(BLOCK_SIZE),
32+
total = math.ceil(total_size / BLOCK_SIZE) ,
33+
unit = 'KB',
34+
unit_scale = True):
35+
bytes_written += f_ep.write(data)

0 commit comments

Comments
 (0)