(fix): Bunkrr child item fix and improve scraping performance by Was213xzc · Pull Request #1689 · Cyberdrop-DL/cyberdrop-dl

Was213xzc · 2026-04-17T23:40:00Z

Fixes

Fix Bunkrr child album discovery so downloadable items nested under album links are scraped correctly.
Fix Bunkrr album pagination so users do not need to manually add ?page=2, ?page=3, etc. to urls.txt.
Preserve Fileditch Turnstile handling with regression coverage.

Features

Add batched/parallel Bunkrr album page discovery to improve scraping throughput on large albums.
Add history-table indexes to speed up repeated scrape/queue lookups.
Optimize completed-referer checks to avoid scanning every matching history row.

Why

Large Bunkrr albums can contain downloadable items across child album links and many paginated pages. Previously, users had to manually enumerate page URLs, and large urls.txt runs could spend too much time scraping/queueing due to repeated page and history lookups.

Notes

This PR intentionally combines the child-item fix and Bunkrr performance work because the batching/parallelism depends on the pagination and child-album discovery changes.
I don't have hard numbers for the performance improvements, but while testing, there was a significantly noticeable improvement over the baseline.

`/redirect/?to=<b64_link>`

- fix 404 not found errors

refactor: add `nextjs` flight data parsing utils

feat: add upload.ee support

refactor: simplify css iget API

fix: update CDN overrides (bunkr)

feat: add filester support

fix: download of multipage profile albums (Chevereto)

feat: use original name (turbovid)

fix: handle additional redirect links (Xenforo)

merge from master

merge from dev

fix: series name and chapter selection (Toonily)

fix(bunkr): fixes bunkr 404 not found, switch download API from id-based to slug-based endpoint

Copilot

Pull request overview

This PR fixes Bunkrr album scraping correctness (child items + pagination) while improving throughput on large albums, and adds performance optimizations to history lookups plus regression coverage for Fileditch Turnstile handling.

Changes:

Add Bunkrr album pagination discovery with batched parallel page fetching and slug de-duping.
Speed up history lookups by adding SQLite indexes and optimizing referer-completion checks.
Add/extend tests for Bunkrr pagination behavior, history-table indexing, and Fileditch Turnstile detection.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
`cyberdrop_dl/crawlers/bunkrr.py`	Adds album page discovery/batching, pagination URL normalization, and download-src resolution changes.
`cyberdrop_dl/crawlers/fileditch.py`	Detects Cloudflare Turnstile challenge pages and raises `DDOSGuardError`.
`cyberdrop_dl/database/tables/history.py`	Creates history-table indexes on startup; optimizes `check_complete_by_referer` query.
`cyberdrop_dl/database/tables/definitions.py`	Defines `create_history_indexes` DDL script.
`tests/test_bunkrr.py`	Unit coverage for new Bunkrr helpers + pagination/behavior regressions.
`tests/crawlers/test_cases/bunkr.py`	Adds an additional Bunkrr crawler integration test case for album child-count coverage.
`tests/test_fileditch_turnstile.py`	Adds regression tests for Turnstile challenge detection.
`tests/test_history_table.py`	Verifies index creation + referer completion check behavior.
`.gitignore`	Ignores pytest/ruff caches and local crawler debugging artifacts.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

NTFSvolume · 2026-04-22T20:16:52Z

This is a good change. Can you please move these database changes to a different PR?

NTFSvolume · 2026-04-22T20:18:22Z

        if domain is None:
-            query = "SELECT completed FROM media WHERE referer = ?"
+            query = "SELECT 1 FROM media WHERE referer = ? and completed != 0 LIMIT 1"
            params = (str(referer),)
        else:
-            query = "SELECT completed FROM media WHERE referer = ? and domain = ?"
+            query = "SELECT 1 FROM media WHERE referer = ? and domain = ? and completed != 0 LIMIT 1"
            params = str(referer), domain

        cursor = await self.db_conn.execute(query, params)
-        if domain is None:
-            rows = await cursor.fetchall()
-        else:
-            row = await cursor.fetchone()
-            if row is None:
-                return False
-            rows = [row]
-        return bool(rows and any(row[0] != 0 for row in rows))
+        return await cursor.fetchone() is not None


This is great as well. Please move to a database PR

NTFSvolume · 2026-04-22T20:21:58Z

I don't understand why you made pagination changes to bunkr. We can get all files in an album with a single request using the advanced query param, which is what CDL already does.

No need for pagination

did not realize there was an advanced query param that did this, I was encountering errors with it failing to scrape items on other pages

NTFSvolume · 2026-04-22T20:23:48Z

        soup = await self.request_soup(scrape_item.url)
+        _check_turnstile(soup)


This is no longer required but if you want to force it, CDL has a dedicated module for that

Suggested change

soup = await self.request_soup(scrape_item.url)

_check_turnstile(soup)

soup = await self.request_soup(scrape_item.url)

from cyberdrop_dl import ddos_guard

await ddos_guard.check(soup)

NTFSvolume · 2026-04-25T18:29:12Z

I will cherry pick the database changes on a new PR

NTFSvolume and others added 30 commits February 14, 2026 22:43

fix: handle additional redirect links (Xenforo)

1770ac8

`/redirect/?to=<b64_link>`

feat: use original name (turbovid)

ac91600

fix: album title

7b5e318

fix: download of multipage profile albums (Chevereto)

d17d026

fix: update CDN overrides (bunkr)

4f341aa

feat: add filester support

06f6100

feat: add album support

d031bf8

feat: add folder support

481aae3

refactor: simplify

70f664f

refactor: simplify css iget API

fbff84d

feat: add upload.ee support

09db220

refactor: add nextjs13+ parsing utils

3a3b11d

refactor: some commnets and reorder

0d2bd2c

tests: add tests

5bed309

refactor: typeAlias

648f2e7

refactor: use external package (Mega.nz)

0166f99

refactor: update transfer it

9640c38

tests: add tests

5da4228

tests: update tests

1d5df21

refactor: update error handling

94e6865

refactor: simplify

c8a52a6

tests: add tests

53fa2cd

chore: add async-mega-py dep

a66efe1

chore: update poetry lock

6f3cb00

Update bunkrr.py

14c4dc0

- fix 404 not found errors

Lock Ruff version

0a9a7e9

Update Ruff Version Requirement

6d78d05

Merge branch 'dev' into fix_bunkr_thumbs

9ffffc6

Merge branch 'dev' into next_js_13

5342701

Merge branch 'dev' into upload.ee

8179963

jbsparrow and others added 19 commits March 13, 2026 11:55

Merge pull request Cyberdrop-DL#1653 from jbsparrow/next_js_13

bf63463

refactor: add `nextjs` flight data parsing utils

Merge pull request Cyberdrop-DL#1652 from jbsparrow/upload.ee

f4418b5

feat: add upload.ee support

Merge pull request Cyberdrop-DL#1651 from jbsparrow/css_iget

b578068

refactor: simplify css iget API

Merge pull request Cyberdrop-DL#1649 from jbsparrow/bunkr_cdns

9385649

fix: update CDN overrides (bunkr)

Merge pull request Cyberdrop-DL#1650 from jbsparrow/filester

3cad241

feat: add filester support

Merge pull request Cyberdrop-DL#1648 from jbsparrow/fix_chevereto_albums

fa36ab1

fix: download of multipage profile albums (Chevereto)

Merge pull request Cyberdrop-DL#1647 from jbsparrow/turbo_name

d4afea1

feat: use original name (turbovid)

Merge pull request Cyberdrop-DL#1646 from jbsparrow/redirects_xenforo

31b3f7b

fix: handle additional redirect links (Xenforo)

fix: series name and chapter selection (Toonily)

25eb09b

ci: Updating tests

33cd8fc

Merge pull request #2 from jbsparrow/dev

d2b5d04

merge from master

Merge branch 'patch-1' into dev

c79b2c9

merge from dev

b449c31

merge from dev

Merge pull request Cyberdrop-DL#1675 from Barbarella6666666/toonily

f15e256

fix: series name and chapter selection (Toonily)

Merge pull request Cyberdrop-DL#1665 from Was213xzc/patch-1

79aa881

fix(bunkr): fixes bunkr 404 not found, switch download API from id-based to slug-based endpoint

Fix Bunkr child album discovery

84a2f78

Handle Bunkr pagination and Fileditch Turnstile

5d5a1c0

Ignore local crawler temp artifacts

b7e65ee

Improve Bunkr scraping performance

ac4edf7

Was213xzc changed the title ~~(fix): Bunkrr child item fix~~ (fix): Bunkrr child item fix and improve scraping performance Apr 18, 2026

Keep Bunkr file links from expanding albums

f606c82

jbsparrow requested a review from Copilot April 21, 2026 13:22

Copilot started reviewing on behalf of jbsparrow April 21, 2026 13:23 View session

Copilot AI reviewed Apr 21, 2026

View reviewed changes

Comment thread cyberdrop_dl/crawlers/bunkrr.py Outdated

Handle terminal Bunkr album probe pages explicitly

c5e28bc

NTFSvolume self-assigned this Apr 22, 2026

NTFSvolume requested changes Apr 22, 2026

View reviewed changes

NTFSvolume changed the base branch from dev to main April 29, 2026 04:21

NTFSvolume marked this pull request as draft April 30, 2026 17:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

(fix): Bunkrr child item fix and improve scraping performance#1689

(fix): Bunkrr child item fix and improve scraping performance#1689
Was213xzc wants to merge 59 commits into
Cyberdrop-DL:mainfrom
Was213xzc:bunkrr-child-item-fix

Was213xzc commented Apr 17, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

NTFSvolume Apr 22, 2026

Uh oh!

NTFSvolume Apr 22, 2026

Uh oh!

NTFSvolume Apr 22, 2026

Uh oh!

Was213xzc Apr 22, 2026 •

edited

Loading

Uh oh!

NTFSvolume Apr 22, 2026

Uh oh!

NTFSvolume commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		soup = await self.request_soup(scrape_item.url)
		_check_turnstile(soup)

Uh oh!

Conversation

Was213xzc commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fixes

Features

Why

Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

NTFSvolume Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

NTFSvolume Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

NTFSvolume Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Was213xzc Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NTFSvolume Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

NTFSvolume commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Was213xzc commented Apr 17, 2026 •

edited

Loading

Was213xzc Apr 22, 2026 •

edited

Loading