compile_hdock_excel

Scrapes HDOCK job result pages, extracts the “Summary of the Top 10 Models” tables for any number of complexes, and builds a single-sheet Excel workbook with embedded download links to each job’s full results archive.

compile_hdock_excel.py

Scrape any number of HDOCK job result pages, capture each job’s
“Summary of the Top 10 Models” table, and combine everything into a single Excel workbook.
Every complex is stacked in one worksheet: Rank / Docking Score / Confidence Score / Ligand RMSD / Interface residues All results package ← clickable hyperlink to all_results.tar.gz (blank line)

Features

Single-sheet output – Easy filtering, sorting, or pivot-table analysis.
Embedded hyperlinks – One-click download of all_results.tar.gz for each complex.
Flexible scraping 3.1 Tries the plaintext ranked_poses.txt first (fast). 3.2 Falls back to the HTML result page if needed. 3.3 Handles both row-oriented (native) and column-oriented table variants.
Polite scraping (1s delay) and descriptive logging.

Installation

git clone https://github.com/SidSin0809/hdock_batch.git

cd hdock_batch

pip install -r requirements.txt

pandas>=2.0 openpyxl>=3.1 requests>=2.30 beautifulsoup4>=4.12

All dependencies are pure-Python and installable with pip install -r requirements.txt.

Usage

Prepare an input list Create hdock_urls.txt (or any filename you like) with one job per line:

6PB0-CRH http://hdock.phys.hust.edu.cn/data/xxxxxxxxxxxxx/

6WZG-Secretin http://hdock.phys.hust.edu.cn/data/xxxxxxxxxxxxx/

Lines starting with # are ignored First column = sheet header / complex ID Second column = base URL of the job directory (trailing “/” optional).

Run the script

python compile_hdock_excel.py
-i hdock_urls.txt
-o compiled_hdock_results.xlsx

You’ll see progress messages in the console. If a job directory is unreachable or malformed, it’s reported and skipped.

Open the workbook

compiled_hdock_results.xlsx → worksheet Summary

Each complex block begins with a bold ID row.

The five-row matrix (Rank → Interface residues) follows.

The “All results package” row contains an Excel HYPERLINK formula pointing to all_results.tar.gz.

Troubleshooting

“Top-10 table not found”

Check the URL and confirm the job is finished (HDOCK sometimes cleans old jobs).

Empty workbook

All jobs failed to parse—run with one URL first for diagnostics.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
LICENSE		LICENSE
README.md		README.md
compile_hdock_excel.py		compile_hdock_excel.py
hdock_urls.txt		hdock_urls.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

compile_hdock_excel

compile_hdock_excel.py

Installation

Usage

Troubleshooting

About

Uh oh!

Releases

Packages

Languages

License

SidSin0809/compile_hdock_excel

Folders and files

Latest commit

History

Repository files navigation

compile_hdock_excel

compile_hdock_excel.py

Installation

Usage

Troubleshooting

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages