Skip to content

Commit

Permalink
add GitHub action, pre-commit and format code
Browse files Browse the repository at this point in the history
Signed-off-by: Zhiyuan Chen <[email protected]>
  • Loading branch information
ZhiyuanChen committed Jun 30, 2023
1 parent 880619c commit 5248c34
Show file tree
Hide file tree
Showing 49 changed files with 1,215 additions and 1,184 deletions.
7 changes: 7 additions & 0 deletions .github/merge_rules.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
- name: merge
patterns:
- icrawler/**
approved_by:
- ZhiyuanChen
mandatory_checks_name:
- push
86 changes: 86 additions & 0 deletions .github/workflows/push.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
name: push
on: [push, pull_request]
jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v4
with:
python-version: 3.x
cache: "pip"
- uses: pre-commit/[email protected]
test:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.7", "3.8", "3.9", "3.10", "3.11"]
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
cache: "pip"
- name: Install dependencies
run: pip install -r requirements.txt && pip install -e .
- name: Install dependencies for testing
run: pip install pytest pytest-cov
- name: pytest
run: pytest .
release:
if: startsWith(github.event.ref, 'refs/tags/v')
needs: [lint, test]
environment: pypi
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v4
with:
python-version: 3.x
cache: "pip"
- name: Install dependencies for building
run: pip install wheel setuptools_scm
- name: build package
run: python setup.py sdist bdist_wheel
- name: create release
uses: "marvinpinto/action-automatic-releases@latest"
with:
repo_token: "${{ secrets.GITHUB_TOKEN }}"
prerelease: false
files: |
dist/*
- name: publish to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
with:
password: ${{ secrets.PYPI_API_TOKEN }}
develop:
if: contains(fromJson('["refs/heads/master", "refs/heads/main"]'), github.ref)
needs: [lint, test]
environment: pypi
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
with:
fetch-depth: 0
- uses: actions/setup-python@v4
with:
python-version: 3.x
cache: "pip"
- name: Install dependencies for building
run: pip install wheel setuptools_scm
- name: build package
run: python setup.py sdist bdist_wheel
- name: create release
uses: "marvinpinto/action-automatic-releases@latest"
with:
repo_token: "${{ secrets.GITHUB_TOKEN }}"
automatic_release_tag: "latest"
prerelease: true
title: "Development Build"
files: |
dist/*
# - name: publish to Test PyPI
# uses: pypa/gh-action-pypi-publish@release/v1
# with:
# password: ${{ secrets.TEST_PYPI_API_TOKEN }}
# repository_url: https://test.pypi.org/legacy/
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -33,3 +33,6 @@ README.md

# Local test scripts
icrawler/utils/test_proxy.py

# version
icrawler/version.py
71 changes: 71 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
default_language_version:
python: python3
repos:
- repo: https://github.com/PSF/black
rev: 23.3.0
hooks:
- id: black
args: [--safe, --quiet]
- repo: https://github.com/PyCQA/isort
rev: 5.12.0
hooks:
- id: isort
name: isort
# - repo: https://github.com/PyCQA/flake8
# rev: 6.0.0
# hooks:
# - id: flake8
# additional_dependencies:
# - flake8-bugbear
# - flake8-comprehensions
# - flake8-simplify
- repo: https://github.com/asottile/pyupgrade
rev: v3.7.0
hooks:
- id: pyupgrade
args: [--py37-plus]
- repo: https://github.com/tox-dev/pyproject-fmt
rev: 0.12.1
hooks:
- id: pyproject-fmt
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.4.1
hooks:
- id: mypy
additional_dependencies:
- types-requests
- types-six
- repo: https://github.com/codespell-project/codespell
rev: v2.2.5
hooks:
- id: codespell
- repo: https://github.com/pre-commit/mirrors-prettier
rev: v3.0.0-alpha.9-for-vscode
hooks:
- id: prettier
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
hooks:
- id: check-added-large-files
- id: check-ast
- id: check-byte-order-marker
- id: check-builtin-literals
- id: check-case-conflict
- id: check-docstring-first
- id: check-merge-conflict
- id: check-vcs-permalinks
- id: check-symlinks
- id: pretty-format-json
- id: check-json
- id: check-xml
- id: check-toml
- id: check-yaml
- id: debug-statements
- id: end-of-file-fixer
- id: fix-byte-order-marker
- id: fix-encoding-pragma
args: ["--remove"]
- id: mixed-line-ending
args: ["--fix=lf"]
- id: requirements-txt-fixer
- id: trailing-whitespace
26 changes: 0 additions & 26 deletions .travis.yml

This file was deleted.

2 changes: 1 addition & 1 deletion MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1 +1 @@
include README.rst
include README.rst
2 changes: 1 addition & 1 deletion README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ they are connected with each other with FIFO queues. The workflow is shown in
the following figure.

.. figure:: http://7xopqn.com1.z0.glb.clouddn.com/workflow.png
:alt:
:alt:

- ``url_queue`` stores the url of pages which may contain images
- ``task_queue`` stores the image url as well as any meta data you
Expand Down
2 changes: 1 addition & 1 deletion docs/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -17,4 +17,4 @@ help:
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
2 changes: 1 addition & 1 deletion docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -41,4 +41,4 @@ utils

.. automodule:: icrawler.utils
:members:
:show-inheritance:
:show-inheritance:
12 changes: 6 additions & 6 deletions docs/builtin.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ Search engine crawlers
----------------------

The search engine crawlers (Google, Bing, Baidu) have universal APIs.
Here is an example of how to use the built-in crawlers.
Here is an example of how to use the built-in crawlers.

.. code:: python
Expand Down Expand Up @@ -77,7 +77,7 @@ When using ``GoogleImageCrawler``, language can be specified via the argument ``
to view the result page. The limitation is usually 1000 for many search engines such as google and bing. To crawl more than 1000 images with a single keyword, we can specify different date ranges.

.. code:: python
google_crawler.crawl(
keyword='cat',
filters={'date': ((2016, 1, 1), (2016, 6, 30))},
Expand Down Expand Up @@ -134,8 +134,8 @@ are also supported. Valid arguments and values are shown as follows.
corresponding relations between the colors and the codes.
- ``styles`` -- A comma-delimited list of styles, including ``blackandwhite``,
``depthoffield``, ``minimalism`` and ``pattern``.
- ``orientation`` -- A comma-delimited list of image orientation. It can be
``landscape``, ``portrait``, ``square`` and ``panorama``. The default
- ``orientation`` -- A comma-delimited list of image orientation. It can be
``landscape``, ``portrait``, ``square`` and ``panorama``. The default
includes all of them.

Another parameter ``size_preference`` is available for Flickr crawler, it define
Expand All @@ -155,7 +155,7 @@ the preferred order of image sizes. Valid values are shown as follows.
- square: 75x75

``size_preference`` can be either a list or a string, if not specified, all
sizes are acceptable and larger sizes are prior to smaller ones.
sizes are acceptable and larger sizes are prior to smaller ones.

.. note::

Expand All @@ -174,7 +174,7 @@ If you just want to crawl all the images from some website, then
from icrawler.builtin import GreedyImageCrawler
greedy_crawler = GreedyImageCrawler(storage={'root_dir': 'your_image_dir'})
greedy_crawler.crawl(domains='http://www.bbc.com/news', max_num=0,
greedy_crawler.crawl(domains='http://www.bbc.com/news', max_num=0,
min_size=None, max_size=None)
The argument ``domains`` can be either an url string or list.
Expand Down
Loading

0 comments on commit 5248c34

Please sign in to comment.