Skip to content

Commit 5e32488

Browse files
committed
Add Sphinx-based documentation
To build the docs, move to the `docs` directory and run `make html`. The resulting docs will be in `docs/build/html`.
1 parent 13324a8 commit 5e32488

20 files changed

+462
-22
lines changed

.editorconfig

+3
Original file line numberDiff line numberDiff line change
@@ -6,3 +6,6 @@ trim_trailing_whitespace = true
66

77
[*.py]
88
max_line_length = 79
9+
10+
[*.rst]
11+
indent_size = 2

.gitignore

+3
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,7 @@
11
# Created by .gitignore support plugin (hsz.mobi)
2+
3+
.vscode
4+
25
### Python template
36
# Byte-compiled / optimized / DLL files
47
__pycache__/

docs/Makefile

+20
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Minimal makefile for Sphinx documentation
2+
#
3+
4+
# You can set these variables from the command line, and also
5+
# from the environment for the first two.
6+
SPHINXOPTS ?= "-W" # This flag turns warnings into errors.
7+
SPHINXBUILD ?= sphinx-build
8+
SOURCEDIR = source
9+
BUILDDIR = build
10+
11+
# Put it first so that "make" without argument is like "make help".
12+
help:
13+
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
14+
15+
.PHONY: help Makefile
16+
17+
# Catch-all target: route all unknown targets to Sphinx using the new
18+
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
19+
%: Makefile
20+
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

docs/make.bat

+35
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
@ECHO OFF
2+
3+
pushd %~dp0
4+
5+
REM Command file for Sphinx documentation
6+
7+
if "%SPHINXBUILD%" == "" (
8+
set SPHINXBUILD=sphinx-build
9+
)
10+
set SOURCEDIR=source
11+
set BUILDDIR=build
12+
13+
if "%1" == "" goto help
14+
15+
%SPHINXBUILD% >NUL 2>NUL
16+
if errorlevel 9009 (
17+
echo.
18+
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
19+
echo.installed, then set the SPHINXBUILD environment variable to point
20+
echo.to the full path of the 'sphinx-build' executable. Alternatively you
21+
echo.may add the Sphinx directory to PATH.
22+
echo.
23+
echo.If you don't have Sphinx installed, grab it from
24+
echo.http://sphinx-doc.org/
25+
exit /b 1
26+
)
27+
28+
%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
29+
goto end
30+
31+
:help
32+
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
33+
34+
:end
35+
popd

docs/requirements.txt

+15
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
-r ../requirements.txt
2+
-r ../requirements-experimental.txt
3+
-r ../requirements-dev.txt
4+
5+
# ../requirements-server.txt includes some binary dependencies readthedocs
6+
# doesn't support, so just include the minimum here.
7+
# See `autodoc_mock_imports` in `docs/source/conf.py` for where other
8+
# dependencies we aren't actually installing get mocked.
9+
tornado >=6.0.0,<7
10+
11+
# We need to tell readthedocs to install this package itself as well.
12+
# Confusingly, we have to specify this directory relative to the repo root
13+
# rather than relative to the directory where this file resides. That is
14+
# inconsistent with the lines above...the mysteries of pip!
15+
./

docs/source/api-reference.rst

+76
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
=============
2+
API Reference
3+
=============
4+
5+
6+
Diff Types
7+
----------
8+
9+
*web-monitoring-diff* provides a variety of diff algorithms for use in comparing web content. They all follow a similar standardized signature and return format.
10+
11+
**Diff Signatures**
12+
13+
All diffs should have parameters named ``a_<body|text>`` and ``b_<body|text>`` as their first two arguments. These represent the two pieces of content to compare, where ``a`` represents the “from” or left-hand side and ``b`` represents the “to” or right-hand side of the comparison. The name indicates whether the function takes bytes (``a_body``/``b_body``) or a decoded string (``a_text``/``b_text``). The web server inspects argument names to determine what to pass to a given diff type.
14+
15+
Additionally, diffs may take several other standardized parameters:
16+
17+
* ``a_body``, ``b_body``: Raw HTTP reponse body (bytes), described above.
18+
* ``a_text``, ``b_text``: Decoded text of HTTP response body (str), described above.
19+
* ``a_url``, ``b_url``: URL at which the content being diffed is found. (This is useful when content contains location-relative information, like links.)
20+
* ``a_headers``, ``b_headers``: Dict of HTTP headers.
21+
22+
Finally, some diffs take additional, diff-specific parameters.
23+
24+
**Return Values**
25+
26+
All diffs return a :class:`dict` with a key named ``"diff"``. The value of this dict entry varies by diff type, but is usually:
27+
28+
- An array of changes. Each entry will be a 2-tuple, where the first item is an :class:`int` reprenting the type of change (``-1`` for removal, ``0`` for unchanged, ``1`` for addition, or other numbers for diff-specific meanings) and the second item is the data or string that was added/removed/unchanged.
29+
30+
- A string representing a custom view of the diff, e.g. an HTML document.
31+
32+
- A bytestring representing a custom binary view of the diff, e.g. an image.
33+
34+
Each diff may add additional, diff-specifc keys to the dict. For example, :func:`web_monitoring_diff.html_diff_render` includes a ``"change_count"`` key indicating how many changes there were, since it’s tough
35+
to inspect the HTML of the resulting diff and count yourself.
36+
37+
38+
.. autofunction:: web_monitoring_diff.compare_length
39+
40+
.. autofunction:: web_monitoring_diff.identical_bytes
41+
42+
.. autofunction:: web_monitoring_diff.side_by_side_text
43+
44+
.. autofunction:: web_monitoring_diff.html_text_diff
45+
46+
.. autofunction:: web_monitoring_diff.html_source_diff
47+
48+
.. autofunction:: web_monitoring_diff.links_diff
49+
50+
.. autofunction:: web_monitoring_diff.links_diff_json
51+
52+
.. autofunction:: web_monitoring_diff.links_diff_html
53+
54+
.. autofunction:: web_monitoring_diff.html_diff_render
55+
56+
.. automodule:: web_monitoring_diff.experimental
57+
58+
.. autofunction:: web_monitoring_diff.experimental.htmldiffer.diff
59+
60+
.. autofunction:: web_monitoring_diff.experimental.htmltreediff.diff
61+
62+
63+
Web Server
64+
----------
65+
66+
.. autofunction:: web_monitoring_diff.server.make_app
67+
68+
.. autofunction:: web_monitoring_diff.server.cli
69+
70+
71+
Exception Classes
72+
-----------------
73+
74+
.. autoclass:: web_monitoring_diff.exceptions.UndecodableContentError
75+
76+
.. autoclass:: web_monitoring_diff.exceptions.UndiffableContentError

docs/source/conf.py

+88
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
# Configuration file for the Sphinx documentation builder.
2+
#
3+
# This file only contains a selection of the most common options. For a full
4+
# list see the documentation:
5+
# https://www.sphinx-doc.org/en/master/usage/configuration.html
6+
7+
# -- Path setup --------------------------------------------------------------
8+
9+
# If extensions (or modules to document with autodoc) are in another directory,
10+
# add these directories to sys.path here. If the directory is relative to the
11+
# documentation root, use os.path.abspath to make it absolute, like shown here.
12+
#
13+
# import os
14+
# import sys
15+
# sys.path.insert(0, os.path.abspath('.'))
16+
17+
18+
# -- Project information -----------------------------------------------------
19+
20+
project = 'web-monitoring-diff'
21+
copyright = '2017-2020, Environmental Data & Governance Initiative'
22+
author = 'Environmental Data & Governance Initiative'
23+
24+
# The version info for the project you're documenting, acts as replacement for
25+
# |version| and |release|, also used in various other places throughout the
26+
# built documents.
27+
#
28+
import web_monitoring_diff
29+
# The short X.Y version.
30+
version = web_monitoring_diff.__version__
31+
# The full version, including alpha/beta/rc tags.
32+
release = web_monitoring_diff.__version__
33+
34+
35+
# -- General configuration ---------------------------------------------------
36+
37+
# Add any Sphinx extension module names here, as strings. They can be
38+
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
39+
# ones.
40+
extensions = [
41+
'sphinx.ext.autodoc',
42+
'sphinx.ext.autosummary',
43+
'sphinx.ext.githubpages',
44+
'sphinx.ext.intersphinx',
45+
'sphinx.ext.mathjax',
46+
'sphinx.ext.viewcode',
47+
'IPython.sphinxext.ipython_directive',
48+
'IPython.sphinxext.ipython_console_highlighting',
49+
'numpydoc',
50+
'sphinx_copybutton',
51+
]
52+
53+
# Generate the API documentation when building
54+
autosummary_generate = True
55+
numpydoc_show_class_members = False
56+
autodoc_mock_imports = ['html5_parser', 'pycurl', 'sentry']
57+
58+
# Add any paths that contain templates here, relative to this directory.
59+
templates_path = ['_templates']
60+
61+
# List of patterns, relative to source directory, that match files and
62+
# directories to ignore when looking for source files.
63+
# This pattern also affects html_static_path and html_extra_path.
64+
exclude_patterns = []
65+
66+
# Example configuration for intersphinx: refer to the Python standard library.
67+
intersphinx_mapping = {
68+
'python': ('https://docs.python.org/3/', None),
69+
'numpy': ('https://numpy.org/doc/stable/', None),
70+
'scipy': ('https://docs.scipy.org/doc/scipy/reference/', None),
71+
'pandas': ('https://pandas.pydata.org/pandas-docs/stable', None),
72+
'matplotlib': ('https://matplotlib.org', None),
73+
}
74+
75+
76+
# -- Options for HTML output -------------------------------------------------
77+
78+
# The theme to use for HTML and HTML Help pages. See the documentation for
79+
# a list of builtin themes.
80+
#
81+
html_theme = 'sphinx_rtd_theme'
82+
# import sphinx_rtd_theme
83+
# html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]
84+
85+
# Add any paths that contain custom static files (such as style sheets) here,
86+
# relative to this directory. They are copied after the builtin static files,
87+
# so a file named "default.css" will overwrite the builtin "default.css".
88+
html_static_path = ['_static']

docs/source/index.rst

+14
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
web-monitoring-diff
2+
===================
3+
4+
*Web-monitoring-diff* is a suite of functions that *diff* (find the differences between) types of content commonly found on the web, such as HTML, text files, etc. in a variety of ways. It also includes an optional web server that generates diffs as an HTTP service.
5+
6+
This package was originally built as a component of EDGI’s `Web Monitoring Project <https://github.com/edgi-govdata-archiving/web-monitoring>`_, but is also used by other organizations and tools.
7+
8+
.. toctree::
9+
:maxdepth: 2
10+
11+
installation
12+
usage
13+
api-reference
14+
release-history

docs/source/installation.rst

+66
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
============
2+
Installation
3+
============
4+
5+
*web-monitoring-diff* requires **Python 3.7 or newer**. Before anything else, make sure you’re using a supported version of Python. If you need to support different local versions of Python on your computer, we recommend using `pyenv`_ or `Conda`_.
6+
7+
1. **System-level dependencies:** web-monitoring-diff depends on several system-level, non-Python libraries that you may need to install first. Specifically, you’ll need: ``libxml2``, ``libxslt``, ``openssl``, and ``libcurl``.
8+
9+
**On MacOS,** we recommend installing these with `Homebrew`_:
10+
11+
.. code-block:: bash
12+
13+
brew install libxml2
14+
brew install libxslt
15+
brew install openssl
16+
# libcurl is built-in, so you generally don't need to install it
17+
18+
**On Debian Linux,** use ``apt``:
19+
20+
.. code-block:: bash
21+
22+
apt-get install libxml2-dev libxslt-dev libssl-dev openssl libcurl4-openssl-dev
23+
24+
**Other systems** may have different package managers or names for the packages, so you may need to look them up.
25+
26+
2. **Install this package** with *pip*. Be sure to include the ``--no-binary lxml`` option:
27+
28+
.. code-block:: bash
29+
30+
pip install web-monitoring-diff --no-binary lxml
31+
32+
Or, to also install the web server for generating diffs on demand, install the ``server`` extras:
33+
34+
.. code-block:: bash
35+
36+
pip install web-monitoring-diff[server] --no-binary lxml
37+
38+
The ``--no-binary`` flag ensures that pip downloads and builds a fresh copy of ``lxml`` (one of web-monitoring-diff’s dependencies) rather than using a pre-built version. It’s slower to install, but is required for all the dependencies to work correctly together. **If you publish a package that depends on web-monitoring-diff, your package will need to be installed with this flag, too.**
39+
40+
**On MacOS,** you may need additional configuration to get ``pycurl`` use the Homebrew openssl. Try the following:
41+
42+
.. code-block:: bash
43+
44+
PYCURL_SSL_LIBRARY=openssl \
45+
LDFLAGS="-L/usr/local/opt/openssl/lib" \
46+
CPPFLAGS="-I/usr/local/opt/openssl/include" \
47+
pip install web-monitoring-diff --no-binary lxml --no-cache-dir
48+
49+
The ``--no-cache-dir`` flag tells *pip* to re-build the dependencies instead of using versions it’s built already. If you tried to install once before but had problems with ``pycurl``, this will make sure pip actually builds it again instead of re-using the version it built last time around.
50+
51+
**For local development,** clone the git repository and then make sure to do an editable installation instead.
52+
53+
.. code-block:: bash
54+
55+
pip install .[server,dev] --no-binary lxml
56+
57+
3. **(Optional) Install experimental diffs.** Some additional types of diffs are considered “experimental” — they may be new and still have lots of edge cases, may not be publicly available via PyPI or another package server, or may have any number of other issues. To install them, run:
58+
59+
.. code-block:: bash
60+
61+
pip install -r requirements-experimental.txt
62+
63+
64+
.. _pyenv: https://github.com/pyenv/pyenv
65+
.. _conda: https://docs.conda.io/en/latest/
66+
.. _Homebrew: https://brew.sh/

docs/source/release-history.rst

+8
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
===============
2+
Release History
3+
===============
4+
5+
In Development: v0.1.0
6+
----------------------
7+
8+
This project used to be a part of `web-monitoring-processing <https://github.com/edgi-govdata-archiving/web-monitoring-processing/>`_, which contains a wide variety of libraries, scripts, and other tools for working with data across all the various parts of EDGI’s Web Monitoring project. The goal of this initial release is to create a new, more focused package containing the diff-releated tools so they can be more easily used by others.

0 commit comments

Comments
 (0)