Skip to content

Commit

Permalink
Start of documentation #7
Browse files Browse the repository at this point in the history
  • Loading branch information
apmoore1 committed Oct 12, 2021
1 parent 6414c57 commit edc7ff1
Show file tree
Hide file tree
Showing 13 changed files with 284 additions and 16 deletions.
1 change: 1 addition & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
*
9 changes: 9 additions & 0 deletions Docs_Docker.dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
FROM node:14.18-alpine3.14

USER node

WORKDIR /home/node/website

EXPOSE 3000

ENTRYPOINT ["/bin/sh"]
17 changes: 16 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -161,4 +161,19 @@ The following table shows the 21 labels at the top level of the hierarchy.
<td><strong>Z</strong></br>names and grammar</td>
</tr>
</tbody>
</table>
</table>


## Documentation (under development)

The documentation is built with [docusaurus v2](https://docusaurus.io/), a static site generator that is based on the [Jamstack](https://jamstack.org/) with pages generated through markup and can be enhanced using Javascript e.g. React components.

### Commands

By default all webpages are hosted locally at: [http://localhost:3000/pymusas/](http://localhost:3000/pymusas/)

* To create the documentation from scratch (this should never be needed but just in case it does): `make create-docs`
* To run the docs locally in development mode: `make develop-docs`
* To build the static documentation files and serve them locally: `make serve-built-docs`

To automatically generate the API documentation: `pydoc-markdown`
3 changes: 2 additions & 1 deletion dev_requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,5 @@ pytest-cov
responses
mypy==0.910
types-requests
flake8>=3.8.0,<3.10.0
flake8>=3.8.0,<3.10.0
pydoc-markdown>=4.0.0,<5.0.0
46 changes: 46 additions & 0 deletions docs/docs/API/basic_tagger.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
---
sidebar_label: basic_tagger
title: basic_tagger
---

#### load\_lexicon

```python
def load_lexicon(lexicon_path: Path, has_headers: bool = True, include_pos: bool = True) -> Dict[str, List[str]]
```

**Arguments**:

TSV format with the following data in this column / field
order: 1. lemma, 2. Part Of Speech (POS) label / tag,
3. USAS / Semantic label.
first line contains a header row e.g. the first line
contain no lexicon data. When this is set to True the
first line of the lexicon file is ignored.
param include_pos: Whether or not the returned dictionary uses POS
within it&#x27;s key.
- `lexicon_path`: File path to the lexicon data. This data should be in
- `has_headers`: This should be set to True if the lexicon file on it&#x27;s

**Returns**:

A dictionary whereby the key is a tuple of

## RuleBasedTagger Objects

```python
class RuleBasedTagger()
```

#### tag\_data

```python
def tag_data(tokens: List[Tuple[str, str, str]]) -> List[List[str]]
```

**Arguments**:

following lingustic information per token: 1. token text,
2. lemma, 3. Part Of Speech.
- `tokens`: Each tuple represents a token. The tuple must contain the

23 changes: 23 additions & 0 deletions docs/docs/API/file_utils.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
---
sidebar_label: file_utils
title: file_utils
---

#### download\_url\_file

```python
def download_url_file(url: str) -> str
```

Reference AllenNLP:
https://github.com/allenai/allennlp/blob/e5d332a592a8624e1f4ee7a9a7d30a90991db83c/allennlp/common/file_utils.py#L536

This function will first check if the downloaded content already exists
based on a cached file within the `config.PYMUSAS_CACHE_HOME` directory.
If it does then the cached file path will be returned else the the content
will be downloaded and cached.

**Returns**:

A path to the contents download from the `url`.

124 changes: 124 additions & 0 deletions docs/docs/API/lexicon_collection.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
---
sidebar_label: lexicon_collection
title: lexicon_collection
---

## LexiconEntry Objects

```python
@dataclass(init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=True)
class LexiconEntry()
```

As frozen is true no values can be assigned after creation of an instance of
this class.

## LexiconCollection Objects

```python
class LexiconCollection(MutableMapping)
```

This is a dictionary object that will hold LexiconEntry data in a fast to
access object. The keys of the dictionary are expected to be either just a
lemma or a combination of lemma and pos in the following format:
{lemma}|{pos}

The value to each key is the associated semantic tags, whereby the semantic
tags are in rank order, the most likely tag is the first tag in the list.
For example in the collection below, for the lemma London with a POS tag noun
the most likely semantic tag is Z3 and the least likely tag is A1:

```
from pymusas.lexicon_collection import LexiconEntry, LexiconCollection
lexicon_entry = LexiconEntry(&#x27;London&#x27;, [&#x27;Z3&#x27;, &#x27;Z1&#x27;, &#x27;A1&#x27;], &#x27;noun&#x27;)
collection = LexiconCollection()
collection.add_lexicon_entry(lexicon_entry)
most_likely_tag = collection[&#x27;London|noun&#x27;][0]
least_likely_tag = collection[&#x27;London|noun&#x27;][-1]
```

#### \_\_str\_\_

```python
def __str__() -> str
```

Human readable string.

#### \_\_repr\_\_

```python
def __repr__() -> str
```

Machine readable string. When printed and run eval() over the string
you should be able to recreate the object.

#### add\_lexicon\_entry

```python
def add_lexicon_entry(value: LexiconEntry, include_pos: bool = True) -> None
```

Will add the LexiconEntry to the collection, whereby the key is the
combination of the lemma and pos and the value is the semantic tags.

The lemma and pos are combined as follows:
{lemma}|{pos}

If the pos value is None then then only the lemma is used, e.g.:
{lemma}

**Arguments**:

- `value`: A LexiconEntry.
- `include_pos`: Whether to include the POS tag within the key.

#### to\_dictionary

```python
def to_dictionary() -> Dict[str, List[str]]
```

**Returns**:

The dictionary object that stores all of the data.

#### from\_tsv

```python
@staticmethod
def from_tsv(tsv_file_path: Union[PathLike, str], include_pos: bool = True) -> Dict[str, List[str]]
```

If `include_pos` is True and the TSV file does not contain a
`pos` field heading then this will return a LexiconCollection that is
identical to a collection that ran this method with `include_pos` equal
to False.

If the file path is a URL, the file will be downloaded and cached using
`file_utils.download_url_file` function.

Reference, the identification of a URL and the idea to do this has
come from the AllenNLP library:
https://github.com/allenai/allennlp/blob/main/allennlp/common/file_utils.py#L205

**Arguments**:

fields with the following headings: 1. `lemma`,
and 2. `semantic_tags`. With an optional field
`pos`. All other fields will be ignored.
Each row will be used to create a `LexiconEntry`
which will then be added to the returneds
`LexiconCollection`
adding the `LexiconEntry` into the returned
`LexiconCollection`. For more information on this
see the `add_lexicon_entry` method.
- `tsv_file_path`: A path or URL to a TSV file that contains at least two
- `include_pos`: Whether to include the POS tag in the key when

**Returns**:

A dictionary object that can be used to create a

9 changes: 9 additions & 0 deletions docs/docs/API/sidebar.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
{
"items": [
"API/basic_tagger",
"API/file_utils",
"API/lexicon_collection"
],
"label": "Reference",
"type": "category"
}
20 changes: 6 additions & 14 deletions docs/docusaurus.config.js
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ const config = {
/** @type {import('@docusaurus/preset-classic').ThemeConfig} */
({
navbar: {
title: 'My Site',
title: 'PyMUSAS',
logo: {
alt: 'My Site Logo',
src: 'img/logo.svg',
Expand All @@ -54,11 +54,11 @@ const config = {
type: 'doc',
docId: 'intro',
position: 'left',
label: 'Tutorial',
label: 'Documentation',
},
{to: '/blog', label: 'Blog', position: 'left'},
{
href: 'https://github.com/facebook/docusaurus',
href: 'https://github.com/ucrel/pymusas',
label: 'GitHub',
position: 'right',
},
Expand All @@ -71,22 +71,14 @@ const config = {
title: 'Docs',
items: [
{
label: 'Tutorial',
label: 'Documentation',
to: '/docs/intro',
},
],
},
{
title: 'Community',
items: [
{
label: 'Stack Overflow',
href: 'https://stackoverflow.com/questions/tagged/docusaurus',
},
{
label: 'Discord',
href: 'https://discordapp.com/invite/docusaurus',
},
{
label: 'Twitter',
href: 'https://twitter.com/docusaurus',
Expand All @@ -102,12 +94,12 @@ const config = {
},
{
label: 'GitHub',
href: 'https://github.com/facebook/docusaurus',
href: 'https://github.com/ucrel/pymusas',
},
],
},
],
copyright: `Copyright © ${new Date().getFullYear()} My Project, Inc. Built with Docusaurus.`,
copyright: `Copyright © ${new Date().getFullYear()} UCREL. Built with Docusaurus.`,
},
prism: {
theme: lightCodeTheme,
Expand Down
5 changes: 5 additions & 0 deletions docs/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,11 @@
module.exports = {
// By default, Docusaurus generates a sidebar from the docs folder structure
tutorialSidebar: [{type: 'autogenerated', dirName: '.'}],
apiSidebar: {
"API Documentation": [
require("./docs/API/sidebar.json")
],
},

// But you can create a sidebar manually
/*
Expand Down
28 changes: 28 additions & 0 deletions makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
.ONESHELL:
SHELL := /bin/bash

WORKING_DIR = /home/node/website
CONTAINER_NAME = pymusas-docs:latest

create-docs: build-docker-docs
@docker run -it --name docusaurus ${CONTAINER_NAME} -c "npm init docusaurus@latest docs classic"
@docker cp docusaurus:${WORKING_DIR}/docs ${PWD}/docs
@docker rm -f docusaurus

develop-docs: build-docker-docs
@docker run -p 127.0.0.1:3000:3000 --rm -it -v ${PWD}/docs:${WORKING_DIR} ${CONTAINER_NAME} -c "yarn start -h 0.0.0.0"

build-docs: build-docker-docs
@docker run --rm -v ${PWD}/docs:${WORKING_DIR} ${CONTAINER_NAME} -c "yarn build"

serve-built-docs: build-docs
@docker run -p 127.0.0.1:3000:3000 --rm -it -v ${PWD}/docs:${WORKING_DIR} ${CONTAINER_NAME} -c "yarn serve -h 0.0.0.0"

install-package-for-docs: build-docker-docs
@docker run --rm -v ${PWD}/docs:${WORKING_DIR} ${CONTAINER_NAME} -c "yarn install"

interactive: build-docker-docs
@docker run -it --rm --name docusaurus -v ${PWD}/docs:${WORKING_DIR} ${CONTAINER_NAME}

build-docker-docs:
@docker build -t ${CONTAINER_NAME} -f Docs_Docker.dockerfile .
14 changes: 14 additions & 0 deletions pydoc-markdown.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
loaders:
- type: python
search_path: [./pymusas]
processors:
- type: filter
skip_empty_modules: true
- type: smart
- type: crossref
renderer:
type: docusaurus
docs_base_path: docs/docs
relative_output_path: API
relative_sidebar_path: sidebar.json
sidebar_top_level_label: 'Reference'
1 change: 1 addition & 0 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ tests =
mypy==0.910
types-requests
flake8>=3.8.0,<3.10.0
pydoc-markdown>=4.0.0,<5.0.0

[flake8]
ignore = E266, E501, W503, W293
Expand Down

0 comments on commit edc7ff1

Please sign in to comment.