-
Notifications
You must be signed in to change notification settings - Fork 13
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
13 changed files
with
284 additions
and
16 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
* |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
FROM node:14.18-alpine3.14 | ||
|
||
USER node | ||
|
||
WORKDIR /home/node/website | ||
|
||
EXPOSE 3000 | ||
|
||
ENTRYPOINT ["/bin/sh"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
--- | ||
sidebar_label: basic_tagger | ||
title: basic_tagger | ||
--- | ||
|
||
#### load\_lexicon | ||
|
||
```python | ||
def load_lexicon(lexicon_path: Path, has_headers: bool = True, include_pos: bool = True) -> Dict[str, List[str]] | ||
``` | ||
|
||
**Arguments**: | ||
|
||
TSV format with the following data in this column / field | ||
order: 1. lemma, 2. Part Of Speech (POS) label / tag, | ||
3. USAS / Semantic label. | ||
first line contains a header row e.g. the first line | ||
contain no lexicon data. When this is set to True the | ||
first line of the lexicon file is ignored. | ||
param include_pos: Whether or not the returned dictionary uses POS | ||
within it's key. | ||
- `lexicon_path`: File path to the lexicon data. This data should be in | ||
- `has_headers`: This should be set to True if the lexicon file on it's | ||
|
||
**Returns**: | ||
|
||
A dictionary whereby the key is a tuple of | ||
|
||
## RuleBasedTagger Objects | ||
|
||
```python | ||
class RuleBasedTagger() | ||
``` | ||
|
||
#### tag\_data | ||
|
||
```python | ||
def tag_data(tokens: List[Tuple[str, str, str]]) -> List[List[str]] | ||
``` | ||
|
||
**Arguments**: | ||
|
||
following lingustic information per token: 1. token text, | ||
2. lemma, 3. Part Of Speech. | ||
- `tokens`: Each tuple represents a token. The tuple must contain the | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
--- | ||
sidebar_label: file_utils | ||
title: file_utils | ||
--- | ||
|
||
#### download\_url\_file | ||
|
||
```python | ||
def download_url_file(url: str) -> str | ||
``` | ||
|
||
Reference AllenNLP: | ||
https://github.com/allenai/allennlp/blob/e5d332a592a8624e1f4ee7a9a7d30a90991db83c/allennlp/common/file_utils.py#L536 | ||
|
||
This function will first check if the downloaded content already exists | ||
based on a cached file within the `config.PYMUSAS_CACHE_HOME` directory. | ||
If it does then the cached file path will be returned else the the content | ||
will be downloaded and cached. | ||
|
||
**Returns**: | ||
|
||
A path to the contents download from the `url`. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,124 @@ | ||
--- | ||
sidebar_label: lexicon_collection | ||
title: lexicon_collection | ||
--- | ||
|
||
## LexiconEntry Objects | ||
|
||
```python | ||
@dataclass(init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=True) | ||
class LexiconEntry() | ||
``` | ||
|
||
As frozen is true no values can be assigned after creation of an instance of | ||
this class. | ||
|
||
## LexiconCollection Objects | ||
|
||
```python | ||
class LexiconCollection(MutableMapping) | ||
``` | ||
|
||
This is a dictionary object that will hold LexiconEntry data in a fast to | ||
access object. The keys of the dictionary are expected to be either just a | ||
lemma or a combination of lemma and pos in the following format: | ||
{lemma}|{pos} | ||
|
||
The value to each key is the associated semantic tags, whereby the semantic | ||
tags are in rank order, the most likely tag is the first tag in the list. | ||
For example in the collection below, for the lemma London with a POS tag noun | ||
the most likely semantic tag is Z3 and the least likely tag is A1: | ||
|
||
``` | ||
from pymusas.lexicon_collection import LexiconEntry, LexiconCollection | ||
lexicon_entry = LexiconEntry('London', ['Z3', 'Z1', 'A1'], 'noun') | ||
collection = LexiconCollection() | ||
collection.add_lexicon_entry(lexicon_entry) | ||
most_likely_tag = collection['London|noun'][0] | ||
least_likely_tag = collection['London|noun'][-1] | ||
``` | ||
|
||
#### \_\_str\_\_ | ||
|
||
```python | ||
def __str__() -> str | ||
``` | ||
|
||
Human readable string. | ||
|
||
#### \_\_repr\_\_ | ||
|
||
```python | ||
def __repr__() -> str | ||
``` | ||
|
||
Machine readable string. When printed and run eval() over the string | ||
you should be able to recreate the object. | ||
|
||
#### add\_lexicon\_entry | ||
|
||
```python | ||
def add_lexicon_entry(value: LexiconEntry, include_pos: bool = True) -> None | ||
``` | ||
|
||
Will add the LexiconEntry to the collection, whereby the key is the | ||
combination of the lemma and pos and the value is the semantic tags. | ||
|
||
The lemma and pos are combined as follows: | ||
{lemma}|{pos} | ||
|
||
If the pos value is None then then only the lemma is used, e.g.: | ||
{lemma} | ||
|
||
**Arguments**: | ||
|
||
- `value`: A LexiconEntry. | ||
- `include_pos`: Whether to include the POS tag within the key. | ||
|
||
#### to\_dictionary | ||
|
||
```python | ||
def to_dictionary() -> Dict[str, List[str]] | ||
``` | ||
|
||
**Returns**: | ||
|
||
The dictionary object that stores all of the data. | ||
|
||
#### from\_tsv | ||
|
||
```python | ||
@staticmethod | ||
def from_tsv(tsv_file_path: Union[PathLike, str], include_pos: bool = True) -> Dict[str, List[str]] | ||
``` | ||
|
||
If `include_pos` is True and the TSV file does not contain a | ||
`pos` field heading then this will return a LexiconCollection that is | ||
identical to a collection that ran this method with `include_pos` equal | ||
to False. | ||
|
||
If the file path is a URL, the file will be downloaded and cached using | ||
`file_utils.download_url_file` function. | ||
|
||
Reference, the identification of a URL and the idea to do this has | ||
come from the AllenNLP library: | ||
https://github.com/allenai/allennlp/blob/main/allennlp/common/file_utils.py#L205 | ||
|
||
**Arguments**: | ||
|
||
fields with the following headings: 1. `lemma`, | ||
and 2. `semantic_tags`. With an optional field | ||
`pos`. All other fields will be ignored. | ||
Each row will be used to create a `LexiconEntry` | ||
which will then be added to the returneds | ||
`LexiconCollection` | ||
adding the `LexiconEntry` into the returned | ||
`LexiconCollection`. For more information on this | ||
see the `add_lexicon_entry` method. | ||
- `tsv_file_path`: A path or URL to a TSV file that contains at least two | ||
- `include_pos`: Whether to include the POS tag in the key when | ||
|
||
**Returns**: | ||
|
||
A dictionary object that can be used to create a | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
{ | ||
"items": [ | ||
"API/basic_tagger", | ||
"API/file_utils", | ||
"API/lexicon_collection" | ||
], | ||
"label": "Reference", | ||
"type": "category" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
.ONESHELL: | ||
SHELL := /bin/bash | ||
|
||
WORKING_DIR = /home/node/website | ||
CONTAINER_NAME = pymusas-docs:latest | ||
|
||
create-docs: build-docker-docs | ||
@docker run -it --name docusaurus ${CONTAINER_NAME} -c "npm init docusaurus@latest docs classic" | ||
@docker cp docusaurus:${WORKING_DIR}/docs ${PWD}/docs | ||
@docker rm -f docusaurus | ||
|
||
develop-docs: build-docker-docs | ||
@docker run -p 127.0.0.1:3000:3000 --rm -it -v ${PWD}/docs:${WORKING_DIR} ${CONTAINER_NAME} -c "yarn start -h 0.0.0.0" | ||
|
||
build-docs: build-docker-docs | ||
@docker run --rm -v ${PWD}/docs:${WORKING_DIR} ${CONTAINER_NAME} -c "yarn build" | ||
|
||
serve-built-docs: build-docs | ||
@docker run -p 127.0.0.1:3000:3000 --rm -it -v ${PWD}/docs:${WORKING_DIR} ${CONTAINER_NAME} -c "yarn serve -h 0.0.0.0" | ||
|
||
install-package-for-docs: build-docker-docs | ||
@docker run --rm -v ${PWD}/docs:${WORKING_DIR} ${CONTAINER_NAME} -c "yarn install" | ||
|
||
interactive: build-docker-docs | ||
@docker run -it --rm --name docusaurus -v ${PWD}/docs:${WORKING_DIR} ${CONTAINER_NAME} | ||
|
||
build-docker-docs: | ||
@docker build -t ${CONTAINER_NAME} -f Docs_Docker.dockerfile . |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
loaders: | ||
- type: python | ||
search_path: [./pymusas] | ||
processors: | ||
- type: filter | ||
skip_empty_modules: true | ||
- type: smart | ||
- type: crossref | ||
renderer: | ||
type: docusaurus | ||
docs_base_path: docs/docs | ||
relative_output_path: API | ||
relative_sidebar_path: sidebar.json | ||
sidebar_top_level_label: 'Reference' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters