Skip to content

Commit

Permalink
Revise documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
hankcs committed Jan 27, 2022
1 parent 006c323 commit ca76dc6
Show file tree
Hide file tree
Showing 4 changed files with 72 additions and 3 deletions.
9 changes: 8 additions & 1 deletion docs/api/hanlp/pretrained/amr.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,13 @@ AMR captures “who is doing what to whom” in a sentence. Each sentence is rep

To parse a raw sentence into AMR:

```{eval-rst}
.. margin:: Batching is Faster
.. Hint:: Parse multiple sentences at once for faster speed!
```


```{code-cell} ipython3
:tags: [output_scroll]
import hanlp
Expand All @@ -26,7 +33,7 @@ amr = amr_parser('The boy wants the girl to believe him.')
print(amr)
```

A list of pre-trained parsers and their scores are listed below.
All the pre-trained parsers and their scores are listed below.

```{eval-rst}
Expand Down
54 changes: 54 additions & 0 deletions docs/api/hanlp/pretrained/constituency.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,59 @@
---
jupytext:
formats: ipynb,md:myst
text_representation:
extension: .md
format_name: myst
format_version: '0.8'
jupytext_version: 1.4.2
kernelspec:
display_name: Python 3
language: python
name: python3
---

# constituency

Constituency Parsing is the process of analyzing the sentences by breaking down it into sub-phrases also known as constituents.

To parse a tokenized sentence into constituency tree, first load a parser:

```{eval-rst}
.. margin:: Batching is Faster
.. Hint:: To speed up, parse multiple sentences at once, and use a GPU.
```

```{code-cell} ipython3
:tags: [output_scroll]
import hanlp
con = hanlp.load(hanlp.pretrained.constituency.CTB9_FULL_TAG_ELECTRA_SMALL)
```

Then parse a sequence or multiple sequences of tokens to it.

```{code-cell} ipython3
:tags: [output_scroll]
tree = con(["2021年", "HanLPv2.1", "带来", "最", "先进", "的", "多", "语种", "NLP", "技术", "。"])
```

The constituency tree is a nested list of constituencies:

```{code-cell} ipython3
:tags: [output_scroll]
tree
```

You can `str` or `print` it to get its bracketed form:

```{code-cell} ipython3
:tags: [output_scroll]
print(tree)
```

All the pre-trained parsers and their scores are listed below.

```{eval-rst}
.. automodule:: hanlp.pretrained.constituency
Expand Down
10 changes: 9 additions & 1 deletion docs/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,8 @@ doc.pretty_print()

## Native API

### Multi-Task Learning

If you want to run our models locally or you want to implement your own RESTful server,
you can [install the native API](https://hanlp.hankcs.com/docs/install.html#install-native-package)
and call it just like the RESTful one.
Expand All @@ -123,4 +125,10 @@ print(HanLP(['In 2021, HanLPv2.1 delivers state-of-the-art multilingual NLP tech
```

Due to the fact that the service provider is very likely running a different model or having different settings, the
RESTful and native results might be slightly different.
RESTful and native results might be slightly different.

To process Chinese or Japanese, HanLP provides mono-lingual models in each language which significantly outperform the multi-lingual model. See [docs](https://hanlp.hankcs.com/docs/api/hanlp/pretrained/mtl.html) for the list of models.

### Single-Task Learning

HanLP also provides a full spectrum of single-task learning models for core NLP tasks including tagging and parsing. Please refer to the documentations of [`pretrained`](https://hanlp.hankcs.com/docs/api/hanlp/pretrained/index.html) models for details.
2 changes: 1 addition & 1 deletion hanlp/pretrained/mtl.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
'XLM-R (:cite:`conneau-etal-2020-unsupervised`) base version of joint tok, pos, lem, fea, ner, srl, dep, sdp and con model trained on UD and OntoNotes5 corpus.'

NPCMJ_UD_KYOTO_TOK_POS_CON_BERT_BASE_CHAR_JA = HANLP_URL + 'mtl/npcmj_ud_kyoto_tok_pos_ner_dep_con_srl_bert_base_char_ja_20210914_133742.zip'
'BERT (:cite:`devlin-etal-2019-bert`) base char encoder trained on NPCMJ/UD/Kyoto corpora with encoders including tok, pos, ner, dep, con, srl.'
'BERT (:cite:`devlin-etal-2019-bert`) base char encoder trained on NPCMJ/UD/Kyoto corpora with decoders including tok, pos, ner, dep, con, srl.'

# Will be filled up during runtime
ALL = {}

0 comments on commit ca76dc6

Please sign in to comment.