Revise documentation

caiyishu · Jan 27, 2022 · ca76dc6 · ca76dc6
1 parent 006c323
commit ca76dc6
Show file tree

Hide file tree

Showing 4 changed files with 72 additions and 3 deletions.
diff --git a/docs/api/hanlp/pretrained/amr.md b/docs/api/hanlp/pretrained/amr.md
@@ -17,6 +17,13 @@ AMR captures “who is doing what to whom” in a sentence. Each sentence is rep
 
 To parse a raw sentence into AMR:
 
+```{eval-rst}
+.. margin:: Batching is Faster
+
+    .. Hint:: Parse multiple sentences at once for faster speed! 
+```
+
+
 ```{code-cell} ipython3
 :tags: [output_scroll]
 import hanlp
@@ -26,7 +33,7 @@ amr = amr_parser('The boy wants the girl to believe him.')
 print(amr)
 ```
 
-A list of pre-trained parsers and their scores are listed below.
+All the pre-trained parsers and their scores are listed below.
 
 ```{eval-rst}
 

diff --git a/docs/api/hanlp/pretrained/constituency.md b/docs/api/hanlp/pretrained/constituency.md
@@ -1,5 +1,59 @@
+---
+jupytext:
+  formats: ipynb,md:myst
+  text_representation:
+    extension: .md
+    format_name: myst
+    format_version: '0.8'
+    jupytext_version: 1.4.2
+kernelspec:
+  display_name: Python 3
+  language: python
+  name: python3
+---
+
 # constituency
 
+Constituency Parsing is the process of analyzing the sentences by breaking down it into sub-phrases also known as constituents.
+
+To parse a tokenized sentence into constituency tree, first load a parser:
+
+```{eval-rst}
+.. margin:: Batching is Faster
+
+    .. Hint:: To speed up, parse multiple sentences at once, and use a GPU.
+```
+
+```{code-cell} ipython3
+:tags: [output_scroll]
+import hanlp
+
+con = hanlp.load(hanlp.pretrained.constituency.CTB9_FULL_TAG_ELECTRA_SMALL)
+```
+
+Then parse a sequence or multiple sequences of tokens to it. 
+
+```{code-cell} ipython3
+:tags: [output_scroll]
+tree = con(["2021年", "HanLPv2.1", "带来", "最", "先进", "的", "多", "语种", "NLP", "技术", "。"])
+```
+
+The constituency tree is a nested list of constituencies:
+
+```{code-cell} ipython3
+:tags: [output_scroll]
+tree
+```
+
+You can `str` or `print` it to get its bracketed form:
+
+```{code-cell} ipython3
+:tags: [output_scroll]
+print(tree)
+```
+
+All the pre-trained parsers and their scores are listed below.
+
 ```{eval-rst}
 
 .. automodule:: hanlp.pretrained.constituency

diff --git a/docs/tutorial.md b/docs/tutorial.md
@@ -102,6 +102,8 @@ doc.pretty_print()
 
 ## Native API
 
+### Multi-Task Learning
+
 If you want to run our models locally or you want to implement your own RESTful server, 
 you can [install the native API](https://hanlp.hankcs.com/docs/install.html#install-native-package) 
 and call it just like the RESTful one.
@@ -123,4 +125,10 @@ print(HanLP(['In 2021, HanLPv2.1 delivers state-of-the-art multilingual NLP tech
 ```
 
 Due to the fact that the service provider is very likely running a different model or having different settings, the
-RESTful and native results might be slightly different.
+RESTful and native results might be slightly different. 
+
+To process Chinese or Japanese, HanLP provides mono-lingual models in each language which significantly outperform the multi-lingual model. See [docs](https://hanlp.hankcs.com/docs/api/hanlp/pretrained/mtl.html) for the list of models.
+
+### Single-Task Learning
+
+HanLP also provides a full spectrum of single-task learning models for core NLP tasks including tagging and parsing. Please refer to the documentations of  [`pretrained`](https://hanlp.hankcs.com/docs/api/hanlp/pretrained/index.html) models for details.
diff --git a/hanlp/pretrained/mtl.py b/hanlp/pretrained/mtl.py
@@ -20,7 +20,7 @@
 'XLM-R (:cite:`conneau-etal-2020-unsupervised`) base version of joint tok, pos, lem, fea, ner, srl, dep, sdp and con model trained on UD and OntoNotes5 corpus.'
 
 NPCMJ_UD_KYOTO_TOK_POS_CON_BERT_BASE_CHAR_JA = HANLP_URL + 'mtl/npcmj_ud_kyoto_tok_pos_ner_dep_con_srl_bert_base_char_ja_20210914_133742.zip'
-'BERT (:cite:`devlin-etal-2019-bert`) base char encoder trained on NPCMJ/UD/Kyoto corpora with encoders including tok, pos, ner, dep, con, srl.'
+'BERT (:cite:`devlin-etal-2019-bert`) base char encoder trained on NPCMJ/UD/Kyoto corpora with decoders including tok, pos, ner, dep, con, srl.'
 
 # Will be filled up during runtime
 ALL = {}