You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The parsing model of GiNZA v4 is trained on a part of
@@ -44,26 +49,62 @@ We use two of the named entity label systems, both
44
49
and extended [OntoNotes5](https://catalog.ldc.upenn.edu/docs/LDC2013T19/OntoNotes-Release-5.0.pdf).
45
50
This model is developed by National Institute for Japanese Language and Linguistics, and Megagon Labs.
46
51
52
+
### mC4
53
+
The GiNZA v5 Transformers model (ja-ginza-electra) is trained by using [transformers-ud-japanese-electra-base-discriminator](https://huggingface.co/megagonlabs/transformers-ud-japanese-electra-base-discriminator) which is pretrained on more than 200 million Japanese sentences extracted from [mC4](https://huggingface.co/datasets/mc4).
54
+
55
+
Contains information from mC4 which is made available under the ODC Attribution License.
56
+
```
57
+
@article{2019t5,
58
+
author = {Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu},
59
+
title = {Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer},
60
+
journal = {arXiv e-prints},
61
+
year = {2019},
62
+
archivePrefix = {arXiv},
63
+
eprint = {1910.10683},
64
+
}
65
+
```
47
66
48
67
## Runtime Environment
49
68
This project is developed with Python>=3.6 and pip for it.
50
69
We do not recommend to use Anaconda environment because the pip install step may not work properly.
51
-
(We'd like to support Anaconda in near future.)
52
70
53
71
Please also see the Development Environment section below.
54
72
### Runtime set up
55
-
#### 1. Install GiNZA NLP Library with Japanese Universal Dependencies Model
56
-
Run following line
73
+
74
+
#### 1. Install GiNZA NLP Library with Transformer-based Model
75
+
Uninstall previous version:
76
+
```console
77
+
$ pip uninstall ginza ja-ginza
78
+
```
79
+
Then, install the latest version of `ginza` and `ja-ginza-electra`:
80
+
```console
81
+
$ pip install -U ginza ja-ginza-electra
82
+
```
83
+
84
+
The package of `ja-ginza-electra` does not include `pytorch_model.bin` due to PyPI's archive size restrictions.
85
+
This large model file will be automatically downloaded at the first run time, and the locally cached file will be used for subsequent runs.
86
+
87
+
If you need to install `ja-ginza-electra` along with `pytorch_model.bin` at the install time, you can specify direct link for GitHub release archive as follows:
If you encountered some install problems related to Cython, please try to set the CFLAGS like below.
92
+
If you hope to accelarate the transformers-based models by using GPUs with CUDA support, you can install `spacy` by specifying the CUDA version as follows:
62
93
```console
63
-
$ CFLAGS='-stdlib=libc++'pip install ginza
94
+
pip install -U "spacy[cuda110]"
64
95
```
65
96
66
-
#### 2. Execute ginza from command line
97
+
#### 2. Install GiNZA NLP Library with Standard Model
98
+
Uninstall previous version:
99
+
```console
100
+
$ pip uninstall ginza ja-ginza
101
+
```
102
+
Then, install the latest version of `ginza` and `ja-ginza`:
103
+
```console
104
+
$ pip install -U ginza ja-ginza
105
+
```
106
+
107
+
### Execute ginza command
67
108
Run `ginza` command from the console, then input some Japanese text.
68
109
After pressing enter key, you will get the parsed results with [CoNLL-U Syntactic Annotation](https://universaldependencies.org/format.html#syntactic-annotation) format.
69
110
```console
@@ -76,13 +117,13 @@ $ ginza
76
117
4 を を ADP 助詞-格助詞 _ 3 case _ SpaceAfter=No|BunsetuBILabel=I|BunsetuPositionType=SYN_HEAD|Reading=ヲ
0 commit comments