-
Notifications
You must be signed in to change notification settings - Fork 98
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
10 changed files
with
171 additions
and
88 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,7 +2,5 @@ | |
*.py[cod] | ||
.ipynb_checkpoints | ||
__pycache__ | ||
|
||
*.tmx | ||
*.gz | ||
bak | ||
*.pkl |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
|
||
*.tmx | ||
*.gz | ||
*.pkl |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,22 +1,36 @@ | ||
|
||
# 英汉翻译测试 | ||
|
||
## 1、下载数据 | ||
|
||
下载页面 | ||
|
||
http://opus.nlpl.eu/OpenSubtitles2016.php | ||
http://opus.nlpl.eu/OpenSubtitles2018.php | ||
|
||
下载链接: | ||
|
||
下载链接(不知道能不能直接用): | ||
wget -O "en-zh_cn.tmx.gz" "http://opus.nlpl.eu/download.php?f=OpenSubtitles2018/en-zh_cn.tmx.gz" | ||
|
||
http://opus.nlpl.eu/download.php?f=OpenSubtitles2016/en-zh_zh.tmx.gz | ||
## 2、解压数据 | ||
|
||
这个数据是`英文-中文`的平行语聊 | ||
|
||
下载并解压数据,然后重命名为 `en-zh_zh.tmx` | ||
解压缩: | ||
|
||
gunzip -k en-zh_cn.tmx.gz | ||
|
||
下载并解压数据,然后重命名为 `en-zh_zh.tmx` (如果有有必要) | ||
|
||
这应该是一个xml格式(在`linux`下可以用`head`命令查看下是否正确) | ||
|
||
## 3、预处理数据 | ||
|
||
运行 `extract_tmx.py` 得到 `data.pkl` | ||
|
||
## 4、训练数据 | ||
|
||
运行 `train.py` 训练(默认到`/tmp/s2ss_en2zh`目录) | ||
|
||
## 5、测试数据(测试翻译) | ||
|
||
运行 `test.py` 查看测试结果 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.