You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Everyone is welcome to contribute, and we value everybody's contribution. Code is thus not the only way to help the community. Answering questions, helping others, reaching out and improving the documentations are immensely valuable to the community.
4
-
5
-
It also helps us if you spread the word: reference the library from blog posts on the awesome projects it made possible, shout out on Twitter every time it has helped you, or simply star the repo to say "thank you".
6
-
7
-
We will also record all contributors and contributions [here](https://github.com/sooftware/OpenSpeech/blob/main/CONTRIBUTORS.md).
8
-
4
+
5
+
It also helps us if you spread the word: reference the library from blog posts on the awesome projects it made possible, shout out on Twitter every time it has helped you, or simply star the repo to say "thank you".
6
+
7
+
We will also record all contributors and contributions [here](https://github.com/sooftware/OpenSpeech/blob/main/CONTRIBUTORS.md).
8
+
9
9
### You can contribute in so many ways!
10
-
10
+
11
11
There are 5 ways you can contribute to OpenSpeech:
12
12
13
13
- Add new dataset recipe.
14
14
- Implementing new models.
15
15
- Share the weight file you trained.
16
16
- Fixing outstanding issues with the existing code.
17
17
- Submitting issues related to bugs or desired new features.
18
-
18
+
19
19
### Do you want to add a new dataset recipe?
20
-
21
-
Grreat!! Please provide the following information:
22
-
23
-
- Short description of the dataset and link to the paper.
20
+
21
+
Grreat!! Please provide the following information:
22
+
23
+
- Short description of the dataset and link to the paper.
24
24
- Indicate the license of the dataset.
25
25
- Write a test code to prove that the code works well.
26
-
26
+
27
27
We want to cover as many datasets as possible. Help us!
28
-
28
+
29
29
### Do you want to implement a new model?
30
-
31
-
Awesome! Please provide the following information:
32
-
30
+
31
+
Awesome! Please provide the following information:
32
+
33
33
- Short description of the model and link to the paper.
34
34
- Link to the implementation if it is open-source.
35
35
- Link to the model weights if they are available.
36
36
- Please write a test code to prove that the code works well.
37
37
38
-
If you are willing to contribute the model yourself, let us know so we can best guide you.
39
-
38
+
If you are willing to contribute the model yourself, let us know so we can best guide you.
39
+
40
40
### Do you want to share the weight file?
41
-
42
-
Nice, Nice, So Nice!! Because OpenSpeech supports multiple datasets and many models, such contribution is essential.
43
-
Please provide the following information:
44
-
41
+
42
+
Nice, Nice, So Nice!! Because OpenSpeech supports multiple datasets and many models, such contribution is essential.
43
+
Please provide the following information:
44
+
45
45
- Indicate which dataset and which model you trained.
46
46
- Share the script you used when you started training.
47
47
- Please share the link that can download the weight file.
48
-
48
+
49
49
#### Did you find a bug?
50
-
51
-
First, we would really appreciate it if you could **make sure the bug was not already reported** (use the search bar on Github under Issues).
52
-
53
-
Did not find it? :( So we can act quickly on it, please follow these steps:
54
-
50
+
51
+
First, we would really appreciate it if you could **make sure the bug was not already reported** (use the search bar on Github under Issues).
52
+
53
+
Did not find it? :( So we can act quickly on it, please follow these steps:
54
+
55
55
- Include your OS type and version, the versions of Python, PyTorch when applicable
56
56
- Give to us a simple example of a code that we can reproduce.
57
-
- Provide the full traceback if an exception is raised.
58
-
57
+
- Provide the full traceback if an exception is raised.
58
+
59
59
### Do you want a new feature (that is not a model)?
60
-
60
+
61
61
A world-class feature request addresses the following points:
62
-
62
+
63
63
1. Motivation first:
64
64
- Is it related to a problem/frustration with the library? If so, please explain why. Providing a code snippet that demonstrates the problem is best.
65
65
- Is it related to something you would need for a project? We'd love to hear about it!
@@ -68,12 +68,12 @@ A world-class feature request addresses the following points:
68
68
3. Provide a code snippet that demonstrates its future use.
69
69
4. In case this is related to a paper, please attach a link.
70
70
5. Attach any additional information (drawings, screenshots, etc.) you think may help.
71
-
71
+
72
72
If your issue is well written we're already 80% of the way there by the time you post it.
73
-
73
+
74
74
75
75
### Submitting a new issue or feature request
76
-
77
-
Do your best to follow these guidelines when submitting an issue or a feature request. It will make it easier for us to come back to you quickly and with good feedback.
76
+
77
+
Do your best to follow these guidelines when submitting an issue or a feature request. It will make it easier for us to come back to you quickly and with good feedback.
78
78
Also, I want you to write in English when you write an issue or pull request. Because we hope as many people as possible can understand and see the issue or pull request.
It records developers and contributions that contributed to OpenSpeech.
4
-
4
+
5
5
### [Soohwan Kim](https://github.com/sooftware/)
6
-
6
+
7
7
- Creator, Lead Development, Main Contributor
8
8
- Program architecture design
9
9
- Model implementation list:
10
10
11
11
1.[**DeepSpeech2**](https://sooftware.github.io/openspeech/architectures/DeepSpeech2.html) (from Baidu Research) released with paper [Deep Speech 2: End-to-End Speech Recognition in English and Mandarin](https://arxiv.org/abs/1512.02595.pdf), by Dario Amodei, Rishita Anubhai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Jingdong Chen, Mike Chrzanowski, Adam Coates, Greg Diamos, Erich Elsen, Jesse Engel, Linxi Fan, Christopher Fougner, Tony Han, Awni Hannun, Billy Jun, Patrick LeGresley, Libby Lin, Sharan Narang, Andrew Ng, Sherjil Ozair, Ryan Prenger, Jonathan Raiman, Sanjeev Satheesh, David Seetapun, Shubho Sengupta, Yi Wang, Zhiqian Wang, Chong Wang, Bo Xiao, Dani Yogatama, Jun Zhan, Zhenyao Zhu.
12
12
2.[**RNN-Transducer**](https://sooftware.github.io/openspeech/architectures/RNN%20Transducer.html) (from University of Toronto) released with paper [Sequence Transduction with Recurrent Neural Networks](https://arxiv.org/abs/1211.3711.pdf), by Alex Graves.
13
-
3.[**LSTM Language Model**](https://sooftware.github.io/openspeech/architectures/LSTM%20LM.html) (from RWTH Aachen University) released with paper [LSTM Neural Networks for Language Modeling](http://www-i6.informatik.rwth-aachen.de/publications/download/820/Sundermeyer-2012.pdf), by Martin Sundermeyer, Ralf Schluter, and Hermann Ney.
14
-
3.[**Listen Attend Spell**](https://sooftware.github.io/openspeech/architectures/Listen%20Attend%20Spell.html) (from Carnegie Mellon University and Google Brain) released with paper [Listen, Attend and Spell](https://arxiv.org/abs/1508.01211), by William Chan, Navdeep Jaitly, Quoc V. Le, Oriol Vinyals.
15
-
4.[**Location-aware attention based Listen Attend Spell**](https://sooftware.github.io/openspeech/architectures/Listen%20Attend%20Spell.html) (from University of Wrocław and Jacobs University and Universite de Montreal) released with paper [Attention-Based Models for Speech Recognition](https://arxiv.org/abs/1506.07503), by Jan Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, Yoshua Bengio.
16
-
5.[**Joint CTC-Attention based Listen Attend Spell**](https://sooftware.github.io/openspeech/architectures/Listen%20Attend%20Spell.html) (from Mitsubishi Electric Research Laboratories and Carnegie Mellon University) released with paper [Joint CTC-Attention based End-to-End Speech Recognition using Multi-task Learning](https://arxiv.org/abs/1609.06773), by Suyoun Kim, Takaaki Hori, Shinji Watanabe.
13
+
3.[**LSTM Language Model**](https://sooftware.github.io/openspeech/architectures/LSTM%20LM.html) (from RWTH Aachen University) released with paper [LSTM Neural Networks for Language Modeling](http://www-i6.informatik.rwth-aachen.de/publications/download/820/Sundermeyer-2012.pdf), by Martin Sundermeyer, Ralf Schluter, and Hermann Ney.
14
+
3.[**Listen Attend Spell**](https://sooftware.github.io/openspeech/architectures/Listen%20Attend%20Spell.html) (from Carnegie Mellon University and Google Brain) released with paper [Listen, Attend and Spell](https://arxiv.org/abs/1508.01211), by William Chan, Navdeep Jaitly, Quoc V. Le, Oriol Vinyals.
15
+
4.[**Location-aware attention based Listen Attend Spell**](https://sooftware.github.io/openspeech/architectures/Listen%20Attend%20Spell.html) (from University of Wrocław and Jacobs University and Universite de Montreal) released with paper [Attention-Based Models for Speech Recognition](https://arxiv.org/abs/1506.07503), by Jan Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, Yoshua Bengio.
16
+
5.[**Joint CTC-Attention based Listen Attend Spell**](https://sooftware.github.io/openspeech/architectures/Listen%20Attend%20Spell.html) (from Mitsubishi Electric Research Laboratories and Carnegie Mellon University) released with paper [Joint CTC-Attention based End-to-End Speech Recognition using Multi-task Learning](https://arxiv.org/abs/1609.06773), by Suyoun Kim, Takaaki Hori, Shinji Watanabe.
17
17
6.[**Deep CNN Encoder with Joint CTC-Attention Listen Attend Spell**](https://sooftware.github.io/openspeech/architectures/Listen%20Attend%20Spell.html) (from Mitsubishi Electric Research Laboratories and Massachusetts Institute of Technology and Carnegie Mellon University) released with paper [Advances in Joint CTC-Attention based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM](https://arxiv.org/abs/1706.02737), by Takaaki Hori, Shinji Watanabe, Yu Zhang, William Chan.
18
-
7.[**Multi-head attention based Listen Attend Spell**](https://sooftware.github.io/openspeech/architectures/Listen%20Attend%20Spell.html) (from Google) released with paper [State-of-the-art Speech Recognition With Sequence-to-Sequence Models](https://arxiv.org/abs/1712.01769), by Chung-Cheng Chiu, Tara N. Sainath, Yonghui Wu, Rohit Prabhavalkar, Patrick Nguyen, Zhifeng Chen, Anjuli Kannan, Ron J. Weiss, Kanishka Rao, Ekaterina Gonina, Navdeep Jaitly, Bo Li, Jan Chorowski, Michiel Bacchiani.
18
+
7.[**Multi-head attention based Listen Attend Spell**](https://sooftware.github.io/openspeech/architectures/Listen%20Attend%20Spell.html) (from Google) released with paper [State-of-the-art Speech Recognition With Sequence-to-Sequence Models](https://arxiv.org/abs/1712.01769), by Chung-Cheng Chiu, Tara N. Sainath, Yonghui Wu, Rohit Prabhavalkar, Patrick Nguyen, Zhifeng Chen, Anjuli Kannan, Ron J. Weiss, Kanishka Rao, Ekaterina Gonina, Navdeep Jaitly, Bo Li, Jan Chorowski, Michiel Bacchiani.
19
19
8.[**Speech-Transformer**](https://sooftware.github.io/openspeech/architectures/Transformer.html) (from University of Chinese Academy of Sciences and Institute of Automation and Chinese Academy of Sciences) released with paper [Speech-Transformer: A No-Recurrence Sequence-to-Sequence Model for Speech Recognition](https://ieeexplore.ieee.org/document/8462506), by Linhao Dong; Shuang Xu; Bo Xu.
20
-
9.[**VGG-Transformer**](https://sooftware.github.io/openspeech/architectures/Transformer.html) (from Facebook AI Research) released with paper [Transformers with convolutional context for ASR](https://arxiv.org/abs/1904.11660), by Abdelrahman Mohamed, Dmytro Okhonko, Luke Zettlemoyer.
20
+
9.[**VGG-Transformer**](https://sooftware.github.io/openspeech/architectures/Transformer.html) (from Facebook AI Research) released with paper [Transformers with convolutional context for ASR](https://arxiv.org/abs/1904.11660), by Abdelrahman Mohamed, Dmytro Okhonko, Luke Zettlemoyer.
21
21
10.[**Transformer with CTC**](https://sooftware.github.io/openspeech/architectures/Transformer.html) (from NTT Communication Science Laboratories, Waseda University, Center for Language and Speech Processing, Johns Hopkins University) released with paper [Improving Transformer-based End-to-End Speech Recognition with Connectionist Temporal Classification and Language Model Integration](https://www.isca-speech.org/archive/Interspeech_2019/pdfs/1938.pdf), by Shigeki Karita, Nelson Enrique Yalta Soplin, Shinji Watanabe, Marc Delcroix, Atsunori Ogawa, Tomohiro Nakatani.
22
22
11.[**Joint CTC-Attention based Transformer**](https://sooftware.github.io/openspeech/architectures/Transformer.html)(from NTT Corporation) released with paper [Self-Distillation for Improving CTC-Transformer-based ASR Systems](https://www.isca-speech.org/archive/Interspeech_2020/pdfs/1223.pdf), by Takafumi Moriya, Tsubasa Ochiai, Shigeki Karita, Hiroshi Sato, Tomohiro Tanaka, Takanori Ashihara, Ryo Masumura, Yusuke Shinohara, Marc Delcroix.
23
23
12.[**Transformer Language Model**](https://sooftware.github.io/openspeech/architectures/Transformer%20LM.html) (from Amazon Web Services) released with paper [Language Models with Transformers](https://arxiv.org/abs/1904.09408), by Chenguang Wang, Mu Li, Alexander J. Smola.
24
-
12.[**Jasper**](https://sooftware.github.io/openspeech/modules/Encoders.html#module-openspeech.encoders.jasper) (from NVIDIA and New York University) released with paper [Jasper: An End-to-End Convolutional Neural Acoustic Model](https://arxiv.org/pdf/1904.03288.pdf), by Jason Li, Vitaly Lavrukhin, Boris Ginsburg, Ryan Leary, Oleksii Kuchaiev, Jonathan M. Cohen, Huyen Nguyen, Ravi Teja Gadde.
25
-
13.[**QuartzNet**](https://sooftware.github.io/openspeech/modules/Encoders.html#module-openspeech.encoders.quartznet) (from NVIDIA and Univ. of Illinois and Univ. of Saint Petersburg) released with paper [QuartzNet: Deep Automatic Speech Recognition with 1D Time-Channel Separable Convolutions](https://arxiv.org/abs/1910.10261.pdf), by Samuel Kriman, Stanislav Beliaev, Boris Ginsburg, Jocelyn Huang, Oleksii Kuchaiev, Vitaly Lavrukhin, Ryan Leary, Jason Li, Yang Zhang.
26
-
15.[**Conformer**](https://sooftware.github.io/openspeech/architectures/Conformer.html) (from Google) released with paper [Conformer: Convolution-augmented Transformer for Speech Recognition](https://arxiv.org/abs/2005.08100), by Anmol Gulati, James Qin, Chung-Cheng Chiu, Niki Parmar, Yu Zhang, Jiahui Yu, Wei Han, Shibo Wang, Zhengdong Zhang, Yonghui Wu, Ruoming Pang.
24
+
12.[**Jasper**](https://sooftware.github.io/openspeech/modules/Encoders.html#module-openspeech.encoders.jasper) (from NVIDIA and New York University) released with paper [Jasper: An End-to-End Convolutional Neural Acoustic Model](https://arxiv.org/pdf/1904.03288.pdf), by Jason Li, Vitaly Lavrukhin, Boris Ginsburg, Ryan Leary, Oleksii Kuchaiev, Jonathan M. Cohen, Huyen Nguyen, Ravi Teja Gadde.
25
+
13.[**QuartzNet**](https://sooftware.github.io/openspeech/modules/Encoders.html#module-openspeech.encoders.quartznet) (from NVIDIA and Univ. of Illinois and Univ. of Saint Petersburg) released with paper [QuartzNet: Deep Automatic Speech Recognition with 1D Time-Channel Separable Convolutions](https://arxiv.org/abs/1910.10261.pdf), by Samuel Kriman, Stanislav Beliaev, Boris Ginsburg, Jocelyn Huang, Oleksii Kuchaiev, Vitaly Lavrukhin, Ryan Leary, Jason Li, Yang Zhang.
26
+
15.[**Conformer**](https://sooftware.github.io/openspeech/architectures/Conformer.html) (from Google) released with paper [Conformer: Convolution-augmented Transformer for Speech Recognition](https://arxiv.org/abs/2005.08100), by Anmol Gulati, James Qin, Chung-Cheng Chiu, Niki Parmar, Yu Zhang, Jiahui Yu, Wei Han, Shibo Wang, Zhengdong Zhang, Yonghui Wu, Ruoming Pang.
27
27
16.[**Conformer with CTC**](https://sooftware.github.io/openspeech/architectures/Conformer.html) (from Northwestern Polytechnical University and University of Bordeaux and Johns Hopkins University and Human Dataware Lab and Kyoto University and NTT Corporation and Shanghai Jiao Tong University and Chinese Academy of Sciences) released with paper [Recent Developments on ESPNET Toolkit Boosted by Conformer](https://arxiv.org/abs/2010.13956.pdf), by Pengcheng Guo, Florian Boyer, Xuankai Chang, Tomoki Hayashi, Yosuke Higuchi, Hirofumi Inaguma, Naoyuki Kamo, Chenda Li, Daniel Garcia-Romero, Jiatong Shi, Jing Shi, Shinji Watanabe, Kun Wei, Wangyou Zhang, Yuekai Zhang.
28
28
17.[**Conformer with LSTM Decoder**](https://sooftware.github.io/openspeech/architectures/Conformer.html) (from IBM Research AI) released with paper [On the limit of English conversational speech recognition](https://arxiv.org/abs/2105.00982.pdf), by Zoltán Tüske, George Saon, Brian Kingsbury.
29
-
30
-
29
+
30
+
31
31
- Recipe:
32
32
1.[LibriSpeech](https://www.openslr.org/12)
33
33
2.[AISHELL-1](https://www.openslr.org/33/)
34
34
3.[KsponSpeech](https://aihub.or.kr/aidata/105)
35
35
36
36
37
37
38
-
### [Sangchun Ha](https://github.com/hasangchun)
39
-
38
+
### [Sangchun Ha](https://github.com/upskyy)
39
+
40
40
- Maintainer, Main Contributor.
41
41
- Code validation
42
-
- Model implementation list:
43
-
42
+
- Model implementation list:
43
+
44
44
1.[**Transformer Transducer**](https://sooftware.github.io/openspeech/architectures/Transformer%20Transducer.html) (from Facebook AI) released with paper [Transformer-Transducer:
45
-
End-to-End Speech Recognition with Self-Attention](https://arxiv.org/abs/1910.12977.pdf), by Ching-Feng Yeh, Jay Mahadeokar, Kaustubh Kalgaonkar, Yongqiang Wang, Duc Le, Mahaveer Jain, Kjell Schubert, Christian Fuegen, Michael L. Seltzer.
45
+
End-to-End Speech Recognition with Self-Attention](https://arxiv.org/abs/1910.12977.pdf), by Ching-Feng Yeh, Jay Mahadeokar, Kaustubh Kalgaonkar, Yongqiang Wang, Duc Le, Mahaveer Jain, Kjell Schubert, Christian Fuegen, Michael L. Seltzer.
46
46
2.[**ContextNet**](https://sooftware.github.io/openspeech/architectures/ContextNet.html) (from Google) released with paper [ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context](https://arxiv.org/abs/2005.03191), by Wei Han, Zhengdong Zhang, Yu Zhang, Jiahui Yu, Chung-Cheng Chiu, James Qin, Anmol Gulati, Ruoming Pang, Yonghui Wu.
47
-
47
+
3.[**Squeezeformer**](https://github.com/upskyy/Squeezeformer) (from Berkeley) released with paper [Squeezeformer: An Efficient Transformer for Automatic Speech Recognition](https://arxiv.org/pdf/2206.00888.pdf), by Sehoon Kim, Amir Gholami, Albert Shaw, Nicholas Lee, Karttikeya Mangalam, Jitendra Malik, Michael W. Mahoney, Kurt Keutzer.
48
+
48
49
- Beam search:
49
50
1. RNN Transducer beam search
50
-
2. Transformer Transducer beam search
51
+
2. Transformer Transducer beam search
51
52
52
53
### [Soyoung Cho](https://github.com/SoYoungCho)
53
54
- Main Contributor.
@@ -58,4 +59,4 @@ End-to-End Speech Recognition with Self-Attention](https://arxiv.org/abs/1910.12
0 commit comments