Skip to content

Commit 6d855a3

Browse files
committed
Apply pre-commit config, github actions
1 parent 8fafea4 commit 6d855a3

File tree

431 files changed

+15964
-16530
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

431 files changed

+15964
-16530
lines changed

.github/.github/ISSUE_TEMPLATE/bug_report.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ assignees: ''
1414
- PyTorch version (GPU?):
1515
- Using GPU in script?:
1616

17-
17+
1818
## Information
1919

2020
Model I am using (ListenAttendSpell, Transformer, Conformer ...):

.github/ISSUE_TEMPLATE/bug_report.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ assignees: ''
1414
- PyTorch version (GPU?):
1515
- Using GPU in script?:
1616

17-
17+
1818
## Information
1919

2020
Model I am using (ListenAttendSpell, Transformer, Conformer ...):

.github/workflows/pre-commit.yaml

+22
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
name: pre-commit
2+
3+
on:
4+
pull_request:
5+
push:
6+
branches: [main]
7+
8+
jobs:
9+
check_and_test:
10+
runs-on: ubuntu-latest
11+
12+
steps:
13+
- uses: actions/checkout@v3
14+
- uses: actions/setup-python@v4
15+
with:
16+
python-version: '3.8'
17+
cache: 'pip'
18+
- name: pre-commit
19+
run: |
20+
pip install -U pre-commit
21+
pre-commit install --install-hooks
22+
pre-commit run -a

.pre-commit-config.yaml

+22
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
repos:
2+
- repo: https://github.com/pre-commit/pre-commit-hooks
3+
rev: v4.0.1
4+
hooks:
5+
- id: check-yaml
6+
- id: end-of-file-fixer
7+
types: [file, python]
8+
- id: trailing-whitespace
9+
- id: mixed-line-ending
10+
- id: check-added-large-files
11+
args: [--maxkb=4096]
12+
- repo: https://github.com/psf/black
13+
rev: 22.3.0
14+
hooks:
15+
- id: black
16+
args: ["--line-length", "120"]
17+
- repo: https://github.com/pycqa/isort
18+
rev: 5.10.1
19+
hooks:
20+
- id: isort
21+
name: isort (python)
22+
args: ["--profile", "black", "-l", "120"]

CONTRIBUTING.md

+38-38
Original file line numberDiff line numberDiff line change
@@ -1,65 +1,65 @@
11
## How to contribute to OpenSpeech?
2-
2+
33
Everyone is welcome to contribute, and we value everybody's contribution. Code is thus not the only way to help the community. Answering questions, helping others, reaching out and improving the documentations are immensely valuable to the community.
4-
5-
It also helps us if you spread the word: reference the library from blog posts on the awesome projects it made possible, shout out on Twitter every time it has helped you, or simply star the repo to say "thank you".
6-
7-
We will also record all contributors and contributions [here](https://github.com/sooftware/OpenSpeech/blob/main/CONTRIBUTORS.md).
8-
4+
5+
It also helps us if you spread the word: reference the library from blog posts on the awesome projects it made possible, shout out on Twitter every time it has helped you, or simply star the repo to say "thank you".
6+
7+
We will also record all contributors and contributions [here](https://github.com/sooftware/OpenSpeech/blob/main/CONTRIBUTORS.md).
8+
99
### You can contribute in so many ways!
10-
10+
1111
There are 5 ways you can contribute to OpenSpeech:
1212

1313
- Add new dataset recipe.
1414
- Implementing new models.
1515
- Share the weight file you trained.
1616
- Fixing outstanding issues with the existing code.
1717
- Submitting issues related to bugs or desired new features.
18-
18+
1919
### Do you want to add a new dataset recipe?
20-
21-
Grreat!! Please provide the following information:
22-
23-
- Short description of the dataset and link to the paper.
20+
21+
Grreat!! Please provide the following information:
22+
23+
- Short description of the dataset and link to the paper.
2424
- Indicate the license of the dataset.
2525
- Write a test code to prove that the code works well.
26-
26+
2727
We want to cover as many datasets as possible. Help us!
28-
28+
2929
### Do you want to implement a new model?
30-
31-
Awesome! Please provide the following information:
32-
30+
31+
Awesome! Please provide the following information:
32+
3333
- Short description of the model and link to the paper.
3434
- Link to the implementation if it is open-source.
3535
- Link to the model weights if they are available.
3636
- Please write a test code to prove that the code works well.
3737

38-
If you are willing to contribute the model yourself, let us know so we can best guide you.
39-
38+
If you are willing to contribute the model yourself, let us know so we can best guide you.
39+
4040
### Do you want to share the weight file?
41-
42-
Nice, Nice, So Nice!! Because OpenSpeech supports multiple datasets and many models, such contribution is essential.
43-
Please provide the following information:
44-
41+
42+
Nice, Nice, So Nice!! Because OpenSpeech supports multiple datasets and many models, such contribution is essential.
43+
Please provide the following information:
44+
4545
- Indicate which dataset and which model you trained.
4646
- Share the script you used when you started training.
4747
- Please share the link that can download the weight file.
48-
48+
4949
#### Did you find a bug?
50-
51-
First, we would really appreciate it if you could **make sure the bug was not already reported** (use the search bar on Github under Issues).
52-
53-
Did not find it? :( So we can act quickly on it, please follow these steps:
54-
50+
51+
First, we would really appreciate it if you could **make sure the bug was not already reported** (use the search bar on Github under Issues).
52+
53+
Did not find it? :( So we can act quickly on it, please follow these steps:
54+
5555
- Include your OS type and version, the versions of Python, PyTorch when applicable
5656
- Give to us a simple example of a code that we can reproduce.
57-
- Provide the full traceback if an exception is raised.
58-
57+
- Provide the full traceback if an exception is raised.
58+
5959
### Do you want a new feature (that is not a model)?
60-
60+
6161
A world-class feature request addresses the following points:
62-
62+
6363
1. Motivation first:
6464
- Is it related to a problem/frustration with the library? If so, please explain why. Providing a code snippet that demonstrates the problem is best.
6565
- Is it related to something you would need for a project? We'd love to hear about it!
@@ -68,12 +68,12 @@ A world-class feature request addresses the following points:
6868
3. Provide a code snippet that demonstrates its future use.
6969
4. In case this is related to a paper, please attach a link.
7070
5. Attach any additional information (drawings, screenshots, etc.) you think may help.
71-
71+
7272
If your issue is well written we're already 80% of the way there by the time you post it.
73-
73+
7474

7575
### Submitting a new issue or feature request
76-
77-
Do your best to follow these guidelines when submitting an issue or a feature request. It will make it easier for us to come back to you quickly and with good feedback.
76+
77+
Do your best to follow these guidelines when submitting an issue or a feature request. It will make it easier for us to come back to you quickly and with good feedback.
7878
Also, I want you to write in English when you write an issue or pull request. Because we hope as many people as possible can understand and see the issue or pull request.
79-
79+

CONTRIBUTORS.md

+23-22
Original file line numberDiff line numberDiff line change
@@ -1,53 +1,54 @@
11
## OpenSpeech's Contributors
2-
2+
33
It records developers and contributions that contributed to OpenSpeech.
4-
4+
55
### [Soohwan Kim](https://github.com/sooftware/)
6-
6+
77
- Creator, Lead Development, Main Contributor
88
- Program architecture design
99
- Model implementation list:
1010

1111
1. [**DeepSpeech2**](https://sooftware.github.io/openspeech/architectures/DeepSpeech2.html) (from Baidu Research) released with paper [Deep Speech 2: End-to-End Speech Recognition in English and Mandarin](https://arxiv.org/abs/1512.02595.pdf), by Dario Amodei, Rishita Anubhai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Jingdong Chen, Mike Chrzanowski, Adam Coates, Greg Diamos, Erich Elsen, Jesse Engel, Linxi Fan, Christopher Fougner, Tony Han, Awni Hannun, Billy Jun, Patrick LeGresley, Libby Lin, Sharan Narang, Andrew Ng, Sherjil Ozair, Ryan Prenger, Jonathan Raiman, Sanjeev Satheesh, David Seetapun, Shubho Sengupta, Yi Wang, Zhiqian Wang, Chong Wang, Bo Xiao, Dani Yogatama, Jun Zhan, Zhenyao Zhu.
1212
2. [**RNN-Transducer**](https://sooftware.github.io/openspeech/architectures/RNN%20Transducer.html) (from University of Toronto) released with paper [Sequence Transduction with Recurrent Neural Networks](https://arxiv.org/abs/1211.3711.pdf), by Alex Graves.
13-
3. [**LSTM Language Model**](https://sooftware.github.io/openspeech/architectures/LSTM%20LM.html) (from RWTH Aachen University) released with paper [LSTM Neural Networks for Language Modeling](http://www-i6.informatik.rwth-aachen.de/publications/download/820/Sundermeyer-2012.pdf), by Martin Sundermeyer, Ralf Schluter, and Hermann Ney.
14-
3. [**Listen Attend Spell**](https://sooftware.github.io/openspeech/architectures/Listen%20Attend%20Spell.html) (from Carnegie Mellon University and Google Brain) released with paper [Listen, Attend and Spell](https://arxiv.org/abs/1508.01211), by William Chan, Navdeep Jaitly, Quoc V. Le, Oriol Vinyals.
15-
4. [**Location-aware attention based Listen Attend Spell**](https://sooftware.github.io/openspeech/architectures/Listen%20Attend%20Spell.html) (from University of Wrocław and Jacobs University and Universite de Montreal) released with paper [Attention-Based Models for Speech Recognition](https://arxiv.org/abs/1506.07503), by Jan Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, Yoshua Bengio.
16-
5. [**Joint CTC-Attention based Listen Attend Spell**](https://sooftware.github.io/openspeech/architectures/Listen%20Attend%20Spell.html) (from Mitsubishi Electric Research Laboratories and Carnegie Mellon University) released with paper [Joint CTC-Attention based End-to-End Speech Recognition using Multi-task Learning](https://arxiv.org/abs/1609.06773), by Suyoun Kim, Takaaki Hori, Shinji Watanabe.
13+
3. [**LSTM Language Model**](https://sooftware.github.io/openspeech/architectures/LSTM%20LM.html) (from RWTH Aachen University) released with paper [LSTM Neural Networks for Language Modeling](http://www-i6.informatik.rwth-aachen.de/publications/download/820/Sundermeyer-2012.pdf), by Martin Sundermeyer, Ralf Schluter, and Hermann Ney.
14+
3. [**Listen Attend Spell**](https://sooftware.github.io/openspeech/architectures/Listen%20Attend%20Spell.html) (from Carnegie Mellon University and Google Brain) released with paper [Listen, Attend and Spell](https://arxiv.org/abs/1508.01211), by William Chan, Navdeep Jaitly, Quoc V. Le, Oriol Vinyals.
15+
4. [**Location-aware attention based Listen Attend Spell**](https://sooftware.github.io/openspeech/architectures/Listen%20Attend%20Spell.html) (from University of Wrocław and Jacobs University and Universite de Montreal) released with paper [Attention-Based Models for Speech Recognition](https://arxiv.org/abs/1506.07503), by Jan Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, Yoshua Bengio.
16+
5. [**Joint CTC-Attention based Listen Attend Spell**](https://sooftware.github.io/openspeech/architectures/Listen%20Attend%20Spell.html) (from Mitsubishi Electric Research Laboratories and Carnegie Mellon University) released with paper [Joint CTC-Attention based End-to-End Speech Recognition using Multi-task Learning](https://arxiv.org/abs/1609.06773), by Suyoun Kim, Takaaki Hori, Shinji Watanabe.
1717
6. [**Deep CNN Encoder with Joint CTC-Attention Listen Attend Spell**](https://sooftware.github.io/openspeech/architectures/Listen%20Attend%20Spell.html) (from Mitsubishi Electric Research Laboratories and Massachusetts Institute of Technology and Carnegie Mellon University) released with paper [Advances in Joint CTC-Attention based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM](https://arxiv.org/abs/1706.02737), by Takaaki Hori, Shinji Watanabe, Yu Zhang, William Chan.
18-
7. [**Multi-head attention based Listen Attend Spell**](https://sooftware.github.io/openspeech/architectures/Listen%20Attend%20Spell.html) (from Google) released with paper [State-of-the-art Speech Recognition With Sequence-to-Sequence Models](https://arxiv.org/abs/1712.01769), by Chung-Cheng Chiu, Tara N. Sainath, Yonghui Wu, Rohit Prabhavalkar, Patrick Nguyen, Zhifeng Chen, Anjuli Kannan, Ron J. Weiss, Kanishka Rao, Ekaterina Gonina, Navdeep Jaitly, Bo Li, Jan Chorowski, Michiel Bacchiani.
18+
7. [**Multi-head attention based Listen Attend Spell**](https://sooftware.github.io/openspeech/architectures/Listen%20Attend%20Spell.html) (from Google) released with paper [State-of-the-art Speech Recognition With Sequence-to-Sequence Models](https://arxiv.org/abs/1712.01769), by Chung-Cheng Chiu, Tara N. Sainath, Yonghui Wu, Rohit Prabhavalkar, Patrick Nguyen, Zhifeng Chen, Anjuli Kannan, Ron J. Weiss, Kanishka Rao, Ekaterina Gonina, Navdeep Jaitly, Bo Li, Jan Chorowski, Michiel Bacchiani.
1919
8. [**Speech-Transformer**](https://sooftware.github.io/openspeech/architectures/Transformer.html) (from University of Chinese Academy of Sciences and Institute of Automation and Chinese Academy of Sciences) released with paper [Speech-Transformer: A No-Recurrence Sequence-to-Sequence Model for Speech Recognition](https://ieeexplore.ieee.org/document/8462506), by Linhao Dong; Shuang Xu; Bo Xu.
20-
9. [**VGG-Transformer**](https://sooftware.github.io/openspeech/architectures/Transformer.html) (from Facebook AI Research) released with paper [Transformers with convolutional context for ASR](https://arxiv.org/abs/1904.11660), by Abdelrahman Mohamed, Dmytro Okhonko, Luke Zettlemoyer.
20+
9. [**VGG-Transformer**](https://sooftware.github.io/openspeech/architectures/Transformer.html) (from Facebook AI Research) released with paper [Transformers with convolutional context for ASR](https://arxiv.org/abs/1904.11660), by Abdelrahman Mohamed, Dmytro Okhonko, Luke Zettlemoyer.
2121
10. [**Transformer with CTC**](https://sooftware.github.io/openspeech/architectures/Transformer.html) (from NTT Communication Science Laboratories, Waseda University, Center for Language and Speech Processing, Johns Hopkins University) released with paper [Improving Transformer-based End-to-End Speech Recognition with Connectionist Temporal Classification and Language Model Integration](https://www.isca-speech.org/archive/Interspeech_2019/pdfs/1938.pdf), by Shigeki Karita, Nelson Enrique Yalta Soplin, Shinji Watanabe, Marc Delcroix, Atsunori Ogawa, Tomohiro Nakatani.
2222
11. [**Joint CTC-Attention based Transformer**](https://sooftware.github.io/openspeech/architectures/Transformer.html)(from NTT Corporation) released with paper [Self-Distillation for Improving CTC-Transformer-based ASR Systems](https://www.isca-speech.org/archive/Interspeech_2020/pdfs/1223.pdf), by Takafumi Moriya, Tsubasa Ochiai, Shigeki Karita, Hiroshi Sato, Tomohiro Tanaka, Takanori Ashihara, Ryo Masumura, Yusuke Shinohara, Marc Delcroix.
2323
12. [**Transformer Language Model**](https://sooftware.github.io/openspeech/architectures/Transformer%20LM.html) (from Amazon Web Services) released with paper [Language Models with Transformers](https://arxiv.org/abs/1904.09408), by Chenguang Wang, Mu Li, Alexander J. Smola.
24-
12. [**Jasper**](https://sooftware.github.io/openspeech/modules/Encoders.html#module-openspeech.encoders.jasper) (from NVIDIA and New York University) released with paper [Jasper: An End-to-End Convolutional Neural Acoustic Model](https://arxiv.org/pdf/1904.03288.pdf), by Jason Li, Vitaly Lavrukhin, Boris Ginsburg, Ryan Leary, Oleksii Kuchaiev, Jonathan M. Cohen, Huyen Nguyen, Ravi Teja Gadde.
25-
13. [**QuartzNet**](https://sooftware.github.io/openspeech/modules/Encoders.html#module-openspeech.encoders.quartznet) (from NVIDIA and Univ. of Illinois and Univ. of Saint Petersburg) released with paper [QuartzNet: Deep Automatic Speech Recognition with 1D Time-Channel Separable Convolutions](https://arxiv.org/abs/1910.10261.pdf), by Samuel Kriman, Stanislav Beliaev, Boris Ginsburg, Jocelyn Huang, Oleksii Kuchaiev, Vitaly Lavrukhin, Ryan Leary, Jason Li, Yang Zhang.
26-
15. [**Conformer**](https://sooftware.github.io/openspeech/architectures/Conformer.html) (from Google) released with paper [Conformer: Convolution-augmented Transformer for Speech Recognition](https://arxiv.org/abs/2005.08100), by Anmol Gulati, James Qin, Chung-Cheng Chiu, Niki Parmar, Yu Zhang, Jiahui Yu, Wei Han, Shibo Wang, Zhengdong Zhang, Yonghui Wu, Ruoming Pang.
24+
12. [**Jasper**](https://sooftware.github.io/openspeech/modules/Encoders.html#module-openspeech.encoders.jasper) (from NVIDIA and New York University) released with paper [Jasper: An End-to-End Convolutional Neural Acoustic Model](https://arxiv.org/pdf/1904.03288.pdf), by Jason Li, Vitaly Lavrukhin, Boris Ginsburg, Ryan Leary, Oleksii Kuchaiev, Jonathan M. Cohen, Huyen Nguyen, Ravi Teja Gadde.
25+
13. [**QuartzNet**](https://sooftware.github.io/openspeech/modules/Encoders.html#module-openspeech.encoders.quartznet) (from NVIDIA and Univ. of Illinois and Univ. of Saint Petersburg) released with paper [QuartzNet: Deep Automatic Speech Recognition with 1D Time-Channel Separable Convolutions](https://arxiv.org/abs/1910.10261.pdf), by Samuel Kriman, Stanislav Beliaev, Boris Ginsburg, Jocelyn Huang, Oleksii Kuchaiev, Vitaly Lavrukhin, Ryan Leary, Jason Li, Yang Zhang.
26+
15. [**Conformer**](https://sooftware.github.io/openspeech/architectures/Conformer.html) (from Google) released with paper [Conformer: Convolution-augmented Transformer for Speech Recognition](https://arxiv.org/abs/2005.08100), by Anmol Gulati, James Qin, Chung-Cheng Chiu, Niki Parmar, Yu Zhang, Jiahui Yu, Wei Han, Shibo Wang, Zhengdong Zhang, Yonghui Wu, Ruoming Pang.
2727
16. [**Conformer with CTC**](https://sooftware.github.io/openspeech/architectures/Conformer.html) (from Northwestern Polytechnical University and University of Bordeaux and Johns Hopkins University and Human Dataware Lab and Kyoto University and NTT Corporation and Shanghai Jiao Tong University and Chinese Academy of Sciences) released with paper [Recent Developments on ESPNET Toolkit Boosted by Conformer](https://arxiv.org/abs/2010.13956.pdf), by Pengcheng Guo, Florian Boyer, Xuankai Chang, Tomoki Hayashi, Yosuke Higuchi, Hirofumi Inaguma, Naoyuki Kamo, Chenda Li, Daniel Garcia-Romero, Jiatong Shi, Jing Shi, Shinji Watanabe, Kun Wei, Wangyou Zhang, Yuekai Zhang.
2828
17. [**Conformer with LSTM Decoder**](https://sooftware.github.io/openspeech/architectures/Conformer.html) (from IBM Research AI) released with paper [On the limit of English conversational speech recognition](https://arxiv.org/abs/2105.00982.pdf), by Zoltán Tüske, George Saon, Brian Kingsbury.
29-
30-
29+
30+
3131
- Recipe:
3232
1. [LibriSpeech](https://www.openslr.org/12)
3333
2. [AISHELL-1](https://www.openslr.org/33/)
3434
3. [KsponSpeech](https://aihub.or.kr/aidata/105)
3535

3636

3737

38-
### [Sangchun Ha](https://github.com/hasangchun)
39-
38+
### [Sangchun Ha](https://github.com/upskyy)
39+
4040
- Maintainer, Main Contributor.
4141
- Code validation
42-
- Model implementation list:
43-
42+
- Model implementation list:
43+
4444
1. [**Transformer Transducer**](https://sooftware.github.io/openspeech/architectures/Transformer%20Transducer.html) (from Facebook AI) released with paper [Transformer-Transducer:
45-
End-to-End Speech Recognition with Self-Attention](https://arxiv.org/abs/1910.12977.pdf), by Ching-Feng Yeh, Jay Mahadeokar, Kaustubh Kalgaonkar, Yongqiang Wang, Duc Le, Mahaveer Jain, Kjell Schubert, Christian Fuegen, Michael L. Seltzer.
45+
End-to-End Speech Recognition with Self-Attention](https://arxiv.org/abs/1910.12977.pdf), by Ching-Feng Yeh, Jay Mahadeokar, Kaustubh Kalgaonkar, Yongqiang Wang, Duc Le, Mahaveer Jain, Kjell Schubert, Christian Fuegen, Michael L. Seltzer.
4646
2. [**ContextNet**](https://sooftware.github.io/openspeech/architectures/ContextNet.html) (from Google) released with paper [ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context](https://arxiv.org/abs/2005.03191), by Wei Han, Zhengdong Zhang, Yu Zhang, Jiahui Yu, Chung-Cheng Chiu, James Qin, Anmol Gulati, Ruoming Pang, Yonghui Wu.
47-
47+
3. [**Squeezeformer**](https://github.com/upskyy/Squeezeformer) (from Berkeley) released with paper [Squeezeformer: An Efficient Transformer for Automatic Speech Recognition](https://arxiv.org/pdf/2206.00888.pdf), by Sehoon Kim, Amir Gholami, Albert Shaw, Nicholas Lee, Karttikeya Mangalam, Jitendra Malik, Michael W. Mahoney, Kurt Keutzer.
48+
4849
- Beam search:
4950
1. RNN Transducer beam search
50-
2. Transformer Transducer beam search
51+
2. Transformer Transducer beam search
5152

5253
### [Soyoung Cho](https://github.com/SoYoungCho)
5354
- Main Contributor.
@@ -58,4 +59,4 @@ End-to-End Speech Recognition with Self-Attention](https://arxiv.org/abs/1910.12
5859

5960
### [Younghun Kim](https://github.com/dudgns0908)
6061
- Contributor.
61-
- Optimizing the KsponSpeech preprocessing
62+
- Optimizing the KsponSpeech preprocessing

0 commit comments

Comments
 (0)