Skip to content

Commit c0d38b5

Browse files
committed
release
0 parents  commit c0d38b5

File tree

154 files changed

+1833
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

154 files changed

+1833
-0
lines changed

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
*.swp
2+
*.DS*
3+
*__pycache__*
4+
*.pyc

.gitmodules

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
[submodule "vctk-silence-labels"]
2+
path = vctk-silence-labels
3+
url = https://github.com/nii-yamagishilab/vctk-silence-labels.git

Dockerfile

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
From nvcr.io/nvidia/pytorch:20.11-py3
2+
3+
Expose 6006 6007 6008 6009
4+
5+
Run apt-get update && apt-get install -y \
6+
software-properties-common
7+
Run add-apt-repository universe
8+
Run apt-get update && apt-get install -y \
9+
curl \
10+
git \
11+
ffmpeg \
12+
libjpeg-dev \
13+
libpng-dev
14+
15+
Run pip3 install --upgrade pip
16+
Run pip3 uninstall tensorboard -y \
17+
nvidia-tensorboard -y \
18+
jupyter-tensorboard -y \
19+
tensorboard-plugin-wit -y \
20+
tensorboard-plugin-dlprof -y
21+
Run pip3 install ffmpeg
22+
Run pip3 install prefetch_generator
23+
Run pip3 install librosa==0.8.0
24+
Run pip3 install omegaconf==2.0.6
25+
Run pip3 install pytorch_lightning==1.2.10
26+
27+
Run ldconfig && \
28+
apt-get clean && \
29+
apt-get autoremove && \
30+
rm -rf /var/lib/apt/lists/* /tmp/*
31+
32+
WORKDIR /workspace
33+
34+
COPY *py /workspace/
35+
COPY *yaml /workspace/
36+
COPY utils /workspace/utils

LICENSE

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
BSD 3-Clause License
2+
3+
Copyright (c) 2022, MINDsLab
4+
All rights reserved.
5+
6+
Redistribution and use in source and binary forms, with or without
7+
modification, are permitted provided that the following conditions are met:
8+
9+
1. Redistributions of source code must retain the above copyright notice, this
10+
list of conditions and the following disclaimer.
11+
12+
2. Redistributions in binary form must reproduce the above copyright notice,
13+
this list of conditions and the following disclaimer in the documentation
14+
and/or other materials provided with the distribution.
15+
16+
3. Neither the name of the copyright holder nor the names of its
17+
contributors may be used to endorse or promote products derived from
18+
this software without specific prior written permission.
19+
20+
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
21+
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
22+
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
23+
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
24+
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
25+
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
26+
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
27+
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
28+
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
29+
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

README.md

Lines changed: 186 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,186 @@
1+
# NU-Wave2 — Official PyTorch Implementation
2+
3+
**NU-Wave 2: A General Neural Audio Upsampling Model for Various Sampling Rates**<br>
4+
Seungu Han, Junhyeok Lee @ [MINDsLab Inc.](https://github.com/mindslab-ai), SNU
5+
6+
[![arXiv](https://img.shields.io/badge/arXiv-2206.08545-brightgreen.svg?style=flat-square)](https://arxiv.org/abs/2206.08545) [![GitHub Repo stars](https://img.shields.io/github/stars/mindslab-ai/nuwave2?color=yellow&label=NU-Wave2&logo=github&style=flat-square)](https://github.com/mindslab-ai/nuwave2) [![githubio](https://img.shields.io/badge/GitHub.io-Audio_Samples-blue?logo=Github&style=flat-square)](https://mindslab-ai.github.io/nuwave2/)
7+
8+
Official Pytorch+[Lightning](https://github.com/PyTorchLightning/pytorch-lightning) Implementation for NU-Wave 2.
9+
10+
![](./docs/sampling.gif)
11+
12+
## Requirements
13+
- [Pytorch](https://pytorch.org/) >=1.7.0 for nn.SiLU(swish activation)
14+
- [Pytorch-Lightning](https://github.com/PyTorchLightning/pytorch-lightning)==1.2.10
15+
- The requirements are highlighted in [requirements.txt](./requirements.txt).
16+
- We also provide docker setup [Dockerfile](./Dockerfile).
17+
18+
## Clone our Repository
19+
```bash
20+
git clone --recursive https://github.com/mindslab-ai/nuwave2.git
21+
cd nuwave2
22+
```
23+
24+
## Preprocessing
25+
Before running our project, you need to download and preprocess dataset to `.wav` files
26+
1. Download [VCTK dataset](https://datashare.ed.ac.uk/handle/10283/3443)
27+
2. Remove speaker `p280` and `p315`
28+
3. Modify path of downloaded dataset `data:base_dir` in `hparameter.yaml`
29+
4. run `utils/flac2wav.py`
30+
```shell script
31+
python utils/flac2wav.py
32+
```
33+
34+
## Training
35+
1. Adjust `hparameter.yaml`, especially `train` section.
36+
```yaml
37+
train:
38+
batch_size: 12 # Dependent on GPU memory size
39+
lr: 2e-4
40+
weight_decay: 0.00
41+
num_workers: 8 # Dependent on CPU cores
42+
gpus: 2 # number of GPUs
43+
opt_eps: 1e-9
44+
beta1: 0.9
45+
beta2: 0.99
46+
```
47+
- Adjust `data` section in `hparameters.yaml`.
48+
```yaml
49+
data:
50+
timestamp_path: 'vctk-silence-labels/vctk-silences.0.92.txt'
51+
base_dir: '/DATA1/VCTK-0.92/wav48_silence_trimmed/'
52+
dir: '/DATA1/VCTK-0.92/wav48_silence_trimmed_wav/' #dir/spk/format
53+
format: '*mic1.wav'
54+
cv_ratio: (100./108., 8./108., 0.00) #train/val/test
55+
```
56+
2. run `trainer.py`.
57+
```shell script
58+
$ python trainer.py
59+
```
60+
- If you want to resume training from checkpoint, check parser.
61+
```python
62+
parser = argparse.ArgumentParser()
63+
parser.add_argument('-r', '--resume_from', type =int,\
64+
required = False, help = "Resume Checkpoint epoch number")
65+
parser.add_argument('-s', '--restart', action = "store_true",\
66+
required = False, help = "Significant change occured, use this")
67+
parser.add_argument('-e', '--ema', action = "store_true",\
68+
required = False, help = "Start from ema checkpoint")
69+
args = parser.parse_args()
70+
```
71+
- During training, tensorboard logger is logging loss, spectrogram and audio.
72+
```shell script
73+
$ tensorboard --logdir=./tensorboard --bind_all
74+
```
75+
76+
![](./docs/images/train_loss.png)
77+
![](./docs/images/spec.png)
78+
79+
## Evaluation
80+
run `for_test.py`
81+
```shell script
82+
python for_test.py -r {checkpoint_number} {-e:option, if ema} {--save:option}
83+
```
84+
Please check parser.
85+
```python
86+
parser = argparse.ArgumentParser()
87+
parser.add_argument('-r', '--resume_from', type =int,
88+
required = True, help = "Resume Checkpoint epoch number")
89+
parser.add_argument('-e', '--ema', action = "store_true",
90+
required = False, help = "Start from ema checkpoint")
91+
parser.add_argument('--save', action = "store_true",
92+
required = False, help = "Save file")
93+
parser.add_argument('--sr', type=int, \
94+
required=True, help="input sampling rate")
95+
```
96+
97+
## Inference
98+
- run `inference.py`
99+
```shell script
100+
python inference.py -c {checkpoint_path} -i {input audio} --sr {Sampling rate of input audio} {--steps:option} {--gt:option}
101+
```
102+
Please check parser.
103+
```python
104+
parser = argparse.ArgumentParser()
105+
parser.add_argument('-c',
106+
'--checkpoint',
107+
type=str,
108+
required=True,
109+
help="Checkpoint path")
110+
parser.add_argument('-i',
111+
'--wav',
112+
type=str,
113+
default=None,
114+
help="audio")
115+
parser.add_argument('--sr',
116+
type=int,
117+
required=True,
118+
help="Sampling rate of input audio")
119+
parser.add_argument('--steps',
120+
type=int,
121+
required=False,
122+
help="Steps for sampling")
123+
parser.add_argument('--gt', action="store_true",
124+
required=False, help="Whether the input audio is 48 kHz ground truth audio.")
125+
parser.add_argument('--device',
126+
type=str,
127+
default='cuda',
128+
required=False,
129+
help="Device, 'cuda' or 'cpu'")
130+
```
131+
132+
## References
133+
This implementation uses code from following repositories:
134+
- [official NU-Wave pytorch implementation](https://github.com/mindslab-ai/nuwave)
135+
- [revsic's Jax/Flax implementation of Variational-DiffWave](https://github.com/revsic/jax-variational-diffwave)
136+
- [ivanvovk's WaveGrad pytorch implementation](https://github.com/ivanvovk/WaveGrad)
137+
- [lmnt-com's DiffWave pytorch implementation](https://github.com/lmnt-com/diffwave)
138+
- [NVlabs' SPADE pytorch implementation](https://github.com/NVlabs/SPADE)
139+
- [pkumivision's FFC pytorch implementation](https://github.com/pkumivision/FFC)
140+
141+
This README and the webpage for the audio samples are inspired by:
142+
- [Tips for Publishing Research Code](https://github.com/paperswithcode/releasing-research-code)
143+
- [Audio samples webpage of DCA](https://google.github.io/tacotron/publications/location_relative_attention/)
144+
- [Cotatron](https://github.com/mindslab-ai/cotatron/)
145+
- [Audio samples wabpage of WaveGrad](https://wavegrad.github.io)
146+
147+
The audio samples on our [webpage](https://mindslab-ai.github.io/nuwave2/) are partially derived from:
148+
- [VCTK dataset(0.92)](https://datashare.ed.ac.uk/handle/10283/3443): 46 hours of English speech from 108 speakers.
149+
- [LJSpeech](https://keithito.com/LJ-Speech-Dataset/): a single-speaker English dataset consists of 13100 short audio clips of a female speaker reading passages from 7 non-fiction books, approximately 24 hours in total.
150+
151+
## Repository Structure
152+
```
153+
.
154+
|-- Dockerfile
155+
|-- LICENSE
156+
|-- README.md
157+
|-- dataloader.py # Dataloader for train/val(=test)
158+
|-- diffusion.py # DPM
159+
|-- for_test.py # Test with for_loop.
160+
|-- hparameter.yaml # Config
161+
|-- inference.py # Inference
162+
|-- lightning_model.py # NU-Wave 2 implementation.
163+
|-- model.py # NU-Wave 2 model based on lmnt-com's DiffWave implementation
164+
|-- requirements.txt # requirement libraries
165+
|-- trainer.py # Lightning trainer
166+
|-- utils
167+
| |-- flac2wav.py # Preprocessing
168+
| |-- stft.py # STFT layer
169+
| `-- tblogger.py # Tensorboard Logger for lightning
170+
|-- docs # For github.io
171+
| |-- ...
172+
`-- vctk-silence-labels # For trimming
173+
|-- ...
174+
```
175+
176+
## Citation & Contact
177+
If this repository useful for your research, please consider citing!
178+
```bib
179+
@article{han2022nu,
180+
title={NU-Wave 2: A General Neural Audio Upsampling Model for Various Sampling Rates},
181+
author={Han, Seungu and Lee, Junhyeok},
182+
journal={arXiv preprint arXiv:2206.08545},
183+
year={2022}
184+
}
185+
```
186+
If you have a question or any kind of inquiries, please contact Seungu Han at [[email protected]](mailto:[email protected])

0 commit comments

Comments
 (0)