Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
85e1152
update hubert and mert in opencpop
South-Twilight Aug 28, 2023
be6e500
update opencpop hubert(in progress)
South-Twilight Aug 29, 2023
121a151
add f0 in hubert(WIP)
South-Twilight Sep 5, 2023
95160c2
update gitigonre
South-Twilight Sep 5, 2023
d68da20
discrete unit: add process of mfcc
South-Twilight Sep 5, 2023
2b4db7e
merge: update merge
South-Twilight Sep 5, 2023
c39020d
to(1): update base info
South-Twilight Sep 5, 2023
5e0e580
sync hubert/mert/mfcc
South-Twilight Sep 6, 2023
6c27921
sync with remote(2023.9.6)
South-Twilight Sep 6, 2023
8a8dc8f
feat: add f0
South-Twilight Sep 14, 2023
f601448
fix merge conflict: merge add_f0
South-Twilight Sep 14, 2023
bd56257
update run.sh
South-Twilight Sep 14, 2023
a10a575
feat: use pretrain feature embedding as input(for top-line)
South-Twilight Sep 14, 2023
94add27
feat: merge f0 and use pretrain feature
South-Twilight Sep 14, 2023
1c0f192
feat: use different layer of pretrain model
South-Twilight Sep 18, 2023
42f8aac
feat(egs/opencpop/hubert_voc1, hifigan.py):
South-Twilight Oct 23, 2023
8a9b948
fix(hifigan.py : DiscreteSymbolF0Generator)
South-Twilight Oct 25, 2023
56ecd19
feat(hubert_voc): add evaluate MCD
South-Twilight Oct 26, 2023
c4c3478
fix(hubert_voc/conf):
South-Twilight Oct 31, 2023
9850df5
Merge remote-tracking branch 'origin/master'
South-Twilight Oct 31, 2023
f805522
fix(hubert_voc1/run.sh):
South-Twilight Oct 31, 2023
b27b05b
Merge remote-tracking branch 'origin/master'
South-Twilight Oct 31, 2023
e3359da
fix(hubert/voc1): update the way to calculate f0
South-Twilight Nov 13, 2023
f7e9c96
Merge branch 'master' of https://github.com/South-Twilight/ParallelWa…
South-Twilight Nov 13, 2023
7427151
feat(egs/opencpop/hubert_voc1): add multi-stream layer and fix some b…
South-Twilight Dec 23, 2023
f7249ed
refactor(egs/*/hubert_voc1->egs/*/token_voc1): refactor hubert_voc to…
South-Twilight Feb 2, 2024
6107aaf
Merge branch 'kan-bayashi:master' into master
South-Twilight Feb 2, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,9 @@ coverage.xml*
egs/*/*/data
egs/*/*/downloads
egs/*/*/dump
egs/*/*/dump*
egs/*/*/wav_dump
egs/*/*/wav_dump*
egs/*/*/exp
egs/*/*/conf/tuning

Expand Down
2 changes: 1 addition & 1 deletion egs/csd/voc1/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ n_jobs=8 # number of parallel jobs in feature extraction
conf=conf/hifigan.v1.yaml

# directory path setting
download_dir=downloads # set the directory to your database
download_dir="/data2/tyx/dataset" # set the directory to your database
dumpdir=dump # directory to dump features

# training related setting
Expand Down
4 changes: 2 additions & 2 deletions egs/ofuton_p_utagoe_db/voc1/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ n_jobs=8 # number of parallel jobs in feature extraction
conf=conf/hifigan.v1.yaml

# directory path setting
download_dir=downloads # set the directory to your database
download_dir=/data3/tyx/dataset # set the directory to your database
dumpdir=dump # directory to dump features

# training related setting
Expand All @@ -33,7 +33,7 @@ checkpoint="" # checkpoint path to be used for decoding
# shellcheck disable=SC1091
. utils/parse_options.sh || exit 1;

train_set="train_nodev" # name of training data directory
train_set="train" # name of training data directory
dev_set="dev" # name of development data direcotry
eval_set="eval" # name of evaluation data direcotry

Expand Down
91 changes: 91 additions & 0 deletions egs/opencpop/token_voc1/cmd.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
# ====== About run.pl, queue.pl, slurm.pl, and ssh.pl ======
# Usage: <cmd>.pl [options] JOB=1:<nj> <log> <command...>
# e.g.
# run.pl --mem 4G JOB=1:10 echo.JOB.log echo JOB
#
# Options:
# --time <time>: Limit the maximum time to execute.
# --mem <mem>: Limit the maximum memory usage.
# -–max-jobs-run <njob>: Limit the number parallel jobs. This is ignored for non-array jobs.
# --num-threads <ngpu>: Specify the number of CPU core.
# --gpu <ngpu>: Specify the number of GPU devices.
# --config: Change the configuration file from default.
#
# "JOB=1:10" is used for "array jobs" and it can control the number of parallel jobs.
# The left string of "=", i.e. "JOB", is replaced by <N>(Nth job) in the command and the log file name,
# e.g. "echo JOB" is changed to "echo 3" for the 3rd job and "echo 8" for 8th job respectively.
# Note that the number must start with a positive number, so you can't use "JOB=0:10" for example.
#
# run.pl, queue.pl, slurm.pl, and ssh.pl have unified interface, not depending on its backend.
# These options are mapping to specific options for each backend and
# it is configured by "conf/queue.conf" and "conf/slurm.conf" by default.
# If jobs failed, your configuration might be wrong for your environment.
#
#
# The official documentaion for run.pl, queue.pl, slurm.pl, and ssh.pl:
# "Parallelization in Kaldi": http://kaldi-asr.org/doc/queue.html
# =========================================================~


# Select the backend used by run.sh from "local", "stdout", "sge", "slurm", or "ssh"
cmd_backend="local"

# Local machine, without any Job scheduling system
if [ "${cmd_backend}" = local ]; then

# The other usage
export train_cmd="utils/run.pl"
# Used for "*_train.py": "--gpu" is appended optionally by run.sh
export cuda_cmd="utils/run.pl"
# Used for "*_recog.py"
export decode_cmd="utils/run.pl"

# Local machine, without any Job scheduling system
elif [ "${cmd_backend}" = stdout ]; then

# The other usage
export train_cmd="utils/stdout.pl"
# Used for "*_train.py": "--gpu" is appended optionally by run.sh
export cuda_cmd="utils/stdout.pl"
# Used for "*_recog.py"
export decode_cmd="utils/stdout.pl"

# "qsub" (SGE, Torque, PBS, etc.)
elif [ "${cmd_backend}" = sge ]; then
# The default setting is written in conf/queue.conf.
# You must change "-q g.q" for the "queue" for your environment.
# To know the "queue" names, type "qhost -q"
# Note that to use "--gpu *", you have to setup "complex_value" for the system scheduler.

export train_cmd="utils/queue.pl"
export cuda_cmd="utils/queue.pl"
export decode_cmd="utils/queue.pl"

# "sbatch" (Slurm)
elif [ "${cmd_backend}" = slurm ]; then
# The default setting is written in conf/slurm.conf.
# You must change "-p cpu" and "-p gpu" for the "partion" for your environment.
# To know the "partion" names, type "sinfo".
# You can use "--gpu * " by defualt for slurm and it is interpreted as "--gres gpu:*"
# The devices are allocated exclusively using "${CUDA_VISIBLE_DEVICES}".

export train_cmd="utils/slurm.pl"
export cuda_cmd="utils/slurm.pl"
export decode_cmd="utils/slurm.pl"

elif [ "${cmd_backend}" = ssh ]; then
# You have to create ".queue/machines" to specify the host to execute jobs.
# e.g. .queue/machines
# host1
# host2
# host3
# Assuming you can login them without any password, i.e. You have to set ssh keys.

export train_cmd="utils/ssh.pl"
export cuda_cmd="utils/ssh.pl"
export decode_cmd="utils/ssh.pl"

else
echo "$0: Error: Unknown cmd_backend=${cmd_backend}" 1>&2
return 1
fi
185 changes: 185 additions & 0 deletions egs/opencpop/token_voc1/conf/hifigan_token_16k_duration.v1.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,185 @@
# This configuration is based on HiFiGAN V1, derived
# from official repository (https://github.com/jik876/hifi-gan).

# for 16k pre-trained model token, like HuBERT, WavLM, XSL-R...

###########################################################
# FEATURE EXTRACTION SETTING #
###########################################################
sampling_rate: 16000 # Sampling rate.
fft_size: null # FFT size.
hop_size: 320 # Hop size.
win_length: null # Window length.
# If set to null, it will be the same as fft_size.
window: null # Window function.
num_mels: 1 # Number of mel basis.
fmin: null # Minimum freq in mel basis calculation.
fmax: null # Maximum frequency in mel basis calculation.
global_gain_scale: 1.0 # Will be multiplied to all of waveform.
trim_silence: false # Whether to trim the start and end of silence.
trim_threshold_in_db: 20 # Need to tune carefully if the recording is not good.
trim_frame_size: 1024 # Frame size in trimming.
trim_hop_size: 256 # Hop size in trimming.
format: "hdf5" # Feature file format. "npy" or "hdf5" is supported.

###########################################################
# GENERATOR NETWORK ARCHITECTURE SETTING #
###########################################################
generator_type: DiscreteSymbolDurationGenerator
generator_params:
in_channels: 1024 # Number of input channels.
out_channels: 1 # Number of output channels.
channels: 512 # Number of initial channels.
num_embs: 1024
kernel_size: 7 # Kernel size of initial and final conv layers.
upsample_scales: [10, 8, 2, 2] # Upsampling scales.
upsample_kernel_sizes: [20, 16, 4, 4] # Kernel size for upsampling layers.
resblock_kernel_sizes: [3, 7, 11] # Kernel size for residual blocks.
resblock_dilations: # Dilations for residual blocks.
- [1, 3, 5]
- [1, 3, 5]
- [1, 3, 5]
use_additional_convs: true # Whether to use additional conv layer in residual blocks.
bias: true # Whether to use bias parameter in conv.
nonlinear_activation: "LeakyReLU" # Nonlinear activation type.
nonlinear_activation_params: # Nonlinear activation paramters.
negative_slope: 0.1
use_weight_norm: true # Whether to apply weight normalization.
duration_layers: 2 # Duration predictor layers
duration_chans: 384 # Duration predictor conv channels
duration_kernel_size: 3 # Duration predictor kernel size
duration_offset: 1.0 # Duration predictor offset
duration_dropout_rate: 0.5 # Duration predictor dropout
num_spk_embs: 0 # Do not consider speaker embedding for single spk

###########################################################
# DISCRIMINATOR NETWORK ARCHITECTURE SETTING #
###########################################################
discriminator_type: HiFiGANMultiScaleMultiPeriodDiscriminator
discriminator_params:
scales: 3 # Number of multi-scale discriminator.
scale_downsample_pooling: "AvgPool1d" # Pooling operation for scale discriminator.
scale_downsample_pooling_params:
kernel_size: 4 # Pooling kernel size.
stride: 2 # Pooling stride.
padding: 2 # Padding size.
scale_discriminator_params:
in_channels: 1 # Number of input channels.
out_channels: 1 # Number of output channels.
kernel_sizes: [15, 41, 5, 3] # List of kernel sizes.
channels: 128 # Initial number of channels.
max_downsample_channels: 1024 # Maximum number of channels in downsampling conv layers.
max_groups: 16 # Maximum number of groups in downsampling conv layers.
bias: true
downsample_scales: [4, 4, 4, 4, 1] # Downsampling scales.
nonlinear_activation: "LeakyReLU" # Nonlinear activation.
nonlinear_activation_params:
negative_slope: 0.1
follow_official_norm: true # Whether to follow the official norm setting.
periods: [2, 3, 5, 7, 11] # List of period for multi-period discriminator.
period_discriminator_params:
in_channels: 1 # Number of input channels.
out_channels: 1 # Number of output channels.
kernel_sizes: [5, 3] # List of kernel sizes.
channels: 32 # Initial number of channels.
downsample_scales: [3, 3, 3, 3, 1] # Downsampling scales.
max_downsample_channels: 1024 # Maximum number of channels in downsampling conv layers.
bias: true # Whether to use bias parameter in conv layer."
nonlinear_activation: "LeakyReLU" # Nonlinear activation.
nonlinear_activation_params: # Nonlinear activation paramters.
negative_slope: 0.1
use_weight_norm: true # Whether to apply weight normalization.
use_spectral_norm: false # Whether to apply spectral normalization.

###########################################################
# STFT LOSS SETTING #
###########################################################
use_stft_loss: false # Whether to use multi-resolution STFT loss.
use_duration_loss: true # Whether to use duration prediction loss (need for duration predictor)
duration_loss_params:
offset: 1.0
reduction: mean
use_mel_loss: true # Whether to use Mel-spectrogram loss.
mel_loss_params: # Mel-spectrogram loss parameters.
fs: 16000
fft_size: 1024
hop_size: 256
win_length: null
window: "hann"
num_mels: 80
fmin: 0
fmax: 8000
log_base: null # Log base. If set to null, use natural logarithm.
generator_adv_loss_params:
average_by_discriminators: false # Whether to average loss by #discriminators.
discriminator_adv_loss_params:
average_by_discriminators: false # Whether to average loss by #discriminators.
use_feat_match_loss: true
feat_match_loss_params:
average_by_discriminators: false # Whether to average loss by #discriminators.
average_by_layers: false # Whether to average loss by #layers in each discriminator.
include_final_outputs: true # Whether to include final outputs in feat match loss calculation.

###########################################################
# ADVERSARIAL LOSS SETTING #
###########################################################
lambda_aux: 45.0 # Loss balancing coefficient for STFT loss.
lambda_adv: 1.0 # Loss balancing coefficient for adversarial loss.
lambda_feat_match: 2.0 # Loss balancing coefficient for feat match loss..

###########################################################
# DATA LOADER SETTING #
###########################################################
batch_size: 16 # Batch size.
batch_max_steps: 10240 # Length of each audio in batch. Make sure dividable by hop_size.
pin_memory: true # Whether to pin memory in Pytorch DataLoader.
num_workers: 0 # Number of workers in Pytorch DataLoader.
remove_short_samples: false # Whether to remove samples the length of which are less than batch_max_steps.
allow_cache: true # Whether to allow cache in dataset. If true, it requires cpu memory.

###########################################################
# OPTIMIZER & SCHEDULER SETTING #
###########################################################
generator_optimizer_type: Adam
generator_optimizer_params:
lr: 2.0e-4
betas: [0.5, 0.9]
weight_decay: 0.0
generator_scheduler_type: MultiStepLR
generator_scheduler_params:
gamma: 0.5
milestones:
- 200000
- 400000
- 600000
- 800000
generator_grad_norm: -1
discriminator_optimizer_type: Adam
discriminator_optimizer_params:
lr: 2.0e-4
betas: [0.5, 0.9]
weight_decay: 0.0
discriminator_scheduler_type: MultiStepLR
discriminator_scheduler_params:
gamma: 0.5
milestones:
- 200000
- 400000
- 600000
- 800000
discriminator_grad_norm: -1

###########################################################
# INTERVAL SETTING #
###########################################################
generator_train_start_steps: 1 # Number of steps to start to train discriminator.
discriminator_train_start_steps: 0 # Number of steps to start to train discriminator.
train_max_steps: 2500000 # Number of training steps.
save_interval_steps: 50000 # Interval steps to save checkpoint.
eval_interval_steps: 1000 # Interval steps to evaluate the network.
log_interval_steps: 100 # Interval steps to record the training log.

###########################################################
# OTHER SETTING #
###########################################################
num_save_intermediate_results: 4 # Number of results to be saved as intermediate results.
Loading