Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

conformer test script in example output none #220

Open
LeonKennedy opened this issue Aug 30, 2021 · 5 comments
Open

conformer test script in example output none #220

LeonKennedy opened this issue Aug 30, 2021 · 5 comments

Comments

@LeonKennedy
Copy link

hi, thank you developed this great toolkit and open source.but i have some problem and cant solve by myself.

i run train.py in examples/conforme.After 20 epoches, loss reduce to 6.92。 but i get none from test.py,so wer and cer is all 1. (none means greedy and beamsearch all empty).

i think i was follow the instruction in readme.

python version:3.9
tf version: 2.5
rnnloss in tf
use characters

@LeonKennedy
Copy link
Author

config.yml :

speech_config:
sample_rate: 16000
frame_ms: 25
stride_ms: 10
num_feature_bins: 80
feature_type: log_mel_spectrogram
preemphasis: 0.97
normalize_signal: True
normalize_feature: True
normalize_per_frame: False

decoder_config:
vocabulary: /mnt/data/conformer/alphabet.txt
target_vocab_size: 1000
max_subword_length: 10
blank_at_zero: True
beam_width: 0
norm_score: True
corpus_files:
- /mnt/data/conformer/corpus

model_config:
name: conformer
encoder_subsampling:
type: conv2d
filters: 144
kernel_size: 3
strides: 2
encoder_positional_encoding: sinusoid
encoder_dmodel: 144
encoder_num_blocks: 16
encoder_head_size: 36
encoder_num_heads: 4
encoder_mha_type: relmha
encoder_kernel_size: 32
encoder_fc_factor: 0.5
encoder_dropout: 0.1
prediction_embed_dim: 320
prediction_embed_dropout: 0
prediction_num_rnns: 1
prediction_rnn_units: 320
prediction_rnn_type: lstm
prediction_rnn_implementation: 2
prediction_layer_norm: True
prediction_projection_units: 0
joint_dim: 320
prejoint_linear: True
joint_activation: tanh
joint_mode: add

learning_config:
train_dataset_config:
use_tf: True
augmentation_config:
feature_augment:
time_masking:
num_masks: 10
mask_factor: 100
p_upperbound: 0.05
freq_masking:
num_masks: 1
mask_factor: 27
data_paths:
- /mnt/data/conformer/train.tsv
tfrecords_dir: /mnt/data/conformer/tfrecords_1030
shuffle: True
cache: True
buffer_size: 100
drop_remainder: True
stage: train

eval_dataset_config:
use_tf: True
data_paths:
- /mnt/data/conformer/test.tsv
tfrecords_dir: /mnt/data/conformer/tfrecords_1031
shuffle: False
cache: True
buffer_size: 100
drop_remainder: True
stage: eval

test_dataset_config:
use_tf: False
data_paths:
- /mnt/data/conformer/test.tsv
tfrecords_dir: null
shuffle: False
cache: True
buffer_size: 100
drop_remainder: True
stage: test

optimizer_config:
warmup_steps: 40000
beta_1: 0.9
beta_2: 0.98
epsilon: 1e-9

running_config:
verbose: 2
batch_size: 8
num_epochs: 20
checkpoint:
filepath: /mnt/data/conformer/checkpoints/{epoch:02d}.h5
save_best_only: True
save_weights_only: True
save_freq: epoch
states_dir: /mnt/data/conformer/states
tensorboard:
log_dir: /mnt/data/conformer/tensorboard
histogram_freq: 1
write_graph: True
write_images: True
update_freq: epoch
profile_batch: 2

@LeonKennedy
Copy link
Author

only one 2080ti / 10G
train log:

2021-08-27 17:25:35.729887: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:Please provide a TPU Name to connect to.
2021-08-27 17:25:36.701098: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-08-27 17:25:36.763453: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:b3:00.0 name: GeForce RTX 2080 Ti computeCapability: 7.5
coreClock: 1.545GHz coreCount: 68 deviceMemorySize: 10.73GiB deviceMemoryBandwidth: 573.69GiB/s
2021-08-27 17:25:36.763954: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 1 with properties:
pciBusID: 0000:17:00.0 name: GeForce RTX 2060 computeCapability: 7.5
coreClock: 1.695GHz coreCount: 30 deviceMemorySize: 5.79GiB deviceMemoryBandwidth: 312.97GiB/s
2021-08-27 17:25:36.763979: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-08-27 17:25:36.766569: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-08-27 17:25:36.766640: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-08-27 17:25:36.767434: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-08-27 17:25:36.767659: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-08-27 17:25:36.768415: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2021-08-27 17:25:36.769071: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-08-27 17:25:36.769187: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-08-27 17:25:36.771323: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0, 1
INFO:tensorflow:Run on 1 Physical GPUs
2021-08-27 17:25:36.771845: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-08-27 17:25:36.773595: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:b3:00.0 name: GeForce RTX 2080 Ti computeCapability: 7.5
coreClock: 1.545GHz coreCount: 68 deviceMemorySize: 10.73GiB deviceMemoryBandwidth: 573.69GiB/s
2021-08-27 17:25:36.774851: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-08-27 17:25:36.774887: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-08-27 17:25:37.270748: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-08-27 17:25:37.270789: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0
2021-08-27 17:25:37.270797: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N
2021-08-27 17:25:37.272713: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8919 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:b3:00.0, compute capability: 7.5)
WARNING:tensorflow:NCCL is not supported when using virtual GPUs, fallingback to reduction to one device
WARNING:tensorflow:Collective ops is not configured at program startup. Some performance features may not be enabled.
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
INFO:tensorflow:Use RNNT loss in TensorFlow
INFO:tensorflow:Use characters ...
INFO:tensorflow:Reading /mnt/data/conformer/train.tsv ...
2021-08-27 17:25:41.096444: I tensorflow_io/core/kernels/cpu_check.cc:128] Your CPU supports instructions that this TensorFlow IO binary was not compiled to use: AVX2 AVX512F FMA
WARNING:tensorflow:From /home/hello/anaconda3/envs/tf2_5/lib/python3.9/site-packages/tensorflow/python/ops/parallel_for/pfor.py:2380: calling gather (from tensorflow.python.ops.array_ops) with validate_indices is deprecated and will be removed in a future version.
Instructions for updating:
The validate_indices argument has no effect. Indices are always validated on CPU and never validated on GPU.
WARNING:tensorflow:Using a while_loop for converting IO>AudioResample
INFO:tensorflow:Reading /mnt/data/conformer/test.tsv ...
WARNING:tensorflow:Using a while_loop for converting IO>AudioResample
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).

....

Epoch 15/20
6250/6250 - 2291s - loss: 6.9487 - val_loss: 6.9367
Epoch 16/20
6250/6250 - 2272s - loss: 6.9390 - val_loss: 6.9335
Epoch 17/20
6250/6250 - 2272s - loss: 6.9364 - val_loss: 6.9258
Epoch 18/20
6250/6250 - 2273s - loss: 6.9333 - val_loss: 6.9232
Epoch 19/20
6250/6250 - 2271s - loss: 6.9280 - val_loss: 6.9276
Epoch 20/20
6250/6250 - 2269s - loss: 6.9253 - val_loss: 6.9219

@LeonKennedy
Copy link
Author

i used voice data collected by myself . the data have been certificated by deepspeech another toolkit.

here is test output file:
greedy and beamsearch is empty
截屏2021-08-30 下午6 08 05

@torthefirst
Copy link

I'm seeing the same result :0

Config.yml:

speech_config:
sample_rate: 16000
frame_ms: 25
stride_ms: 10
num_feature_bins: 80
feature_type: log_mel_spectrogram
preemphasis: 0.97
normalize_signal: True
normalize_feature: True
normalize_per_frame: False

decoder_config:
vocabulary: /home/sl-t01/skavenger.agent-2.3/nn/speech/conformer/librispeech_train_4_4076.subwords
target_vocab_size: 1000
max_subword_length: 10
blank_at_zero: True
beam_width: 0
norm_score: True
corpus_files:
- /home/sl-t01/skavenger.agent-2.3/nn/speech/conformer/train/transcripts.tsv

model_config:
name: conformer
encoder_subsampling:
type: conv2d
filters: 144
kernel_size: 3
strides: 2
encoder_positional_encoding: sinusoid
encoder_dmodel: 144
encoder_num_blocks: 16
encoder_head_size: 36
encoder_num_heads: 4
encoder_mha_type: relmha
encoder_kernel_size: 32
encoder_fc_factor: 0.5
encoder_dropout: 0.1
prediction_embed_dim: 320
prediction_embed_dropout: 0
prediction_num_rnns: 1
prediction_rnn_units: 320
prediction_rnn_type: lstm
prediction_rnn_implementation: 2
prediction_layer_norm: True
prediction_projection_units: 0
joint_dim: 320
prejoint_linear: True
joint_activation: tanh
joint_mode: add

learning_config:
train_dataset_config:
use_tf: True
augmentation_config:
feature_augment:
time_masking:
num_masks: 10
mask_factor: 100
p_upperbound: 0.05
freq_masking:
num_masks: 1
mask_factor: 27
data_paths:
- /home/sl-t01/skavenger.agent-2.3/nn/speech/conformer/train/transcripts.tsv
tfrecords_dir: null
shuffle: True
cache: True
buffer_size: 100
drop_remainder: True
stage: train

eval_dataset_config:
use_tf: True
data_paths:
- /home/sl-t01/skavenger.agent-2.3/nn/speech/conformer/eval/transcripts.tsv
tfrecords_dir: null
shuffle: False
cache: True
buffer_size: 100
drop_remainder: True
stage: eval

test_dataset_config:
use_tf: True
data_paths:
- /home/sl-t01/skavenger.agent-2.3/nn/speech/conformer/test/transcripts.tsv
tfrecords_dir: null
shuffle: False
cache: True
buffer_size: 100
drop_remainder: True
stage: test

optimizer_config:
warmup_steps: 40000
beta_1: 0.9
beta_2: 0.98
epsilon: 1e-9

running_config:
batch_size: 32
num_epochs: 1
checkpoint:
filepath: /home/sl-t01/skavenger.agent-2.3/nn/speech/conformer/checkpoints/{epoch:02d}.h5
save_best_only: True
save_weights_only: True
save_freq: epoch
states_dir: /home/sl-t01/skavenger.agent-2.3/nn/speech/conformer/states
tensorboard:
log_dir: /home/sl-t01/skavenger.agent-2.3/nn/speech/conformer/tensorboard
histogram_freq: 1
write_graph: True
write_images: True
update_freq: epoch
profile_batch: 32

@jinmo-yang
Copy link

I have a same problem. Any updates on this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants