![]() ![]() VGG2L (RNN/custom encoder) and Conv2D (custom encoder) bottlenecks.Custom encoder and decoder supporting Transformer, Conformer (encoder), 1D Conv / TDNN (encoder) and causal 1D Conv (decoder) blocks.Incorporate RNNLM/LSTMLM/TransformerLM/N-gram trained only with text data.Attention: Dot product, location-aware attention, variants of multi-head.Decoder: RNN (LSTM/GRU), Transformer, or S4.Encoder: VGG-like CNN + BiRNN (LSTM/GRU), sub-sampling BiRNN (LSTM/GRU), Transformer, Conformer, Branchformer, or E-Branchformer.CTC/attention joint decoding to boost monotonic alignment decoding.Fast/accurate training with CTC/attention multitask training.Hybrid CTC/attention based end-to-end ASR. ![]() State-of-the-art performance in several ASR benchmarks (comparable/superior to hybrid DNN/HMM and CTC).Support singing voice synthesis recipe (ofuton_p_utagoe_db).Support speaker diarization recipe (mini_librispeech, librimix).Support voice conversion recipe (VCC2020 baseline).Support numbers of SE/SS recipes (DNS-IS2020, LibriMix, SMS-WSJ, VCTK-noisyreverb, WHAM!, WHAMR!, WSJ-2mix, etc.).Support numbers of SLU recipes (CATSLU-MAPS, FSC, Grabo, IEMOCAP, JDCINAL, SNIPS, SLURP, SWBD-DA, etc.).Support numbers of MT recipes (IWSLT'14, IWSLT'16, the above ST recipes etc.).Support numbers of ST recipes (Fisher-CallHome Spanish, Libri-trans, IWSLT'18, How2, Must-C, Mboshi-French, etc.).Support numbers of TTS recipes with a similar manner to the ASR recipe (LJSpeech, LibriTTS, M-AILABS, etc.).Support numbers of ASR recipes (WSJ, Switchboard, CHiME-4/5, Librispeech, TED, CSJ, AMI, HKUST, Voxforge, REVERB, etc.).ESPnet: end-to-end speech processing toolkit system/pytorch ver.ĮSPnet is an end-to-end speech processing toolkit covering end-to-end speech recognition, text-to-speech, speech translation, speech enhancement, speaker diarization, spoken language understanding, and so on.ĮSPnet uses pytorch as a deep learning engine and also follows Kaldi style data processing, feature extraction/format, and recipes to provide a complete setup for various speech processing experiments. ![]()
0 Comments
Leave a Reply. |