Improved Multi-Stage Training of Online Attention-based Encoder-Decoder Models

12/28/2019
by   Abhinav Garg, et al.
0

In this paper, we propose a refined multi-stage multi-task training strategy to improve the performance of online attention-based encoder-decoder (AED) models. A three-stage training based on three levels of architectural granularity namely, character encoder, byte pair encoding (BPE) based encoder, and attention decoder, is proposed. Also, multi-task learning based on two-levels of linguistic granularity namely, character and BPE, is used. We explore different pre-training strategies for the encoders including transfer learning from a bidirectional encoder. Our encoder-decoder models with online attention show 35 smaller and bigger models, respectively. Our models achieve a word error rate (WER) of 5.04 bigger models respectively after fusion with long short-term memory (LSTM) based external language model (LM).

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro