Adversarial Self-Attention for Language Understanding

06/25/2022
by   Hongqiu Wu, et al.
0

An ultimate language system aims at the high generalization and robustness when adapting to diverse scenarios. Unfortunately, the recent white hope pre-trained language models (PrLMs) barely escape from stacking excessive parameters to the over-parameterized Transformer architecture to achieve higher performances. This paper thus proposes Adversarial Self-Attention mechanism (ASA), which adversarially reconstructs the Transformer attentions and facilitates model training from contaminated model structures, coupled with a fast and simple implementation for better PrLM building. We conduct comprehensive evaluation across a wide range of tasks on both pre-training and fine-tuning stages. For pre-training, ASA unfolds remarkable performance gain compared to regular training for longer periods. For fine-tuning, ASA-empowered models consistently outweigh naive models by a large margin considering both generalization and robustness.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro