Improving Truthfulness of Headline Generation

05/02/2020
by   Kazuki Matsumaru, et al.
0

Most studies on abstractive summarization re-port ROUGE scores between system and ref-erence summaries. However, we have a con-cern about thetruthfulnessof generated sum-maries: whether all facts of a generated sum-mary are mentioned in the source text. Thispaper explores improving the truthfulness inheadline generation on two popular datasets.Analyzing headlines generated by the state-of-the-art encoder-decoder model, we showthat the model sometimes generates untruthfulheadlines. We conjecture that one of the rea-sons lies in untruthful supervision data usedfor training the model. In order to quantifythe truthfulness of article-headline pairs, weconsider the textual entailment of whether anarticle entails its headline. After confirmingquite a few untruthful instances in the datasets,this study hypothesizes that removing untruth-ful instances from the supervision data mayremedy the problem of the untruthful behav-iors of the model. Building a binary classifierthat predicts an entailment relation between anarticle and its headline, we filter out untruth-ful instances from the supervision data. Exper-imental results demonstrate that the headlinegeneration model trained on filtered supervi-sion data shows no clear difference in ROUGEscores but remarkable improvements in auto-matic and manual evaluations of the generatedheadlines.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro