research
          
      
      ∙
      02/08/2023
    Unsupervised Learning of Initialization in Deep Neural Networks via Maximum Mean Discrepancy
Despite the recent success of stochastic gradient descent in deep learni...
          
            research
          
      
      ∙
      10/03/2022
    A Non-monotonic Self-terminating Language Model
Recent large-scale neural autoregressive sequence models have shown impr...
          
            research
          
      
      ∙
      09/25/2019
    Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models
In natural language processing, it has been observed recently that gener...
          
            research
          
      
      ∙
      09/29/2018