查看原文
其他

100篇论文纵览语言模型推理能力

陈志朋 RUC AI Box 2022-12-14
© 作者|陈志朋
机构|中国人民大学高瓴人工智能学院

本文针对语言模型的推理能力,整理了近100篇论文,涉及相关数据集、模型与方法,包含了各大会议及预印本arXiv中的高质量论文。文章也同步发布在AI Box知乎专栏(知乎搜索 AI Box专栏),欢迎大家在知乎专栏的文章下方评论留言,交流探讨! 

引言自然语言处理作为人工智能领域的热门方向之一,在学术界和工业界都具有重要的地位。语言模型作为自然语言处理领域中的一个重要模块,受到了广泛的关注。近年来,随着技术的不断发展,语言模型的各项能力愈发强大,在部分任务上达到甚至超越了人类水平。本文针对语言模型的推理能力,整理了近100篇论文,涉及相关数据集、模型与方法,包含了各大会议及预印本arXiv中的高质量论文。欢迎大家在评论区交流讨论。

目录

  • 阅读理解&问答

  • 数值推理

  • 数学题推理

  • 符号推理

  • 预训练模型

  • 分析与讨论
1、阅读理解&问答

1.1 相关数据集

  • SQuAD: 100,000+ Questions for Machine Comprehension of Text  【SQuAD数据集】

  • Your Answer is Incorrect... Would you like to know why? Introducing a Bilingual Short Answer Feedback Dataset 【包含详细反馈的数据集】

1.2 相关模型与方法

  • AdaLoGN: Adaptive Logic Graph Network for Reasoning-Based Machine Reading Comprehension 【将图结构加入到阅读理解任务当中】

  • Deep Inductive Logic Reasoning for Multi-Hop Reading Comprehension

  • Improving Machine Reading Comprehension with Contextualized Commonsense Knowledge 【将常识加入到阅读理解任务当中】

  • Lite Unified Modeling for Discriminative Reading Comprehension 【使用统一的方法建模不同类型的阅读理解任务】

  • Answering Open-Domain Multi-Answer Questions via a Recall-then-Verify Framework 【使用两阶段的方法解决开放域问答任务】

  • Generated Knowledge Prompting for Commonsense Reasoning 【使用prompt增强常识推理】

  • Modeling Multi-hop Question Answering as Single Sequence Prediction

  • Open Domain Question Answering with A Unified Knowledge Interface 【使用data-to-text的方式处理开放域问答任务】

  • Program Transfer for Answering Complex Questions over Knowledge Bases 【基于知识库的复杂问答】

  • Retrieval-guided Counterfactual Generation for QA

  • RNG-KBQA: Generation Augmented Iterative Ranking for Knowledge Base Question Answering 【在KBQA领域中使用检索增强答案生成】

  • Sequence-to-Sequence Knowledge Graph Completion and Question Answering

  • Simulating Bandit Learning from User Feedback for Extractive Question Answering 【通过用户反馈进行学习】

  • Subgraph Retrieval Enhanced Model for Multi-hop Knowledge Base Question Answering 【提出了新的子图检索模块】

  • From Good to Best: Two-Stage Training for Cross-lingual Machine Reading Comprehension 【将多阶段训练运用到阅读理解任务中】

  • Language Models of Code are Few-Shot Commonsense Learners 【使用代码相关的预训练语言模型处理常识推理任务】

  • Retrieval Augmentation for Commonsense Reasoning: A Unified Approach 【使用检索增强常识推理任务】

2、数值推理

2.1 相关数据集

  • MultiHiertt: Numerical Reasoning over Multi Hierarchical Tabular and Textual Data 【财务方面的数据集】

2.2 相关模型与方法

  • ELASTIC: Numerical Reasoning with Adaptive Symbolic Compiler 【将符号与数值分离开进行处理】

  • Inversely Eliciting Numerical Reasoning in Language Models via Solving Linear Systems 【优化了数值推理任务中数值的表示方法】

3、数学题推理

3.1 相关数据集

  • Are NLP Models really able to Solve Simple Math Word Problems? 【SVAMP英文数据集】

  • Deep Neural Solver for Math Word Problems. 【Math23k中文数据集】

  • MAWPS: A Math Word Problem Repository 【MAWPS英文数据集】

  • A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers 【ASDiv英文数据集】

  • Training Verifiers to Solve Math Word Problems 【GSM8k英文数据集】

  • Measuring Mathematical Problem Solving With the MATH Dataset 【MATH英文数据集】

  • How Well Do Computers Solve Math Word Problems? Large-Scale Dataset Construction and Evaluation 【Dophin18k英文数据集】

  • MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based【MathQA英文数据集】

  • NumGLUE: A Suite of Fundamental yet Challenging Mathematical Reasoning Tasks 【NumGLUE数据集】

  • Lila: A Unified Benchmark for Mathematical Reasoning 【大规模benchmark】

  • Unbiased Math Word Problems Benchmark for Mitigating Solving Bias 【无偏的数学应用题数据集】

3.2 相关模型与方法

  • Mapping to Declarative Knowledge for Word Problem Solving

  • Deep Neural Solver for Math Word Problems 【使用Seq2Seq模型解决数学问题】

  • A Goal-Driven Tree-Structured Neural Model for Math Word Problems 【使用树结构解码数学表达式】

  • Semantically-Aligned Universal Tree-Structured Solver for Math Word Problems 【使用树结构解码数学表达式,并在语义上进行对齐】

  • Graph-to-Tree Learning for Solving Math Word Problems 【使用图结构编码数学题,使用树结构解码数学表达式】

  • LogicSolver: Towards Interpretable Math Word Problem Solving with Logical Prompt-enhanced Learning 【使用检索公式的方式强化语言模型的数学知识】

  • Generate & Rank: A Multi-task Framework for Math Word Problems 【使用先生成再排序的方式解决数学问题】

  • On the Advance of Making Language Models Better Reasoners 【使用prompt+verifier的方式增强模型的推理能力】

  • Tackling Math Word Problems with Fine-to-Coarse Abstracting and Reasoning 【对模型进行不同粒度的训练】

  • Heterogeneous Line Graph Transformer for Math Word Problems 【使用图结构和Transformer解决数学问题】

  • Improving Compositional Generalization in Math Word Problem Solving 【探究模型在数学题上的组合泛化能力】

  • Learning to Reason Deductively: Math Word Problem Solving as Complex Relation Extraction 【使用关系抽取的方式解决数学问题】

  • Learning by Fixing: Solving Math Word Problems with Weak Supervision 【弱监督下的数学问题求解】

  • UniGeo: Unifying Geometry Logical Reasoning via Reformulating Mathematical Expression 【对推理过程进行统一】

  • Multi-View Reasoning: Consistent Contrastive Learning for Math Word Problem  【通过多种表达式联合预测数学应用题答案】

  • Solving Math Word Problem via Cooperative Reasoning induced Language Models

  • Seeking Diverse Reasoning Logic: Controlled Equation Expression Generation for Solving Math Word Problems  【通过不同的token引导模型生成多样化答案】

  • Practice Makes a Solver Perfect: Data Augmentation for Math Word Problem Solvers 【使用数据增强的方式提升模型的解题能力】

  • HyperTree Proof Search for Neural Theorem Proving

  • NaturalProver: Grounded Mathematical Proof Generation with Language Models

  • Autoformalization with Large Language Models

  • Thor: Wielding Hammers to Integrate Language Models and Automated Theorem Provers

4、代码生成

  • Multilingual Code Snippets Training for Program Translation 【使用多种方式进行数据增强】

5、预训练模型

  • Pretrained Language Models are Symbolic Mathematics Solvers too!

  • Large Language Models are Zero-Shot Reasoners 【使用Chain-of-Though增强大规模模型的推理能力】

  • Self-Consistency Improves Chain of Thought Reasoning in Language Models

  • SHOW YOUR WORK: SCRATCHPADS FOR INTERMEDIATE COMPUTATION WITH LANGUAGE MODELS 【使用草稿纸提升模型的推理能力】

  • Autoformalization with Large Language Models

  • MWP-BERT: Numeracy-Augmented Pre-training for Math Word Problem Solving

  • CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation

  • Unsupervised Translation of Programming Languages 【使用翻译的方式增强预训练】

  • Solving Quantitative Reasoning Problems with Language Models 【使用网页和arXiv上的文章进行预训练】

  • reStructured Pre-training 【使用prompt的方式统一预训练任务】

  • A Neural Network Solves, Explains, and Generates University Math Problems by Program Synthesis and Few-Shot Learning at Human Level 【使用模型将根据题目生成代码,然后运行代码得到答案】

  • MMTM: Multi-Tasking Multi-Decoder Transformer for Math Word Problems 【使用共享Encoder进行多任务预训练】

  • Continual Pre-training of Language Models for Math Problem Understanding with Syntax-Aware Memory Network 【基于语义图结构的预训练模型】

  • JiuZhang: A Chinese Pre-trained Language Model for Mathematical Problem Understanding 【基于数学语料的中文预训练模型】

  • LinkBERT: Pretraining Language Models with Document Links 【使用文档链接增强模型对不同知识点的关联能力】

  • Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks 【使用代码代替CoT的中间过程】

  • CBEAF-Adapting: Enhanced Continual Pretraining for Building Chinese Biomedical Language Model 【添加少量参数使得PLM能够快速进行新领域的预训练】

  • STaR: Self-Taught Reasoner Bootstrapping Reasoning With Reasoning 【模型对自己产生的负例进行学习】

  • Complexity-Based Prompting for Multi-step Reasoning 【使用prompt技术增强大模型的推理能力】

6、分析与讨论

  • Language models show human-like content effects on reasoning 【讨论模型推理能力的局限性】

  • ElitePLM: An Empirical Study on General Language Ability Evaluation of Pretrained Language Models 【分析预训练语言模型的各项能力】

  • Exploring Length Generalization in Large Language Models 【分析模型在推理中长度泛化问题】

  • Why are NLP Models Fumbling at Elementary Math? A Survey of Deep Learning based Word Problem Solvers 【数学应用题相关综述】

  • What Makes Reading Comprehension Questions Difficult? 【分析模型在不同推理类型的阅读理解任务上的表现】

  • How Do We Answer Complex Questions: Discourse Structure of Long-form  Answers 【分析模型对于长答案的处理】

  • Do Language Models Understand Measurements? 【分析语言模型对数值的理解能力】

  • Investigating Math Word Problems using Pretrained Multilingual Language Models 【探究多语预训练语言模型对于数学应用题的解题能力】

  • Limitations of Language Models in Arithmetic and Symbolic Induction 【模型在推理任务中的局限性】

  • A Systematic Investigation of Commonsense Knowledge in Large Language Models 【评估语言模型的常识推理能力】

更多推荐



Multimodality in Medical Domain:一文速览医学多模态进展



首次公开超参配置!更用户友好的伯乐四期来啦!



WSDM 2023 | GNN竟是动态规划?轻量级特征交互图网络它来了!


您可能也对以下帖子感兴趣

文章有问题?点此查看未经处理的缓存