查看原文
其他

ACL2023赶会必备,拿来即用之Experiments

刘沛羽 RUC AI Box 2022-12-14

© 作者|刘沛羽

机构|中国人民大学高瓴人工智能学院

研究方向|自然语言处理,模型压缩


本文从ACL2022年中的论文中整理出来了常见的数据集描述和基准方法,范围涵盖14个子领域20篇论文。文章也同步发布在AI Box知乎专栏(知乎搜索 AI Box专栏),欢迎大家在知乎专栏的文章下方评论留言,交流探讨!





本次整理主要有以下几个特点:
  1. 拿来即用,辅助写作。论文写作中一定会包含数据集介绍部分。对于通用的数据集,一般会有相对“标准”的描述方式。本文从顶会论文中收集和整理了不同数据集的描述,这有助于我们学习和积累规范的表达方式;
  2. 全面的数据集和基准方法收集和整理。本文涵盖了14个子领域20篇论文,虽然无法保证一篇不漏,但也竭尽可能覆盖了绝大多数主流数据集。本文可以让读者对NLP前沿任务有全面了解,同时对于感兴趣的领域又可快速找到基准方法、评测数据集和指标,容易快速上手复现相关论文和改进方法;
  3. 辅助规划实验。已公开的数据集非常多,但为了说明自己方法的有效性又不能全部做实验。本文整理的素材有助于读者规划自己实验任务和数据集时,有的放矢,且不偏不漏。


上次分享过拿来即用之Abstract和Related work,目的是让大家从别人论文中快速对相关领域有个准确的认识,感兴趣的可以移步:

《ACL2022赶会必备,拿来即用之Abstract和Related Work 》


关于数据集整理的文章很多,但是这些文章主要还是作为数据集的“资源池”,即包含数据集的官方链接、介绍等。而本文依然保持“赶会必备系列”的初衷,目的是辅助科研和写作。希望大家ACL 2023投稿顺利~


预训练语言模型

[1]On the Sensitivity and Stability of Model Interpretations in NLP

关键词:可解释性

任务与数据集:

  • text classification: SST-2, Yelp, AGNews

SST-2 and Yelp are sentiment classification tasks where models predict whether a review is negative (0) or positive (1). AGNews is to discriminate between world (0) and business (1) articles.

基准方法:

  • VaGrad, GradInp (gradient-based)

  • IngGrad, DeepLIFT (reference based)

  • Occlusion, LIME (perturbation based)

[2]Composable Sparse Fine-Tuning for Cross-Lingual Transfer

关键词:跨语言微调

任务与数据集:

  • part-of-speech tagging (POS), dependency parsing (DP): Universal Dependencies 2.7

  • named entity recognition (NER): MasakhaNER

  • natural language inference (NLI): AmericasNLI

基准方法:

  • MAD-X(adapter-based framework)

  • BITFIT

[3]Compression of Generative Pre-trained Language Models via Quantization

关键词:模型压缩

任务与数据集:

  • Language Modeling: WikiText2, Penn Treebank (PTB), WikiText103

The task of language modeling is to predict the probability distribution over a sequence of words.
  • Next Utterance Prediction: Persona-Chat

The task of next utterance prediction predicts the next utterance given the dialogue context. It tests the language understanding ability of generative models.
  • Abstractive Summarization: XSum

Abstractive summarization aims at generating a terse summary that captures the main ideas of the source article.

基准方法:

  • PACT, LSQ, LAQ

[4]AdapLeR: Speeding up Inference by Adaptive Length Reduction

关键词:模型加速

任务与数据集:

  • sentiment: SST-2, IMDB

  • paraphrase: MRPC

  • topic classification: AG’s News

  • knowledge extraction: DBpedia

  • NLI: MNLI

  • question answering: QNLI

  • hate speech: HateXplain

基准方法:

  • BERT-base (the backbone)

  • DistillBERT (static compression method)

  • PoWER-BERT, TR-BERT (length reduction methods)

[5]ABC: Attention with Bounded-memory Control

关键词:模型加速

任务与数据集:

  • Language Modeling: WikiText-103

  • Machine Translation: WMT14 EN-DE (Sentence-level translation), IWSLT14 ESEN (Document-level translation)

  • Masked Language Model Finetuning: BookCorpus, English Wikipedia, OpenWebText, and RealNews (pretrain) GLUE (fine-tuning)

基准方法:

  • Linformer

[6]PERFECT: Prompt-free and Efficient Few-shot Learning with Language Models

关键词:prompt

任务与数据集:

  • sentiment analysis datasets: SST-2, SST-5, MR, CR

  • subjectivity classification: SUBJ

  • question classification: TREC

  • natural language inference: CB, RTE

  • question answering: QNLI

  • word sense disambiguation: WiC

  • paraphrase detection: MRPC, QQP

基准方法:

  • PET

  • The standard fine-tuning

[7]A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models

关键词:prompt,多模态

任务与数据集:

  • visual question answering: VQAv2, OKVQA, GQA

  • image captioning: NoCaps, Flickr30k

  • categorical learning: miniImageNet

基准方法:

  • Frozen, PICa, SimVLM, Unified VLP (zero/few-shot vision-language learners)

  • Uniter_large, Oscar, SimVLM, VinVL, Unified VLP (full fine-tuned models)

  • VL-T5_no-vqa (pre-trained without visual question answering dataset)

  • Frozen and AFHN (miniImageNet)


表示学习

[8]A Contrastive Framework for Learning Sentence Representations from Pairwise and Triple-wise Perspective in Angular Space

关键词:对比学习

任务与数据集:

  • unsupervised demantic textual similarity: STS tasks 2012-2016, STS Benchmark, SICK-Relatedness

  • sentEval transfer tasks: MR, CR, SUBJ, MPQA, SST-2, TREC, MRPC

基准方法:

  • GloVe embeddings, Skip-thought, average BERT embeddings from the last layer, BERT-Flow, BERT-Whitening (representative methods)

  • ISBERT, CT-BERT, ConSERT, SimCSE (contrastive learning methods)


机器翻译

[9]Universal Conditional Masked Language Pre-training for Neural Machine Translation

关键词:预训练

任务与数据集:

  • autoregressive neural machine translation: En-Kk, De-En, En-Tr, En-Ro, En-Et, En-Fi, En-Lv, En-De, En-Cs, En-De, En-Fr

  • non-autoregressive neural machine translation: WMT14 En-De, WMT16 En-Ro and IWSLT14 En-De

基准方法:

  • mBART, mRASP, MASS, XLM, mBERT


信息检索

[10]Compact Token Representations with Contextual Quantization for Efficient Document Re-ranking

关键词:模型加速

任务与数据集:

  • passage and document ranking: MS MARCO

基准方法:

  • Choices of first-stage retrieval models: fast BM25 method, uniCOIL, Colbert

  • Re-ranking models and quantizers compared: BECR, PreTTR, BERT-base, TILDEv2


对话

[11]A Model-Agnostic Data Manipulation Method for Persona-based Dialogue Generation

任务与数据集:

  • PersonaChat

基准方法:

  • Back Translation (BT)

  • CVAE

  • Entropy Filter


推理

[12]Generated Knowledge Prompting for Commonsense Reasoning

任务与数据集:

  • NumerSense

NumerSense (Lin et al., 2020) consists of numerical statements about common objects and concepts where for each sentence we need to recover a masked number word.
  • CommonsenseQA (CSQA)

CommonsenseQA (CSQA) (Talmor et al., 2019) is a 5-way multiple-choice QA dataset about common world scenarios.
  • CommonsenseQA 2.0 (CSQA2)

CommonsenseQA 2.0 (CSQA2) (Talmor et al., 2021) is a binary classification dataset where we need to judge whether commonsense statements are true or false.
  • QASC

QASC (Khot et al., 2020) is an 8-way multiplechoice QA dataset about grade school science.

基准方法(Knowledge Generation Baselines):

  • No knowledge, Random sentences, Context sentences, Template-based, Retrieval-based


情感分析

[13]Adversarial Soft Prompt Tuning for Cross-Domain Sentiment Analysis

任务和数据集:

  • Amazon reviews dataset

基准方法:

  • BERT-DAAT

  • SENTIX_Fix

  • Standard fine-tuning

  • Fine-tuning + AT (Add the adversarial training operating on standard fine-tuning vanilla PLMs.)

  • Prompt-tuning(Hard) (Use a manually defined template “It is [MASK]” for prompt-tuning)

  • Prompt-tuning(Hard) + AT (Add the adversarial training operating on Prompt-tuning(Hard))


比喻解释

[14]Can Pre-trained Language Models Interpret Similes as Smart as Human?

任务和数据集:

  • The Simile Property Probing Task: General Corpus, Teacher-designed Quizzes

基准方法:

  • EMB

  • Meta4meaning

  • ConScore

  • MIUWE


多模态

[15]Vision-Language Pre-Training for Multimodal Aspect-Based Sentiment Analysis

任务和数据集:

  • TWITTER-2015, TWITTER-2017

基准方法:

  • RAN, UMT, OSCGA, RpBERT (multimodal aspect term extraction (MATE))

  • TomBERT, CapTrBERT (multimodal aspect sentiment classification (MASC))

  • SPAN, D-GCN, BART (joint aspect sentiment analysis (JASA))

  • UMT+TomBERT, OSCGA+TomBERT, UMT-collapsed, OSCGAcollapsed, RpBERTcollapsed, JML (Joint Multimodal Aspect-Sentiment Analysis (JMASA))


文本生成

[16]A Multi-Document Coverage Reward for RELAXed Multi-Document Summarization

关键词:多文档摘要

数据集:

  • Multi-News

  • Wikipedia Current Events Portal (WCEP)

基准方法:

  • HiMAP, Hierarchical Transformer, GraphSum, GraphSum + RoBERTa, BART-Long (Multi-News)

  • TSR, BERTReg, Submodular+ABS, BART-WCEP-DynE-5 (WCEP)


阅读理解

[17]AdaLoGN: Adaptive Logic Graph Network for Reasoning-Based Machine Reading Comprehension

数据集:

  • ReClor

  • LogiQA

基准方法:

  • BERT, RoBERTa, XLNet (pre-trained language model based methods)

  • DAGN, Focal Reasoner, LReasoner


代码理解

[18]A Neural Network Architecture for Program Understanding Inspired by Human Behaviors

任务和数据集:

  • code summarization: TL-CodeSum, Java subset of CodeSearchNet

  • code clone detection: BigCloneBench 2014 (BCB), BCB-F (new dataset)

基准方法:

  • CodeNN, NCS, Rencos, CodeBERT, PLBART (code summarization)

  • CodeBERT, PLBART, ASTNN, FA-AST (code clone detection)


信息抽取

[19]FormNet: Structural Encoding beyond Sequential Modeling in Form Document Information Extraction

任务和数据集

  • CORD

We evaluate on CORD (Park et al., 2019), which stands for the Consolidated Receipt Dataset for post-OCR parsing. The annotations are provided in 30 fine-grained semantic entities such as store name, menu price, table number, discount, etc.
  • FUNSD

FUNSD (Jaume et al., 2019) is a public dataset for form understanding in noisy scanned documents. It is a subset of the Truth Tobacco Industry Document (TTID)9. The dataset consists of 199 annotated forms with 9,707 entities and 31,485 word-level annotations for 4 entity types: header, question, answer, and other.
  • Payment

We use the large-scale payment data (Majumder et al., 2020) that consists of around 10K documents and 7 semantic entity labels from human annotators. The corpus comes from different vendors with different layout templates.

基准方法:

  • SPADE

  • UniLMv2

  • LayoutLMv1

  • DocFormer

  • LayoutLMv2

  • TILT

  • DocFormer


表格处理

[20]FORTAP: Using Formulas for Numerical-Reasoning-Aware Table Pretraining

任务和数据集:

  • Formula Prediction: Enron

  • Table Question Answering: HiTab

  • Cell Type Classification: DeEx

基准方法:

  • SpreadsheetCoder, TaPEx, TUTA (Formula Prediction)

  • TaPas, BERT, TaPEx, TUTA (Table Question Answering)

  • CNN^BERT, Bi-LSTM, TaBERT, TaPas, TUTA (Cell Type Classification)、


参考:

1. https://aclanthology.org/2022.acl-long.188

2. https://aclanthology.org/2022.acl-long.125

3. https://aclanthology.org/2022.acl-long.331

4. https://aclanthology.org/2022.acl-long.1

5. https://aclanthology.org/2022.acl-long.515

6. https://aclanthology.org/2022.acl-long.254

7. https://aclanthology.org/2022.acl-long.197

8. https://aclanthology.org/2022.acl-long.336

9. https://aclanthology.org/2022.acl-long.442

10. https://aclanthology.org/2022.acl-long.51

11. https://aclanthology.org/2022.acl-long.550

12. https://aclanthology.org/2022.acl-long.225

13. https://aclanthology.org/2022.acl-long.174

14. https://aclanthology.org/2022.acl-long.543

15. https://aclanthology.org/2022.acl-long.152

16. https://aclanthology.org/2022.acl-long.351

17. https://aclanthology.org/2022.acl-long.494

18. https://aclanthology.org/2022.acl-long.353

19. https://aclanthology.org/2022.acl-long.260

20. https://aclanthology.org/2022.acl-long.82


更多推荐

预训练模型哪家强?提示迁移学习为文本生成提供新思路——NAACL 2022论文解读

扩散模型与其在文本生成图像领域的应用


一文梳理图上的点击率预测模型



点击下方“阅读原文”前往知乎专栏
↓↓↓

您可能也对以下帖子感兴趣

文章有问题?点此查看未经处理的缓存