Intro

MS MARCO dataset (Microsoft MAchine Reading COmprehension) is a widely used benchmark for research in Natural Language Processing, Information Retrieval, and question-answering (QA) systems.

img

Starting with a paper released at NIPS 2016, MS MARCO is a collection of datasets focused on deep learning in search.

The first dataset was a question answering dataset featuring 100,000 real Bing questions and a human generated answer. Since then we released a 1,000,000 question dataset, a natural language generation dataset, a passage ranking dataset, keyphrase extraction dataset, crawling dataset, and a conversational search.