
Deep Dive into MS MARCO Web Search: Unpacking Dataset Characteristics
29 Jun 2025
Explore a comprehensive analysis of the MS MARCO Web Search dataset, detailing its multilingual distribution and significant data skew

Crafting Real-World Queries: MS MARCO Web Search's Authentic Data
29 Jun 2025
Discover how MS MARCO Web Search meticulously selects and labels millions of real queries from Bing search logs

Introducing MS MARCO Web Search: A New Era for LLM and IR Data
28 Jun 2025
Witness the arrival of MS MARCO Web Search, the first colossal, authentic, and information-rich web dataset with millions of clicked query-document labels

Why New Datasets are Needed for Deep Learning-Enhanced IR
28 Jun 2025
This section critiques existing information retrieval benchmarks, noting their lack of web-scale data and highly-skewed multilingual queries

Challenges in Web-Scale Information Retrieval: From Keywords to Embeddings
27 Jun 2025
Explore the evolution of web-scale information retrieval, detailing the limitations of keyword matching, advancements in embedding-based retrieval

MS MARCO Web Search: Powering Next-Gen Information Access & Neural Indexers
27 Jun 2025
MS MARCO Web Search dataset provides real-world web data to mitigate LLM hallucination and update challenges, fostering research in neural indexers

16 Best Sklearn Datasets for Building Machine Learning Models
15 Apr 2023
Sklearn datasets are included as part of the scikit-learn (sklearn) library, so they come pre-installed with the library.

11 Torchvision Datasets for Computer Vision You Need to Know
26 Mar 2023
With torchvision datasets, developers can train and test their machine learning models on a range of tasks, such as image classification and object detection.

15 Excel Datasets for Data Analytics Beginners
19 Mar 2023
Excel is an indispensable tool for data manipulation, data visualization and statistical analysis. These are 15 Excel datasets for data analytics beginners.