cover

Deep Dive into MS MARCO Web Search: Unpacking Dataset Characteristics

29 Jun 2025

Explore a comprehensive analysis of the MS MARCO Web Search dataset, detailing its multilingual distribution and significant data skew

cover

Crafting Real-World Queries: MS MARCO Web Search's Authentic Data

29 Jun 2025

Discover how MS MARCO Web Search meticulously selects and labels millions of real queries from Bing search logs

cover

Introducing MS MARCO Web Search: A New Era for LLM and IR Data

28 Jun 2025

Witness the arrival of MS MARCO Web Search, the first colossal, authentic, and information-rich web dataset with millions of clicked query-document labels

cover

Why New Datasets are Needed for Deep Learning-Enhanced IR

28 Jun 2025

This section critiques existing information retrieval benchmarks, noting their lack of web-scale data and highly-skewed multilingual queries

cover

Challenges in Web-Scale Information Retrieval: From Keywords to Embeddings

27 Jun 2025

Explore the evolution of web-scale information retrieval, detailing the limitations of keyword matching, advancements in embedding-based retrieval

cover

MS MARCO Web Search: Powering Next-Gen Information Access & Neural Indexers

27 Jun 2025

MS MARCO Web Search dataset provides real-world web data to mitigate LLM hallucination and update challenges, fostering research in neural indexers

cover

16 Best Sklearn Datasets for Building Machine Learning Models

15 Apr 2023

Sklearn datasets are included as part of the scikit-learn (sklearn) library, so they come pre-installed with the library.

cover

11 Torchvision Datasets for Computer Vision You Need to Know

26 Mar 2023

With torchvision datasets, developers can train and test their machine learning models on a range of tasks, such as image classification and object detection.

cover

15 Excel Datasets for Data Analytics Beginners

19 Mar 2023

Excel is an indispensable tool for data manipulation, data visualization and statistical analysis. These are 15 Excel datasets for data analytics beginners.