
References for Web-Scale Information Retrieval Challenges
2 Jul 2025
A list of scholarly references at the intersection of deep learning in information retrieval, large-scale approximate nearest neighbor search

Navigating Skew: Addressing Language & Domain Biases in Web Data
2 Jul 2025
Explore the challenges posed by high-skewed language and topic distributions in web data, acknowledging potential model biases

Mind the Gap: End-to-End Quality Drop with ANN in Web Search AI
2 Jul 2025
Discover how integrating ANN indices leads to a substantial drop in final retrieval quality compared to brute-force search

From Embeddings to ANN: Practical Performance on MS MARCO Web Search
1 Jul 2025
Dive into the practical evaluation of embedding models and ANN algorithms on MS MARCO Web Search, revealing insights into real-world search system behavior.

Measuring Search Excellence: Result Quality and System Performance
1 Jul 2025
Explore the robust evaluation framework for MS MARCO Web Search baselines, covering both result quality and system performance under resource constraints.

Establishing Baselines: MS MARCO Web Search's Foundational Methods
1 Jul 2025
Explore the cutting-edge embedding models and disk-based ANN algorithms selected as initial baselines for the new MS MARCO search benchmark.

MS MARCO Web Search: Unveiling Initial Benchmark Results
1 Jul 2025
Explore the foundational benchmark results on the MS MARCO Web Search 100M dataset, featuring state-of-the-art embedding models

Unlocking Web Search AI: MS MARCO's Three Grand Challenges
1 Jul 2025
Discover how MS MARCO Web Search sparks new research, posing formidable challenges in large-scale embedding model generalization

Deep Dive into MS MARCO Web Search: Unpacking Dataset Characteristics
29 Jun 2025
Explore a comprehensive analysis of the MS MARCO Web Search dataset, detailing its multilingual distribution and significant data skew