LLMSearchRecommender

Awesome Generative AI in Search, Recommendation, Personalization

Generative AI and LLMs for Search, Recommender, Personalization Engines

The goal of this repository is to survey and review generative AI and LLM-based methods for building large-scale search and recommender engines. see also LLM Evaluation methods repository

LLM and Search and Recommender Engines

Search Surveys

A Survey of Large Language Model Empowered Agents for Recommendation and Search: Towards Next-Generation Information Retrieval, Mar 2025, arxiv
A Survey on Knowledge-Oriented Retrieval-Augmented Generation, Mar 2025, arxiv
From Matching to Generation: A Survey on Generative Information Retrieval, Feb 2025, Journal Version, ACM Transaction on Information Systems, Feb 2025
Cross-Modal Retrieval: A Systematic Review of Methods and Future Directions, Jan 2025, IEEE
Improving Recommendation Systems & Search in the Age of LLMs by Eugene Neyan, Mar 2025, blog post
A Survey of Model Architectures in Information Retrieval, Jan 2025, arxiv
A Survey of Conversational Search, Oct 2024, arxiv
Large language models for generative information extraction: a survey, 2024, Front Comp Sci
From Matching to Generation: A Survey on Generative Information Retrieval, Apr 2024, arxiv
Dense Text Retrieval Based on Pretrained Language Models: A Survey, Feb 2024, ACM
Retrieval-Augmented Generation for Large Language Models: A Survey, 2023, simg
Large Language Models for Information Retrieval: A Survey, Aug 2023, arxiv
Recommender Engine Surveys
A Comprehensive Survey on Cross-Domain Recommendation: Taxonomy, Progress, and Prospects, Mar 2025. arxiv
A Survey on LLM-powered Agents for Recommender Systems, Feb 2025, arxiv: RE: based on DeepSeek-R methods for training reasoning and interleaved LLMs calling search as a tool.
How Can Recommender Systems Benefit from Large Language Models: A Survey, ACM Transactions on Information Systems 2025
Graph Foundation Models for Recommendation: A Comprehensive Survey, Feb 2025, arxiv
Recommender Systems in the Era of Large Language Models (LLMs), TKDE Nov 2024 by subscription
Multimodal Pretraining, Adaptation, and Generation for Recommendation: A Survey, Jul 2024, arxiv
A Review of Modern Recommender Systems Using Generative Models (Gen-RecSys), KDD 2024 pdf
A Comprehensive Survey on Retrieval Methods in Recommender Systems, Jul 2024, arxiv
A Survey of Generative Search and Recommendation in the Era of Large Language Models, Apr 2024, arxiv
A survey on large language models for recommendation, WWW 2024 Springer
Towards Next-Generation LLM-based Recommender Systems: A Survey and Beyond, Oct 2024, arxiv
Exploring the Impact of Large Language Models on Recommender Systems: An Extensive Review, Feb 2024, arxiv
Pre-train, Prompt, and Recommendation: A Comprehensive Survey of Language Modeling Paradigm Adaptations in Recommender Systems , Dec 2023, MIT
Conferences, Workshops
2025 SIGIR Workshop on eCommerce
RecSys
KDD 2024’ Workshop on Generative AI for Recommender Systems and Personalization
CIKM 2024 1st Workshop on Multimodal Search and Recommendations
SIGIR 2024 Workshop on eCommerce ECOM24
WWW 2024 The 2nd Workshop on Recommendation with Generative Models
EACL 2024 Workshop on Personalization of Generative AI Systems
SIGIR 2024 The First Workshop on Large Language Models (LLMs) for Evaluation in Information Retrieval
SIGIR 2024 The Second Workshop on Generative Information Retrieval
Industrial conferences
Haystack Haystack
Activate Activate
Tutorials

Software, libraries, frameworks
Nvidia Merlin Recommender systems, including Transformer4Rec
OpenP5 RecSys23 tutorial
FreshLLM and similar architectures (LLM and large scale search)
ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning, Mar 2025, arxiv
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning, Mar 2025, arxiv
When Search Engine Services Meet Large Language Models: Visions and Challenges, Dec 2024, IEEE
Long-form factuality in large language models, Mar 2024, arxiv
When to Retrieve: Teaching LLMs to Utilize Information Retrieval Effectively, Apr 2024, arxiv
Enhancing Noise Robustness of Retrieval-Augmented Language Models with Adaptive Adversarial Training, May 2024, arxiv
FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation, Oct 2023. arxiv
Gorilla: Large Language Model Connected with Massive APIs, May 2023, arxiv
Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions, Dec 2022, arxiv
Conversational Search
A Survey of Conversational Search, Oct 2024, arxiv
Engineering Conversational Search Systems: A Review of Applications, Architectures, and Functional Components, Jul 2024, arxiv
ChatRetriever: Adapting Large Language Models for Generalized and Robust Conversational Dense Retrieval, Apr 2024, arxiv
CoSearchAgent: A Lightweight Collaborative Search Agent with Large Language Models, Feb 2024, arxiv
Generalizing Conversational Dense Retrieval via LLM-Cognition Data Augmentation, Feb 2024, arxiv
History-Aware Conversational Dense Retrieval, Jan 24, arxiv
Large Language Models Know Your Contextual Search Intent: A Prompting Framework for Conversational Search, Findings of EMNLP 2023
Improving Conversational Passage Re-ranking with View Ensemble, SIGIR 23 Short paper
ConvGQR: Generative Query Reformulation for Conversational Search, May 2023, arxiv ACL 2023
Phrase Retrieval for Open-Domain Conversational Question Answering with Conversational Dependency Modeling via Contrastive Learning, arxiv Findings of ACL 2023
Curriculum Contrastive Context Denoising for Few-shot Conversational Dense Retrieval, SIGIR 2022
Open-Retrieval Conversational Question Answering, SIGIR 2020
Search Assistance

autocomplete/autosuggest and other search assistance tasks, search clarification, query recommendation and other techniques guiding users in search
Evaluating auto-complete ranking for diversity and relevance, ECIR 2025
Enhancing Discoverability in Enterprise Conversational Systems with Proactive Question Suggestions, Dec 2024, arxiv
DiAL: Diversity aware listwise ranking for query auto-complete, EMNLP 2024
Evaluation and Continual Improvement for an Enterprise AI Assistant, Jun 2024, arxiv
Generating Query Recommendations via LLMs, May 2024, arxiv
Towards Asking Clarification Questions for Information Seeking on Task-Oriented Dialogues, May 2023, axiv
Asking Clarification Questions to Handle Ambiguity in Open-Domain QA, May 2023, arxiv
Asking Clarifying Questions in Open-Domain Information-Seeking Conversations, SIGIR 2019
Multi Turn
MTRAG: A Multi-Turn Conversational Benchmark for Evaluating Retrieval-Augmented Generation Systems, Jan 2025, arxiv
Task Solving
Investigating Users’ Search Behavior and Outcome with ChatGPT in Learning-oriented Search Tasks, SIGIR 2024
Mind2Web: Towards a Generalist Agent for the Web, NeurIPS 2023
Personalization
Can Large Language Models Understand Preferences in Personalized Recommendation?, Jan 2025, arxiv
Unified Embedding Based Personalized Retrieval in Etsy Search, Sep 2024, arxiv
IntentRec: Predicting User Session Intent with Hierarchical Multi-Task Learning, Jul 2024 Netflix, arxiv
LLM-based Medical Assistant Personalization with Short- and Long-Term Memory Coordination, NAACL 2024
Multi modal
Cross-Modal Retrieval: A Systematic Review of Methods and Future Directions, Jan 2025, IEEEhttps://ieeexplore.ieee.org/abstract/document/10843094?casa_token=oXnLMUJ8EaoAAAAA:bLPPXHI2Sypz5wdjPLTZG965RDQ0jbp6lwbfKi2U3n70i3RWqwBUjHRmxriYp5H2InizkfA40sRs
RAMQA: A Unified Framework for Retrieval-Augmented Multi-Modal Question Answering, Jan 2025, arxiv
EA-VTR: Event-Aware Video-Text Retrieval, ECCV 2024, ECCV 2024
ColPali: Efficient Document Retrieval with Vision Language Models, Jun 2024, arxiv useful practical info vespa blog
UrbanCross: Enhancing Satellite Image-Text Retrieval with Cross-Domain Adaptation, MM 2024
Generative Cross-Modal Retrieval: Memorizing Images in Multimodal Language Models for Retrieval and Beyond, Feb 2024, arxiv
Listen, Think, and Understand, OpenAQA dataset, May 2023, arxiv
Clotho-AQA: A Crowdsourced Dataset for Audio Question Answering, Apr 2022, arxiv
Question Answering
CoReQA: Uncovering Potentials of Language Models in Code Repository Question Answering, Jan 2025, arxiv
Unveiling the power of language models in chemical research question answering, Jan 2025, Nature
Toward expert-level medical question answering with large language models, Jan 2025, Nature
LLM-MedQA: Enhancing Medical Question Answering through Case Studies in Large Language Models, Jan 2025, arxiv
Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation, Jan 2025, arxiv
Assessing The Potential Of Mid-Sized Language Models For Clinical QA, apr 2024, arxiv
Listen, Think, and Understand, OpenAQA dataset, May 2023, arxiv
Clotho-AQA: A Crowdsourced Dataset for Audio Question Answering, Apr 2022, arxiv
Querying Structured Information
Querying Databases with Function Calling, Jan 2025, arxiv
RAG
A Survey on Knowledge-Oriented Retrieval-Augmented Generation, Mar 2025, arxiv
Sufficient Context: A New Lens on Retrieval Augmented Generation Systems, Google Research, ICLR 2025, Google Research
Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG, Jan 2025, arxiv
In Defense of RAG in the Era of Long-Context Language Models, Sep 2024, arxiv
RAFT: Adapting Language Model to Domain Specific RAG, Jul 2024, open review
RAGAs: Automated Evaluation of Retrieval Augmented Generation, EACL Demo 2024
Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?, Jun 2024, arxiv
RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement, Dec 2024, arxiv
A Survey on Retrieval-Augmented Text Generation for Large Language Models, Apr 2024, arxiv
RQ-RAG: Learning to Refine Queries for Retrieval Augmented Generation, Mar 2024, arxiv
Retrieval
DRAMA: Diverse Augmentation from Large Language Models to Smaller Dense Retrievers, Meta, univ of Waterloo, Feb 2025, arxiv
CAME: Competitively Learning a Mixture-of-Experts Model for First-stage Retrieval, Jan 2025, ACM
On the Robustness of Generative Information Retrieval Models: An Out-of-Distribution Perspective, Jan 2025, link
Fine-Tuning LLaMA for Multi-Stage Text Retrieval, Oct 2023, arxiv
How Does Generative Retrieval Scale to Millions of Passages?, Google Research, May 2023 arxiv
How to Make Cross Encoder a Good Teacher for Efficient Image-Text Retrieval?, July 2024, arxiv
Ranking for Search
Cross-Encoder Rediscovers a Semantic Variant of BM25, Feb 2025, arxiv
Orbit: A framework for designing and evaluating multi-objective rankers, ACM conf on intelligence user interfaces 2025
Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation, Mar 2025, arxiv
DISKCO: Disentangling knowledge from cross-encoder to bi-encoder, WWW 2024
Adaptive Neural Ranking Framework: Toward Maximized Business Goal for Cascade Ranking Systems, WWW 2024
RankTower: A Synergistic Framework for Enhancing Two-Tower Pre-Ranking Model, Jul 2024, arxiv
Bi-CAT: Improving robustness of LLM-based text rankers to conditional distribution shifts, Amazon Science, WWW 2024 workshop
Fine-Tuning LLaMA for Multi-Stage Text Retrieval, SIGIR 2024
RankZephyr: Effective and Robust Zero-Shot Listwise Reranking is a Breeze!, Dec 2023
Classical bi-encoder and cross encoder ranking,

bert based ranking, hybrid encoder based ranking
A Thorough Comparison of Cross-Encoders and LLMs for Reranking SPLADE, Mar 2024
Rankt5: Fine-tuning t5 for text ranking with ranking losses, 2023, SIGIR 2023
ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction, 2022, arxiv
Pretrained Transformers for Text Ranking: BERT and Beyond, 2021, ACM
Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering, 2021, arxiv
ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT, 2020, arxiv
Dense Passage Retrieval for Open-Domain Question Answering, 2020, arxiv
Understanding the Behaviors of BERT in Ranking, 2019, arxiv
Passage Re-ranking with BERT, 2019, arxiv
Query Understanding
Two Heads Are Better Than One: Improving Search Effectiveness Through LLM-Generated Query Variants, 2025, RMIT University
Large Language Model based Long-tail Query Rewriting in Taobao Search, WWW 2024
Near-duplicate question detection, WWW 2024
Hierarchical query classification in e-commerce search, WWW 2024
Query Understanding in the Age of Large Language Models, Jun 2023, arxiv
Decomposing Complex Queries for Tip-of-the-tongue Retrieval, May 2023, arxiv
ConvGQR: Generative Query Reformulation for Conversational Search, May 2023, arxiv
Query Rewriting in Retrieval-Augmented Large Language Models, EMNLP 2023
Query2doc: Query Expansion with Large Language Models, Mar 2023, arxiv
Few-Shot Generative Conversational Query Rewriting, SIGIR 2020
Embedding models
Granite Embedding Models (multi-lingual embedding models from IBM), Feb 2025 arxiv
mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data, Feb 2025, arxiv
SFR-Embedding from Salesforce in Salesforce blog Oct 2024
BGE-en-ICL, BGE-ICL embedding model, Making Text Embedders Few-Shot Learners, Sep 2024, arxiv
Multilingual E5 Text Embeddings: A Technical Report, Feb 2024, arxiv
NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models, May 2024, from Nvidia arxiv
BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation, Feb 2024, arxiv
E5-Mistral embeddings from Microsoft in Improving Text Embeddings with Large Language Models Dec 2023
Embedding models evaluation
The Bitter Lesson Learned from 2,000+ Multilingual Benchmarks, Apr 2025, arxiv
MMTEB: Massive Multilingual Text Embedding Benchmark, Feb 2025, arxiv
MTEB: Massive Text Embedding Benchmark Oct 2022 arxiv Leaderboard
Marqo embedding benchmark for eCommerce at Huggingface, text to image and category to image tasks
The Scandinavian Embedding Benchmarks: Comprehensive Assessment of Multilingual and Monolingual Text Embedding, openreview pdf
MMTEB: Community driven extension to MTEB repository
Chinese MTEB C-MTEB repository
French MTEB repository
Document understanding
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion, Mar 2025, arxiv
Qwen2.5-VL Technical Report, see, 3.3.2 Document Understanding and OCR at Feb 2025 arxiv
ColPali: Efficient Document Retrieval with Vision Language Models, Jun 2024, arxiv
Response Generation
Neural headline generation: A comprehensive survey, Mar 2025, Neurocomputing
Cite Before You Speak: Enhancing Context-Response Grounding in E-commerce Conversational LLM-Agents, Mar 2025, arxiv
Agentic Search
Open Deep Search: Democratizing Search with Open-source Reasoning Agents, Mar 2025, arxiv
A Survey of Large Language Model Empowered Agents for Recommendation and Search: Towards Next-Generation Information Retrieval, Mar 2025, arxiv
Search-o1: Agentic Search-Enhanced Large Reasoning Models, Jan 2025, arxiv
Plan*RAG: Efficient Test-Time Planning for Retrieval Augmented Generation, Oct 2024, arxiv
MindSearch: Mimicking Human Minds Elicits Deep AI Searcher, Jul 2024, arxiv
Recommender Engines
EAGER-LLM: Enhancing Large Language Models as Recommenders through Exogenous Behavior-Semantic Integration, Feb 2025, arxiv
360Brew: A Decoder-only Foundation Model for Personalized Ranking and Recommendation, Jan 2025, arxiv
Sparse Meets Dense: Unified Generative Recommendations with Cascaded Sparse-Dense Representations, Baidu, Mar 2025, arxiv
Personalised outfit recommendation via history-aware transformers, Amazon Science, WSDM 2025
Representation Learning with Large Language Models for Recommendation, WWW 2024
Llmrec: Large language models with graph augmentation for recommendation, WSDM 2024
Improved Estimation of Ranks for Learning Item Recommenders with Negative Sampling, Google CIKM 2024
Recommendation as Instruction Following: A Large Language Model Empowered Recommendation Approach, May 2023, arxiv
Towards Open-World Recommendation with Knowledge Augmentation from Large Language Models, RecSys 2024
Data-efficient Fine-tuning for LLM-based Recommendation, SIGIR 2024
LLMRec: Large Language Models with Graph Augmentation for Recommendation, WSDM 2024
DiffKG: Knowledge Graph Diffusion Model for Recommendation, WSDM 2024
Bridging Language and Items for Retrieval and Recommendation, Mar 2024 arxiv BLAIR paper
Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations, Feb 2024, arxiv
Trinity: Syncretizing Multi-/Long-tail/Long-term Interests All in One, BydeDance , Feb 2024, arxiv
Tapping the Potential of Large Language Models as Recommender Systems: A Comprehensive Framework and Empirical Analysis, Jan 2024 arxiv
Leveraging Large Language Models for Sequential Recommendation, RecSys 2023
Text Is All You Need: Learning Language Representations for Sequential Recommendation, May 2023, arxiv
On the Factory Floor: ML Engineering for Industrial-Scale Ads Recommendation Models, Sep 2022, arxiv
Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5), Mar 2022, arxiv
Augmenting Netflix Search with In-Session Adapted Recommendations, RecSys 2022
Sampling-Bias-Corrected Neural Modeling for Large Corpus Item Recommendations, Google RecSys 2019
Sequential Recommendation
Unsupervised Graph Embeddings for Session-based Recommendation with Item Features, Feb 2025, arxiv
TagRec: Temporal-Aware Graph Contrastive Learning with Theoretical Augmentation for Sequential Recommendation, IEEE KDE 2025, IEEE KDE
LLMCDSR: Enhancing Cross-Domain Sequential Recommendation with Large Language Models, Large Language Models Cross-Domain Sequential Recommendation, ACM Transaction on Information Systems 2025
Plug-In Diffusion Model for Sequential Recommendation, AAAI AI 2024
Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations, Feb 2024, arxiv
EAGER: Two-Stream Generative Recommender with Behavior-Semantic Collaboration, (EAGER, a two-strEAm GEnerative Recommender ) KDD 2024, KDD 2024 arxiv
Mamba4Rec: Towards Efficient Sequential Recommendation with Selective State Space Models, Mar 2024, arxiv
Leveraging Large Language Models for Sequential Recommendation, RecSys 2023
Recommender Systems with Generative Retrieval, (Transformer Index for GEnerative Recommenders TIGER) NeurIPS 2023, NeurIPS 2023
Efficient On-Device Session-Based Recommendation, ACM Transaction on Information Systems 2023
XLNet4Rec: Recommendations Based on Users’ Long-Term and Short-Term Interests Using Transformer, ICMLA 2023
How to Index Item IDs for Recommendation Foundation Models, P5, SIGIR 2023, SIGIR 2023
Text Is All You Need: Learning Language Representations for Sequential Recommendation, May 2023, arxiv
Multi-Behavior Sequential Transformer Recommender, SIGIR 2024, arxiv
Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5), RecSys 2022
Transformers4Rec: Bridging the Gap between NLP and Sequential / Session-Based Recommendation, RecSys 2021
BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer, CIKM 2019, CIKM 2019
Self-Attentive Sequential Recommendation, (SAS4REC), 2018, IEEE Explore
Session-based Recommendations with Recurrent Neural Networks, (GRU4EC) 2015, arxiv
Discovery
LLMs for User Interest Exploration in Large-scale Recommendation Systems, Generative AI and Recommender Systems Workshop at KDD 2024, work by Google, pdf
Unclassified

methods (unclassified. TODO classify). methods used in search engines
Translational Generative Retrieval via Potential Query Generation, ICASSP 2025
RouteLLM: Learning to Route LLMs with Preference Data, Jun 2024, arxiv
INTERS: Unlocking the Power of Large Language Models in Search with Instruction Tuning, Jan 2024, arxiv
A Comprehensive Study of Knowledge Editing for Large Language Models, Jan 2024, arxiv
Zero-shot Document Retrieval with Hybrid Pseudo-document Retriever, ICASSP 2025
Recommendation as Instruction Following: A Large Language Model Empowered Recommendation Approach, Dec 2024, ACM
Representation Learning with Large Language Models for Recommendation, WWW 2024
Recommender Rankers
Large Language Models are Zero-Shot Rankers for Recommender Systems, Mar 2024, LLMRank, Springer
Industrial approaches
ZeroEntropy 🔎 - Advanced AI Search Over Complex Documents launch doc
Evaluation of Search engines
Rankers, Judges, and Assistants: Towards Understanding the Interplay of LLMs in Information Retrieval Evaluation, DeepMind, Mar 2025, arxiv
LLM-Assisted Relevance Assessments: When Should We Ask LLMs for Help?, Jan 2025, arxiv
AI Search Has A Citation Problem, Mar 2025, CJR Columbia Journalism Review
Large Language Models for Relevance Judgment in Product Search, Jul 2024, arxiv
STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases, Apr 2024, arxiv
Report on the 1st Workshop on Large Language Model for Evaluation in Information Retrieval (LLM4Eval 2024) at SIGIR 2024, SIGIR
Evaluation of RAG

and Question Answering and knowledge assistants and information seeking LLM based systems
MMTEB: Massive Multilingual Text Embedding Benchmark, Feb 2025, hugging face, leaderboard Brief: 1043 languages in total, primarily in Bitext mining (text pairing), but also 255 in classification, 209 in clustering, and 142 in Retrieval., 550 tasks, anything from sentiment analysis, question-answering reranking, to long-document retrieval. 17 domains, like legal, religious, programming, web, social, medical, blog, academic, etc. Across this collection of tasks, we subdivide into a lot of separate benchmarks, like MTEB(eng, v2), MTEB(Multilingual, v1), MTEB(Law, v1). Our new MTEB(eng, v2) is much smaller and faster than the original English MTEB, making submissions much cheaper and simpler. from Tom Aarsen’s linkedin
MTRAG: A Multi-Turn Conversational Benchmark for Evaluating Retrieval-Augmented Generation Systems, Jan 2025, arxiv
RAD-Bench: Evaluating Large Language Models Capabilities in Retrieval Augmented Dialogues, Sep 2024, arrxiv
IRSC: A Zero-shot Evaluation Benchmark for Information Retrieval through Semantic Comprehension in Retrieval-Augmented Generation Scenarios, Sep 2024, arxiv
Evaluating Retrieval Quality in Retrieval-Augmented Generation, Apr 2024, arxiv
RAGAS: Automated Evaluation of Retrieval Augmented Generation Jul 23, arxiv
ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems Nov 23, arxiv
TREC iKAT 2023: A Test Collection for Evaluating Conversational and Interactive Knowledge Assistants, SIGIR 2024
MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries, Jan 2024, arxiv
FaithDial: A Faithful Benchmark for Information-Seeking Dialogue , Dec 2022, MIT Press
Open-Retrieval Conversational Question Answering, SIGIR 2020
XOR QA: Cross-lingual Open-Retrieval Question Answering, Oct 2020, arxiv
QA Benchmarks

QA is used in many vertical domains, see Vertical section bellow
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines, Mar 2025, arxiv
CoReQA: Uncovering Potentials of Language Models in Code Repository Question Answering, Jan 2025, arxiv
Unveiling the power of language models in chemical research question answering, Jan 2025, Nature, communication chemistry ScholarChemQA Dataset
Search Engines in an AI Era: The False Promise of Factual and Verifiable Source-Cited Responses, Oct 2024, Salesforce, arxiv Answer Engine (RAG) Evaluation Repository
HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly, Oct 2024, arxiv
Introducing SimpleQA, OpenAI, Oct 2024 OpenAI
NovelQA: A Benchmark for Long-Range Novel Question Answering, Mar 2024, arxiv
NovelQA: Benchmarking Question Answering on Documents Exceeding 200K Tokens, Mar 2024, arxiv
Are Large Language Models Consistent over Value-laden Questions?, Jul 2024, arxiv
LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding, Aug 2023, arxiv
L-Eval: Instituting Standardized Evaluation for Long Context Language Models, Jul 2023. arxiv
A Dataset of Information-Seeking Questions and Answers Anchored in Research Papers, QASPER, May 2021, arxiv
MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents, EMNLP 2021, ACL
CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge, Jun 2019, ACL
Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering, Sep 2018, arxiv OpenBookQA dataset at AllenAI
Jin, Di, et al. “What Disease does this Patient Have? A Large-scale Open Domain Question Answering Dataset from Medical Exams., 2020, arxiv MedQA
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge, 2018, arxiv ARC Easy dataset ARC dataset
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions, 2019, arxiv BoolQ dataset
BookQA: Stories of Challenges and Opportunities, Oct 2019, arxiv
HellaSwag, HellaSwag: Can a Machine Really Finish Your Sentence? 2019, arxiv Paper + code + dataset https://rowanzellers.com/hellaswag/
PIQA: Reasoning about Physical Commonsense in Natural Language, Nov 2019, arxiv PIQA dataset
Crowdsourcing Multiple Choice Science Questions arxiv SciQ dataset
The NarrativeQA Reading Comprehension Challenge, Dec 2017, arxiv dataset at deepmind
WinoGrande: An Adversarial Winograd Schema Challenge at Scale, 2017, arxiv Winogrande dataset
TruthfulQA: Measuring How Models Mimic Human Falsehoods, Sep 2021, arxiv
TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages, 2020, arxiv data
Natural Questions: A Benchmark for Question Answering Research, Transactions ACL 2019
Blog posts, whitepapers
Perplexity, firmly Build Merchant Network to Power GenAI Commerce, Mar 2025, press release
Adobe Analytics: Traffic to U.S. retail websites from Generative AI sources jumps 1,200 percent, Mar 2025, adobe
Foundation Model for Personalized Recommendation by Netflix, Mar 2025, Netflix blog
Improving Recommendation Systems & Search in the Age of LLMs by Eugene Neyan, Mar 2025, blog post
A Coding Implementation to Build a Conversational Research Assistant with FAISS, Langchain, Pypdf, and TinyLlama-1.1B-Chat-v1.0, Mar 2025, marktechpost
Investigating ChatGPT Search: Insights from 80 Million Clickstream Records, Feb 2025, SemRush blog
Query Expansion with LLMs: Searching Better by Saying More, Feb 2025, Jina ai
Transformers in music recommendation, Google on how transformers are used for music recommendation at youtube, google research blog
Scaling the Instagram Explore Recommendations Systems, Meta 08 2023
PDF Retrieval with Vision Language Models, about ColPali and using it for document search from Vespa
Evaluating search relevance part 2 - Phi-3 as relevance judge, a series of articles from ElasticSearch, practical experience on using Phi-3 llm family for relevance evaluation elasticsearch
Verticals

Product Search
Generative Retrieval and Alignment Model: A New Paradigm for E-commerce Retrieval, apr 2025, arxiv
Personalised outfit recommendation via history-aware transformers, Amazon Science, WSDM 2025
Cite Before You Speak: Enhancing Context-Response Grounding in E-commerce Conversational LLM-Agents, Mar 2025, arxiv
Automated Query-Product Relevance Labeling using Large Language Models for E-commerce Search, Feb 2025, arxiv
Behavior Modeling Space Reconstruction for E-Commerce Search, Jan 2025, arxiv
Enhancing Relevance of Embedding-based Retrieval at Walmart, Oct 2024, CIKM 2024
Towards translating objective product attributes into customer language, Amazon Science
Manipulating Large Language Models to Increase Product Visibility, Sep 2024, arxiv
Hierarchical query classification in e-commerce search, WWW 2024
An interpretable ensemble of graph and language models for improving search relevance in e-commerce, WWW 2024
Web-Scale Semantic Product Search with Large Language Models, May 2023, KDDM 2023
Web-scale semantic product search with large language models, Amazon Science, PAKDD 2023
Behavior-driven query similarity prediction based on pre-trained language models for e-commerce search, Amazon Science, SIGIR 2023 eCommerce workshop
Rethinking E-Commerce Search, Instacart, 2023, arxiv
Overview of the TREC 2023 Product Product Search Track, TREC 2023
Location Aware (Maps, real estate, local. travel)
Learning to Rank for Maps at Airbnb, KDD 2024
Transforming Location Retrieval at Airbnb: A Journey from Heuristics to Reinforcement Learning, CIKM 2024
Optimizing Airbnb Search Journey with Multi-task Learning SIGKDD 2023
Learning To Rank Diversely At Airbnb, CIKM 2023
Improving Deep Learning for Airbnb Search, KDD 2020
Real-time Personalization using Embeddings for Search Ranking at Airbnb, KDD 2018
Ads / advertisement
Set-based state estimation of nonlinear discrete-time systems using constrained zonotopes and polyhedral relaxations, Mar 2025, arxiv
Semantic Ads Retrieval at Walmart eCommerce with Language Models Progressively Trained on Multiple Knowledge Domains, Mar 2025, arxiv
Applying Deep Learning to Ads Conversion Prediction in Last Mile Delivery Marketplace, Feb 2025, DoorDash, arxiv
Scaling Laws for Online Advertisement Retrieval, Nov 2024, arxiv
Real estate
Beyond Relevance: A Demand Balancer Model for Rental Platforms with Single-Unit Inventory, WSDM 2025
Healthcare
MedExpQA: Multilingual benchmarking of Large Language Models for Medical Question Answering, Sep 2024, AI in Medicine
Better to Ask in English: Cross-Lingual Evaluation of Large Language Models for Healthcare Queries, WWW 2024
JMLR: Joint Medical LLM and Retrieval Training for Enhancing Reasoning and Professional Question Answering Capability, Jun 2024, arxiv
BioMistral: A Collection of Open-Source Pretrained Large Language Models for Medical Domains, Feb 2024, arxiv
DISC-MedLLM: Bridging General Large Language Models and Real-World Medical Consultation, Aug 2023, arxiv
Clinical Camel: An Open Expert-Level Medical Language Model with Dialogue-Based Knowledge Encoding, May 2023, arxiv
MedAlpaca – An Open-Source Collection of Medical Conversational AI Models and Training Data, Apr 2023, arxiv
Science
PaSa: An LLM Agent for Comprehensive Academic Paper Search, Jan 2025, arxiv
Finance

Legal
Conversational vs Traditional: Comparing Search Behavior and Outcome in Legal Case Retrieval, SIGIR 21 short paper
Search Engine Optimization

Search Engine Optimization, adversial
Dynamics of Adversarial Attacks on Large Language Model-Based Search Engines, Jan 2025, arxiv
Adversarial Search Engine Optimization for Large Language Models, Jul 2024, arxiv
Stealthy Attack on Large Language Model based Recommendation, Feb 2024, arxiv

LLMSearchRecommender

Awesome Generative AI in Search, Recommendation, Personalization

Generative AI and LLMs for Search, Recommender, Personalization Engines

Search Surveys

Recommender Engine Surveys

Conferences, Workshops

Industrial conferences

Tutorials

Software, libraries, frameworks

FreshLLM and similar architectures (LLM and large scale search)

Conversational Search

Search Assistance

Multi Turn

Task Solving

Personalization

Multi modal

Question Answering

Querying Structured Information

RAG

Retrieval

Ranking for Search

Classical bi-encoder and cross encoder ranking,

Query Understanding

Embedding models

Embedding models evaluation

Document understanding

Response Generation

Agentic Search

Recommender Engines

Sequential Recommendation

Discovery

Unclassified

Recommender Rankers

Industrial approaches

Evaluation of Search engines

Evaluation of RAG

QA Benchmarks

Blog posts, whitepapers

Verticals

Product Search

Location Aware (Maps, real estate, local. travel)

Ads / advertisement

Real estate

Healthcare

Science

Finance

Legal

Search Engine Optimization