Awesome Generative AI in Search, Recommendation, Personalization
Generative AI and LLMs for Search, Recommender, Personalization Engines
The goal of this repository is to survey and review generative AI and LLM-based methods for building large-scale search and recommender engines.
see also LLM Evaluation methods repository
Search Surveys
- A Survey of Large Language Model Empowered Agents for Recommendation and Search: Towards Next-Generation Information Retrieval, Mar 2025, arxiv
- A Survey on Knowledge-Oriented Retrieval-Augmented Generation, Mar 2025, arxiv
- From Matching to Generation: A Survey on Generative Information Retrieval, Feb 2025, Journal Version, ACM Transaction on Information Systems, Feb 2025
- Cross-Modal Retrieval: A Systematic Review of Methods and Future Directions, Jan 2025, IEEE
- Improving Recommendation Systems & Search in the Age of LLMs by Eugene Neyan, Mar 2025, blog post
- A Survey of Model Architectures in Information Retrieval, Jan 2025, arxiv
- A Survey of Conversational Search, Oct 2024, arxiv
- Large language models for generative information extraction: a survey, 2024, Front Comp Sci
- From Matching to Generation: A Survey on Generative Information Retrieval, Apr 2024, arxiv
- Dense Text Retrieval Based on Pretrained Language Models: A Survey, Feb 2024, ACM
- Retrieval-Augmented Generation for Large Language Models: A Survey, 2023, simg
- Large Language Models for Information Retrieval: A Survey, Aug 2023, arxiv
Recommender Engine Surveys
- A Comprehensive Survey on Cross-Domain Recommendation: Taxonomy, Progress, and Prospects, Mar 2025. arxiv
- A Survey on LLM-powered Agents for Recommender Systems, Feb 2025, arxiv: RE: based on DeepSeek-R methods for training reasoning and interleaved LLMs calling search as a tool.
- How Can Recommender Systems Benefit from Large Language Models: A Survey, ACM Transactions on Information Systems 2025
- Graph Foundation Models for Recommendation: A Comprehensive Survey, Feb 2025, arxiv
- Recommender Systems in the Era of Large Language Models (LLMs), TKDE Nov 2024 by subscription
- Multimodal Pretraining, Adaptation, and Generation for Recommendation: A Survey, Jul 2024, arxiv
- A Review of Modern Recommender Systems Using Generative Models (Gen-RecSys), KDD 2024 pdf
- A Comprehensive Survey on Retrieval Methods in Recommender Systems, Jul 2024, arxiv
- A Survey of Generative Search and Recommendation in the Era of Large Language Models, Apr 2024, arxiv
- A survey on large language models for recommendation, WWW 2024 Springer
- Towards Next-Generation LLM-based Recommender Systems: A Survey and Beyond, Oct 2024, arxiv
- Exploring the Impact of Large Language Models on Recommender Systems: An Extensive Review, Feb 2024, arxiv
- Pre-train, Prompt, and Recommendation: A Comprehensive Survey of Language Modeling Paradigm Adaptations in Recommender Systems , Dec 2023, MIT
Conferences, Workshops
- 2025 SIGIR Workshop on eCommerce
- RecSys
- KDD 2024’ Workshop on Generative AI for Recommender Systems and Personalization
- CIKM 2024 1st Workshop on Multimodal Search and Recommendations
- SIGIR 2024 Workshop on eCommerce ECOM24
- WWW 2024 The 2nd Workshop on Recommendation with Generative Models
- EACL 2024 Workshop on Personalization of Generative AI Systems
- SIGIR 2024 The First Workshop on Large Language Models (LLMs) for Evaluation in Information Retrieval
- SIGIR 2024 The Second Workshop on Generative Information Retrieval
Industrial conferences
- Haystack Haystack
- Activate Activate
Tutorials
Software, libraries, frameworks
- Nvidia Merlin Recommender systems, including Transformer4Rec
- OpenP5 RecSys23 tutorial
FreshLLM and similar architectures (LLM and large scale search)
- ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning, Mar 2025, arxiv
- Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning, Mar 2025, arxiv
- When Search Engine Services Meet Large Language Models: Visions and Challenges, Dec 2024, IEEE
- Long-form factuality in large language models, Mar 2024, arxiv
- When to Retrieve: Teaching LLMs to Utilize Information Retrieval Effectively, Apr 2024, arxiv
- Enhancing Noise Robustness of Retrieval-Augmented Language Models with Adaptive Adversarial Training, May 2024, arxiv
- FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation, Oct 2023. arxiv
- Gorilla: Large Language Model Connected with Massive APIs, May 2023, arxiv
- Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions, Dec 2022, arxiv
Conversational Search
- A Survey of Conversational Search, Oct 2024, arxiv
- Engineering Conversational Search Systems: A Review of Applications, Architectures, and Functional Components, Jul 2024, arxiv
- ChatRetriever: Adapting Large Language Models for Generalized and Robust Conversational Dense Retrieval, Apr 2024, arxiv
- CoSearchAgent: A Lightweight Collaborative Search Agent with Large Language Models, Feb 2024, arxiv
- Generalizing Conversational Dense Retrieval via LLM-Cognition Data Augmentation, Feb 2024, arxiv
- History-Aware Conversational Dense Retrieval, Jan 24, arxiv
- Large Language Models Know Your Contextual Search Intent: A Prompting Framework for Conversational Search, Findings of EMNLP 2023
- Improving Conversational Passage Re-ranking with View Ensemble, SIGIR 23 Short paper
- ConvGQR: Generative Query Reformulation for Conversational Search, May 2023, arxiv ACL 2023
- Phrase Retrieval for Open-Domain Conversational Question Answering with Conversational Dependency Modeling via Contrastive Learning, arxiv Findings of ACL 2023
- Curriculum Contrastive Context Denoising for Few-shot Conversational Dense Retrieval, SIGIR 2022
- Open-Retrieval Conversational Question Answering, SIGIR 2020
Search Assistance
autocomplete/autosuggest and other search assistance tasks, search clarification, query recommendation and other techniques guiding users in search
- Evaluating auto-complete ranking for diversity and relevance, ECIR 2025
- Enhancing Discoverability in Enterprise Conversational Systems with Proactive Question Suggestions, Dec 2024, arxiv
- DiAL: Diversity aware listwise ranking for query auto-complete, EMNLP 2024
- Evaluation and Continual Improvement for an Enterprise AI Assistant, Jun 2024, arxiv
- Generating Query Recommendations via LLMs, May 2024, arxiv
- Towards Asking Clarification Questions for Information Seeking on Task-Oriented Dialogues, May 2023, axiv
- Asking Clarification Questions to Handle Ambiguity in Open-Domain QA, May 2023, arxiv
- Asking Clarifying Questions in Open-Domain Information-Seeking Conversations, SIGIR 2019
Multi Turn
- MTRAG: A Multi-Turn Conversational Benchmark for Evaluating Retrieval-Augmented Generation Systems, Jan 2025, arxiv
Task Solving
- Investigating Users’ Search Behavior and Outcome with ChatGPT in Learning-oriented Search Tasks, SIGIR 2024
- Mind2Web: Towards a Generalist Agent for the Web, NeurIPS 2023
Personalization
- Can Large Language Models Understand Preferences in Personalized Recommendation?, Jan 2025, arxiv
- Unified Embedding Based Personalized Retrieval in Etsy Search, Sep 2024, arxiv
- IntentRec: Predicting User Session Intent with Hierarchical Multi-Task Learning, Jul 2024 Netflix, arxiv
- LLM-based Medical Assistant Personalization with Short- and Long-Term Memory Coordination, NAACL 2024
Multi modal
- Cross-Modal Retrieval: A Systematic Review of Methods and Future Directions, Jan 2025, IEEEhttps://ieeexplore.ieee.org/abstract/document/10843094?casa_token=oXnLMUJ8EaoAAAAA:bLPPXHI2Sypz5wdjPLTZG965RDQ0jbp6lwbfKi2U3n70i3RWqwBUjHRmxriYp5H2InizkfA40sRs
- RAMQA: A Unified Framework for Retrieval-Augmented Multi-Modal Question Answering, Jan 2025, arxiv
- EA-VTR: Event-Aware Video-Text Retrieval, ECCV 2024, ECCV 2024
- ColPali: Efficient Document Retrieval with Vision Language Models, Jun 2024, arxiv useful practical info vespa blog
- UrbanCross: Enhancing Satellite Image-Text Retrieval with Cross-Domain Adaptation, MM 2024
- Generative Cross-Modal Retrieval: Memorizing Images in Multimodal Language Models for Retrieval and Beyond, Feb 2024, arxiv
- Listen, Think, and Understand, OpenAQA dataset, May 2023, arxiv
- Clotho-AQA: A Crowdsourced Dataset for Audio Question Answering, Apr 2022, arxiv
Question Answering
- CoReQA: Uncovering Potentials of Language Models in Code Repository Question Answering, Jan 2025, arxiv
- Unveiling the power of language models in chemical research question answering, Jan 2025, Nature
- Toward expert-level medical question answering with large language models, Jan 2025, Nature
- LLM-MedQA: Enhancing Medical Question Answering through Case Studies in Large Language Models, Jan 2025, arxiv
- Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation, Jan 2025, arxiv
- Assessing The Potential Of Mid-Sized Language Models For Clinical QA, apr 2024, arxiv
- Listen, Think, and Understand, OpenAQA dataset, May 2023, arxiv
- Clotho-AQA: A Crowdsourced Dataset for Audio Question Answering, Apr 2022, arxiv
- Querying Databases with Function Calling, Jan 2025, arxiv
RAG
- A Survey on Knowledge-Oriented Retrieval-Augmented Generation, Mar 2025, arxiv
- Sufficient Context: A New Lens on Retrieval Augmented Generation Systems, Google Research, ICLR 2025, Google Research
- Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG, Jan 2025, arxiv
- In Defense of RAG in the Era of Long-Context Language Models, Sep 2024, arxiv
- RAFT: Adapting Language Model to Domain Specific RAG, Jul 2024, open review
- RAGAs: Automated Evaluation of Retrieval Augmented Generation, EACL Demo 2024
- Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?, Jun 2024, arxiv
- RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement, Dec 2024, arxiv
- A Survey on Retrieval-Augmented Text Generation for Large Language Models, Apr 2024, arxiv
- RQ-RAG: Learning to Refine Queries for Retrieval Augmented Generation, Mar 2024, arxiv
Retrieval
- DRAMA: Diverse Augmentation from Large Language Models to Smaller Dense Retrievers, Meta, univ of Waterloo, Feb 2025, arxiv
- CAME: Competitively Learning a Mixture-of-Experts Model for First-stage Retrieval, Jan 2025, ACM
- On the Robustness of Generative Information Retrieval Models: An Out-of-Distribution Perspective, Jan 2025, link
- Fine-Tuning LLaMA for Multi-Stage Text Retrieval, Oct 2023, arxiv
- How Does Generative Retrieval Scale to Millions of Passages?, Google Research, May 2023 arxiv
- How to Make Cross Encoder a Good Teacher for Efficient Image-Text Retrieval?, July 2024, arxiv
Ranking for Search
- Cross-Encoder Rediscovers a Semantic Variant of BM25, Feb 2025, arxiv
- Orbit: A framework for designing and evaluating multi-objective rankers, ACM conf on intelligence user interfaces 2025
- Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation, Mar 2025, arxiv
- DISKCO: Disentangling knowledge from cross-encoder to bi-encoder, WWW 2024
- Adaptive Neural Ranking Framework: Toward Maximized Business Goal for Cascade Ranking Systems, WWW 2024
- RankTower: A Synergistic Framework for Enhancing Two-Tower Pre-Ranking Model, Jul 2024, arxiv
- Bi-CAT: Improving robustness of LLM-based text rankers to conditional distribution shifts, Amazon Science, WWW 2024 workshop
- Fine-Tuning LLaMA for Multi-Stage Text Retrieval, SIGIR 2024
- RankZephyr: Effective and Robust Zero-Shot Listwise Reranking is a Breeze!, Dec 2023
Classical bi-encoder and cross encoder ranking,
bert based ranking, hybrid encoder based ranking
- A Thorough Comparison of Cross-Encoders and LLMs for Reranking SPLADE, Mar 2024
- Rankt5: Fine-tuning t5 for text ranking with ranking losses, 2023, SIGIR 2023
- ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction, 2022, arxiv
- Pretrained Transformers for Text Ranking: BERT and Beyond, 2021, ACM
- Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering, 2021, arxiv
- ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT, 2020, arxiv
- Dense Passage Retrieval for Open-Domain Question Answering, 2020, arxiv
- Understanding the Behaviors of BERT in Ranking, 2019, arxiv
- Passage Re-ranking with BERT, 2019, arxiv
Query Understanding
- Two Heads Are Better Than One: Improving Search Effectiveness Through LLM-Generated Query Variants, 2025, RMIT University
- Large Language Model based Long-tail Query Rewriting in Taobao Search, WWW 2024
- Near-duplicate question detection, WWW 2024
- Hierarchical query classification in e-commerce search, WWW 2024
- Query Understanding in the Age of Large Language Models, Jun 2023, arxiv
- Decomposing Complex Queries for Tip-of-the-tongue Retrieval, May 2023, arxiv
- ConvGQR: Generative Query Reformulation for Conversational Search, May 2023, arxiv
- Query Rewriting in Retrieval-Augmented Large Language Models, EMNLP 2023
- Query2doc: Query Expansion with Large Language Models, Mar 2023, arxiv
- Few-Shot Generative Conversational Query Rewriting, SIGIR 2020
Embedding models
- Granite Embedding Models (multi-lingual embedding models from IBM), Feb 2025 arxiv
- mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data, Feb 2025, arxiv
- SFR-Embedding from Salesforce in Salesforce blog Oct 2024
- BGE-en-ICL, BGE-ICL embedding model, Making Text Embedders Few-Shot Learners, Sep 2024, arxiv
- Multilingual E5 Text Embeddings: A Technical Report, Feb 2024, arxiv
- NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models, May 2024, from Nvidia arxiv
- BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation, Feb 2024, arxiv
- E5-Mistral embeddings from Microsoft in Improving Text Embeddings with Large Language Models Dec 2023
Embedding models evaluation
- The Bitter Lesson Learned from 2,000+ Multilingual Benchmarks, Apr 2025, arxiv
- MMTEB: Massive Multilingual Text Embedding Benchmark, Feb 2025, arxiv
- MTEB: Massive Text Embedding Benchmark Oct 2022 arxiv Leaderboard
- Marqo embedding benchmark for eCommerce at Huggingface, text to image and category to image tasks
- The Scandinavian Embedding Benchmarks: Comprehensive Assessment of Multilingual and Monolingual Text Embedding, openreview pdf
- MMTEB: Community driven extension to MTEB repository
- Chinese MTEB C-MTEB repository
- French MTEB repository
Document understanding
- SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion, Mar 2025, arxiv
- Qwen2.5-VL Technical Report, see, 3.3.2 Document Understanding and OCR at Feb 2025 arxiv
- ColPali: Efficient Document Retrieval with Vision Language Models, Jun 2024, arxiv
Response Generation
- Neural headline generation: A comprehensive survey, Mar 2025, Neurocomputing
- Cite Before You Speak: Enhancing Context-Response Grounding in E-commerce Conversational LLM-Agents, Mar 2025, arxiv
Agentic Search
- Open Deep Search: Democratizing Search with Open-source Reasoning Agents, Mar 2025, arxiv
- A Survey of Large Language Model Empowered Agents for Recommendation and Search: Towards Next-Generation Information Retrieval, Mar 2025, arxiv
- Search-o1: Agentic Search-Enhanced Large Reasoning Models, Jan 2025, arxiv
- Plan*RAG: Efficient Test-Time Planning for Retrieval Augmented Generation, Oct 2024, arxiv
- MindSearch: Mimicking Human Minds Elicits Deep AI Searcher, Jul 2024, arxiv
Recommender Engines
- EAGER-LLM: Enhancing Large Language Models as Recommenders through Exogenous Behavior-Semantic Integration, Feb 2025, arxiv
- 360Brew: A Decoder-only Foundation Model for Personalized Ranking and Recommendation, Jan 2025, arxiv
- Sparse Meets Dense: Unified Generative Recommendations with Cascaded Sparse-Dense Representations, Baidu, Mar 2025, arxiv
- Personalised outfit recommendation via history-aware transformers, Amazon Science, WSDM 2025
- Representation Learning with Large Language Models for Recommendation, WWW 2024
- Llmrec: Large language models with graph augmentation for recommendation, WSDM 2024
- Improved Estimation of Ranks for Learning Item Recommenders with Negative Sampling, Google CIKM 2024
- Recommendation as Instruction Following: A Large Language Model Empowered Recommendation Approach, May 2023, arxiv
- Towards Open-World Recommendation with Knowledge Augmentation from Large Language Models, RecSys 2024
- Data-efficient Fine-tuning for LLM-based Recommendation, SIGIR 2024
- LLMRec: Large Language Models with Graph Augmentation for Recommendation, WSDM 2024
- DiffKG: Knowledge Graph Diffusion Model for Recommendation, WSDM 2024
- Bridging Language and Items for Retrieval and Recommendation, Mar 2024 arxiv BLAIR paper
- Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations, Feb 2024, arxiv
- Trinity: Syncretizing Multi-/Long-tail/Long-term Interests All in One, BydeDance , Feb 2024, arxiv
- Tapping the Potential of Large Language Models as Recommender Systems: A Comprehensive Framework and Empirical Analysis, Jan 2024 arxiv
- Leveraging Large Language Models for Sequential Recommendation, RecSys 2023
- Text Is All You Need: Learning Language Representations for Sequential Recommendation, May 2023, arxiv
- On the Factory Floor: ML Engineering for Industrial-Scale Ads Recommendation Models, Sep 2022, arxiv
- Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5), Mar 2022, arxiv
- Augmenting Netflix Search with In-Session Adapted Recommendations, RecSys 2022
- Sampling-Bias-Corrected Neural Modeling for Large Corpus Item Recommendations, Google RecSys 2019
Sequential Recommendation
- Unsupervised Graph Embeddings for Session-based Recommendation with Item Features, Feb 2025, arxiv
- TagRec: Temporal-Aware Graph Contrastive Learning with Theoretical Augmentation for Sequential Recommendation, IEEE KDE 2025, IEEE KDE
- LLMCDSR: Enhancing Cross-Domain Sequential Recommendation with Large Language Models, Large Language Models Cross-Domain Sequential Recommendation, ACM Transaction on Information Systems 2025
- Plug-In Diffusion Model for Sequential Recommendation, AAAI AI 2024
- Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations, Feb 2024, arxiv
- EAGER: Two-Stream Generative Recommender with Behavior-Semantic Collaboration, (EAGER, a two-strEAm GEnerative Recommender ) KDD 2024, KDD 2024 arxiv
- Mamba4Rec: Towards Efficient Sequential Recommendation with Selective State Space Models, Mar 2024, arxiv
- Leveraging Large Language Models for Sequential Recommendation, RecSys 2023
- Recommender Systems with Generative Retrieval, (Transformer Index for GEnerative Recommenders TIGER) NeurIPS 2023, NeurIPS 2023
- Efficient On-Device Session-Based Recommendation, ACM Transaction on Information Systems 2023
- XLNet4Rec: Recommendations Based on Users’ Long-Term and Short-Term Interests Using Transformer, ICMLA 2023
- How to Index Item IDs for Recommendation Foundation Models, P5, SIGIR 2023, SIGIR 2023
- Text Is All You Need: Learning Language Representations for Sequential Recommendation, May 2023, arxiv
- Multi-Behavior Sequential Transformer Recommender, SIGIR 2024, arxiv
- Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5), RecSys 2022
- Transformers4Rec: Bridging the Gap between NLP and Sequential / Session-Based Recommendation, RecSys 2021
- BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer, CIKM 2019, CIKM 2019
- Self-Attentive Sequential Recommendation, (SAS4REC), 2018, IEEE Explore
- Session-based Recommendations with Recurrent Neural Networks, (GRU4EC) 2015, arxiv
Discovery
- LLMs for User Interest Exploration in Large-scale Recommendation Systems, Generative AI and Recommender Systems Workshop at KDD 2024, work by Google, pdf
Unclassified
methods (unclassified. TODO classify). methods used in search engines
- Translational Generative Retrieval via Potential Query Generation, ICASSP 2025
- RouteLLM: Learning to Route LLMs with Preference Data, Jun 2024, arxiv
- INTERS: Unlocking the Power of Large Language Models in Search with Instruction Tuning, Jan 2024, arxiv
- A Comprehensive Study of Knowledge Editing for Large Language Models, Jan 2024, arxiv
- Zero-shot Document Retrieval with Hybrid Pseudo-document Retriever, ICASSP 2025
- Recommendation as Instruction Following: A Large Language Model Empowered Recommendation Approach, Dec 2024, ACM
- Representation Learning with Large Language Models for Recommendation, WWW 2024
Recommender Rankers
- Large Language Models are Zero-Shot Rankers for Recommender Systems, Mar 2024, LLMRank, Springer
Industrial approaches
- ZeroEntropy 🔎 - Advanced AI Search Over Complex Documents launch doc
Evaluation of Search engines
- Rankers, Judges, and Assistants: Towards Understanding the Interplay of LLMs in Information Retrieval Evaluation, DeepMind, Mar 2025, arxiv
- LLM-Assisted Relevance Assessments: When Should We Ask LLMs for Help?, Jan 2025, arxiv
- AI Search Has A Citation Problem, Mar 2025, CJR Columbia Journalism Review
- Large Language Models for Relevance Judgment in Product Search, Jul 2024, arxiv
- STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases, Apr 2024, arxiv
- Report on the 1st Workshop on Large Language Model for Evaluation in Information Retrieval (LLM4Eval 2024) at SIGIR 2024, SIGIR
Evaluation of RAG
and Question Answering
and knowledge assistants and information seeking LLM based systems
- MMTEB: Massive Multilingual Text Embedding Benchmark, Feb 2025, hugging face, leaderboard Brief: 1043 languages in total, primarily in Bitext mining (text pairing), but also 255 in classification, 209 in clustering, and 142 in Retrieval., 550 tasks, anything from sentiment analysis, question-answering reranking, to long-document retrieval. 17 domains, like legal, religious, programming, web, social, medical, blog, academic, etc. Across this collection of tasks, we subdivide into a lot of separate benchmarks, like MTEB(eng, v2), MTEB(Multilingual, v1), MTEB(Law, v1). Our new MTEB(eng, v2) is much smaller and faster than the original English MTEB, making submissions much cheaper and simpler. from Tom Aarsen’s linkedin
- MTRAG: A Multi-Turn Conversational Benchmark for Evaluating Retrieval-Augmented Generation Systems, Jan 2025, arxiv
- RAD-Bench: Evaluating Large Language Models Capabilities in Retrieval Augmented Dialogues, Sep 2024, arrxiv
- IRSC: A Zero-shot Evaluation Benchmark for Information Retrieval through Semantic Comprehension in Retrieval-Augmented Generation Scenarios, Sep 2024, arxiv
- Evaluating Retrieval Quality in Retrieval-Augmented Generation, Apr 2024, arxiv
- RAGAS: Automated Evaluation of Retrieval Augmented Generation Jul 23, arxiv
- ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems Nov 23, arxiv
- TREC iKAT 2023: A Test Collection for Evaluating Conversational and Interactive Knowledge Assistants, SIGIR 2024
- MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries, Jan 2024, arxiv
- FaithDial: A Faithful Benchmark for Information-Seeking Dialogue , Dec 2022, MIT Press
- Open-Retrieval Conversational Question Answering, SIGIR 2020
- XOR QA: Cross-lingual Open-Retrieval Question Answering, Oct 2020, arxiv
QA Benchmarks
QA is used in many vertical domains, see Vertical section bellow
- SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines, Mar 2025, arxiv
- CoReQA: Uncovering Potentials of Language Models in Code Repository Question Answering, Jan 2025, arxiv
- Unveiling the power of language models in chemical research question answering, Jan 2025, Nature, communication chemistry ScholarChemQA Dataset
- Search Engines in an AI Era: The False Promise of Factual and Verifiable Source-Cited Responses, Oct 2024, Salesforce, arxiv Answer Engine (RAG) Evaluation Repository
- HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly, Oct 2024, arxiv
- Introducing SimpleQA, OpenAI, Oct 2024 OpenAI
- NovelQA: A Benchmark for Long-Range Novel Question Answering, Mar 2024, arxiv
- NovelQA: Benchmarking Question Answering on Documents Exceeding 200K Tokens, Mar 2024, arxiv
- Are Large Language Models Consistent over Value-laden Questions?, Jul 2024, arxiv
- LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding, Aug 2023, arxiv
- L-Eval: Instituting Standardized Evaluation for Long Context Language Models, Jul 2023. arxiv
- A Dataset of Information-Seeking Questions and Answers Anchored in Research Papers, QASPER, May 2021, arxiv
- MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents, EMNLP 2021, ACL
- CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge, Jun 2019, ACL
- Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering, Sep 2018, arxiv OpenBookQA dataset at AllenAI
- Jin, Di, et al. “What Disease does this Patient Have? A Large-scale Open Domain Question Answering Dataset from Medical Exams., 2020, arxiv MedQA
- Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge, 2018, arxiv ARC Easy dataset ARC dataset
- BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions, 2019, arxiv BoolQ dataset
- BookQA: Stories of Challenges and Opportunities, Oct 2019, arxiv
- HellaSwag, HellaSwag: Can a Machine Really Finish Your Sentence? 2019, arxiv Paper + code + dataset https://rowanzellers.com/hellaswag/
- PIQA: Reasoning about Physical Commonsense in Natural Language, Nov 2019, arxiv
PIQA dataset
- Crowdsourcing Multiple Choice Science Questions arxiv SciQ dataset
- The NarrativeQA Reading Comprehension Challenge, Dec 2017, arxiv dataset at deepmind
- WinoGrande: An Adversarial Winograd Schema Challenge at Scale, 2017, arxiv Winogrande dataset
- TruthfulQA: Measuring How Models Mimic Human Falsehoods, Sep 2021, arxiv
- TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages, 2020, arxiv data
- Natural Questions: A Benchmark for Question Answering Research, Transactions ACL 2019
Blog posts, whitepapers
- Perplexity, firmly Build Merchant Network to Power GenAI Commerce, Mar 2025, press release
- Adobe Analytics: Traffic to U.S. retail websites from Generative AI sources jumps 1,200 percent, Mar 2025, adobe
- Foundation Model for Personalized Recommendation by Netflix, Mar 2025, Netflix blog
- Improving Recommendation Systems & Search in the Age of LLMs by Eugene Neyan, Mar 2025, blog post
- A Coding Implementation to Build a Conversational Research Assistant with FAISS, Langchain, Pypdf, and TinyLlama-1.1B-Chat-v1.0, Mar 2025, marktechpost
- Investigating ChatGPT Search: Insights from 80 Million Clickstream Records, Feb 2025, SemRush blog
- Query Expansion with LLMs: Searching Better by Saying More, Feb 2025, Jina ai
- Transformers in music recommendation, Google on how transformers are used for music recommendation at youtube, google research blog
- Scaling the Instagram Explore Recommendations Systems, Meta 08 2023
- PDF Retrieval with Vision Language Models, about ColPali and using it for document search from Vespa
- Evaluating search relevance part 2 - Phi-3 as relevance judge, a series of articles from ElasticSearch, practical experience on using Phi-3 llm family for relevance evaluation elasticsearch
Verticals
Product Search
- Generative Retrieval and Alignment Model: A New Paradigm for E-commerce Retrieval, apr 2025, arxiv
- Personalised outfit recommendation via history-aware transformers, Amazon Science, WSDM 2025
- Cite Before You Speak: Enhancing Context-Response Grounding in E-commerce Conversational LLM-Agents, Mar 2025, arxiv
- Automated Query-Product Relevance Labeling using Large Language Models for E-commerce Search, Feb 2025, arxiv
- Behavior Modeling Space Reconstruction for E-Commerce Search, Jan 2025, arxiv
- Enhancing Relevance of Embedding-based Retrieval at Walmart, Oct 2024, CIKM 2024
- Towards translating objective product attributes into customer language, Amazon Science
- Manipulating Large Language Models to Increase Product Visibility, Sep 2024, arxiv
- Hierarchical query classification in e-commerce search, WWW 2024
- An interpretable ensemble of graph and language models for improving search relevance in e-commerce, WWW 2024
- Web-Scale Semantic Product Search with Large Language Models, May 2023, KDDM 2023
- Web-scale semantic product search with large language models, Amazon Science, PAKDD 2023
- Behavior-driven query similarity prediction based on pre-trained language models for e-commerce search, Amazon Science, SIGIR 2023 eCommerce workshop
- Rethinking E-Commerce Search, Instacart, 2023, arxiv
- Overview of the TREC 2023 Product Product Search Track, TREC 2023
Location Aware (Maps, real estate, local. travel)
- Learning to Rank for Maps at Airbnb, KDD 2024
- Transforming Location Retrieval at Airbnb: A Journey from Heuristics to Reinforcement Learning, CIKM 2024
- Optimizing Airbnb Search Journey with Multi-task Learning SIGKDD 2023
- Learning To Rank Diversely At Airbnb, CIKM 2023
- Improving Deep Learning for Airbnb Search, KDD 2020
- Real-time Personalization using Embeddings for Search Ranking at Airbnb, KDD 2018
Ads / advertisement
- Set-based state estimation of nonlinear discrete-time systems using constrained zonotopes and polyhedral relaxations, Mar 2025, arxiv
- Semantic Ads Retrieval at Walmart eCommerce with Language Models Progressively Trained on Multiple Knowledge Domains, Mar 2025, arxiv
- Applying Deep Learning to Ads Conversion Prediction in Last Mile Delivery Marketplace, Feb 2025, DoorDash, arxiv
- Scaling Laws for Online Advertisement Retrieval, Nov 2024, arxiv
Real estate
- Beyond Relevance: A Demand Balancer Model for Rental Platforms with Single-Unit Inventory, WSDM 2025
Healthcare
- MedExpQA: Multilingual benchmarking of Large Language Models for Medical Question Answering, Sep 2024, AI in Medicine
- Better to Ask in English: Cross-Lingual Evaluation of Large Language Models for Healthcare Queries, WWW 2024
- JMLR: Joint Medical LLM and Retrieval Training for Enhancing Reasoning and Professional Question Answering Capability, Jun 2024, arxiv
- BioMistral: A Collection of Open-Source Pretrained Large Language Models for Medical Domains, Feb 2024, arxiv
- DISC-MedLLM: Bridging General Large Language Models and Real-World Medical Consultation, Aug 2023, arxiv
- Clinical Camel: An Open Expert-Level Medical Language Model with Dialogue-Based Knowledge Encoding, May 2023, arxiv
- MedAlpaca – An Open-Source Collection of Medical Conversational AI Models and Training Data, Apr 2023, arxiv
Science
- PaSa: An LLM Agent for Comprehensive Academic Paper Search, Jan 2025, arxiv
Finance
Legal
- Conversational vs Traditional: Comparing Search Behavior and Outcome in Legal Case Retrieval, SIGIR 21 short paper
Search Engine Optimization
Search Engine Optimization, adversial
- Dynamics of Adversarial Attacks on Large Language Model-Based Search Engines, Jan 2025, arxiv
- Adversarial Search Engine Optimization for Large Language Models, Jul 2024, arxiv
- Stealthy Attack on Large Language Model based Recommendation, Feb 2024, arxiv