Information Retrieval
Instructor: Venkatesh Vinayakarao
Term: Aug - Nov 2022



Welcome to Information Retrieval (IR) course! It is difficult to imagine living without search engines. Availability of big data has necessitated a systematic study of retrieval techniques. Principles and practices of information retrieval have been a focus of both researchers and practitioners alike. This course is not about just search engines. It is about dealing with big data and retrieving information which opens up several interesting applications. This course will introduce students to key parts of IR such as indexing techniques, challenges in query processing and well-known retrieval models.

Key Learning Objectives

At the end of this course, you should be able to:
  • Understand and apply text indexing techniques to big data.
  • Understand and apply text ranking techniques.
  • Analyze and evaluate existing retrieval systems.

Lecture Resources

Lecture #TopicReadingsSlides/Material
Part 1: Building a Search System - An Overview
1Introduction to Information RetrievalChapter 1 from CPS[Video][Slides]
2Building a Simple Retrieval SystemChapter 1 from CPS[Video][Slides]
3Query Processing with Inverted IndexChapter 2 from CPS[Video][Slides]
4Evaluating Retrieval SystemsChapter 8 from CPS[Video][Slides]
Tutorial: Lucene DemoLucene Tutorial[Video][LuceneDemo.zip]
Part 2: Components of a Retrieval System
2.1 Indexing
5Indexing: Query Processing OrderSection 1.3 from CPS[Video][Slides]
6Indexing: ChallengesChapter 2 from CPS[Video][Slides]
[Video][Slides]
The Power of Indexing
2.2 Query Understanding
7Query Understanding: Segmentation and Spelling CorrectionChapter 3 from CPS[Video][Slides]
8Query Understanding: Phonetic CorrectionChapter 3 from CPS[Video][Slides]
Handling Wildcard Queries
9Tutorial: Solr[Video][Slides][Code]
2.3 Index Compression
10Index CompressionChapter 5 from CPS[Video][Slides]
2.4 Crawling
11,12CrawlersChapter 20 from CPS, Chapter 3 from BDT[Video][Slides]
Challenges in Web Crawling
2.5 Relevance and Ranking
13Scoring and Term-WeightingChapter 6 (Sections 6.2,6.3,6.4) from CPS[Video][Slides]
14Scoring with Zonal IndexesChapter 6 (Section 6.1) from CPS[Video][Slides]
Mid-Term Exam
15Tutorial: Solr: Analyzers, Tokenizers and FiltersSolr Guide[Video][Code]
[Demo-Video]
16Efficient Scoring and RankingChapter 7 from CPS[Video][Slides]
2.5 Evaluation
17,18EvaluationChapter 8 from CPS[Video][Slides]
Part 3: Advanced Topics in Information Retrieval
19Relevance FeedbackChapter 9 from CPS[Video][Slides]
20,21Probabilistic IRChapter 11 from CPS[Video][Slides]
A shorter video without the revision of first part but with a slightly better explanation of the RSV computation is available here: [Video]
22String Handling[Video][Slides]
23Popularity of Web Pages - Link AnalysisChapter 21 from CPS[Video][Slides]
24,25Topics in IR - An Overview [Knowledge Graphs and Entity Retrieval][Mixed Initiative Search]
[Online Learning to Rank]
[Personalization in Web Search]
[Query Understanding and Ranking]
Overview of Searching in Solr [Video][Solr Guide]


Evaluation
InstrumentMax Marks
Final Exam30%
Mid-Semester Exam20%
Assignments (3 * 10% each)30%
Mini-Project (Ideas)20%

Pre-requisites
Familiarity with Java will help in coding with Lucene. You may use your favourite programming language (for assignments) as long as the objectives of the assignment are met. Basic understanding of linear algebra, set theory and probability will be useful in understanding the IR models. However, there are no specific pre-requisites for this course. We will revise the fundamentals wherever necessary.

Resources

Text Reference
  • [BDT] Search Engines: Information Retrieval in Practice. Bruce Croft, Donald Metzler, Trevor Strohman


If you are not having fun, you are not the best student you can be!