The Mathematics behind Search Engines Mathematical models play a significant role in designing elegant products. Have you ever wondered how these search engines could index trillions of documents and yet respond to queries within seconds? How does the search engine understand queries? How does the indexing work? Similarly, we explore some ranking techniques too. In this talk, we discuss the vector space models and probabilistic retrieval models from scratch. I will cover the necessary preliminaries and explain the way various models lend elegance and efficiency. Towards the end, we shall discuss current trends in web search research. Tech Talk (delivered live online) for Qualcomm, 4th August, 2021. Slides
Knowledge Graphs - Going Beyond Data! Digital enablement is the new mantra. Towards this purpose, we have come a long way leveraging the power of data. A key step in this direction is the mining and representation of knowledge. Various domains ranging from clinical healthcare to fintech, find a need to capture and use knowledge. In this talk, we discuss how knowledge graphs serve this purpose. Specifically, we focus on knowledge graph construction and discuss a case in entity retrieval. Tech Talk (delivered live online) for Qualcomm, 24th Jun, 2021. Slides
Source Code Search - An Overview With the growing volume of information and information needs, technology support to retrieve information has become indispensable. We focus on Information Retrieval (IR) of a specific type of content, namely, source code. Source code retrieval has attracted the attention of several researchers in the last decade. This research has resulted in several useful applications such as code completion, example search, plagiarism detection, automated feedback for programming assignments, feature location, and bug localization. We discuss literature from the software engineering (SE) domain to understand the role of IR in SE, specifically where source code is involved, either as content or query. While this talk gives an overview of source code search, we hope it also inspires you to explore topics such as program analysis, software engineering and advanced information retrieval for future studies. Guest Lecture at CS317- Information Retrieval, National University of Computer & Emerging Sciences, Karachi Campus, 08th Apr, 2021. Slides
Researchers Beware! The Devil is in the (Mathematical) Models Research in Information and Communication Engineering has to deal with enormous amount of complexity. Researchers often seek to find refuge in mathematical models. Models abstract unwanted information in the context of the problem at hand and thus lead us to simpler representations that are tenable for research. In this talk, we focus specifically on mathematical models in real-world software products and related research. We start with Edgar Codd's Turing award winning idea that drives modern day databases. After this ice breaker, we explore two popular underlying models used in search engines. The Vector Space Model and the Probabilistic Retrieval Model present two different ways of modeling data to solve the same problem of retrieving documents. These models demonstrate the rigor and elegance that mathematical models can bring to research. Two Day National Workshop on "An Insight into Mathematical Modelling in Information and Communication Engineering", Jointly Organized by Department of CSE and Department of Mathematics, Sri Sivasubramaniya Nadar College of Engineering (delivered live online), 26th Mar, 2021. Slides
Why should software developers care for mathematics? Software products have grown in scale and complexity. We have built amazing software products such as operating systems and search engines that are not only large, but also fast. Mere knowledge of programming languages has become insufficient for effective software development. The logical thinking ability, data structures and algorithmic skills are essential, but they leave a gap for mathematical models to fill. Have you ever wondered how the massive search engines that index trillions of documents are able to respond within sub-seconds? In this talk, we discuss the fundamental question,"Why should software developers care for mathematics?". We will take relational databases and search engines as examples to discuss the role of mathematical models in building large scale complex software systems. Tech Talk (delivered live online) for Qualcomm, 18th Feb, 2021. Slides
Data Analytics - Key Enablers and Associated Challenges: In the current digital era, businesses depend heavily on the power of analytics. Data Analytics is typically looked at as a discipline that explores various aspects of data management including collecting, curating, organizing, storing and analyzing data. In the first part of this talk, after a gentle introduction to analytics, we discuss the evolution of two key enablers of data analytics systems namely, data storage and processing mechanisms. We observe that both storage and processing suffer from the issue of scale. In the second part, we focus on the challenges especially attributed to scale leading to cloud computing. Specifically, we discuss the role of cloud frameworks and platforms (such as Storm and Kafka) in addressing this problem. Faculty Development Programme on End to End Process in Data Analytics, Department of Information Technology, SSN College of Engineering, Chennai, India, 02nd Nov, 2020.Slides
The Art and Science Behind Search Engines: Can you imagine a life without search engines? It might not be an exaggeration even to claim that our ability to access the right information at the right time is crucial for our success. In the post-COVID era of digital dominance, we are expected to be inundated with data. Availability of big data has necessitated a systematic study of data storage and retrieval techniques. Principles and practices of information retrieval have been a focus of both researchers and practitioners alike. The field of study that deals with search and search engines is popularly known as "Information Retrieval". This session introduces "Information Retrieval". It touches upon two major questions: 1) What is the science behind search engines? In other words, how do these search engines work? and 2) What career (both industrial and research) opportunities exist for search enthusiasts?NPTEL - Live Sessions - Special Lecture Series, 26th Jun, 2020.Slides, Video
Code Variants Retrieval: Code variants represent alternative implementations of a code snippet, where each alternative provides the same functionality, but has different properties that make some of them better suited to the overall project requirements. Developers routinely need to analyze existing code, find better reuse alternatives, and look to develop high-quality code that meets some desired properties. However, searching for such code variants over the web has several challenges. Existing text-retrieval models do not work well on source code. Expressing natural language queries on source code is an open problem. Many query terms in natural language have multiple surface forms in source code. We address this problem by perceiving source code as a collection of entities. In this talk, we present new techniques to search for code variants. The ability to perform semantic search over source code snippets assisted by developer knowledge in the form of discussion forum data opens up a new way to solve several important problems.3rd International Conference on Computational Intelligence in Data Science, Chennai, India, 21st Feb, 2020.
Demystifying Datascience: This talk is split into two parts. In the first part, I address two key questions, "What is data science?" and "Why study data science?". As an example of science behind the data, in the second part of this talk, I take search as an example to show how seeing data in a new way can get you great benefits. SSN Institutions, 14th Dec, 2019.Slides
Graphs in Program Analysis: Software has become ubiquitous. Understanding and reasoning about program properties have become necessary. In this talk, I introduce program analysis using one example from the classic four data flow analyses. Specifically, the talk will focus on the role of graphs in program analysis. Towards the end, we discuss some research trends and potential open problems for future research.
Delhi Technological University, 29th Nov, 2019.Slides
The Magic of Models (With Two Case Studies): The journey from complex problems to beautiful solutions often requires us to take a stop at a station called models! Abstracting out unnecessary details helps us to focus on the core of the problem and therefore to find elegant solutions. In this talk, I discuss few interesting models from diverse areas including 1) the application of vectors to model a real world problem that often occurs wherever we need to compare natural language (say English) sentences, and 2) the application of our knowledge on forms and functions on bayesian data analysis.
Agilisium Consulting, 24th Sep, 2019.Slides
The Mathematics behind Web Search Engines: Mathematical models play a significant role in designing elegant products. In this talk, we discuss how web search engines work by diving into the vector space models. I will cover the necessary preliminaries and explain several use cases of search engines and the way various models lend elegance and efficiency. Towards the end, we shall discuss current trends in web search research. SRM University, Chennai, 31st July, 2019.Slides
The Magic of Models: The journey from complex problems to beautiful solutions often requires us to take a stop at a station called models! Abstracting out unnecessary details helps us to focus on the core of the problem and therefore to find elegant solutions. In this talk, we apply our knowledge of vectors to model a real world problem that often occurs wherever we need to compare natural language (say English) sentences. Research Science Initiative - Summer Programme, CMI, Chennai, 28th May, 2019.Slides
Big Data - Text Processing: Principles and Practices: Text Processing is essential for several applications. Our primary medium of communication over the web remains to be text. In this talk, I explain the need for data structures using the example of suffix trees. In the second part of the talk, I show how vector space models have been used in search engines. With these two examples, we gather a glimpse of principles and practices of dealing with big textual data. SSN Analytica, SSN Institutions, Chennai, 11th May, 2019.Slides
Information Retrieval: An Overview: This talk starts with the notion and role of "information" in daily life. I will then discuss the library and information sciences practices that led to the field of digital libraries and further to search engines. I introduce Information Retrieval (IR)
covering basics of indexing and query processing. As part of the indexing step, I discuss vector space model to represent text, further leading to postings list. As part of query processing, I discuss simple
boolean queries (AND, OR, NOT) answered by merging the postings. We end the talk with a note on comparing and evaluating IR systems. Post-IOI (International Olympiad for Informatics) Training Camp Workshop, Chennai Mathematical Institute (CMI), 9th May, 2019.Slides
Code Variants and Their Retrieval: Information plays a crucial role in our day to day life. In this talk, I introduce you to the principles and practices behind information retrieval using a case of code variants search. Developers routinely analyze existing code, find better reuse alternatives, and look to develop high-quality code. Our research shows that 25% to 40% of developer discussions in bug reports are about variants. However, searching for such code variants over the web has several challenges. Using knowledge driven approaches, we propose a system to automate the search for code variants. Chennai Mathematical Institute (CMI), 9th January, 2019.
Principles and Practices Behind Building the Search Engines: Search engines play a key role in our daily life. Yet, building a search engine is scientifically involved due to the sub-second response requirement on top of fetching only relevant content and fetching all relevant content. We discuss some principles and practices of building search engines in this talk. IIIT Sri City, 6th October, 2018. Research Expo Material: Download here (zip file [5.01 MB])
Knowledge-Discovery based Approaches for Code Snippet Search: Code snippets pose several challenges compared to text for retrieval. In this talk, we propose a MATF based retrieval model, and hopefully inspire the audience to appreciate code search as a potential area of research. IIIT Sri City, 20th April, 2018.
Term Frequency and its Variants in Retrieval Models: Term frequency models and term frequency weighing schemes have appeared in several variants while designing retrieval models for text. In this talk, we review the background, intuitions and formulations of some of the variants. IIIT Delhi, 11th April, 2018.
Code Search - Challenges and Applications: Searching for source code is challenging for several reasons. The naturalness of source code cannot be compared with that of natural languages like English. How does this impact indexing and querying? In this talk, I highlight the major challenges and research opportunities in this field. Neemrana University, Rajasthan, 24th September, 2017.
Porting InnerEye to Azure Cloud: InnerEye is a research project that uses state of the art machine learning technology to build innovative tools for the automatic, quantitative analysis of three-dimensional radiological images. In this talk, I summarize my 12 weeks at Microsoft Research where I successfully migrated the InnerEye ML models to Azure cloud. Microsoft Research, Cambridge, United Kingdom, June 29, 2017.
Modeling Source Code to Support Retrieval-Based Applications: Advances in text retrieval do not apply directly to source code retrieval because of the difference in characteristics of source code when compared to text. Here, I discuss the role of source code models in retrieval. The Doctoral Consortium at the Tenth ACM International Conference on Web Search and Data Mining, WSDM 2017, Cambridge, United Kingdom, February 6, 2017.
Reflections from a PM Research Fellow: My PhD is supported by Prime Minister's Research Fellowship Scheme. In this talk, first I give a glimpse of my research, then I discuss the challenges of doing PhD in India, and finally elaborate on ways in which PM fellowship helps in overcoming some of these hurdles. Panjab University, Chandigarh, 24th March, 2015.
Bayesian Data Analysis: Bayesian data analysis provides us a framework to systematically approach our belief update process. Suppose we know nothing about the fairness of the coin, and we see four continuous heads in the first four flips, what should we infer about the coin? Is it biased towards head? Should we bet on another head again? IIIT Delhi, 12th November, 2014.
Swarm Intelligence: How can loosely connected agents display collective intelligence? Can we learn something from them? Models such as metric distance and topological distances simulate swarm behavior. Several algorithms such as Ant Colony Optimization, and Particle Swarm Optimization find applications in crowd simulation and ad-hoc networks. Let us take a look. IIIT Delhi, 01st March, 2014.
Beauty of Information Retrieval: Information retrieval is a fast growing field which concerns search towards satiating some information need. In this talk, we discuss some key challenges and interesting applications. IIIT Delhi, 01st March, 2014.