Welcome to Distributed Computing and Big Data course! Massive increase in the availability of data has made the storage, management, and analysis extremely challenging. Various tools, technologies and frameworks have surfaced to help address this challenge. Apache Hadoop is one such framework that enables us to handle big data by making distributed computing easier. Concerns such as reliability, distributed file management and distributed processing have been abstracted from us by hadoop. In this course, we shall start with understanding the characteristics of big data and the fundamental concepts of cloud computing. We will explore the hadoop ecosystem. Specifically, we will explore HDFS, Map-Reduce, Pig and NoSQL DB. Our objective is to handle big data effectively and build web applications and RESTful services over cloud. This is an introductory course focused on the breadth of the big data landscape.
Key Learning Objectives
At the end of this course, you should be able to:
- Understand how distribtued file systems work. Be able to use Hadoop HDFS.
- Understand distributed processing fundamentals. Code using map-reduce framework and pig scripts.
- Understand NoSQL DB concepts using MongoDB and HBase.
Lecture Schedule
Evaluation
Instrument | Max Marks |
Mid Exam | 20% |
Final Exam | 35% |
Assignment (3*10%) | 30% |
Student Presentation and Demos | 15% |
|
Student Presentation and Demos
The presentation topics will be released soon. You may either individually do this or may work in a group of up to four students. Students may individually be called for a viva. Further instructions about submission format and expectations will be discussed in the class.
Pre-requisites
None.
Resources
Text
There is no prescribed text for this course. Readings will be shared during the lectures.
References
Optional Readings