|
Distributed Computing and Big Data
Instructor | : Venkatesh Vinayakarao |
Term | : Jan - Apr 2023 |
Welcome to Distributed Computing and Big Data course! Massive increase in the availability of data has made the storage, management, and analysis extremely challenging. Various tools, technologies and frameworks have surfaced to help address this challenge. Apache Hadoop is one such framework that enables us to handle big data by making distributed computing easier. Concerns such as reliability, distributed file management and distributed processing have been abstracted from us by hadoop. In this course, we shall start with understanding the characteristics of big data and the fundamental concepts of cloud computing. We will explore the hadoop ecosystem. Specifically, we will explore HDFS, Map-Reduce, Pig and NoSQL DB. Our objective is to handle big data effectively and build web applications and RESTful services over cloud. This is an introductory course focused on the breadth of the big data landscape.
Key Learning Objectives
At the end of this course, you should be able to:
- Understand how distribtued file systems work. Be able to use Hadoop HDFS.
- Understand distributed processing fundamentals. Code using map-reduce framework and pig scripts.
- Understand NoSQL DB concepts using MongoDB and HBase.
Lecture Schedule
Lecture # | Topic | Readings | Slides/Material |
Part 1: Introduction |
1 | Introduction to Big Data | The Complete Beginner's Guide To Big Data Everyone Can Understand Basics About Cloud Computing | Lecture 1 Lecture 2 Lecture 1 and 2 (new) |
2 | Hands-On Tutorial: A Tour of Big Data Stack with Cloudera VM | CDH Overview | Tutorial 1 |
3 | Distributed File Systems | Files and Directories File System The Hadoop Distributed File System | Lecture 3 Lecture 3.1 (DC Model) DFS (Lecture 4 Updated) |
4 | Hands-On Tutorial: HDFS | Exploring the File System | Tutorial 2 |
5 | Distributed Processing with Map-Reduce and Pig | Overview of Map-Reduce Pig-Latin | Lecture 5 |
6 | Hands-On Tutorial: Map-Reduce | | Tutorial 3 |
7 | Introduction to OOAD and UML | | Lecture 6 Lecture 7 Tut4-JavaCode |
8 | Big Data Design Patterns | Chapter 1 from Thinking in Patterns Map-Reduce Design Patterns | Lecture 8 |
9 | Apache Pig | | Lecture 9 |
10 | Hands-On Tutorial: Map-Reduce and Pig | MapReduce Tutorial Pig Tutorial | Tutorial 5: MR [zip], [video], Pig [zip] |
11 | NoSQL DB | Chapter 4 from BDA Book Columnar Storage NoSQL Explained | Lecture 10 |
12 | Hands-On Tutorial: MongoDB, HBase | MongoDB CRUD Operations | Tut6-MongoDB Exercise Tut6-HBase |
13 | Web Application Development and Service Oriented Architecture | Web Application Development Sections 1, 2 and 3 of Web Services | Lecture 11 Notes Video Neo4j Neo4j Commands |
14 | Hands-On Tutorial: Apache Tomcat, JSON, RESTful Services | Building Web Applications with Tomcat RESTful Services | Hive and Solr Tut7-WS Tut7-Code Video |
Part 2: Additional Topics |
15 | Big Data in the Real World (Media, Streaming, Transportation, Sports, Finance) |
16 | Data Warehouse, Pond and Lakes |
17 | Cloud Storage and Computing - AWS |
Evaluation
Instrument | Max Marks |
Mid Exam | 25% |
Final Exam | 35% |
Assignment (4*10%) | 40% |
|
Pre-requisites
None.
Resources
Text
There is no prescribed text for this course. Readings will be shared during the lectures.
References
Optional Readings
|
If you are not
having fun, you are not the best student you can be!
|
|
|
|