- Level Foundation
- Duration 21 hours
- Course by University of California San Diego
-
Offered by
About
Welcome to the Capstone Project for Big Data! In this culminating project, you will build a big data ecosystem using tools and methods form the earlier courses in this specialization. You will analyze a data set simulating big data generated from a large number of users who are playing our imaginary game "Catch the Pink Flamingo". During the five week Capstone Project, you will walk through the typical big data science steps for acquiring, exploring, preparing, analyzing, and reporting. In the first two weeks, we will introduce you to the data set and guide you through some exploratory analysis using tools such as Splunk and Open Office. Then we will move into more challenging big data problems requiring the more advanced tools you have learned including KNIME, Spark's MLLib and Gephi. Finally, during the fifth and final week, we will show you how to bring it all together to create engaging and compelling reports and slide presentations. As a result of our collaboration with Splunk, a software company focus on analyzing machine-generated big data, learners with the top projects will be eligible to present to Splunk and meet Splunk recruiters and engineering leadership.Modules
Introduction to the Capstone Project
4
Videos
- Welcome to the Big Data Capstone Project
- Welcome from Splunk: Rob Reed World Education Evangelist
- A Summary of Catch the Pink Flamingo
- A Conceptual Schema for Catch the Pink Flamingo
4
Readings
- Planning, Preparation, and Review
- A Game by Eglence Inc. : Catch The Pink Flamingo
- Overview of the Catch the Pink Flamingo Data Model
- Overview of Final Project Design
Acquiring and Understanding the Game Data
2
Readings
- Downloading the Game Data and Associated Scripts
- Understanding the CSV Files Generated by the Scripts
Let's Do It: Exploring and Preparing the Data
1
Assignment
- Data Exploration With Splunk
1
Peer Review
- Data Exploration Technical Appendix
4
Readings
- Optional Review of Splunk
- “Catch the Pink Flamingo” Data Exploration with Splunk
- Aggregate Calculations Using Splunk
- Filtering the Data With Splunk
Get Thinking: Classifying Players' Spending Habits
2
Readings
- Review: Classification Using Decision Tree in KNIME
- Review: Interpreting a Decision Tree in KNIME
Let's Do It
1
Peer Review
- Classifying in KNIME to identify big spenders in Catch the Pink Flamingo
2
Readings
- Workflow Overview for Building a Decision Tree in KNIME
- Description of combined_data.csv
Get Thinking: Clustering to Improve Eglence Inc.'s Revenue
3
Discussions
- Is there only “one way” to cluster a client base?
- How many clusters?
- What kind of criteria might provide actionable information for Eglence Inc.?
1
Readings
- Informing business strategies based on client base
Let's Do It
1
Peer Review
- Recommending Actions from Clustering Analysis
1
Readings
- Practice with PySpark MLlib Clustering
Get Thinking: A Graph Analytics Approach to Simulated Chat Data
1
Readings
- Understanding the Simulated Chat Data Generated by the Scripts
Let's Do It: Working with Simulated Chat Data in Neo4j
1
Peer Review
- Graph Analytics With Chat Data Using Neo4j
1
Readings
- Graph Analytics of Catch the Pink Flamingo Chat Data Using Neo4j
Final Project Instructions
1
Videos
- Week 5: Bringing It All Together
1
Readings
- Final project preparation
Final Project Submission
1
Peer Review
- Final Project
1
Videos
- Congratulations! Some Final Words...
Optional Splunk Submission
1
Peer Review
- Optional 3-minute video: Splunk opportunity
1
Readings
- Part 2: Help us connect your video to your LinkedIn profile
Auto Summary
Embark on an exciting journey into the world of big data with the Capstone Project for Big Data, a premier offering in the Data Science & AI domain. Guided by Coursera, this five-week intensive course allows learners to apply their knowledge from previous courses by constructing a sophisticated big data ecosystem. The project centers around analyzing a comprehensive data set from the fictional game "Catch the Pink Flamingo," simulating real-world big data scenarios. Participants will follow a structured path through essential big data science steps: acquisition, exploration, preparation, analysis, and reporting. The initial weeks focus on familiarizing learners with the data set and performing exploratory analysis using tools like Splunk and Open Office. As the course progresses, students tackle complex big data challenges employing advanced tools such as KNIME, Spark's MLLib, and Gephi. In the final week, learners will master the art of creating compelling reports and presentations. An added perk of this course is the collaboration with Splunk, providing top performers the unique opportunity to present their projects to Splunk's recruiters and engineering leadership. With a duration totaling 1260 minutes, the course offers various subscription options including Starter, Professional, and Paid plans, catering to different levels of commitment and access. This foundational-level course is ideal for aspiring data scientists and AI enthusiasts eager to enhance their practical skills and gain recognition in the industry. Join now and take a significant step towards mastering big data.

Ilkay Altintas

Amarnath Gupta