Monthly Subscription Starting at AED 99 + VAT

Level Foundation
المدة 20 ساعات hours
الطبع بواسطة University of Washington
Offered by

عن

Data analysis has replaced data acquisition as the bottleneck to evidence-based decision making --- we are drowning in it. Extracting knowledge from large, heterogeneous, and noisy datasets requires not only powerful computing resources, but the programming abstractions to use them effectively. The abstractions that emerged in the last decade blend ideas from parallel databases, distributed systems, and programming languages to create a new class of scalable data analytics platforms that form the foundation for data science at realistic scales. In this course, you will learn the landscape of relevant systems, the principles on which they rely, their tradeoffs, and how to evaluate their utility against your requirements. You will learn how practical systems were derived from the frontier of research in computer science and what systems are coming on the horizon. Cloud computing, SQL and NoSQL databases, MapReduce and the ecosystem it spawned, Spark and its contemporaries, and specialized systems for graphs and arrays will be covered. You will also learn the history and context of data science, the skills, challenges, and methodologies the term implies, and how to structure a data science project. At the end of this course, you will be able to: Learning Goals: 1. Describe common patterns, challenges, and approaches associated with data science projects, and what makes them different from projects in related fields. 2. Identify and use the programming models associated with scalable data manipulation, including relational algebra, mapreduce, and other data flow models. 3. Use database technology adapted for large-scale analytics, including the concepts driving parallel databases, parallel query processing, and in-database analytics 4. Evaluate key-value stores and NoSQL systems, describe their tradeoffs with comparable systems, the details of important examples in the space, and future trends. 5. “Think” in MapReduce to effectively write algorithms for systems including Hadoop and Spark. You will understand their limitations, design details, their relationship to databases, and their associated ecosystem of algorithms, extensions, and languages. write programs in Spark 6. Describe the landscape of specialized Big Data systems for graphs, arrays, and streams

الوحدات

Lesson 1: Examples and the Diversity of Data Science

6 Videos

Show info about module content

6 Videos

Appetite Whetting: Politics
Appetite Whetting: Extreme Weather
Appetite Whetting: Digital Humanities
Appetite Whetting: Bibliometrics
Appetite Whetting: Food, Music, Public Health
Appetite Whetting: Public Health cont'd, Earthquakes, Legal

Lesson 2: Working Definitions of Data Science

4 Videos

Show info about module content

4 Videos

Characterizing Data Science
Characterizing Data Science, cont'd
Distinguishing Data Science from Related Topics
Four Dimensions of Data Science

Lesson 3: Characterizing this Course

5 Videos

Show info about module content

5 Videos

Tools vs. Abstractions
Desktop Scale vs. Cloud Scale
Hackers vs. Analysts
Structs vs. Stats
Structs vs. Stats cont'd

Lesson 4: Related Topics

5 Videos

Show info about module content

5 Videos

A Fourth Paradigm of Science
Data-Intensive Science Examples
Big Data and the 3 Vs
Big Data Definitions
Big Data Sources

Lesson 5 : Course Logistics

1 Videos

2 Readings

Show info about module content

1 Videos

Course Logistics

2 Readings

Supplementary: Three-Course Reading List
Supplementary: Resources for Learning Python

Assignment 1: Twitter Sentiment Analysis

1 Videos

2 Readings

1 Programming

Show info about module content

1 Programming

Twitter Sentiment Analysis

1 Videos

Twitter Assignment: Getting Started

2 Readings

Supplementary: Class Virtual Machine
Supplementary: Github Instructions

Lesson 6: Principles of Data Manipulation and Management

5 Videos

Show info about module content

5 Videos

Data Models, Terminology
From Data Models to Databases
Pre-Relational Databases
Motivating Relational Databases
Relational Databases: Key Ideas

Lesson 7: Relational Algebra

7 Videos

Show info about module content

7 Videos

Algebraic Optimization Overview
Relational Algebra Overview
Relational Algebra Operators: Union, Difference, Selection
Relational Algebra Operators: Projection, Cross Product
Relational Algebra Operators: Cross Product cont'd, Join
Relational Algebra Operators: Outer Join
Relational Algebra Operators: Theta-Join

Lesson 8: SQL for Data Science

6 Videos

Show info about module content

6 Videos

From SQL to RA
Thinking in RA: Logical Query Plans
Practical SQL: Binning Timeseries
Practical SQL: Genomic Intervals
User-Defined Functions
Support for User-Defined Functions

Lesson 9: Key Principles of Relational Databases

6 Videos

Show info about module content

6 Videos

Optimization: Physical Query Plans
Optimization: Choosing Physical Plans
Declarative Languages
Declarative Languages: More Examples
Views: Logical Data Independence
Indexes

Assignment 2: SQL

1 Programming

Show info about module content

1 Programming

SQL for Data Science Assignment

Lesson 10: Reasoning about Scale

5 Videos

Show info about module content

5 Videos

What Does Scalable Mean?
A Sketch of Algorithmic Complexity
A Sketch of Data-Parallel Algorithms
"Pleasingly Parallel" Algorithms
More General Distributed Algorithms

Lesson 11: The MapReduce Programming Model

7 Videos

Show info about module content

7 Videos

MapReduce Abstraction
MapReduce Data Model
Map and Reduce Functions
MapReduce Simple Example
MapReduce Simple Example cont'd
MapReduce Example: Word Length Histogram
MapReduce Examples: Inverted Index, Join

Lesson 12: Algorithms in MapReduce

8 Videos

Show info about module content

8 Videos

Relational Join: Map Phase
Relational Join: Reduce Phase
Simple Social Network Analysis: Counting Friends
Matrix Multiply Overview
Matrix Multiply Illustrated
Shared Nothing Computing
MapReduce Implementation
MapReduce Phases

Lesson 13: Parallel Databases vs. MapReduce

6 Videos

Show info about module content

6 Videos

A Design Space for Large-Scale Data Systems
Parallel and Distributed Query Processing
Teradata Example, MR Extensions
RDBMS vs. MapReduce: Features
RDBMS vs. Hadoop: Grep
RDBMS vs. Hadoop: Select, Aggregate, Join

Assignment 3: MapReduce

1 Programming

Show info about module content

1 Programming

Thinking in MapReduce

Lesson 14: What problems do NoSQL systems aim to solve?

6 Videos

Show info about module content

6 Videos

NoSQL Context and Roadmap
NoSQL Roundup
Relaxing Consistency Guarantees
Two-Phase Commit and Consensus Protocols
Eventual Consistency
CAP Theorem

Lesson 15: Early key-value systems and key concepts

6 Videos

Show info about module content

6 Videos

Types of NoSQL Systems
ACID, Major Impact Systems
Memcached: Consistent Hashing
Consistent Hashing, cont'd
DynamoDB: Vector Clocks
Vector Clocks, cont'd

Lesson 16: Document Stores and Extensible Record Stores

4 Videos

Show info about module content

4 Videos

CouchDB Overview
CouchB Views
BigTable Overview
BigTable Implementation

Lesson 17: Extended NoSQL Systems

6 Videos

Show info about module content

6 Videos

HBase, Megastore
Spanner
Spanner cont'd, Google Systems
MapReduce-based Systems
Bringing Back Joins
NoSQL Rebuttal

Lesson 18: Pig: Programming with Relational Algebra

5 Videos

Show info about module content

5 Videos

Almost SQL: Pig
Pig Architecture and Performance
Data Model
Load, Filter, Group
Group, Distinct, Foreach, Flatten

Lesson 19: Pig Analytics

6 Videos

Show info about module content

6 Videos

CoGroup, Join
Join Algorithms
Skew
Other Commands
Evaluation Walkthrough
Review

Lesson 20: Spark

3 Videos

Show info about module content

3 Videos

Context
Spark Examples
RDDs, Benefits

Lesson 21: Structural Tasks

4 Videos

Show info about module content

4 Videos

Graph Overview
Structural Analysis
Degree Histograms, Structure of the Web
Connectivity and Centrality

Lesson 22: Traversal Tasks

4 Videos

Show info about module content

4 Videos

PageRank
PageRank in more Detail
Traversal Tasks: Spanning Trees and Circuits
Traversal Tasks: Maximum Flow

Lesson 23: Pattern Matching Tasks and Graph Query

5 Videos

Show info about module content

5 Videos

Pattern Matching
Querying Edge Tables
Relational Algebra and Datalog for Graphs
Querying Hybrid Graph/Relational Data
Graph Query Example: NSA

Lesson 24: Recursive Queries

4 Videos

Show info about module content

4 Videos

Graph Query Example: Recursion
Evaluation of Recursive Programs
Recursive Queries in MapReduce
The End-Game Problem

Lesson 24: Representations and Algorithms

4 Videos

Show info about module content

4 Videos

Representation: Edge Table, Adjacency List
Representation: Adjacency Matrix
PageRank in MapReduce
PageRank in Pregel

Auto Summary

Unlock the potential of large-scale data analytics with "Data Manipulation at Scale: Systems and Algorithms," designed for data science and AI enthusiasts. Led by expert instructors, this foundational course delves into scalable data platforms, cloud computing, SQL/NoSQL databases, MapReduce, Spark, and more. Over 1200 minutes, you'll master programming models, parallel query processing, and specialized systems for graphs and arrays. Ideal for those seeking comprehensive knowledge in handling vast datasets, the course offers both Starter and Professional subscription options. Join now to elevate your data science skills!

Instructor

Bill Howe

Data Manipulation at Scale: Systems and Algorithms

عن

الوحدات

Lesson 1: Examples and the Diversity of Data Science

Lesson 2: Working Definitions of Data Science

Lesson 3: Characterizing this Course

Lesson 4: Related Topics

Lesson 5 : Course Logistics

Assignment 1: Twitter Sentiment Analysis

Lesson 6: Principles of Data Manipulation and Management

Lesson 7: Relational Algebra

Lesson 8: SQL for Data Science

Lesson 9: Key Principles of Relational Databases

Assignment 2: SQL

Lesson 10: Reasoning about Scale

Lesson 11: The MapReduce Programming Model

Lesson 12: Algorithms in MapReduce

Lesson 13: Parallel Databases vs. MapReduce

Assignment 3: MapReduce

Lesson 14: What problems do NoSQL systems aim to solve?

Lesson 15: Early key-value systems and key concepts

Lesson 16: Document Stores and Extensible Record Stores

Lesson 17: Extended NoSQL Systems

Lesson 18: Pig: Programming with Relational Algebra

Lesson 19: Pig Analytics

Lesson 20: Spark

Lesson 21: Structural Tasks

Lesson 22: Traversal Tasks

Lesson 23: Pattern Matching Tasks and Graph Query

Lesson 24: Recursive Queries

Lesson 24: Representations and Algorithms

Auto Summary

ابدأ التعلّم معنا اليوم!

دردشة مباشرة

Data Manipulation at Scale: Systems and Algorithms

عن

الوحدات

Data Science Context and Concepts

Lesson 1: Examples and the Diversity of Data Science

Lesson 2: Working Definitions of Data Science

Lesson 3: Characterizing this Course

Lesson 4: Related Topics

Lesson 5 : Course Logistics

Assignment 1: Twitter Sentiment Analysis

Relational Databases and the Relational Algebra

Lesson 6: Principles of Data Manipulation and Management

Lesson 7: Relational Algebra

Lesson 8: SQL for Data Science

Lesson 9: Key Principles of Relational Databases

Assignment 2: SQL

MapReduce and Parallel Dataflow Programming

Lesson 10: Reasoning about Scale

Lesson 11: The MapReduce Programming Model

Lesson 12: Algorithms in MapReduce

Lesson 13: Parallel Databases vs. MapReduce

Assignment 3: MapReduce

NoSQL: Systems and Concepts

Lesson 14: What problems do NoSQL systems aim to solve?

Lesson 15: Early key-value systems and key concepts

Lesson 16: Document Stores and Extensible Record Stores

Lesson 17: Extended NoSQL Systems

Lesson 18: Pig: Programming with Relational Algebra

Lesson 19: Pig Analytics

Lesson 20: Spark

Graph Analytics

Lesson 21: Structural Tasks

Lesson 22: Traversal Tasks

Lesson 23: Pattern Matching Tasks and Graph Query

Lesson 24: Recursive Queries

Lesson 24: Representations and Algorithms

Auto Summary