Machine Learning: Clustering & Retrieval

Buy Now AED 274.99 + VAT

Monthly Subscription Starting at AED 99 + VAT

Level Foundation
Duration 17 hours
Course by University of Washington
Offered by

About

Case Studies: Finding Similar Documents A reader is interested in a specific news article and you want to find similar articles to recommend. What is the right notion of similarity? Moreover, what if there are millions of other documents? Each time you want to a retrieve a new document, do you need to search through all other documents? How do you group similar documents together? How do you discover new, emerging topics that the documents cover? In this third case study, finding similar documents, you will examine similarity-based algorithms for retrieval. In this course, you will also examine structured representations for describing the documents in the corpus, including clustering and mixed membership models, such as latent Dirichlet allocation (LDA). You will implement expectation maximization (EM) to learn the document clusterings, and see how to scale the methods using MapReduce. Learning Outcomes: By the end of this course, you will be able to: -Create a document retrieval system using k-nearest neighbors. -Identify various similarity metrics for text data. -Reduce computations in k-nearest neighbor search by using KD-trees. -Produce approximate nearest neighbors using locality sensitive hashing. -Compare and contrast supervised and unsupervised learning tasks. -Cluster documents by topic using k-means. -Describe how to parallelize k-means using MapReduce. -Examine probabilistic clustering approaches using mixtures models. -Fit a mixture of Gaussian model using expectation maximization (EM). -Perform mixed membership modeling using latent Dirichlet allocation (LDA). -Describe the steps of a Gibbs sampler and how to use its output to draw inferences. -Compare and contrast initialization techniques for non-convex optimization objectives. -Implement these techniques in Python.

Modules

What is this course about?

4 Videos

5 Readings

Show info about module content

4 Videos

Welcome and introduction to clustering and retrieval tasks
Course overview
Module-by-module topics covered
Assumed background

5 Readings

Important Update regarding the Machine Learning Specialization
Slides presented in this module
Software tools you'll need for this course
A big week ahead!
Get help and meet other learners. Join your Community!

Introduction to nearest neighbor search and algorithms

3 Videos

1 Readings

Show info about module content

3 Videos

Retrieval as k-nearest neighbor search
1-NN algorithm
k-NN algorithm

1 Readings

Slides presented in this module

The importance of data representations and distance metrics

5 Videos

1 Assignment

Show info about module content

1 Assignment

Representations and metrics

5 Videos

Document representation
Distance metrics: Euclidean and scaled Euclidean
Writing (scaled) Euclidean distance using (weighted) inner products
Distance metrics: Cosine similarity
To normalize or not and other distance considerations

Programming Assignment 1

1 Readings

1 Assignment

Show info about module content

1 Assignment

Choosing features and metrics for nearest neighbor search

1 Readings

Choosing features and metrics for nearest neighbor search

Scaling up k-NN search using KD-trees

6 Videos

1 Readings

1 Assignment

Show info about module content

1 Assignment

KD-trees

6 Videos

Complexity of brute force search
KD-tree representation
NN search with KD-trees
Complexity of NN search with KD-trees
Visualizing scaling behavior of KD-trees
Approximate k-NN search using KD-trees

1 Readings

(OPTIONAL) A worked-out example for KD-trees

Locality sensitive hashing for approximate NN search

7 Videos

1 Assignment

Show info about module content

1 Assignment

Locality Sensitive Hashing

7 Videos

Limitations of KD-trees
LSH as an alternative to KD-trees
Using random lines to partition points
Defining more bins
Searching neighboring bins
LSH in higher dimensions
(OPTIONAL) Improving efficiency through multiple tables

Programming Assignment 2

1 Readings

1 Assignment

Show info about module content

1 Assignment

Implementing Locality Sensitive Hashing from scratch

1 Readings

Implementing Locality Sensitive Hashing from scratch

Summarizing nearest neighbor search

1 Videos

Show info about module content

1 Videos

A brief recap

Introduction to clustering

3 Videos

1 Readings

Show info about module content

3 Videos

The goal of clustering
An unsupervised task
Hope for unsupervised learning, and some challenge cases

1 Readings

Slides presented in this module

Clustering via k-means

4 Videos

1 Assignment

Show info about module content

1 Assignment

k-means

4 Videos

The k-means algorithm
k-means as coordinate descent
Smart initialization via k-means++
Assessing the quality and choosing the number of clusters

Programming Assignment

1 Readings

1 Assignment

Show info about module content

1 Assignment

Clustering text data with K-means

1 Readings

Clustering text data with k-means

MapReduce for scaling k-means

4 Videos

1 Assignment

Show info about module content

1 Assignment

MapReduce for k-means

4 Videos

Motivating MapReduce
The general MapReduce abstraction
MapReduce execution overview and combiners
MapReduce for k-means

Summarizing clustering with k-means

2 Videos

Show info about module content

2 Videos

Other applications of clustering
A brief recap

Motivating and setting the foundation for mixture models

4 Videos

1 Readings

Show info about module content

4 Videos

Motiving probabilistic clustering models
Aggregating over unknown classes in an image dataset
Univariate Gaussian distributions
Bivariate and multivariate Gaussians

1 Readings

Slides presented in this module

Mixtures of Gaussians for clustering

3 Videos

Show info about module content

3 Videos

Mixture of Gaussians
Interpreting the mixture of Gaussian terms
Scaling mixtures of Gaussians for document clustering

Expectation Maximization (EM) building blocks

4 Videos

Show info about module content

4 Videos

Computing soft assignments from known cluster parameters
(OPTIONAL) Responsibilities as Bayes' rule
Estimating cluster parameters from known cluster assignments
Estimating cluster parameters from soft assignments

The EM algorithm

3 Videos

1 Readings

1 Assignment

Show info about module content

1 Assignment

EM for Gaussian mixtures

3 Videos

EM iterates in equations and pictures
Convergence, initialization, and overfitting of EM
Relationship to k-means

1 Readings

(OPTIONAL) A worked-out example for EM

Summarizing mixture models

1 Videos

Show info about module content

1 Videos

A brief recap

Programming Assignment 1

1 Readings

1 Assignment

Show info about module content

1 Assignment

Implementing EM for Gaussian mixtures

1 Readings

Implementing EM for Gaussian mixtures

Programming Assignment 2

1 Readings

1 Assignment

Show info about module content

1 Assignment

Clustering text data with Gaussian mixtures

1 Readings

Clustering text data with Gaussian mixtures

Introduction to latent Dirichlet allocation

4 Videos

1 Readings

1 Assignment

Show info about module content

1 Assignment

Latent Dirichlet Allocation

4 Videos

Mixed membership models for documents
An alternative document clustering model
Components of latent Dirichlet allocation model
Goal of LDA inference

1 Readings

Slides presented in this module

Bayesian inference via Gibbs sampling

3 Videos

Show info about module content

3 Videos

The need for Bayesian inference
Gibbs sampling from 10,000 feet
A standard Gibbs sampler for LDA

Collapsed Gibbs sampling for LDA

4 Videos

Show info about module content

4 Videos

What is collapsed Gibbs sampling?
A worked example for LDA: Initial setup
A worked example for LDA: Deriving the resampling distribution
Using the output of collapsed Gibbs sampling

Summarizing latent Dirichlet allocation

1 Videos

1 Assignment

Show info about module content

1 Assignment

Learning LDA model via Gibbs sampling

1 Videos

A brief recap

Programming Assignment

1 Readings

1 Assignment

Show info about module content

1 Assignment

Modeling text topics with Latent Dirichlet Allocation

1 Readings

Modeling text topics with Latent Dirichlet Allocation

What we've learned

4 Videos

1 Readings

Show info about module content

4 Videos

Module 1 recap
Module 2 recap
Module 3 recap
Module 4 recap

1 Readings

Slides presented in this module

Hierarchical clustering and clustering for time series segmentation

6 Videos

Show info about module content

6 Videos

Why hierarchical clustering?
Divisive clustering
Agglomerative clustering
The dendrogram
Agglomerative clustering details
Hidden Markov models

Programming Assignment

1 Readings

1 Assignment

Show info about module content

1 Assignment

Modeling text data with a hierarchy of clusters

1 Readings

Modeling text data with a hierarchy of clusters

Summary and what's ahead in the specialization

2 Videos

Show info about module content

2 Videos

What we didn't cover
Thank you!

Auto Summary

Explore the fascinating world of document similarity with "Machine Learning: Clustering & Retrieval" on Coursera. Dive into data science and AI with expert guidance as you learn to identify similar documents using advanced algorithms. This foundational course covers k-nearest neighbors, KD-trees, MapReduce, k-means clustering, Gaussian models, and latent Dirichlet allocation. With practical implementation in Python, you'll master both supervised and unsupervised learning tasks. Perfect for data science enthusiasts, this 1020-minute course offers a starter subscription to kickstart your journey in document clustering and retrieval.

Instructors

Emily Fox

Instructors

Carlos Guestrin

Machine Learning: Clustering & Retrieval

About

Modules

What is this course about?

Introduction to nearest neighbor search and algorithms

The importance of data representations and distance metrics

Programming Assignment 1

Scaling up k-NN search using KD-trees

Locality sensitive hashing for approximate NN search

Programming Assignment 2

Summarizing nearest neighbor search

Introduction to clustering

Clustering via k-means

Programming Assignment

MapReduce for scaling k-means

Summarizing clustering with k-means

Motivating and setting the foundation for mixture models

Mixtures of Gaussians for clustering

Expectation Maximization (EM) building blocks

The EM algorithm

Summarizing mixture models

Programming Assignment 1

Programming Assignment 2

Introduction to latent Dirichlet allocation

Bayesian inference via Gibbs sampling

Collapsed Gibbs sampling for LDA

Summarizing latent Dirichlet allocation

Programming Assignment

What we've learned

Hierarchical clustering and clustering for time series segmentation

Programming Assignment

Summary and what's ahead in the specialization

Auto Summary

Start learning with us today!

Live Chat

Machine Learning: Clustering & Retrieval

About

Modules

Welcome

What is this course about?

Nearest Neighbor Search

Introduction to nearest neighbor search and algorithms

The importance of data representations and distance metrics

Programming Assignment 1

Scaling up k-NN search using KD-trees

Locality sensitive hashing for approximate NN search

Programming Assignment 2

Summarizing nearest neighbor search

Clustering with k-means

Introduction to clustering

Clustering via k-means

Programming Assignment

MapReduce for scaling k-means

Summarizing clustering with k-means

Mixture Models

Motivating and setting the foundation for mixture models

Mixtures of Gaussians for clustering

Expectation Maximization (EM) building blocks

The EM algorithm

Summarizing mixture models

Programming Assignment 1

Programming Assignment 2

Mixed Membership Modeling via Latent Dirichlet Allocation

Introduction to latent Dirichlet allocation

Bayesian inference via Gibbs sampling

Collapsed Gibbs sampling for LDA

Summarizing latent Dirichlet allocation

Programming Assignment

Hierarchical Clustering & Closing Remarks

What we've learned

Hierarchical clustering and clustering for time series segmentation

Programming Assignment

Summary and what's ahead in the specialization

Auto Summary