

Our Courses

Big Data Science with the BD2K-LINCS Data Coordination and Integration Center
The Library of Integrative Network-based Cellular Signatures (LINCS) was an NIH Common Fund program that lasted for 10 years from 2012-2021. The idea behind the LINCS program was to perturb different types of human cells with many different types of perturbations such as drugs and other small molecules, genetic manipulations such as single gene knockdown, knockout, or overexpression, manipulation of the extracellular microenvironment conditions, for example, growing cells on different surfaces, and more.
-
Course by
-
Self Paced
-
9 hours
-
English

Apache Spark (TM) SQL for Data Analysts
Apache Spark is one of the most widely used technologies in big data analytics. In this course, you will learn how to leverage your existing SQL skills to start working with Spark immediately. You will also learn how to work with Delta Lake, a highly performant, open-source storage layer that brings reliability to data lakes. By the end of this course, you will be able to use Spark SQL and Delta Lake to ingest, transform, and query data to extract valuable insights that can be shared with your team.
-
Course by
-
Self Paced
-
14 hours
-
English

Machine Learning Using SAS Viya
This course covers the theoretical foundation for different techniques associated with supervised machine learning models. In addition, a business case study is defined to guide participants through all steps of the analytical life cycle, from problem understanding to model deployment, through data preparation, feature selection, model training and validation, and model assessment. A series of demonstrations and exercises is used to reinforce the concepts and the analytical approach to solving business problems.
-
Course by
-
Self Paced
-
34 hours
-
English

Data Manipulation at Scale: Systems and Algorithms
Data analysis has replaced data acquisition as the bottleneck to evidence-based decision making --- we are drowning in it. Extracting knowledge from large, heterogeneous, and noisy datasets requires not only powerful computing resources, but the programming abstractions to use them effectively.
-
Course by
-
Self Paced
-
20 hours
-
English

Big-O Time Complexity in Python Code
In the field of data science, the volumes of data can be enormous, hence the term Big Data.
-
Course by
-
Self Paced
-
2 hours
-
English

Machine Learning with Apache Spark
Explore the exciting world of machine learning with this IBM course. Start by learning ML fundamentals before unlocking the power of Apache Spark to build and deploy ML models for data engineering applications. Dive into supervised and unsupervised learning techniques and discover the revolutionary possibilities of Generative AI through instructional readings and videos. Gain hands-on experience with Spark structured streaming, develop an understanding of data engineering and ML pipelines, and become proficient in evaluating ML models using SparkML.
-
Course by
-
Self Paced
-
15 hours
-
English

Big data and Language 1
In this course, students will understand characteristics of language through big data. Students will learn how to collect and analyze big data, and find linguistic features from the data. A number of approaches to the linguistic analysis of written and spoken texts will be discussed. The class will consist of lecture videos which are approximately 1 hour and a quiz for each week. There will be a final project which requires students to conduct research on text data and language.
-
Course by
-
Self Paced
-
5 hours
-
English

Data and Statistics Foundation for Investment Professionals
Aimed at investment professionals or those with investment industry knowledge, this course offers an introduction to the basic data and statistical techniques that underpin data analysis and lays an essential foundation in the techniques that are used in big data and machine learning. It introduces the topics and gives practical examples of how they are used by investment professionals, including the importance of presenting the “data story" by using appropriate visualizations and report writing.
In this course you will learn how to:
-
Course by
-
Self Paced
-
21 hours
-
English

Statistics for Genomic Data Science
An introduction to the statistics behind the most popular genomic data science projects. This is the sixth course in the Genomic Big Data Science Specialization from Johns Hopkins University.
-
Course by
-
Self Paced
-
9 hours
-
English

Digital Governance
Big data, artificial intelligence, machine learning, autonomous cars, chatbots, just a few terms that have become a part of our professional legal and political vocabulary. Emerging technologies and technological advancement have confronted us in our daily practice and will continue to do so in the future. Whether we’re buying something online, taking part in an election, or chatting with friends across the globe. Technology is here and it is here to stay.
-
Course by
-
Self Paced
-
28 hours
-
English

Communicating Data Science Results
Important note: The second assignment in this course covers the topic of Graph Analysis in the Cloud, in which you will use Elastic MapReduce and the Pig language to perform graph analysis over a moderately large dataset, about 600GB. In order to complete this assignment, you will need to make use of Amazon Web Services (AWS). Amazon has generously offered to provide up to $50 in free AWS credit to each learner in this course to allow you to complete the assignment.
-
Course by
-
Self Paced
-
8 hours
-
English

Data Science with NumPy, Sets, and Dictionaries
Become proficient in NumPy, a fundamental Python package crucial for careers in data science. This comprehensive course is tailored to novice programmers aspiring to become data scientists, software developers, data analysts, machine learning engineers, data engineers, or database administrators. Starting with foundational computer science concepts, such as object-oriented programming and data organization using sets and dictionaries, you'll progress to more intricate data structures like arrays, vectors, and matrices.
-
Course by
-
Self Paced
-
31 hours
-
English

Population Health: Responsible Data Analysis
In most areas of health, data is being used to make important decisions. As a health population manager, you will have the opportunity to use data to answer interesting questions. In this course, we will discuss data analysis from a responsible perspective, which will help you to extract useful information from data and enlarge your knowledge about specific aspects of interest of the population. First, you will learn how to obtain, safely gather, clean and explore data.
-
Course by
-
Self Paced
-
20 hours
-
English

Teaching Impacts of Technology: Data Collection, Use, and Privacy
In this course you’ll focus on how constant data collection and big data analysis have impacted us, exploring the interplay between using your data and protecting it, as well as thinking about what it could do for you in the future. This will be done through a series of paired teaching sections, exploring a specific “Impact of Computing” in your typical day and the “Technologies and Computing Concepts” that enable that impact, all at a K12-appropriate level.
-
Course by
-
Self Paced
-
13 hours
-
English

Leading Change in Health Informatics
Do you dream of being a CMIO or a Senior Director of Clinical Informatics? If you are aiming to rise up in the ranks in your health system or looking to pivot your career in the direction of big data and health IT, this course is made for you. You'll hear from experts at Johns Hopkins about their experiences harnessing the power of big data in healthcare, improving EHR adoption, and separating out the hope vs hype when it comes to digital medicine.
-
Course by
-
Self Paced
-
15 hours
-
English

Tools for Data Science
In order to be successful in Data Science, you need to be skilled with using tools that Data Science professionals employ as part of their jobs. This course teaches you about the popular tools in Data Science and how to use them. You will become familiar with the Data Scientist’s tool kit which includes: Libraries & Packages, Data Sets, Machine Learning Models, Kernels, as well as the various Open source, commercial, Big Data and Cloud-based tools. Work with Jupyter Notebooks, JupyterLab, RStudio IDE, Git, GitHub, and Watson Studio.
-
Course by
-
Self Paced
-
18 hours
-
English

Ubiquitous Learning and Instructional Technologies
This course will analyze currently available technologies for learning. Areas addressed include: learning management systems, intelligent tutors, computer adaptive testing, gamification, simulations, learning in and through social media and peer interaction, universal design for learning, differentiated instruction systems, big data and learning analytics, attention monitoring, and affect-aware systems.
-
Course by
-
Self Paced
-
14 hours
-
English

Command Line Tools for Genomic Data Science
Introduces to the commands that you need to manage and analyze directories, files, and large sets of genomic data. This is the fourth course in the Genomic Big Data Science Specialization from Johns Hopkins University.
-
Course by
-
Self Paced
-
12 hours
-
English

Bigtable: Qwik Start - Command Line
This is a self-paced lab that takes place in the Google Cloud console. Cloud Bigtable is Google's NoSQL Big Data database service. It's the same database that powers many core Google services, including Search, Analytics, Maps, and Gmail. In this lab you'll use Bigtable with the cbt command line. Watch the short videos Bigtable: Qwik Start - Qwiklabs Preview and Handle Massive Worklo…
-
Course by
-
Self Paced
-
1 hour
-
English

Big data and Language 2
In this course, students will understand characteristics of language through big data. Students will learn how to collect and analyze big data, and find linguistic features from the data. A number of approaches to the linguistic analysis of written and spoken texts will be discussed.
-
Course by
-
Self Paced
-
5 hours
-
English

Explore Core Data Concepts in Microsoft Azure
In this course, you will learn the fundamentals of database concepts in a cloud environment, get basic skilling in cloud data services, and build your foundational knowledge of cloud data services within Microsoft Azure. You will identify and describe core data concepts such as relational, non-relational, big data, and analytics, and explore how this technology is implemented with Microsoft Azure.
-
Course by
-
Self Paced
-
9 hours
-
English

Designing the Future of Work
The workplace of tomorrow is an uncertain place. We live in a rapidly changing world, and design innovations such as artificial intelligence (AI), robotics, and big data are rapidly changing the fundamental nature of how we live and work. As these technologies continue to evolve at an exponential rate - it is becoming critical to understand their impact on contemporary work practices, and for businesses and employees to understand how to design a secure future amidst this disruption. What new, disruptive technologies are on the horizon? How will jobs change?
-
Course by
-
Self Paced
-
13 hours
-
English

Bioinformatics Capstone: Big Data in Biology
In this course, you will learn how to use the BaseSpace cloud platform developed by Illumina (our industry partner) to apply several standard bioinformatics software approaches to real biological data. In particular, in a series of Application Challenges will see how genome assembly can be used to track the source of a food poisoning outbreak, how RNA-Sequencing can help us analyze gene expression data on the tissue level, and compare the pros and cons of whole genome vs.
-
Course by
-
Self Paced
-
13 hours
-
English

Leveraging Unstructured Data with Cloud Dataproc on Google Cloud em Português Brasileiro
Este curso intensivo de uma semana baseia-se nos cursos anteriores da especialização Data Engineering on Google Cloud Platform.
-
Course by
-
Self Paced
-
English

Big Data, Genes, and Medicine
This course distills for you expert knowledge and skills mastered by professionals in Health Big Data Science and Bioinformatics. You will learn exciting facts about the human body biology and chemistry, genetics, and medicine that will be intertwined with the science of Big Data and skills to harness the avalanche of data openly available at your fingertips and which we are just starting to make sense of.
-
Course by
-
Self Paced
-
40 hours
-
English