- Level Professional
- المدة 14 ساعات hours
- الطبع بواسطة Google Cloud
-
Offered by
عن
Service level indicators (SLIs) and service level objectives (SLOs) are fundamental tools for measuring and managing reliability. In this course, students learn approaches for devising appropriate SLIs and SLOs and managing reliability through the use of an error budget.الوحدات
About this Course
1
Videos
- Course structure
What Is SRE? How Does It Differ from Devops?
1
Assignment
- DevOps/SRE
2
Videos
- What's the difference between DevOps and SRE? - Intro
- What's the difference between DevOps and SRE?
Who are CRE? How can they help you be more reliable?
4
Videos
- Now SRE Everyone Else with CRE! - Intro
- Now SRE Everyone Else with CRE!
- CRE's Three Reliability Principles
- Reliability in the Cloud
Why Are SLOs Important for Your Organization?
4
Videos
- How SLOs help your business make decisions
- How SLOs help you build features faster
- How SLOs help you balance operational and project work
- Making SLOs work for your organization
Introduction
1
Videos
- Introduction
Promises, Promises. SLOs vs SLAs. Happiness Test.
2
Assignment
- A working service
- SLOs and SLAs
2
Videos
- SLOs vs SLAs
- The happiness test
How Do We Measure Reliability? Edge Cases
2
Videos
- How do we measure reliability?
- Edge cases
How Reliable Should a Service Be? Setting Targets for Reliability. Iterate!
1
Assignment
- Reliability and iterating
2
Videos
- 100% is the wrong target
- Iterating
Graded Quiz
1
Assignment
- Targeting Reliability Assessment
Introduction
1
Videos
- Introduction
When Do We Need to Make a Service More Reliable? Error Budgets.
1
Videos
- Error budgets
Trading off Reliability Against Features
1
Assignment
- Error budgets
2
Videos
- Everything is a trade-off
- Error budgets: advanced concepts
How Do We Make a Service More Reliable?
1
Assignment
- Increasing reliability
3
Videos
- Axes of improvement
- Operational approach to increasing reliability
- Module summary
Graded Quiz
1
Assignment
- Operating for Reliability Assessment
Introduction
1
Videos
- Introduction
Metrics and Measurement
1
Assignment
- Measuring happiness
1
Discussions
- Which measurement strategies have complementary pros/cons?
3
Videos
- User happiness in metric form
- The properties of good SLI metrics
- Ways of measuring SLIs
Commonly Used SLIs
2
Assignment
- Commonly used SLIs
- Correctness and Coverage
3
Discussions
- Defining Freshness and Correctness
- When should you set your latency thresholds?
- Correctness SLIs for everything?
4
Videos
- The SLI menu
- The SLI equation
- Request / Response SLIs
- Data processing SLIs
Managing Complexity
1
Discussions
- Is bucketing useful elsewhere?
3
Videos
- But my system is really complex!
- Managing complexity with aggregation
- Managing complexity with bucketing
Setting Reliability Targets
3
Videos
- Achieveable SLOs
- Aspirational SLOs
- Continuous improvement
Introduction
1
Videos
- Introduction
Introducing Our Example Game
3
Videos
- The 4 step process
- Our example game
- Loading the profile page
Refining SLI Specifications
1
Assignment
- Postmortem!
1
Peer Review
- Choose SLI specifications and refine them into SLI implementations for another, more complex user journey.
1
Videos
- Refining SLI specifications
Do the SLIs cover the failure modes of the service? What SLOs?
1
Assignment
- Setting Achievable SLO targets
1
Peer Review
- Walk the user journey and set aspirational SLO targets for another, more complex user journey.
2
Videos
- Looking for observability gaps
- Failure modes
Introduction
1
Videos
- Introduction
Reliability Risks
1
Peer Review
- Brainstorm SLO risks for our example service.
1
Videos
- Is your error budget realistic?
Characterizing Risk
1
Videos
- Modeling risks in our spreadsheet
Analyzing Risk
1
Peer Review
- Fill in the Risk Catalog sheet, estimate SLO impact, and propose fixes or mitigations to meet the desired availability target.
1
Videos
- Analyzing risk
Introduction
1
Discussions
- Production outages
1
Videos
- Introduction
Why You Should Document Your SLO: what to document and where
1
Discussions
- SLO Metadata
2
Videos
- No surprises
- A dashboard example
Why You Need an Error Budget Policy
1
Assignment
- Error budget policies
1
Discussions
- Metadata usage
1
Videos
- Why an error budget policy?
How to Write an Error Budget Policy
1
Assignment
- Error budget policy - considerations
2
Videos
- Fundamentals of an error budget policy
- How to draft an error budget policy
An Example Error Budget Policy
2
Videos
- Example policy thresholds
- A hypothetical policy scenario
Course Conclusion
1
Videos
- Course conclusion and video wrap up
Graded Quiz
1
Assignment
- Consequences of SLO Misses
Auto Summary
Unlock the secrets of maintaining robust and reliable systems with the "Site Reliability Engineering: Measuring and Managing Reliability" course. This professional-level program, offered by Coursera, is perfect for IT and computer science enthusiasts aiming to master the essentials of reliability management. Dive deep into the world of Service Level Indicators (SLIs) and Service Level Objectives (SLOs), learning how to craft these crucial metrics to ensure your systems remain dependable. The course also introduces the concept of error budgets, a powerful tool for balancing innovation and reliability. Spanning approximately 840 minutes of comprehensive content, this course provides a thorough understanding of reliability engineering principles. It's available through the Starter subscription, making it accessible for those looking to enhance their skills without breaking the bank. Ideal for professionals in the IT and computer science fields, this course equips you with the knowledge to measure, manage, and maintain system reliability effectively. Join now and take the first step towards becoming an expert in site reliability engineering.

Google Cloud Training