Player FM - Internet Radio Done Right
13 subscribers
Checked 13d ago
Ditambah four tahun yang lalu
Indhold leveret af Stephen Townshend. Alt podcastindhold inklusive episoder, grafik og podcastbeskrivelser uploades og leveres direkte af Stephen Townshend eller deres podcastplatformspartner. Hvis du mener, at nogen bruger dit ophavsretligt beskyttede værk uden din tilladelse, kan du følge processen beskrevet her https://da.player.fm/legal.
Player FM - Podcast-app
Gå offline med appen Player FM !
Gå offline med appen Player FM !
Slight Reliability
Marker alle som (u)afspillede ...
Manage series 2917773
Indhold leveret af Stephen Townshend. Alt podcastindhold inklusive episoder, grafik og podcastbeskrivelser uploades og leveres direkte af Stephen Townshend eller deres podcastplatformspartner. Hvis du mener, at nogen bruger dit ophavsretligt beskyttede værk uden din tilladelse, kan du følge processen beskrevet her https://da.player.fm/legal.
Learning SRE, one day at a time.
…
continue reading
94 episoder
Marker alle som (u)afspillede ...
Manage series 2917773
Indhold leveret af Stephen Townshend. Alt podcastindhold inklusive episoder, grafik og podcastbeskrivelser uploades og leveres direkte af Stephen Townshend eller deres podcastplatformspartner. Hvis du mener, at nogen bruger dit ophavsretligt beskyttede værk uden din tilladelse, kan du følge processen beskrevet her https://da.player.fm/legal.
Learning SRE, one day at a time.
…
continue reading
94 episoder
모든 에피소드
×
1 Slight Reliability Episode 92 - Observability Maturity with Ádám Tóth 30:09
30:09
Afspil senere
Afspil senere
Lister
Like
Liked30:09
This week Adam and I get philosophical about what constitutes maturity in the field of observability. We tackle questions such as... 💸 Does your org treat observability as a cost centre or a value add? 🔥 Are you using observability reactively to solve problems? Or proactively to build better products and services? 👤 Is your observability connected to your users and business in a meaningful way? 🌐 Is monitoring the social media sentiment of your product part of observability? ...and much more. You can find Adam at: LinkedIn: https://www.linkedin.com/in/adam-toth-innovateq/ InnovaTeQ website: https://innovateq.io/ I mentioned the 'This Is Fine!' podcast about resilience engineering. Find it on Spotify or at https://www.thisisfinepod.com/ You can find Stephen at: LinkedIn: https://www.linkedin.com/in/stephentownshend/ Bluesky: https://bsky.app/profile/slightreliability.bsky.social YouTube: https://www.youtube.com/c/SlightReliability Instagram: https://www.instagram.com/slight_reliability/ TikTok: https://www.tiktok.com/@the_kiwi_sre…

1 Slight Reliability Episode 91 - Head in the Clouds 15:43
15:43
Afspil senere
Afspil senere
Lister
Like
Liked15:43
In this episode I explore the challenges of achieving unified observability when integrating with SaaS products and services. I cover: 🌊 The new wave of mega-complex SaaS ⚗️ Challenges integrating SaaS with our observability pipelines 👩🦯 How the lack of SaaS autonomy limits the effectiveness of OpenTelemetry 💰 Paying twice to ingest, store, and search telemetry 📈 Monitoring and predicting SaaS observability costs ...and much more. Shout out to Mark Chiavaroli (and apologies for mispronouncing your surname multiple times), Damian Sharrock, and Reece Hewitt for bouncing ideas on this topic. The 'Is it observable?' series can be found here: https://isitobservable.io/ ...and you can find Henrik on LinkedIn: https://www.linkedin.com/in/hrexed/ You can find Stephen at: LinkedIn: https://www.linkedin.com/in/stephentownshend/ Bluesky: https://bsky.app/profile/slightreliability.bsky.social YouTube: https://www.youtube.com/c/SlightReliability Instagram: https://www.instagram.com/slight_reliability/ TikTok: https://www.tiktok.com/@the_kiwi_sre…

1 Slight Reliability Episode 90 - Non-Prod Reliability Engineering + 2024 Wrap 18:13
18:13
Afspil senere
Afspil senere
Lister
Like
Liked18:13
This week I check in and give an update on work, life, and my attempts at bringing to life SRE practices in the world of non-production environment management. You can find the official Slight Reliability podcast website at: https://slightreliability.com/ You can find Stephen at: LinkedIn: https://www.linkedin.com/in/stephentownshend/ Twitter: https://twitter.com/the_kiwi_sre YouTube: https://www.youtube.com/c/SlightReliability Instagram: https://www.instagram.com/slight_reliability/ TikTok: https://www.tiktok.com/@the_kiwi_sre This episode was sponsored by SquaredUp. SquaredUp combines all your data with awesome dashboards, analytics, health rollup, and notifications, into a unified observability portal. Using a data mesh architecture, SquaredUp is a beautifully simple way to get instant access to the insights that matter, whenever you need them. If you want to know more head over to https://squaredup.com/ to sign up for your free account.…

1 Slight Reliability Episode 89 - Blameless Post-mortems with Karanveer Anand 26:06
26:06
Afspil senere
Afspil senere
Lister
Like
Liked26:06
This week I'm joined by Karanveer Anand, SRE Technical Program Manager at Google to discuss blameless post-mortems. We cover: 🦅 The recent Crowdstrike outage and their public post-mortem 🚑 When do we do a blameless post-mortem? 😕 How do we do a blameless post-mortem? ✅ How do we make sure action items are followed through? 📰 The power of learning from post-mortems created by other teams and orgs ...and much more. You can find Karanveer on LinkedIn: https://www.linkedin.com/in/karanveer/ You can find Crowdstrike's preliminary post incident report here: https://www.crowdstrike.com/blog/falcon-content-update-preliminary-post-incident-report/ You can find the official Slight Reliability podcast website at: https://slightreliability.com/ You can find Stephen at: LinkedIn: https://www.linkedin.com/in/stephentownshend/ Twitter: https://twitter.com/the_kiwi_sre YouTube: https://www.youtube.com/c/SlightReliability Instagram: https://www.instagram.com/slight_reliability/ TikTok: https://www.tiktok.com/@the_kiwi_sre This episode was sponsored by SquaredUp. SquaredUp combines all your data with awesome dashboards, analytics, health rollup, and notifications, into a unified observability portal. Using a data mesh architecture, SquaredUp is a beautifully simple way to get instant access to the insights that matter, whenever you need them. If you want to know more head over to https://squaredup.com/ to sign up for your free account.…

1 Slight Reliability Episode 88 - OpenTelemetry Revisited with Zach Michel 26:51
26:51
Afspil senere
Afspil senere
Lister
Like
Liked26:51
This week Zach Michel from https://middleware.io/ and I discuss the state of OpenTelemetry and what it means to adopt it. We cover: 🌩️ Achieving observability in a SaaS world 🥫 Context propagation - the magic sauce of OTEL 🚪 The telemetry gateway concept and leveraging the OTEL collector 🪵 The state of OpenTelemetry logging 🫂 Making use of the OpenTelemetry community ...and much more. You can find Zach on LinkedIn: https://www.linkedin.com/in/zamichel/ You can find the official Slight Reliability podcast website at: https://slightreliability.com/ For a list of ways to interact with the OpenTelemetry community go to: https://opentelemetry.io/community/ You can find Stephen at: LinkedIn: https://www.linkedin.com/in/stephentownshend/ Twitter: https://twitter.com/the_kiwi_sre YouTube: https://www.youtube.com/c/SlightReliability Instagram: https://www.instagram.com/slight_reliability/ TikTok: https://www.tiktok.com/@the_kiwi_sre This episode was sponsored by SquaredUp. SquaredUp combines all your data with awesome dashboards, analytics, health rollup, and notifications, into a unified observability portal. Using a data mesh architecture, SquaredUp is a beautifully simple way to get instant access to the insights that matter, whenever you need them. If you want to know more head over to https://squaredup.com/ to sign up for your free account.…

1 Slight Reliability Episode 87 - Measuring the value of SRE with Artem Yakimenko 35:33
35:33
Afspil senere
Afspil senere
Lister
Like
Liked35:33
In Episode 80 Niall Murphy talked about the need for SREs to be better at articulating the value of our work. In this episode I'm joined by ex-Googler and Engineering Director (SRE) at Culture Amp Artem Yakimenko about how we might achieve this. We discuss both quantifiable and qualitative approaches including leveraging the untapped data in support tickets, customer sentiment and rankings, the relationship between finance and performance, the link between user design and performance, and so much more. Books mentioned in the episode: 100 Things Every Designer Needs to Know About People By Susan Weinschenk https://www.amazon.com.au/Things-Every-Designer-Needs-People/dp/0321767535 You can find Artem on LinkedIn: https://www.linkedin.com/in/temikus/ You can find the official Slight Reliability podcast website at: https://slightreliability.com/ You can find Stephen at: LinkedIn: https://www.linkedin.com/in/stephentownshend/ Twitter: https://twitter.com/the_kiwi_sre YouTube: https://www.youtube.com/c/SlightReliability Instagram: https://www.instagram.com/slight_reliability/ TikTok: https://www.tiktok.com/@the_kiwi_sre This episode was sponsored by SquaredUp. SquaredUp combines all your data with awesome dashboards, analytics, health rollup, and notifications, into a unified observability portal. Using a data mesh architecture, SquaredUp is a beautifully simple way to get instant access to the insights that matter, whenever you need them. If you want to know more head over to https://squaredup.com/ to sign up for your free account.…

1 Slight Reliability Episode 86 - Evolving SLOs with Dom Finn 25:57
25:57
Afspil senere
Afspil senere
Lister
Like
Liked25:57
In the world of SRE we constantly talk about defining SLOs, but what about evolving them over time? This week I chat with SRE Tech Lead Dom Finn about just that. We cover the relationship between reliability and user analytics, latency classes as a way to speak SLOs with business stakeholders, the role of NFRs and how the thresholds differ from SLOs, and much more. Books mentioned in the episode: The Beginning of Infinity: Explanations That Transform the World By David Deutch https://www.amazon.com.au/Beginning-Infinity-Explanations-Transform-World/dp/0143121359 Turn The Ship Around! By David Marquette https://davidmarquet.com/turn-the-ship-around-book/ You can find Dom on LinkedIn: https://www.linkedin.com/in/dom-finn/ You can find the official Slight Reliability podcast website at: https://slightreliability.com/ You can find Stephen at: LinkedIn: https://www.linkedin.com/in/stephentownshend/ Twitter: https://twitter.com/the_kiwi_sre YouTube: https://www.youtube.com/c/SlightReliability Instagram: https://www.instagram.com/slight_reliability/ TikTok: https://www.tiktok.com/@the_kiwi_sre This episode was sponsored by SquaredUp. SquaredUp combines all your data with awesome dashboards, analytics, health rollup, and notifications, into a unified observability portal. Using a data mesh architecture, SquaredUp is a beautifully simple way to get instant access to the insights that matter, whenever you need them. If you want to know more head over to https://squaredup.com/ to sign up for your free account.…

1 Slight Reliability Episode 85 - Feeling SaaSsy 11:08
11:08
Afspil senere
Afspil senere
Lister
Like
Liked11:08
This week I talk about the impact of SaaS-first technology strategies on the work of an SRE. I pose questions about observability, ownership, on-call, and how much control we have over reliability. You can find the Bleeding Tech blog on Medium: https://medium.com/@stownshend You can find Stephen at: LinkedIn: https://www.linkedin.com/in/stephentownshend/ Twitter: https://twitter.com/the_kiwi_sre YouTube: https://www.youtube.com/c/SlightReliability Instagram: https://www.instagram.com/slight_reliability/ TikTok: https://www.tiktok.com/@the_kiwi_sre…

1 Slight Reliability Episode 84 - Clinical Troubleshooting with Dan Slimmon 27:40
27:40
Afspil senere
Afspil senere
Lister
Like
Liked27:40
This week I chat with Dan Slimmon about applying the approach doctors use to treat patient symptoms during incident response. You can find Dan's blog at https://blog.danslimmon.com/ or connect with him on LinkedIn here: https://www.linkedin.com/in/danslimmon/ You can find the official Slight Reliability podcast website at: https://slightreliability.com/ You can find Stephen at: LinkedIn: https://www.linkedin.com/in/stephentownshend/ Twitter: https://twitter.com/the_kiwi_sre YouTube: https://www.youtube.com/c/SlightReliability Instagram: https://www.instagram.com/slight_reliability/ TikTok: https://www.tiktok.com/@the_kiwi_sre This episode was sponsored by SquaredUp. SquaredUp combines all your data with awesome dashboards, analytics, health rollup, and notifications, into a unified observability portal. Using a data mesh architecture, SquaredUp is a beautifully simple way to get instant access to the insights that matter, whenever you need them. If you want to know more head over to https://squaredup.com/ to sign up for your free account.…

1 Slight Reliability Episode 83 - An Unfulfilled Promise with Itiel Shwartz 30:32
30:32
Afspil senere
Afspil senere
Lister
Like
Liked30:32
This week I hear about all things Kubernetes from Komodor CTO and co-founder Itiel Shwartz. We chat about the promise that was made when Kubernetes first entered the industry, the challenge of getting developers engaged and capable of working in Kubernetes, my hate/hate relationship with Helm but its important contribution to the Kubernetes project, Kubernetes observability, and so much more. You can find the Kubernetes for Humans podcast here: https://komodor.com/blog/the-kubernetes-for-humans-podcast/ Or find out more about Komodor here: https://komodor.com/ Or find Itiel on LinkedIn: https://www.linkedin.com/in/itiel-shwartz-18542853/ You can find the official Slight Reliability podcast website at: https://slightreliability.com/ You can find Stephen at: LinkedIn: https://www.linkedin.com/in/stephentownshend/ Twitter: https://twitter.com/the_kiwi_sre YouTube: https://www.youtube.com/c/SlightReliability Instagram: https://www.instagram.com/slight_reliability/ TikTok: https://www.tiktok.com/@the_kiwi_sre This episode was sponsored by SquaredUp. SquaredUp combines all your data with awesome dashboards, analytics, health rollup, and notifications, into a unified observability portal. Using a data mesh architecture, SquaredUp is a beautifully simple way to get instant access to the insights that matter, whenever you need them. If you want to know more head over to https://squaredup.com/ to sign up for your free account.…

1 Slight Reliability Episode 82 - CI/CD with Amin Astaneh 25:47
25:47
Afspil senere
Afspil senere
Lister
Like
Liked25:47
This week I sit down and have a discussion with Amin Astaneh (from Certo Modo) about CI/CD. We cover the power of the standard change as a way to navigate ITIL while still implementing DevOps practices, what to monitor to make your CI/CD observable, single piece flow, testing in production, and so much more. You can find Amin on his company website https://certomodo.io , LinkedIn: https://www.linkedin.com/in/aminastaneh/ and Twitter: https://twitter.com/aastaneh You can find the official Slight Reliability podcast website at: https://slightreliability.com/ You can find Stephen at: LinkedIn: https://www.linkedin.com/in/stephentownshend/ Twitter: https://twitter.com/the_kiwi_sre YouTube: https://www.youtube.com/c/SlightReliability Instagram: https://www.instagram.com/slight_reliability/ TikTok: https://www.tiktok.com/@the_kiwi_sre This episode was sponsored by SquaredUp. SquaredUp combines all your data with awesome dashboards, analytics, health rollup, and notifications, into a unified observability portal. Using a data mesh architecture, SquaredUp is a beautifully simple way to get instant access to the insights that matter, whenever you need them. If you want to know more head over to https://squaredup.com/ to sign up for your free account.…

1 Slight Reliability Episode 81 - Incident Management in Non-Prod Environments 10:09
10:09
Afspil senere
Afspil senere
Lister
Like
Liked10:09
"Environment issues are just incidents that happened to occur in a non-production environment"... so why do we treat them so differently? In this first episode of the 2024 season I reflect on how we handle incidents in non-prod environments. (Note: Had a few issues with noise suppression in OBS Studio cutting off the start of some words, will sort it for the next episode) You can find Stephen at: LinkedIn: https://www.linkedin.com/in/stephentownshend/ Twitter: https://twitter.com/the_kiwi_sre YouTube: https://www.youtube.com/c/SlightReliability Instagram: https://www.instagram.com/slight_reliability/ TikTok: https://www.tiktok.com/@the_kiwi_sre…

1 Slight Reliability Episode 80 - What's Been Bugging Niall Murphy 36:45
36:45
Afspil senere
Afspil senere
Lister
Like
Liked36:45
This week I speak with co-author of the original SRE book + the SRE workbook, and renowned speaker Niall Murphy. We chat about the state of SRE in the current macro-economic climate and how we're not yet doing a very good job at articulating the value of SRE to leaders, the relationship that velocity and reliability have, the value of new features versus reliability improvements, and *much* more. You can find Niall at: LinkedIn: https://www.linkedin.com/in/niallm/ X: https://twitter.com/niallm Website: https://relyabilit.ie/ (and his company Stanza: https://www.stanza.systems/ ) You can find the official Slight Reliability podcast website at: https://slightreliability.com/ You can find Stephen at: LinkedIn: https://www.linkedin.com/in/stephentownshend/ X: https://twitter.com/the_kiwi_sre Instagram: https://www.instagram.com/slight_reliability/…

1 Slight Reliability Episode 76 - Sampling Distributed Traces with Paige Cruz 45:27
45:27
Afspil senere
Afspil senere
Lister
Like
Liked45:27
Paige Cruz (from Chronosphere) is back. This week we discuss sampling. What is sampling? Why do it? What kinds of sampling are there? You can check out Chronosphere's cloud native observability platform here: https://chronosphere.io/ You can find Paige on: LinkedIn: https://www.linkedin.com/in/paigerduty/ X: https://twitter.com/paigerduty You can find the official Slight Reliability podcast website at: https://slightreliability.com/ You can find Stephen at: LinkedIn: https://www.linkedin.com/in/stephentownshend/ X: https://twitter.com/the_kiwi_sre Instagram: https://www.instagram.com/slight_reliability/…

1 Slight Reliability Episode 79 - Incident Story Time with Valeska Victoria 37:51
37:51
Afspil senere
Afspil senere
Lister
Like
Liked37:51
This week Valeska Victoria returns to share some of her experiences working as an SRE at eBay. We look at the cascading effect of production issues in complex integrated environments (how there's often no single root cause), developer literacy of how infrastructure works, the importance of ownership and accountability of reliability, and much more. You can find Valeska on: LinkedIn: https://www.linkedin.com/in/valeska-victoria/ You can find the official Slight Reliability podcast website at: https://slightreliability.com/ You can find Stephen at: LinkedIn: https://www.linkedin.com/in/stephentownshend/ X: https://twitter.com/the_kiwi_sre Instagram: https://www.instagram.com/slight_reliability/…
Velkommen til Player FM!
Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.