32 subscribers
Gå offline med appen Player FM !
Podcasts der er værd at lytte til
SPONSORERET


Next-Gen Data Modeling, Integrity, and Governance with YODA
Manage episode 357217977 series 2510642
In this episode, Kris interviews Doron Porat, Director of Infrastructure at Yotpo, and Liran Yogev, Director of Engineering at ZipRecruiter (formerly at Yotpo), about their experiences and strategies in dealing with data modeling at scale.
Yotpo has a vast and active data lake, comprising thousands of datasets that are processed by different engines, primarily Apache Spark™. They wanted to provide users with self-service tools for generating and utilizing data with maximum flexibility, but encountered difficulties, including poor standardization, low data reusability, limited data lineage, and unreliable datasets.
The team realized that Yotpo's modeling layer, which defines the structure and relationships of the data, needed to be separated from the execution layer, which defines and processes operations on the data.
This separation would give programmers better visibility into data pipelines across all execution engines, storage methods, and formats, as well as more governance control for exploration and automation.
To address these issues, they developed YODA, an internal tool that combines excellent developer experience, DBT, Databricks, Airflow, Looker and more, with a strong CI/CD and orchestration layer.
Yotpo is a B2B, SaaS e-commerce marketing platform that provides businesses with the necessary tools for accurate customer analytics, remarketing, support messaging, and more.
ZipRecruiter is a job site that utilizes AI matching to help businesses find the right candidates for their open roles.
EPISODE LINKS
- Current 2022 Talk: Next Gen Data Modeling in the Open Data Platform
- Data Mesh 101
- Data Mesh Architecture: A Modern Distributed Data Model
- Watch the video version of this podcast
- Kris Jenkins’ Twitter
- Streaming Audio Playlist
- Join the Confluent Community
- Learn more with Kafka tutorials, resources, and guides at Confluent Developer
- Live demo: Intro to Event-Driven Microservices with Confluent
- Use PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)
Kapitler
1. Intro (00:00:00)
2. What is Yotpo? (00:02:29)
3. Building an ETL framework based on Spark (00:05:25)
4. What is Apache Spark? (00:10:18)
5. Decoupling the data model (00:15:40)
6. Using data mesh principles (00:18:51)
7. How to address different data personas (00:22:24)
8. What is the "shift left" movement? (00:26:35)
9. How can organizations change the way they treat their data? (00:28:47)
10. Use-cases for tooling and documenting data sets (00:31:01)
11. Schema vs. schema-less (00:32:07)
12. What is YODA? (00:40:07)
13. Takeaways from the conversation with Doron and Liran (00:48:35)
14. It's a wrap! (00:52:45)
265 episoder
Manage episode 357217977 series 2510642
In this episode, Kris interviews Doron Porat, Director of Infrastructure at Yotpo, and Liran Yogev, Director of Engineering at ZipRecruiter (formerly at Yotpo), about their experiences and strategies in dealing with data modeling at scale.
Yotpo has a vast and active data lake, comprising thousands of datasets that are processed by different engines, primarily Apache Spark™. They wanted to provide users with self-service tools for generating and utilizing data with maximum flexibility, but encountered difficulties, including poor standardization, low data reusability, limited data lineage, and unreliable datasets.
The team realized that Yotpo's modeling layer, which defines the structure and relationships of the data, needed to be separated from the execution layer, which defines and processes operations on the data.
This separation would give programmers better visibility into data pipelines across all execution engines, storage methods, and formats, as well as more governance control for exploration and automation.
To address these issues, they developed YODA, an internal tool that combines excellent developer experience, DBT, Databricks, Airflow, Looker and more, with a strong CI/CD and orchestration layer.
Yotpo is a B2B, SaaS e-commerce marketing platform that provides businesses with the necessary tools for accurate customer analytics, remarketing, support messaging, and more.
ZipRecruiter is a job site that utilizes AI matching to help businesses find the right candidates for their open roles.
EPISODE LINKS
- Current 2022 Talk: Next Gen Data Modeling in the Open Data Platform
- Data Mesh 101
- Data Mesh Architecture: A Modern Distributed Data Model
- Watch the video version of this podcast
- Kris Jenkins’ Twitter
- Streaming Audio Playlist
- Join the Confluent Community
- Learn more with Kafka tutorials, resources, and guides at Confluent Developer
- Live demo: Intro to Event-Driven Microservices with Confluent
- Use PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)
Kapitler
1. Intro (00:00:00)
2. What is Yotpo? (00:02:29)
3. Building an ETL framework based on Spark (00:05:25)
4. What is Apache Spark? (00:10:18)
5. Decoupling the data model (00:15:40)
6. Using data mesh principles (00:18:51)
7. How to address different data personas (00:22:24)
8. What is the "shift left" movement? (00:26:35)
9. How can organizations change the way they treat their data? (00:28:47)
10. Use-cases for tooling and documenting data sets (00:31:01)
11. Schema vs. schema-less (00:32:07)
12. What is YODA? (00:40:07)
13. Takeaways from the conversation with Doron and Liran (00:48:35)
14. It's a wrap! (00:52:45)
265 episoder
همه قسمت ها
×
1 Apache Kafka 3.5 - Kafka Core, Connect, Streams, & Client Updates 11:25

1 A Special Announcement from Streaming Audio 1:18

1 How to use Data Contracts for Long-Term Schema Management 57:28

1 Next-Gen Data Modeling, Integrity, and Governance with YODA 55:55

1 Migrate Your Kafka Cluster with Minimal Downtime 1:01:30

1 Real-Time Data Transformation and Analytics with dbt Labs 43:41

1 What is the Future of Streaming Data? 41:29

1 What can Apache Kafka Developers learn from Online Gaming? 55:32

1 Apache Kafka 3.4 - New Features & Improvements 5:13

1 How to use OpenTelemetry to Trace and Monitor Apache Kafka Systems 50:01

1 What is Data Democratization and Why is it Important? 47:27

1 Git for Data: Managing Data like Code with lakeFS 30:42

1 Using Kafka-Leader-Election to Improve Scalability and Performance 51:06

1 Real-Time Machine Learning and Smarter AI with Data Streaming 38:56

1 The Present and Future of Stream Processing 31:19

1 Top 6 Worst Apache Kafka JIRA Bugs 1:10:58

1 Learn How Stream-Processing Works The Simplest Way Possible 31:29

1 Building and Designing Events and Event Streams with Apache Kafka 53:06

1 Rethinking Apache Kafka Security and Account Management 41:23

1 Real-time Threat Detection Using Machine Learning and Apache Kafka 29:18

1 Improving Apache Kafka Scalability and Elasticity with Tiered Storage 29:32

1 Decoupling with Event-Driven Architecture 38:38

1 If Streaming Is the Answer, Why Are We Still Doing Batch? 43:58

1 Security for Real-Time Data Stream Processing with Confluent Cloud 48:33


1 Build a Real Time AI Data Platform with Apache Kafka 37:18

1 Optimizing Apache JVMs for Apache Kafka 1:11:42

1 Apache Kafka 3.3 - KRaft, Kafka Core, Streams, & Connect Updates 6:42

1 Application Data Streaming with Apache Kafka and Swim 39:10

1 International Podcast Day - Apache Kafka Edition | Streaming Audio Special 1:02:22

1 How to Build a Reactive Event Streaming App - Coding in Motion 1:26

1 Real-Time Stream Processing, Monitoring, and Analytics With Apache Kafka 34:07

1 Reddit Sentiment Analysis with Apache Kafka-Based Microservices 35:23

1 Capacity Planning Your Apache Kafka Cluster 1:01:54

1 Streaming Real-Time Sporting Analytics for World Table Tennis 34:29

1 Real-Time Event Distribution with Data Mesh 48:59

1 Apache Kafka Security Best Practices 39:10

1 What Could Go Wrong with a Kafka JDBC Connector? 41:10

1 Apache Kafka Networking with Confluent Cloud 37:22

1 Event-Driven Systems and Agile Operations 53:22

1 Streaming Analytics and Real-Time Signal Processing with Apache Kafka 1:06:33

1 Blockchain Data Integration with Apache Kafka 50:59

1 Automating Multi-Cloud Apache Kafka Cluster Rollouts 48:29

1 Common Apache Kafka Mistakes to Avoid 1:09:43

1 Tips For Writing Abstracts and Speaking at Conferences 48:56


1 Data Mesh Architecture: A Modern Distributed Data Model 48:42

1 Flink vs Kafka Streams/ksqlDB: Comparing Stream Processing Tools 55:55

1 Practical Data Pipeline: Build a Plant Monitoring System with ksqlDB 33:56

1 Apache Kafka 3.2 - New Features & Improvements 6:54

1 Scaling Apache Kafka Clusters on Confluent Cloud ft. Ajit Yagaty and Aashish Kohli 49:07

1 Streaming Analytics on 50M Events Per Day with Confluent Cloud at Picnic 34:41

1 Build a Data Streaming App with Apache Kafka and JS - Coding in Motion 2:03

1 Optimizing Apache Kafka's Internals with Its Co-Creator Jun Rao 48:54

1 Using Event-Driven Design with Apache Kafka Streaming Applications ft. Bobby Calderwood 51:09

1 Monitoring Extreme-Scale Apache Kafka Using eBPF at New Relic 38:25

1 Confluent Platform 7.1: New Features + Updates 10:01

1 Scaling an Apache Kafka Based Architecture at Therapie Clinic 1:10:56

1 Bridging Frontend and Backend with GraphQL and Apache Kafka ft. Gerard Klijs 23:13

1 Building Real-Time Data Governance at Scale with Apache Kafka ft. Tushar Thole 42:58

1 Handling 2 Million Apache Kafka Messages Per Second at Honeycomb 41:36


1 Serverless Stream Processing with Apache Kafka ft. Bill Bejeck 42:23

1 The Evolution of Apache Kafka: From In-House Infrastructure to Managed Cloud Service ft. Jay Kreps 46:32

1 What’s Next for the Streaming Audio Podcast ft. Kris Jenkins 2:39

1 On to the Next Chapter ft. Tim Berglund 6:45

1 Intro to Event Sourcing with Apache Kafka ft. Anna McDonald 30:14

1 Expanding Apache Kafka Multi-Tenancy for Cloud-Native Systems ft. Anna Povzner and Anastasia Vela 31:01

1 Apache Kafka 3.1 - Overview of Latest Features, Updates, and KIPs 4:43

1 Optimizing Cloud-Native Apache Kafka Performance ft. Alok Nikhil and Adithya Chandra 30:40

1 From Batch to Real-Time: Tips for Streaming Data Pipelines with Apache Kafka ft. Danica Fine 29:50

1 Real-Time Change Data Capture and Data Integration with Apache Kafka and Qlik 34:51

1 Modernizing Banking Architectures with Apache Kafka ft. Fotios Filacouris 34:59

1 Running Hundreds of Stream Processing Applications with Apache Kafka at Wise 31:08

1 Lessons Learned From Designing Serverless Apache Kafka ft. Prachetaa Raghavan 28:20

1 Using Apache Kafka as Cloud-Native Data System ft. Gwen Shapira 33:57

1 ksqlDB Fundamentals: How Apache Kafka, SQL, and ksqlDB Work Together ft. Simon Aubury 30:42

1 Explaining Stream Processing and Apache Kafka ft. Eugene Meidinger 29:28

1 Handling Message Errors and Dead Letter Queues in Apache Kafka ft. Jason Bell 37:41

1 Confluent Platform 7.0: New Features + Updates 12:16

1 Real-Time Stream Processing with Kafka Streams ft. Bill Bejeck 35:32

1 Automating Infrastructure as Code with Apache Kafka and Confluent ft. Rosemary Wang 30:08

1 Getting Started with Spring for Apache Kafka ft. Viktor Gamov 32:44

1 Powering Event-Driven Architectures on Microsoft Azure with Confluent 38:42

1 Automating DevOps for Apache Kafka and Confluent ft. Pere Urbón-Bayes 26:08

1 Intro to Kafka Connect: Core Components and Architecture ft. Robin Moffatt 31:18

1 Designing a Cluster Rollout Management System for Apache Kafka ft. Twesha Modi 30:08

1 Apache Kafka 3.0 - Improving KRaft and an Overview of New Features 15:17

1 How to Build a Strong Developer Community with Global Engagement ft. Robin Moffatt and Ale Murray 35:18
Velkommen til Player FM!
Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.