84 subscribers
Gå offline med appen Player FM !
Katharine Jarmul on using Python for data analysis
Manage episode 261558966 series 1652312
The O’Reilly Programming Podcast: Wrangling data with Python’s libraries and packages.
In this episode of the O’Reilly Programming Podcast, I talk with Katharine Jarmul, a Python developer and data analyst whose company, Kjamistan, provides consulting and training on topics surrounding machine learning, natural language processing, and data testing. Jarmul is the co-author (along with Jacqueline Kazil) of the O’Reilly book Data Wrangling with Python, and she has presented the live online training course Practical Data Cleaning with Python.
Discussion points:
- How data wrangling enables you to take real-world data and “clean it, organize it, validate it, and put it in some format you can actually work with,” says Jarmul.
- Why Python has become a preferred language for use in data science: Jarmul cites the accessibility of the language and the emergence of packages such as NumPy, pandas, SciPy, and scikit-learn.
- Jarmul calls pandas “Excel on steroids” and says, “it allows you to manipulate tabular data, and transform it quite easily. For anyone using structured, tabular data, you can’t go wrong with doing some part of your analysis in pandas.”
- She cites gensim and spaCy as her favorite NLP Python libraries, praising them for “the ability to just install a library and have it do quite a lot of deep learning or machine learning tasks for you.”
Other links:
- Check out the video Building Data Pipelines with Python, presented by Jarmul.
- Check out the video Data Wrangling and Analysis with Python, presented by Jarmul.
- Jarmul is one of the founders of the group PyLadies, which focuses on helping more women become active participants and leaders in the Python open source community.
40 episoder
Manage episode 261558966 series 1652312
The O’Reilly Programming Podcast: Wrangling data with Python’s libraries and packages.
In this episode of the O’Reilly Programming Podcast, I talk with Katharine Jarmul, a Python developer and data analyst whose company, Kjamistan, provides consulting and training on topics surrounding machine learning, natural language processing, and data testing. Jarmul is the co-author (along with Jacqueline Kazil) of the O’Reilly book Data Wrangling with Python, and she has presented the live online training course Practical Data Cleaning with Python.
Discussion points:
- How data wrangling enables you to take real-world data and “clean it, organize it, validate it, and put it in some format you can actually work with,” says Jarmul.
- Why Python has become a preferred language for use in data science: Jarmul cites the accessibility of the language and the emergence of packages such as NumPy, pandas, SciPy, and scikit-learn.
- Jarmul calls pandas “Excel on steroids” and says, “it allows you to manipulate tabular data, and transform it quite easily. For anyone using structured, tabular data, you can’t go wrong with doing some part of your analysis in pandas.”
- She cites gensim and spaCy as her favorite NLP Python libraries, praising them for “the ability to just install a library and have it do quite a lot of deep learning or machine learning tasks for you.”
Other links:
- Check out the video Building Data Pipelines with Python, presented by Jarmul.
- Check out the video Data Wrangling and Analysis with Python, presented by Jarmul.
- Jarmul is one of the founders of the group PyLadies, which focuses on helping more women become active participants and leaders in the Python open source community.
40 episoder
Alle episoder
×
1 Kyle Simpson and Tammy Everts on the challenges of the modern web 49:02

1 Rebecca Parsons on evolutionary architecture 25:42

1 Richard Warburton and Raoul-Gabriel Urma on Java 8 and Reactive Programming 36:36

1 Paul Bakker and Sander Mak on Java 9 modularity 29:31

1 Luciano Ramalho on Python’s features and libraries 20:40

1 Sam Newman on building microservices 29:23

1 Wendy Wise on developing for virtual reality and augmented reality 21:07

1 Katharine Jarmul on using Python for data analysis 26:17

1 Nathaniel Schutta on succeeding as a software architect 29:52

1 Matt Stine on cloud-native architecture 42:45

1 Michael Nygard on architecture without an end state 28:31

1 Jim Blandy and Jason Orendorff on Rust 29:24

1 Ken Kousen on Java, Spring, and Groovy 26:41

1 Adam Scott on ethical web development 19:49

1 Mike Roberts on serverless architectures 34:05

1 Eric Freeman and Elisabeth Robson on design patterns 33:51

1 Aaron Maxwell on the power of Python 33:52

1 Sam Newman on moving from monolith systems to microservices 29:04



1 Paris Buttfield-Addison on what’s new in Swift programming 21:38

1 Neal Ford on evolutionary architecture 26:03
Velkommen til Player FM!
Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.