Sometimes our Python Pandas code feels slow and sometimes we can't fit enough data into RAM. Based on recent updates to the 2nd edition of Ian's High-Performance Python book and his public training classes come and learn how to get more into RAM (reducing your need for other technologies like Spark), how to quickly compile for significant speedups, how to run in parallel and which libraries you're missing that unlock additional performance benefits. You'll leave with new techniques to make your DataFrames smaller and many ideas for processing your data faster.
This talk is inspired by Ian's work updating his O'Reilly book High-Performance Python to the 2nd edition for 2020. With over 10 years of evolution, the Pandas DataFrame library has gained a huge amount of functionality and it is used by millions of Pythonistas - but the most obvious way to solve a task isn't always the fastest or most RAM efficient. This talk will help any Pandas user (beginner or beyond) process more data faster, making them more effective at their jobs
Overview and Author Bio
Principal Data Scientist, Co-founder | PyData London