The cool thing about dask is that you can do things like renaming columns without loading all the data into memory.
The above are just some samples for using dask’s dataframe construct.
Remember, we built a new dataframe using pandas’ filters without loading the entire original data set into memory.
In a recent post titled Working with Large CSV files in Python, I shared an approach I use when I have very large CSV files (and other file types) that are too large to load into memory.
While the approach I previously highlighted works well, it can be tedious to first load data into sqllite (or any other database) and then access that database to analyze data. While looking around the web to learn about some parallel processing capabilities, I ran across a python module named Dask, which describes itself as: When I saw that, I was intrigued.
There’s a lot that can be done with that statement and I’ve got plans to introduce Dask into my various tool sets for data analytics.