I'm not including parsing time, both pandas and polars versions started from an ...

mmaunder · 2024-01-09T02:59:32.000000Z

Is polars not parallelizing some ops on the GPU?

theLiminator · 2024-01-09T03:47:23.000000Z

It has zero GPU support for now.

lmeyerov · 2024-01-09T06:14:07.000000Z

Important point.

Nowadays, we write a pure pandas version, and when the data needs to be 100X bigger and faster, change almost nothing and have it run on the GPU via cudf, a GPU runtime that fully follows the pandas API. Most recently, we port GFQL (Cypher graph queries on dataframes) to GPU execution over the holiday weekend and it already beats most Cypher implementations. Think billions of edges traversed per second on a cheap 5 year old GPU.

We're planning the bigger than memory & multi node versions next, for both CPU + GPU, and while cudf leans towards dask_cudf, plans are still TBD. Polars, Ray, and Dask all have sweet spots here.