Hacker News new | past | comments | ask | show | jobs | submit login

How does its parallelization feature work with lazy generators? Or does its seq function not return generators?



Seq doesn't return a generator. At its core there is a concept of Lineage taken from Apache Spark. Essentially, when you do something like seq(data).map(func) it builds on a list of operations to execute and holds a copy of the base data. When asked for a value only then will it compute the result.

So for parallelization operations that are embarassingly parallel (map/filter etc) are paralelized with multiprocessing. If the data is heavy and serialization is expensive it might be slow, but for operations where the bulk of the work is done in parallel it can help a lot.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: