Hacker News new | past | comments | ask | show | jobs | submit login

Just install Apache Spark they said. It will be fun they said.

If you have the money, having a managed Spark instance with a bunch of added features can be a big win for some. There is a lot that goes into Spark maintenance.




It also apparently includes some performance optimizations because they control both the hardware and software. And Delta Lake is pretty cool, and hosted MLFlow integration.


Databricks built a proprietary vectorized accelerator for Spark they call Photon. It's not just that they've tuned OSS Spark especially well.


Back when I was a customer (before Photon was released, also during) they had a very good tuning, in the order of around 2x faster for the workloads we had at the time (very large graph computation and a “simple” filtering)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: