Hacker News new | past | comments | ask | show | jobs | submit login

Seeing as the tidyverse is pretty much your invention, its well within your purvue to define what is in and what is out. I do think, respectfully, that both the op and the general sentiment article have the right of it.

Data.table and data.table-esque notation represent such an improvement of tibble/dplyr, that within my company, we're making a concerted effort to purge all tidyverse packages from general use (less ggplot). When new developers come on, if they are coming from tidyverse, their first task will be something involving pipes and data.table. Tidyverse was fine in school. It doesn't pass muster in production, at least not in our work.

Data.table syntax is simpler, easier to read, easier to teach, and orders of magnitude faster. It plays nicer with other packages than the tidyverse (if it fits into a DF, it almost always fits into a DT, and i've never met a tibble that I didn't wish was a data.table), and since almost all of our datasets are 10's to 1000's of millions of lines long, the decision was really made for us.




"Tidyverse was fine in school. It doesn't pass muster in production...."

This is a bit disrespectful.

"Data.table syntax is simpler, easier to read, easier to teach..."

This is rather arbitrary, and I don't think it's the majority view of the community, whatever the advantages of data.table.

"...orders of magnitude faster"

This is an exaggeration in most real cases, even according to the benchmarks pointed to by data.table.[1]

[1] https://h2oai.github.io/db-benchmark/


If you find data.table more useful, you should by all means use it.

My greatest regret about coining the word tidyverse is that for some reason people seem to think it’s a monolith. It’s not; you’re totally free to pick and choose whatever parts of it you find useful.

It doesn’t hurt my feelings if packages that I have help write aren’t the perfect fit for your problems. Use whatever makes you happy :)


Sure, with your amount of data you need data.table. But that is your specific use case. That has nothing to do with dplyr not being production ready, just that it is not the right tool for you. Separate things but somehow programmers love to consider them the same.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: