Wow, the idea that the Tidyverse is somehow burdened with an overabundance of co...

jrumbut · on July 5, 2019

I think this is what the author is getting at, that correct tidyverse usage requires greater experience and knowledge and creates difficult to debug and optimize code when used incorrectly (and is easy to use incorrectly).

I tend to agree. I've found tidyverse code has a write-only quality to it. Since I'm going to see more of it I plan to dive into the inner workings of at least dyplr and purr.

That said it is hard to deny what Hadley Wickham has done for the R community and I can't write off the idea that he sees a bigger picture I'm missing.

hardboiled · on July 5, 2019

Tidyverse code is not write-only; it is designed to be mostly read-only where the level of abstraction is at the level of the domain, which in this case is data frames and data pipelines akin to the same constructs in relational algebra/SQL or data processing (map/reduce).

Correct usage requires learning the right abstractions, but fortunately these abstractions are shared across language communities and frameworks.

If you are doing any data processing work and you do not know about foundational functional concepts ie map/reduce then I would argue the code you write is less readable, overly focused on the idiosyncracies of how to do something rather than how entities and data are related to each other - their logic and structure.

The main optimization is to be correct and expressive of one's intention and purpose. If you need performance use Julia.

thom · on July 5, 2019

It’s very hard to write expressive, readable code that munges some horrible tabular format into another arbitrary tabular format, to be fair. That said it’s true that most R code doesn’t seem like map/reduce, or some other pipeline of transformations. It’s usually more like someone cut and pasted a long, hard REPL session into a notebook and has no intention of ever scrolling back up.

jrumbut · on July 5, 2019

I understand that's the goal, but it hasn't been what I've seen from user's code. I think the issue, which the article describes, is that the tidyverse has too large a surface area for something aimed at very smart people who are not software engineers and don't have time to become one.

So they end up with a subset of functionality they can get stuff done with, but then they need to collaborate with someone who uses a different subset. That can be messy.

I think in an environment where everyone learns R from the same course/book/MOOC/whatever (or have done lots of different sorts of programming) and the organization can impose a style guide the tidyverse approach would be great, but when you have people coming from all sorts of places and backgrounds I don't think it's a good fit.

thom · on July 5, 2019

This is more a symptom of the fact that nobody has told R programmers that they are software engineers and the thing that they’re creating is software first, and not just the plot that comes at the end. No library can really help that.

jrumbut · on July 5, 2019

You're absolutely right with this diagnosis.