I really am curious why anything "R" and "Tutorial" gets massively upvoted to the Top 3 of HN like clockwork nowadays. I might have to restart my R tutorial screencasts since there appears to be a demand. :P
Idiomatic R has changed enormously with the growth of Magrittr the ℅>℅ operator, dplyr and now broom. Code by expert users is unrecognisable from that of a few years ago.
R still looks pretty much the same to me when I look at the code on kaggle, stack exchange, etc. Can you share where you have seen this expert R use and/or what use cases?
Most modern R usage I've seen, even on Kaggle, has used margittr/dplyr since it's orders of magnitudes faster/easier than the base functions. (i.e. I almost quit R in favor for Python without those two)
Thanks for the example code, it actually has me wondering whether using the pipes has any impact on performance. They are much easier to read than the nested function calls that would be used instead.
Indeed. 10-12 years ago there were probably no more than 50ish people in the world who knew R better than me. By 5 years ago the tide had heavily turned but I was still using it at least a few times a week.
If I had to use R as a daily driver now it'd be like learning the language all over again. I could write it using the old ways, and u do that on the few occasions I have to, but it's no longer idiomatic at all
Indeed - I think it's unclear how you should begin to teach R now. There is a lot of legacy stuff that you may need to recognise if you want to understand the help documentation. But if you wanted to seriously work in R it maybe better to skip straight to ggplot2, dplyr and broom without touching base R, the apply functions, and list in list in list hell.
That's pretty much what I did. I dip back down into the 'base' stuff when I need to, but in general I stick in the hadleyverse of tools. It's been great.
> However, there is a difference between the two operators. = is only allowed at the top level i.e. if the complete expression is written at the prompt. so = is not allowed in control structures. Here's an example:
> if (aa=0) {print ("test")}
Error: unexpected '=' in "if (aa="
> aa
Error: object 'aa' not found
> if (bb<-0) {print ("test")}
> bb
[1] 0
What in the LOLWTF...This is a bad example because even as an experienced programmer, I have no idea what is supposed to happen, or why this pattern would even be used -- and even then, I was surprised with the result. This is an overly complicated explanation even if it is correct. I prefer Hadley Wickham's style guide:
> All entities in R are called objects. They can be arrays, numbers, strings, functions.
This may technically be the case, but any R tutorial that does not open up with what makes R different from other mainstream languages is doing the reader a major disservice. This Listing Objects chapters shows patterns that use R in ways that I haven't seen used in other R examples (beginner and advanced). Even if it's correct, what's the point in showing esoteric examples unless this tutorial is meant to teach R to someone interested in the design of languages?
Again, Wickham's Advanced R book handles this topic well (in fact, Advanced R is probably the best book you can read if you already know how to program -- it is incredibly accessible) in his early chapter on Data structures:
http://adv-r.had.co.nz/Data-structures.html
> R’s base data structures can be organised by their dimensionality (1d, 2d, or nd) and whether they’re homogeneous (all contents must be of the same type) or heterogeneous (the contents can be of different types). This gives rise to the five data types most often used in data analysis...Almost all other objects are built upon these foundations. In the OO field guide you’ll see how more complicated objects are built of these simple pieces.
And this next sentence is the one thing I wished someone had printed out and stapled to my forehead before I started to learn R:
> Note that R has no 0-dimensional, or scalar types. Individual numbers or strings, which you might think would be scalars, are actually vectors of length one.
Maybe that's self-evident to other programmers, but even as someone who once programmed in MATLAB, I was stunningly ignorant I was of how every return value I interacted with was a vector, even a single simple string. In retrospect, the interactive shell alludes to this...but I didn't even bother looking up the details about the shell:
> 2 + 2
[1] 4
> 'a'
[1] "a"
R is wonderfully easy to start up with and produce visualizations with. But skipping over the language's fundamentals was incredibly painful for me. A few minutes skimming "Advanced R" would have easily saved me hours of confusion.
wrt assignments: = does assignments too but not eagerly. It's used for late bindings. This isn't a matter of style as your reference to adv-r would suggest but a different use case.
I wish people would spend more time reading the official language reference instead of all these half-baked tutorials.
Also, if you are using models and algorithms on the backend, R is painful compared to python.
If you are just doing interactive work RStudio and jupyter with well-stocked support libraries are about equivilant to me.
If you want to do something not all that popular (eg. conjoint analysis) or something cutting edge (duplicating recent research results), R is pretty much the only way to go.
If you want to do something not all that popular (eg. conjoint analysis) or something cutting edge (duplicating recent research results), R is pretty much the only way to go.
Why? Shouldn't you be able to do that in Python (or MATLAB if that's your thing)?
You sure can! Except you will likely have to roll your own, rather than the plug-and-play modules that seem to be released for R first (used to be MATLAB, but R seems to be the pretty new girl in class, getting all the attention).
That would be so because with R you get access to a myriad of statistical packages or because the functionality of numpy etc. is already included? IMHO the only real shortcoming is the lack of 64 bit integers, which is of course ridiculous.
I really am curious why anything "R" and "Tutorial" gets massively upvoted to the Top 3 of HN like clockwork nowadays. I might have to restart my R tutorial screencasts since there appears to be a demand. :P