The RStudio guys have really made R a pleasure to use. Thank you guys!
The core language is still a confusing mess (I'm still never sure when to use a matrix, a dataframe, a list..), but if you use their tools you can ignore it for the most part.
In under 10 lines you can massage data and generate fantastic graphics.
A little off topic: but does anyone know what their business model is? Are they going to run out of money and burnout in a year or two?
And no, we're not planning on burning out. We currently sell three things:
* RStudio Server Pro. An commercial version of the open-source server version that provides stuff that corporate IT wants (e.g. monitoring, more auth options, ...)
* Shiny Server Pro. A more flexible version of the open-source shiny server that offers more configurability (e.g. number of R processes per app), and again other stuff that corporate IT wants.
* Right to use the RStudio desktop IDE to companies who don't want to use AGPL software
Here's how I think of it, which has been working for me:
matrix - If you have data that would make sense to be in a spreadsheet-type format and all your data are numbers.
dataframe - If you have data that would make sense to be in a spreadsheet-type format and some columns are numbers but other columns are something else (character strings, dates, TRUE/FALSE); but each column is only one thing. That is, you have one column that's all dates, another column that's all numbers, yet another column that's all character strings, etc.
list - if you need to mix data types within a certain entity (vector or column of data).
To piggyback on what hadley said a bit, I find thinking of a data frame as a "collection of records", and a matrix as "two dimensional data" to be a bit better.
One useful heuristic worth asking is "Does it make sense to sort this data by something". In that case, you have a data frame. Whereas if you want to perform matrix math on something (inverting it, multiplying it by another matrix, reducing it, etc.), you have a matrix. Things that I use a matrix for can generally also be expressed as a data frame with columns rowId, colId, and value. If it doesn't make sense in that format, a matrix is generally not the appropriate structure.
I'd amend that a little: use a matrix when you're actually calculating statistics (internally to the function). Clean your data so it always fits in a data frame when you load it. Lists are for representing things like data scraped from html before converting it to a data frame.
The community and ecosystem around R is rapidly changing and adapting. R has a long and storied history as a niche language for statistics and analysis. Much like those disciplines have entered the mainstream of modern technology enabled businesses, so follows the R ecosystem. Previously laborious tasks are being revamped with new elegant APIs such as Rvest does for scraping (and dplyr for manipulation, and lubridate for date manipulation, etc...)
Performance is also historically an R bugaboo, but with changes to R's copy on write semantics and other optimizations in the base language, current benchmarks show it behaving on par with Python and other dynamic languages (if not even slightly better with tools such as dplyr and data.table.)
The maggritr package's implementation of a "pipe semantic" (often considered the only truly successful implementation of a 'component architecture') and the adoption of the model for tools such as Rvset are really allowing for the functional, vectorized nature of R to shine through. These are really darned exciting times to be a part of this community!
Performance isn't currently a huge focus for dplyr. In my opinion dplyr is fast enough that the bottleneck becomes mostly cognitive - you spend more time thinking about what you want to do than actually doing it.
In the R Help Desk 2004 (http://www.r-project.org/doc/Rnews/Rnews_2004-1.pdf), Gabor Grothendieck recommends chron over POSIXct classes on account of the time zone conversions which occur when the tz attribute of the latter object is not "GMT". Will this not be a problem with lubridate? Thanks in advance.
I guess I really do skip over who the submitter is when checking out HN links...if this submission had been titled, "Rvest: Easy web scraping R library by Hadley Wickham", I would've immediately been non-skeptical.
It looks like rvest intends to be the equivalent of Mechanize, with stateful navigation in the works. Is there an R equivalent to just Beautiful Soup or Nokogiri?
What are you looking for? rvest should support all the navigation tools from beautiful soup/nokogiri (unless I've missed something), but currently doesn't have any support for modifying the document (in which case I think your only option is the XML package).
One of the reasons I learned Python for data scraping was that R in general does not play nice with https (RCurl requires a certificate and even then it's pretty fussy)
The core language is still a confusing mess (I'm still never sure when to use a matrix, a dataframe, a list..), but if you use their tools you can ignore it for the most part.
In under 10 lines you can massage data and generate fantastic graphics.
A little off topic: but does anyone know what their business model is? Are they going to run out of money and burnout in a year or two?