Trekonomics Teaser Videos: Brad DeLong, Adam Gomolin, Manu Saadia

Live from Crow's Coffee: Neal Beck: R for Undergraduates: "First, I got no first-hand horror stories...

...(though did get some second hand horror stories). This does not mean that such do not exist, but I am more inclined to believe they are an urban myth (but this is not updating for me very much). Also, it was pointed out it is not as though UGs love learning Stata; a move from Stata to R is not as drastic as from the Daily Show (sigh!) to the PBS Newshour. We will always have UGs who will hate whatever choice we make.

Second, anyone using R wants to use a nice front end; for me and most respondents, this is RStudio. This is a no-brainer, so R means R/RStudio. Others mentioned others GUIs or interfaces such as RCommander or Zelig, but RStudio is the clear no-brainer.

Third, instructors made UGs happy by stressing two things about R right away (making it clear that they did not simply choose R because they are sadists)

a) R is FREE. So students can work at home, are not dependent on a university computer lab, etc. Students like FREE.

b) R can help get you a job after graduation. R provides many transferrable skills and is widely used in a variety of non-academic settings. (I was first moved to think about R for UGs when I hear Amanda Cox talk about using R at the New York Times on Data Stories.) Obviously people in the non-academic world will be asked to use many programs, many of which do not even exist yet, but R is a good gateway drug for such. And it is a good start if one has to move on to something like Python. Students like things that will help them get a job.

People stressed how critical it is that students understand we are not just sadists.

Fourth, R needs good support, both at the TA level and also the university level. While RStudio makes things like getting the right working directory or installing a package easy, easy is not enough. So you need the resources to make this very easy. Precise written instructions also help, but those who need the most help also often seem to not find the instructions. (Many happy R instructors have used R in small courses with a self-selected group of students. Be careful in assuming this generalizes.) At a minimum, your TAs need to be very happy with R. And as someone who can never remember whether I need to type FALSE or "FALSE" or na.omit=TRUE, I have some sympathy with undergraduates. At the university level, some remarked that different labs have different versions of R. Assume that Murphy was an optimist.

Fifth, your colleagues need to buy in. Obviously you need TAs conversant in R. But if your colleagues want UG RAs who are good in Stata, there is a problem. Similarly if the person teaching public opinion gives data assignments in Stata. (Going along with this this, if a student can get a degree without doing anything quantitative, they might correctly ask why they had to learn R (or Stata).)

Sixth is to simplify. So lots of stuff like regression is really straightforward in R. But you might go more out of your way to give students data sets with no missing data to avoid them having to remember about na.omit=TRUE or provide a .R file that sets easy defaults the way you want.

Seventh is to provide some dessert up front. Data visualization is great for this, and can be done on day one. So don't start by just defining R objects or looking at lists. Hold off on cbind for a while. Visualization is fun, or did I just say that.

So there are a bunch of things one needs to do. Can be done, but not all can be done without a group consensus.

Turning to the downside (and this exists), R takes a bit more of the UG brain than does Stata. Given that something has to give, at least early on R classes will probably give students less on thinking about data. Some have noted that by being able to look under the hood one can in the end know more about how to think about data, but learning to look under the hood is not free. This to me is the big issue.

Related is that getting data into R is not as trivial as it is in Stata. It is easy to cut a table from a web page and paste it into a new Stata data set. Not quite so easy in R (though not so hard; paste into Excel, save as cvs, and read.csv("file") hoping students remember quotes and have gotten file into their working directory and remember to type quotes or remembering how to type df<-read.delim(pipe(“pbpaste”)) and not putting j after pb).

In the long run R is fabulous for bring all sorts of very cool but complicated data into an analysis, but there are short run concerns. It is also probably not quite as easy to play with annoying data in R as it would be in Stata. Obviously one compromise is to start out with very pretty well behaved data sets. (Another, as suggested by a wise colleague, is to start UGs early in Excel (in the big intro classes) and then then tell them that a data frame is just like an Excel spreadsheet. If students can play a bit with Excel, which they likely learned a long before college, they can make data a bit prettier in Excel before read.csv.) But if read.csv(file) which returns object `file' not found is going to be a killer, think twice. If you have good support this may not be an issue.

Comments