The Future of R-Project!

Publicado el sábado, 22 de mayo de 2010

It has already been 10 years since I stumbled upon the R-Project while doing my PhD work for the University of Amsterdam. Back in those days, we worked either with SPSS and as botanical ecologists with CANOCO (by Cajo ter Braak) and R Package (developed by Philipe Casgrain and Pierre Legendre at the University of Montreal) . Soon after both commercial packages started to move towards an “upgrade” in their user interface. Essentially this meant that they moved from a job-based (scripted) environment to a full menu based graphical user interface.

I am one of those who consider menu GUI’s to be a downgrade when it comes to analysis software. Either you know what you are doing, and want to do that efficiently, or you don’t, and in the latter case you will feel the urge to click around and try. This is a no-no in statistics. In actual fact each “try” reduces the amount of degrees of freedom of any test you end up presenting in your research paper. But I have yet to see the first paper that acknowledges this reduction in power of their analysis.

R Project offered instant freedom from all this. In a well designed and structured language (which has been around for a long time, R is based on S), most of the analyses that I needed were integrated through one of the 2000 modules that have been written by experts in the field. There is a good peer-review system in place to check for accuracy, and I have found the documentation to be excellent and consistent in format and depth, regardless of the authors.

At the same time I have have been wondering for ten years why the main site of the R-Project looks the way it does. Either there is no one who wants to spend time to make a nice look and feel of the entry point of the project. Or they feel that since most are interested in just the goods anyway, there is no point.

Part of the answer has been given in the December issue of the R-Journal in this paper by John Fox. It’s a fascinating inside view of the origins of a collaborating group of people, starting in 1990, that have spent thousands of hours creating something because they could, and because they found that others found it useful. The strength of the article is that it highlights the differences in motivation of the core-team of the R-Project – ranging from the technical challenges, to outright altruism – and then places this in the context of the future of R.

Successful open source projects have either survived through a strict set of rules and code of conduct (e.g. Debian), through financing the organizational structure by a philanthropist (e.g. Ubuntu) or through financing by a commercial organization (e.g. Open Office). John Fox highlights the risks of success and further expansion of R under the current organizational structure, but does not indicate whether there is any discussion within the core team on the strategic choice they need to make to ensure viability and success in the next ten years.

The R-Project deserves to have a long future, as it offers top class analysis tools to anybody, at any university or institute where the only investment required is to learn more about statistics.