My posts on Analytic bridge regarding regression,cross validation, and predictive power

 

http://www.analyticbridge.com/profiles/blogs/cross-validation-in-r-a-do-it-yourself-and-a-black-box-approach

http://www.analyticbridge.com/profiles/blogs/use-press-not-r-squared-to-judge-predictive-power-of-regression

Leave-one-out cross validation in R and computation of the predicted residual sum of squares(PRESS) statistic

We will use the gala dataset in the faraway package to demonstrate leave-one-out cross-validation. In this type of validation, one case of the data set is left out and used as the testing set and the remaining data are used as the training set for the regression. This process is repeated until each case in  the data set has served as the testing set.

The key concept in creating the iterative leave-one-out process in R is creating a column vector c1 and attaching it to gala as shown below, This allows us to uniquely identify the row that is to be left out in each iteration.

> library(faraway)

> gala[1:3,]

          Species Endemics  Area Elevation Nearest Scruz Adjacent

Baltra         58       23 25.09       346     0.6   0.6     1.84

Bartolome      31       21  1.24       109     0.6  26.3   572.33

Caldwell        3        3  0.21       114     2.8  58.7     0.78

>c1<-c(1:30)

> gala2<-cbind(gala,c1)

> gala2[1:3,]

          Species Endemics  Area Elevation Nearest Scruz Adjacent c1

Baltra         58       23      25.09       346     0.6        0.6        1.84       1

Bartolome   31       21     1.24       109       0.6      26.3     572.33      2

> diff1<-numeric(30)

> for(i in 1:30){model1<-lm(Species~Endemics+Area+Elevation,subset=(c1!=i),data=gala2)

+ specpr<-predict(model1,list(Endemics=gala2[i,2],Area=gala2[i,3],Elevation=gala2[i,4]),data=gala2)

+ diff1[i]<-gala2[i,1]-specpr }

 

>summ1<-numeric(1)

>summ1=0

> for(i in 1:30){summ1<-summ1+diff1[i]^2}

> summ1

[1] 259520.5

The variable summ1 holds the value of  the PRESS statistic.