Prediction using the R SuperLearner package

In this post, the R SuperLearner package is used to predict the values of the testset part of the prostate dataset.
In the SuperLearner approach, prediction is performed by combining weighted versions of different learners. As shown in the code below, the mean square prediction error is 0.319. As a comparison, in an earlier post (3-way variable selection in R regression), the mean square errors of linear regression and lasso regression were 0.516 and 0.493 respectively.

Here is the SuperLearner code:
> library(SuperLearner)
>library(ElemStatLearn)
> data(prostate)
#Separate the training and test sets.
> head(prostate)
lcavol lweight age lbph svi lcp gleason pgg45 lpsa train
1 -0.5798185 2.769459 50 -1.386294 0 -1.386294 6 0 -0.4307829 TRUE
2 -0.9942523 3.319626 58 -1.386294 0 -1.386294 6 0 -0.1625189 TRUE
3 -0.5108256 2.691243 74 -1.386294 0 -1.386294 7 20 -0.1625189 TRUE
4 -1.2039728 3.282789 58 -1.386294 0 -1.386294 6 0 -0.1625189 TRUE
5 0.7514161 3.432373 62 -1.386294 0 -1.386294 6 0 0.3715636 TRUE
6 -1.0498221 3.228826 50 -1.386294 0 -1.386294 6 0 0.7654678 TRUE
#The variable to be predicted is lpsa. The train variable is a dummy variable that indicates whether a case belongs to the trainset or the testset.
> trainset testset testset1 testset2 trainset1 trainset2<-trainset1[,-9]
#Specify the learners that will be used by the superlearner.
mylibrary X newX<-testset2
ay out out$SL.predict
[,1]
7 1.814375
9 1.110389
10 1.237715
15 1.871654
22 2.699901
25 1.943528
26 1.977818
28 1.965107
32 1.988913
34 1.227339
36 2.875185
42 2.231019
44 2.346486
48 2.783155
49 2.419382
50 2.120854
53 2.388720
54 3.046706
55 3.001810
57 1.612384
62 3.444458
64 3.635944
65 2.350286
66 2.748382
73 2.853494
74 3.341276
80 3.117150
84 3.186377
95 3.241076
97 3.657895
#Let’s compute now the mean square error between the predicted values and the actual testset values.
> sum=0
> tt for(i in 1:tt) {
+ sum sumg sumg
[1] 0.3191994
>

Published by

Unknown's avatar

alitheia15

Data Mining-Analytics Software Consultant

One thought on “Prediction using the R SuperLearner package”

Leave a reply to Pete Mancini Cancel reply