Correlation of the columns of a dataframe in R

1. Columns contain numerical values.

Here we can use the function cor().

Example:

> library(MASS)
> cor(trees)
Girth Height Volume
Girth 1.0000000 0.5192801 0.9671194
Height 0.5192801 1.0000000 0.5982497
Volume 0.9671194 0.5982497 1.0000000

2. Some columns contain numerical and some contain ordinal values.

Here we can use the function hetcor() in R package polycor. This function computes Pearson correlations between numeric columns, polyserial correlations between numeric and ordinal variables, and polychoric correlations between ordinal variables. We will try it on dataset quine in the faraway package.

Example:

>library(faraway)
> library(polycor)
> quine[1:4,]
Eth Sex Age Lrn Days
1 A M F0 SL 2
2 A M F0 SL 11
3 A M F0 SL 14
4 A M F0 AL 5
> hetcor(quine)

Two-Step Estimates

Correlations/Type of Correlation:
Eth Sex Age Lrn Days
Eth 1 Polychoric Polychoric Polychoric Polyserial
Sex 0.008333 1 Polychoric Polychoric Polyserial
Age -0.02581 -0.08348 1 Polychoric Polyserial
Lrn 0.03389 -0.2393 -0.3187 1 Polyserial
Days -0.3504 0.1048 0.1773 0.05657 1

Standard Errors:
Eth Sex Age Lrn
Eth
Sex 0.1305
Age 0.1109 0.1103
Lrn 0.1307 0.1259 0.1026
Days 0.09246 0.1025 0.08507 0.1039

n = 146

P-values for Tests of Bivariate Normality:
Eth Sex Age Lrn
Eth
Sex <NA>
Age 0.8355 0.01814
Lrn <NA> <NA> 7.895e-11
Days 0.008092 0.009787 0.007213 0.005257

Published by

Unknown's avatar

alitheia15

Data Mining-Analytics Software Consultant

4 thoughts on “Correlation of the columns of a dataframe in R”

  1. I have not checked in here for some time because I thought it was getting boring, but the last few posts are good quality so I guess I will add you back to my daily bloglist. You deserve it friend 🙂

    Like

Leave a reply to admin Cancel reply