The R language was created by Ross Ihaka and Robert Gentleman in Auckland, New Zealand - and since recently moving to Auckland - I thought I would give it its dues.
From Wikipedia:
R is an open source programming language and software environment for statistical computing and graphics that is supported
by the R Foundation for Statistical Computing.
'R' can be executed via a script (Rscript) or via an IDE such as RStudio.
Data files can be found from the general site:
http://www.kgs.ku.edu/Mathgeo/Books/Stat
For the exercise below, the brine.txt file was used.
The R language is used to perform statistical problem solving - and thus part of the machine learning envelope of software.
In the book - "Practical Machine Learning Cookbook", the R script can be found to use the brine.txt data file (found on the data site mentioned above)"
Using a text editor, type the following file into the editor and save as 'demo1.R'
You can then execute the file from the command line using 'Rscript demo1.R' - if you have R loaded on your MacOS or Linux workstation. This program can also be executed via 'RStudio' if you are using Windows/MacOS.
From Wikipedia:
R is an open source programming language and software environment for statistical computing and graphics that is supported
by the R Foundation for Statistical Computing.
'R' can be executed via a script (Rscript) or via an IDE such as RStudio.
Data files can be found from the general site:
http://www.kgs.ku.edu/Mathgeo/Books/Stat
For the exercise below, the brine.txt file was used.
The R language is used to perform statistical problem solving - and thus part of the machine learning envelope of software.
In the book - "Practical Machine Learning Cookbook", the R script can be found to use the brine.txt data file (found on the data site mentioned above)"
Using a text editor, type the following file into the editor and save as 'demo1.R'
You can then execute the file from the command line using 'Rscript demo1.R' - if you have R loaded on your MacOS or Linux workstation. This program can also be executed via 'RStudio' if you are using Windows/MacOS.
library(MASS)
# Load the brine.txt dataset
brine <- read.table("Data/brine.txt", header=TRUE, sep=",", row.names=1)
# Show the first few lines of the data file
head(brine)
pairs(brine[ ,1:6])
brine.log <- brine
brine.log[ ,1:6] <- log(brine[ ,1:6]+1)
pairs(brine.log[ ,1:6])
brine.log.lda <- lda(GROUP ~ HCO3 + SO4 + Cl + Ca + Mg + Na, data=brine.log)
brine.log.lda
brine.log.hat <- predict(brine.log.lda)
brine.log.hat
apply(brine.log.hat$posterior, MARGIN=1, FUN=max)
plot(brine.log.lda)
plot(brine.log.lda, dimen=1, type="both")
tab <- table(brine.log$GROUP, brine.log.hat$class)
tab
sum(tab[row(tab) == col(tab)]) / sum(tab)
brine.log.lda <- lda(GROUP ~ HCO3 + SO4 + Cl + Ca + Mg + Na, data=brine.log, CV=TRUE)
tab <- table(brine.log$GROUP, brine.log.lda$class)
tab
sum(tab[row(tab) == col(tab)]) / sum(tab)
# Load the brine.txt dataset
brine <- read.table("Data/brine.txt", header=TRUE, sep=",", row.names=1)
# Show the first few lines of the data file
head(brine)
pairs(brine[ ,1:6])
brine.log <- brine
brine.log[ ,1:6] <- log(brine[ ,1:6]+1)
pairs(brine.log[ ,1:6])
brine.log.lda <- lda(GROUP ~ HCO3 + SO4 + Cl + Ca + Mg + Na, data=brine.log)
brine.log.lda
brine.log.hat <- predict(brine.log.lda)
brine.log.hat
apply(brine.log.hat$posterior, MARGIN=1, FUN=max)
plot(brine.log.lda)
plot(brine.log.lda, dimen=1, type="both")
tab <- table(brine.log$GROUP, brine.log.hat$class)
tab
sum(tab[row(tab) == col(tab)]) / sum(tab)
brine.log.lda <- lda(GROUP ~ HCO3 + SO4 + Cl + Ca + Mg + Na, data=brine.log, CV=TRUE)
tab <- table(brine.log$GROUP, brine.log.lda$class)
tab
sum(tab[row(tab) == col(tab)]) / sum(tab)
rscript demo1.R
HCO3 SO4 Cl Ca Mg Na GROUP
1 10.4 30.0 967.1 95.9 53.7 857.7 1
2 6.2 29.6 1174.9 111.7 43.9 1054.7 1
3 2.1 11.4 2387.1 348.3 119.3 1932.4 1
4 8.5 22.5 2186.1 339.6 73.6 1803.4 1
5 6.7 32.8 2015.5 287.6 75.1 1691.8 1
6 3.8 18.9 2175.8 340.4 63.8 1793.9 1
Call:
lda(GROUP ~ HCO3 + SO4 + Cl + Ca + Mg + Na, data = brine.log)
Prior probabilities of groups:
1 2 3
0.3684211 0.3157895 0.3157895
Group means:
HCO3 SO4 Cl Ca Mg Na
1 1.759502 3.129009 7.496891 5.500942 4.283490 7.320686
2 2.736481 3.815399 6.829565 4.302573 4.007725 6.765017
3 1.374438 2.378965 6.510211 4.641049 3.923851 6.289692
Coefficients of linear discriminants:
LD1 LD2
HCO3 -1.67799521 0.64415802
SO4 0.07983656 0.02903096
Cl 22.27520614 -0.31427770
Ca -1.26859368 2.54458682
Mg -1.88732009 -2.89413332
Na -20.86566883 1.29368129
Proportion of trace:
LD1 LD2
0.7435 0.2565
$class
[1] 2 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3
Levels: 1 2 3
$posterior
1 2 3
1 2.312733e-01 7.627845e-01 5.942270e-03
2 9.488842e-01 3.257237e-02 1.854347e-02
3 8.453057e-01 9.482540e-04 1.537461e-01
4 9.990242e-01 8.794725e-04 9.632578e-05
5 9.965920e-01 2.849903e-03 5.581176e-04
6 9.984987e-01 1.845534e-05 1.482872e-03
7 8.676660e-01 7.666611e-06 1.323263e-01
8 4.938019e-03 9.949035e-01 1.584755e-04
9 4.356152e-03 9.956351e-01 8.770078e-06
10 2.545287e-05 9.999439e-01 3.066264e-05
11 2.081510e-02 9.791728e-01 1.210748e-05
12 1.097540e-03 9.989023e-01 1.455693e-07
13 1.440307e-02 9.854613e-01 1.356671e-04
14 4.359641e-01 2.367602e-03 5.616683e-01
15 6.169265e-02 1.540353e-04 9.381533e-01
16 7.500357e-04 4.706701e-09 9.992500e-01
17 1.430433e-03 1.095281e-06 9.985685e-01
18 2.549733e-04 3.225658e-07 9.997447e-01
19 6.433759e-02 8.576694e-03 9.270857e-01
$x
LD1 LD2
1 -1.1576284 -0.1998499
2 -0.1846803 0.6655823
3 1.0179998 0.6827867
4 -0.3939366 2.6798084
5 -0.3167164 2.0188002
6 1.0061340 2.6434491
7 2.0725443 1.5714400
8 -2.0387449 -0.9731745
9 -2.6054261 -0.2774844
10 -2.5191350 -2.8304663
11 -2.4915044 0.3194247
12 -3.4448401 0.1869864
13 -2.0343204 -0.4674925
14 1.0441237 -0.0991014
15 1.6987023 -0.6036252
16 3.9138884 -0.7211078
17 2.7083649 -1.3896956
18 2.9310268 -1.9243611
19 0.7941483 -1.2819190
1 2 3 4 5 6 7 8
0.7627845 0.9488842 0.8453057 0.9990242 0.9965920 0.9984987 0.8676660 0.9949035
9 10 11 12 13 14 15 16
0.9956351 0.9999439 0.9791728 0.9989023 0.9854613 0.5616683 0.9381533 0.9992500
17 18 19
0.9985685 0.9997447 0.9270857
1 2 3
1 6 1 0
2 0 6 0
3 0 0 6
[1] 0.9473684
1 2 3
1 6 1 0
2 1 4 1
3 1 0 5
[1] 0.7894737
HCO3 SO4 Cl Ca Mg Na GROUP
1 10.4 30.0 967.1 95.9 53.7 857.7 1
2 6.2 29.6 1174.9 111.7 43.9 1054.7 1
3 2.1 11.4 2387.1 348.3 119.3 1932.4 1
4 8.5 22.5 2186.1 339.6 73.6 1803.4 1
5 6.7 32.8 2015.5 287.6 75.1 1691.8 1
6 3.8 18.9 2175.8 340.4 63.8 1793.9 1
Call:
lda(GROUP ~ HCO3 + SO4 + Cl + Ca + Mg + Na, data = brine.log)
Prior probabilities of groups:
1 2 3
0.3684211 0.3157895 0.3157895
Group means:
HCO3 SO4 Cl Ca Mg Na
1 1.759502 3.129009 7.496891 5.500942 4.283490 7.320686
2 2.736481 3.815399 6.829565 4.302573 4.007725 6.765017
3 1.374438 2.378965 6.510211 4.641049 3.923851 6.289692
Coefficients of linear discriminants:
LD1 LD2
HCO3 -1.67799521 0.64415802
SO4 0.07983656 0.02903096
Cl 22.27520614 -0.31427770
Ca -1.26859368 2.54458682
Mg -1.88732009 -2.89413332
Na -20.86566883 1.29368129
Proportion of trace:
LD1 LD2
0.7435 0.2565
$class
[1] 2 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3
Levels: 1 2 3
$posterior
1 2 3
1 2.312733e-01 7.627845e-01 5.942270e-03
2 9.488842e-01 3.257237e-02 1.854347e-02
3 8.453057e-01 9.482540e-04 1.537461e-01
4 9.990242e-01 8.794725e-04 9.632578e-05
5 9.965920e-01 2.849903e-03 5.581176e-04
6 9.984987e-01 1.845534e-05 1.482872e-03
7 8.676660e-01 7.666611e-06 1.323263e-01
8 4.938019e-03 9.949035e-01 1.584755e-04
9 4.356152e-03 9.956351e-01 8.770078e-06
10 2.545287e-05 9.999439e-01 3.066264e-05
11 2.081510e-02 9.791728e-01 1.210748e-05
12 1.097540e-03 9.989023e-01 1.455693e-07
13 1.440307e-02 9.854613e-01 1.356671e-04
14 4.359641e-01 2.367602e-03 5.616683e-01
15 6.169265e-02 1.540353e-04 9.381533e-01
16 7.500357e-04 4.706701e-09 9.992500e-01
17 1.430433e-03 1.095281e-06 9.985685e-01
18 2.549733e-04 3.225658e-07 9.997447e-01
19 6.433759e-02 8.576694e-03 9.270857e-01
$x
LD1 LD2
1 -1.1576284 -0.1998499
2 -0.1846803 0.6655823
3 1.0179998 0.6827867
4 -0.3939366 2.6798084
5 -0.3167164 2.0188002
6 1.0061340 2.6434491
7 2.0725443 1.5714400
8 -2.0387449 -0.9731745
9 -2.6054261 -0.2774844
10 -2.5191350 -2.8304663
11 -2.4915044 0.3194247
12 -3.4448401 0.1869864
13 -2.0343204 -0.4674925
14 1.0441237 -0.0991014
15 1.6987023 -0.6036252
16 3.9138884 -0.7211078
17 2.7083649 -1.3896956
18 2.9310268 -1.9243611
19 0.7941483 -1.2819190
1 2 3 4 5 6 7 8
0.7627845 0.9488842 0.8453057 0.9990242 0.9965920 0.9984987 0.8676660 0.9949035
9 10 11 12 13 14 15 16
0.9956351 0.9999439 0.9791728 0.9989023 0.9854613 0.5616683 0.9381533 0.9992500
17 18 19
0.9985685 0.9997447 0.9270857
1 2 3
1 6 1 0
2 0 6 0
3 0 0 6
[1] 0.9473684
1 2 3
1 6 1 0
2 1 4 1
3 1 0 5
[1] 0.7894737