library(ggplot2)
The following represents a fake data set for the metabolic rates of crickets at two different temperature treatments (High, Low). For simplicity, there are only two generic temperature treatments. Metabolic rate is measured in ml O2/gram / Hour. This fake data will be analyzed using an ANOVA test.
Please Refer to Homework 2 on main portfolio.
Metabolic rate is expected to increase at higher temperatures. There will likely be a corresponding increase in standard deviation as temperatures increase.
Mean Low: 1.5 SD: 1
Mean High: 2.25 SD: 2 # Higher Standard Deviation with Temp N: 300
myFunction <- function(nGroup = 2, nName1 = c("Low", "High"),
nSize1 = c(50,50), nMean1 =c(1.5,2)){
ID <- 1:100
resVar <- c(rnorm(n=nSize1[1], mean= nMean1[1],sd = .5),
rnorm(n=nSize1[2], mean= nMean1[2],sd = .75))
TGroup <- rep(nName1,nSize1)
ANOdata <- data.frame(ID, TGroup, resVar)
return(ANOdata)
}
Our desired data frame is created by assigning myFunction to the global environment
myDF <- myFunction()
myDF <- myFunction()
AnoFunction <- function(data=myDF){
ANOmodel <- aov(myDF$resVar ~ myDF$TGroup, data = myDF)
ANOSummary <- summary(ANOmodel)
return(ANOSummary)
}
AnoFunction()
## Df Sum Sq Mean Sq F value Pr(>F)
## myDF$TGroup 1 4.63 4.634 9.37 0.00285 **
## Residuals 98 48.46 0.495
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The function generates the following data:
myFunction()
Using ggplot2
ANOplot <- ggplot(data = myDF, aes(x=myDF$TGroup,y=myDF$resVar, fill = myDF$TGroup))+
geom_boxplot(colour = "Black")+
scale_y_continuous(name = "Metabolic Rate")+
scale_x_discrete(name = "Temperature Treatment") +
ggtitle("Metabolic Rates in High vs Low Temperature")+
theme_bw()
ANOplot
Repeating the function with the same parameters yields:
# 1.) 0.0837 2.) 0.00721 3.) 0.000635 4.) 0.000124 5.) 0.0019 6.) 0.0267 7.)0.0011
# 8.)0.000374 9.)0.00112 10.)0.0386 11.)0.00052 12.) 0.00093 13.) 0.049
The Results are all highly significant from the first run. Here I will decease the mean of the “High” metabolic rate. I will also slightly adjust the corresponding standard deviation.
myFunction <- function(nGroup = 2, nName1 = c("Low", "High"),
nSize1 = c(50,50), nMean1 =c(1.5,1.80)){
ID <- 1:100
resVar <- c(rnorm(n=nSize1[1], mean= nMean1[1],sd = .5),
rnorm(n=nSize1[2], mean= nMean1[2],sd = .6))
TGroup <- rep(nName1,nSize1)
ANOdata <- data.frame(ID, TGroup, resVar)
return(ANOdata)
}
myDF <- myFunction()
AnoFunction <- function(data=myDF){
ANOmodel <- aov(myDF$resVar ~ myDF$TGroup, data = myDF)
ANOSummary <- summary(ANOmodel)
return(ANOSummary)
}
AnoFunction()
## Df Sum Sq Mean Sq F value Pr(>F)
## myDF$TGroup 1 0.43 0.4300 1.367 0.245
## Residuals 98 30.83 0.3146
45 % of these runs are significant. Decreasing the mean by 0.20 and standard deviation by 0.15 makes for more significant results
# 1.) 0.078 2.) 0.0171 2.) 0.0008 3.) 0.00741 4.)0.0541 5.) 0.00471 6.) 0.0214 7.) 0.066 8.) 0.0105
# 9.) 0.172 10.) 0.793 11.) 0.53 12.) 0.00106 13.) 0.145 14.) 0.001 15.)0.138 16.)0.00374 17.)0.0019
# 18.) 0.00268 19.) 0.0178 20.) 0.106
If we decrease the mean metabolic rate for a high temperature by another 0.20 the results even more significant
myFunction <- function(nGroup = 2, nName1 = c("Low", "High"),
nSize1 = c(50,50), nMean1 =c(1.5,1.60)){
ID <- 1:100
resVar <- c(rnorm(n=nSize1[1], mean= nMean1[1],sd = .5),
rnorm(n=nSize1[2], mean= nMean1[2],sd = .6))
TGroup <- rep(nName1,nSize1)
ANOdata <- data.frame(ID, TGroup, resVar)
return(ANOdata)
}
myDF <- myFunction()
AnoFunction <- function(data=myDF){
ANOmodel <- aov(myDF$resVar ~ myDF$TGroup, data = myDF)
ANOSummary <- summary(ANOmodel)
return(ANOSummary)
}
AnoFunction()
## Df Sum Sq Mean Sq F value Pr(>F)
## myDF$TGroup 1 0.03 0.0334 0.098 0.755
## Residuals 98 33.54 0.3422
After running the function with means = c(1.5, 1.6) with the same standard deviations we see 79% of the results are NOT significant.
# 1.) 0.273 2.) 0.834 3.) 0.729 4.)0.00165 5.)0.0972 6.)0.401 7.) 0.504 8.) 0.0221 9.) 0.823 10.) 0.37
# 11.) 0.385 12.) 0.661 13.) 0.00144 14.) 0.709 15.) 0.218 16.) 0.296 17.) 0.022 18.) 0.878 19.) 0.409
Reverting Back to the original means and standard deviations. I decreased the sample size by half
myFunction <- function(nGroup = 2, nName1 = c("Low", "High"),
nSize1 = c(10,10), nMean1 =c(1.5,2)){
ID <- 1:100
resVar <- c(rnorm(n=nSize1[1], mean= nMean1[1],sd = .5),
rnorm(n=nSize1[2], mean= nMean1[2],sd = .75))
TGroup <- rep(nName1,nSize1)
ANOdata <- data.frame(ID, TGroup, resVar)
return(ANOdata)
}
myDF <- myFunction()
AnoFunction <- function(data=myDF){
ANOmodel <- aov(myDF$resVar ~ myDF$TGroup, data = myDF)
ANOSummary <- summary(ANOmodel)
return(ANOSummary)
}
AnoFunction()
## Df Sum Sq Mean Sq F value Pr(>F)
## myDF$TGroup 1 0.09 0.0853 0.223 0.638
## Residuals 98 37.41 0.3818
Decreasing the sample size to 10 only yields data that is significant 47% of the time.
# 1.) 0.00337 2.) 0.0012 3.) 0.0452 4.) 0.388 5.) 0.00708 6.) 0.991 7.) 0.113 8.) 0.201 9.) 0.377
# 10.) 0.000162 11.) 0.00292 12.) 0.073 13.) 0.413 14.) 0.021 15.) 0.146 16.) 0.0087 17.) 0.0512
myFunction <- function(nGroup = 2, nName1 = c("Low", "High"),
nSize1 = c(5,5), nMean1 =c(1.5,2)){
ID <- 1:100
resVar <- c(rnorm(n=nSize1[1], mean= nMean1[1],sd = .5),
rnorm(n=nSize1[2], mean= nMean1[2],sd = .75))
TGroup <- rep(nName1,nSize1)
ANOdata <- data.frame(ID, TGroup, resVar)
return(ANOdata)
}
myDF <- myFunction()
AnoFunction <- function(data=myDF){
ANOmodel <- aov(myDF$resVar ~ myDF$TGroup, data = myDF)
ANOSummary <- summary(ANOmodel)
return(ANOSummary)
}
AnoFunction()
## Df Sum Sq Mean Sq F value Pr(>F)
## myDF$TGroup 1 54.16 54.16 447.3 <2e-16 ***
## Residuals 98 11.87 0.12
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Decreasing the sample size again by half makes data that is only significant 40% of the time.
# 1.) 0.698 2.) 0.00253 3.) 0.011 4.) 0.00491 5.) 0.0565 6.) 0.106 7.) 0.748 8.)0.365 9.)0.034 10.) 0.0709
# The parameters originally set were fairly generous. It would be fairly difficult to obtain metabolic rates for 50 individuals. Additionally, the initial standard deviation may be relatively high. When the means for both "High" is decreased, we see more significant data. Similarly, if the standard deviation for "High Temperature" is decreased, the data becomes more significant.
# Interestingly, the sample size can be reduced to 5 individuals and still obtain significant data 40% of the time. This may be indicative of a flawed model with false parameters.
# This model follows the expected pattern that when the means approach the same value, the differences are less significant. Decreasing the sample size dereases the power we have of detecting the differences between "High temperature" treatment and "Low Temperature" treatment.