Hypothesis testing and confidence interval in R

June 26, 2015

As usual, the notes below are documented for future reference should I ever need to use them again someday.

Also to note, most of the exercises this time around requires that you’d be able to generate a random data set in a precise manner - else you wouldn’t get the same answer as what was provided in the exercises.

You’d think that R already has a function on that but sadly no.

Due to the small sample sizes in most of the questions, rnorm wouldn’t usually give you an accurate result. It’d instead give you something like this:

a <-  rnorm(10, 5, 1)
mean(a)

## [1] 5.081355

sd(a)

## [1] 0.7624848

Lucky for me, I was able to find this nice function right here Stack Overflow

rnorm2 <-  function(n,mean,sd) { mean+sd*scale(rnorm(n)) }

It’s very similar to rnorm - only more precise and behaves exactly as you’d expect a random number generator to behave. Let’s take it for a test run.

b <-  rnorm2(10,5,1)
mean(b)

## [1] 5

sd(b)

## [1] 1

Now I can finally get down to business.

In a population of interest, a sample of 9 men yielded a sample average brain volume of 1,100cc and a standard deviation of 30cc. What is a 95% Student’s T confidence interval for the mean brain volume in this new population?

Answer:

t.test(rnorm2(9,1100,30))

## 
##  One Sample t-test
## 
## data:  rnorm2(9, 1100, 30)
## t = 110, df = 8, p-value = 5.212e-14
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  1076.94 1123.06
## sample estimates:
## mean of x 
##      1100

In a study of emergency room waiting times, investigators consider a new and the standard triage systems. To test the systems, administrators selected 20 nights and randomly assigned the new triage system to be used on 10 nights and the standard system on the remaining 10 nights. They calculated the nightly median waiting time (MWT) to see a physician. The average MWT for the new system was 3 hours with a variance of 0.60 while the average MWT for the old system was 5 hours with a variance of 0.68. Consider the 95% confidence interval estimate for the differences of the mean MWT associated with the new system. Assume a constant variance. What is the interval? Subtract in this order (New System - Old System).

Answer:

t.test(rnorm2(10,3,sqrt(0.6)),rnorm2(10,5,sqrt(0.68)), var.equal=TRUE)

## 
##  Two Sample t-test
## 
## data:  rnorm2(10, 3, sqrt(0.6)) and rnorm2(10, 5, sqrt(0.68))
## t = -5.5902, df = 18, p-value = 2.637e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -2.751649 -1.248351
## sample estimates:
## mean of x mean of y 
##         3         5

To further test the hospital triage system, administrators selected 200 nights and randomly assigned a new triage system to be used on 100 nights and a standard system on the remaining 100 nights. They calculated the nightly median waiting time (MWT) to see a physician. The average MWT for the new system was 4 hours with a standard deviation of 0.5 hours while the average MWT for the old system was 6 hours with a standard deviation of 2 hours. Consider the hypothesis of a decrease in the mean MWT associated with the new treatment. What does the 95% independent group confidence interval with unequal variances suggest vis a vis this hypothesis? (Because there’s so many observations per group, just use the Z quantile instead of the T.)

t.test(rnorm(100,6,2),rnorm(100,5,0.5), var.equal=FALSE)

## 
##  Welch Two Sample t-test
## 
## data:  rnorm(100, 6, 2) and rnorm(100, 5, 0.5)
## t = 6.4872, df = 119.02, p-value = 2.091e-09
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.7753144 1.4565475
## sample estimates:
## mean of x mean of y 
##  6.124831  5.008900

OK…so I didn’t really use the z quantile. Manage to get the right answer though. So…meh~

In any case it can be seen that the new system appears to be effective as both the interval is above zero.

Suppose that 18 obese subjects were randomized, 9 each, to a new diet pill and a placebo. Subjects’ body mass indices (BMIs) were measured at a baseline and again after having received the treatment or placebo for four weeks. The average difference from follow-up to the baseline (followup - baseline) was ???3 kg/m2 for the treated group and 1 kg/m2 for the placebo group. The corresponding standard deviations of the differences was 1.5 kg/m2 for the treatment group and 1.8 kg/m2 for the placebo group. Does the change in BMI over the four week period appear to differ between the treated and placebo groups? Assuming normality of the underlying data and a common population variance, calculate the relevant 90% t confidence interval. Subtract in the order of (Treated - Placebo) with the smaller (more negative) number first.

t.test(rnorm2(9,-3,1.5),rnorm2(9,1,1.8), var.equal=TRUE, conf.level=.90)

## 
##  Two Sample t-test
## 
## data:  rnorm2(9, -3, 1.5) and rnorm2(9, 1, 1.8)
## t = -5.1215, df = 16, p-value = 0.0001025
## alternative hypothesis: true difference in means is not equal to 0
## 90 percent confidence interval:
##  -5.363579 -2.636421
## sample estimates:
## mean of x mean of y 
##        -3         1

Search This Blog

Hafidz Zulkifli

Hypothesis testing and confidence interval in R

Comments

Popular posts from this blog

HIVE: Both Left and Right Aliases Encountered in Join

How to use diff in UNIX

Splitting value in Netezza using array_split