STATS: HYPOTHESIS TESTING

Question: A manufacturing company produces steel housing for electrical equipment. The main component part of the housing is steel trough that is made out of a 14-gauge steel coil. It is produced using a 250-ton progressive punch press with a wipe down operation that puts two 90-degree forms in the flat steel to make the trough. The distance from one side of the form to the other is critical because of weatherproofing in outdoor applications. The company requires that the width of the trough be between 8.31 inches and 8.61 inches. The file “Trough” contains the widths of the troughs, in inches, for a sample of n=49.

  1. Compute the mean, Median, first quartile, and third quartile.
  2. Compute the range, interquartile range, variance, standard deviation and coefficient of variation.
  3. Interpret the measure of central tendency and variation within the context of this problem. Why should the company be concerned about the central tendency and variation?
  4. At the 0.10, 0.05 and 0.01 level of significance, is there evidence that the mean width of the troughs is different from 8.46 inches?
  5. What assumption about the population distribution is needed in order to conduct the hypothesis test in (a)?

Question: A manufacturing company produces steel housing for electrical equipment. The main component part of the housing is steel trough that is made out of a 14-gauge steel coil. It is produced using a 250-ton progressive punch press with a wipe down operation that puts two 90-degree forms in the flat steel to make the trough. The distance from one side of the form to the other is critical because of weatherproofing in outdoor applications. The company requires that the width of the trough be between 8.31 inches and 8.61 inches. The file “Trough” contains the widths of the troughs, in inches, for a sample of n=49.

  1. Compute the mean, Median, first quartile, and third quartile.
  2. Compute the range, interquartile range, variance, standard deviation and coefficient of variation.
  3. Interpret the measure of central tendency and variation within the context of this problem. Why should the company be concerned about the central tendency and variation?
  4. At the 0.10, 0.05 and 0.01 level of significance, is there evidence that the mean width of the troughs is different from 8.46 inches?
  5. What assumption about the population distribution is needed in order to conduct the hypothesis test in (a)?

1   Newcomb-Michelson Velocity of Light Experiments

Simon Newcomb of the Nautical Almanac Office (NAO), U.S., published the veloc- ity of light [Newcomb, 1883]4 based on a series of experiments he conducted with Albert Michelson until 1882. The dataset ‘NewcombLight.txt’ contains 66 sam- ples (time in seconds taken for light to travel 7442 meters at sea level) Newcomb collected in 1882. Conduct the t-test and the bootstrap based one-sample tests and provide the population mean of light velocity (in m/s) with your choice of a confidence level. Do the estimates include the widely known speed of light as in HERE5? Do the estimates from the t-test and the bootstrap show any systematic difference? If so, provide possible reasons based on the sampling distributions used by the two approaches.

2    Space Shuttle O-Ring Failures

On 27 January 1986, the night before the space shuttle Challenger exploded, en- gineers at the company that built the shuttle warned NASA scientists that the shuttle should not be launched because of predicted cold weather. Fuel seal prob- lems, which had been encountered in earlier flights, were suspected being associated with low temperatures. It was argued, however, that the evidence was inconclu- sive. The decision was made to launch, even though the temperature at launch time was 29 F ( 1.67 C).

The dataset ‘O Ring Data.XLS’ summarizes the number of O-ring incidents on

24 space shuttle flights prior to the Challenger disaster. Launch temperature was below 65 F for data labeled ‘COOL’ and above 65 F for data labeled ‘WARM’. Conduct a permutation test if the number of O-ring incidents was associated with the temperature using 99% confidence interval with your choice of one-sided or two-sided test options. Use 10,000 permutations to draw conclusion. Justify your choice and show your null distribution as a histogram with a test statistic marked on it.  Make your final suggestion about the launch of the space shuttle on the day of accident based on the quantitative evidence that supports your suggestions.

http://vigo.ime.unicamp.br/~fismat/newcomb.pdf

https://en.wikipedia.org/wiki/Speed_of_light

3    Atmospheric CO2 Concentration during Global

Forced Confinement by COVID-19 (12 marks)

Global forced lockdowns caused by fast spreading COVID-196 since late January

2020 reportedly reduced global CO2 emission. A recent report7 estimates the reduction in CO2 emission as high as 17%.  In this section, we examine if the atmospheric CO2 concentration was lower than the level it would have been with- out COVID-19 during the peak forced confinement period, April and May in 2020.

https://ourworldindata.org/grapher/covid-stringency-index?year=2020-08-24

https://www.nature.com/articles/s41558-020-0797-x

To examine the atmospheric CO2 in April and May 2020, we use the monthly CO2 data8 maintained by the Scripps Institution of Oceanography at the Univer- sity of California, San Diego in the US. The Mauna Loa CO2 monitoring station in Hawaii provides the longest continuous record of atmospheric CO2 concentration since 1958 and is ideally located to measure globally representative CO2 values. Download the monthly CO2 data HERE9 and use the unadjusted values in the 5th column.

1. Monthly timeseries of atmospheric CO2 features steady increase from the beginning of monitoring with strong seasonal fluctuations. It is advised that anomaly of the concentration in 2020 should be examined after removing the background trend and cyclic fluctuations. The seasonal fluctuation can be removed by sampling only April values to test concentrations in April and sampling May values to test concentrations in May.  Construct sepa- rate batches of CO2 values in April and May for 1958-2020.  To remove the long-term trend from the time series, use a quadratic function (2nd-order polynomial) fit to the CO2 concentration values in 1958-2019 for April and May separately.  Once the trend is removed, your samples can be viewed as residuals (deviations) from the expected long-term trend of the atmospheric CO2. Conduct a residual analysis to the detrended April and May samples and provide your assessment of the residuals following the steps suggested in the Linear Regression section.

2. Now you conduct a hypothesis test using confidence interval(s) and Student’s t distribution.   Provide the null and alternative hypotheses for the test. If you chose to use t statistic as a test statistic and 95% confidence level to determine acceptance/rejection, what are the critical values for the null distributions in April and May? What is the result of your test?

3. What is your assessment of the atmospheric CO2 concentration in April and May of 2020 based on the test statistics in the previous question? Are they within the range of your intuitive expectations? There exist a large number of academic and general articles about the expectations and interpretations around the observed atmospheric CO2 published online this year. Provide your assessment and interpretation of the test statistic supported by your choice of relevant articles.

https://scrippsco2.ucsd.edu/data/atmospheric_co2/primary_mlo_co2_record.html

https://scrippsco2.ucsd.edu/assets/data/atmospheric/stations/in_situ_co2/

monthly/monthly_in_situ_co2_mlo.csv

A random sample of 50 binomial trials resulted in 20 successes. Test the claim that the population proportion of successes does not equal 0.50. Use a level of significance of 0.05.

(a)

Can a normal distribution be used for the  distribution? Explain.
No, n·q is greater than 5, but n·p is less than 5. Yes, nap and niqab are both less than 5. No, n·p and n·q are both less than 5. Yes, n·p and n·q are both greater than 5. No, n·p is greater than 5, but n·q is less than 5.

(b)

State the hypotheses.
H0p = 0.5; H1p H0p H1p = 0.5 H0p = 0.5; H1p > 0.5H0p = 0.5; H1p? 0.5

(c)

Compute . (Enter a number.)
p hat =
Compute the corresponding standardized sample test statistic. (Enter a number. Round your answer to two decimal places.)


(d)

Find the P-value of the test statistic. (Enter a number. Round your answer to four decimal places.)


(e)

Do you reject or fail to reject H0? Explain.
At the a = 0.05 level, we reject the null hypothesis and conclude the data are statistically significant. At the a = 0.05 level, we reject the null hypothesis and conclude the data are not statistically significant. At the a = 0.05 level, we fail to reject the null hypothesis and conclude the data are statistically significant. At the a = 0.05 level, we fail to reject the null hypothesis and conclude the data are not statistically significant.

Question 1 In a survey of 2,307 adults, 700 say they believe in UFOs. Construct a 90% confidence interval for the population proportion of adults who believe in UFOs. Question 2 A random sample of 144 soybean plants from our field has a sample mean of 38.8 pods/plant with a sample standard deviation of 3.6. Construct a 95% confidence interval for our yield. Question 3 For a confidence level of 90% with a sample size of 17, find the critical t value.