**Executive Summary**

The purpose for this report is to analyze the data and help a team of sports scientist understand more about the health effects of physical exercise. The data was created by students in a statistics class which included five class groups between 1993 and 1998. The process was students recorded their own pulse rate first, and then flipped a coin. If a coin came up heads, they run for a minute; otherwise, they stayed for a minute. At the end, they took their pulse again. In these years, they used different methods for this experiment to collect the data. Therefore, sometimes the pre-assigned had equal number for running or non-running. On the other hand, the forms were returned so the data could not entirely controlled.

Based on the limitation of data, we used following processes which are understanding the problem, collected and revised the received data, developed the assumptions, thought where data can be wrong, upgraded data, calculated and analyzed for data, and then came with the final conclusions.

This report will provide a detailed analysis for the paired comparison between two different times of pulses and accuracy for this experiment. Meanwhile, it also presented the relationship and affect for life style and physiological measurements. The conclusions found though the data were pulses changed heavily on Ran; as well there is no evidence that some students did not run even though their coin toss came up heads. As well, the pulse also depends on different lifestyle, physiological measurements, and years.

** **

**Data Analysis **

**The paired comparison **

The paired comparison between Pulse 1 and Pulse 2 heavily depend on Run. During this experiment, the paired comparisons is a good model to use because the data matched pairs experimental design and used before-after designs. To be more specific, this experiment recorded two different pulses; which were Pulse 1 (First pulse measurement) and Pulse 2 (Second pulse measurement). Therefore, it is a good model to use.

This paired comparison was carried out to investigate whether the effect of pulse was heavily depended on running or not. Therefore, the purpose is to test the null hypothesis that the pulse change is zero. In this case, the two-side test will be used because the pulse can increase or decrease. Meanwhile, the “differences in differences” analysis would be used, which meant two independent samples, RUN and SAT, would be compared separately for the before-after pulses.

For both groups of students, null hypothesis need to test because it’s the proof for whether the pulses changed or not.

**RUN Student Analysis**

RUN group of student’s pulse rate changed. Based on the Appendix A, the Paired T-Test and CI between Pulse1 and Pulse2 for RUN students, showed the 95% confidence interval for difference were -57.74 and -44.93, which meant Ho=0 is outside. As well, the corresponding P-value is 0 that is about 0%. Therefore, this represents a very strong evidence to reject Ho (No change for pulse). Appendix A also shows the pulses 2 was increased because the mean for pulse 1 was 76 and pulse 2 was 127, which reflected around 67% increase. Thus, it proofs that the pulse increasesfor the RUN students group.

**SAT Student Analysis**

SAT group of student’s pulse rate changed. Based on the Appendix B, the Paired T-Test and CI between Pulse1 and Pulse2 for SAT students, showed the 95% confidence interval for difference were 0.012 and 1.988, which meant Ho=0 is outside. As well, the corresponding P-value is 0.047 that is about 4.7%. Therefore, this represents some evidence to reject Ho at 5% level (No change for pulse). Appendix B also shows the pulses 2 was increased a little because the mean for pulse 1 was 76 and pulse 2 was 75, which reflected around 1.3% decrease. Thus, it proofs that the pulse did not affect a lot for the SAT students group.

**RUN & SAT Analysis (2-Sample Test)**

Based on the above evidences, the data shows the conclusion that the pulse 1 and pulse 2 are different for both RUN and SAT group students. For the further analysis, the data shows that pulse changed heavily depend on RUN.

In this analysis, the 2-sample test was used, the purpose is to compared the difference between pulse 1 and pulse 2 in RUN and SAT groups of students. These two-difference group has different variance can also be proved by Appendix C (2 Variance test), which shows these two samples have no relationships. The data can fit into this model becausethe central limit theorem states “a sample size of 10 or 20 is large enough for the theorem to work”. (1) In this experiment, the data has 110 independent samples. Although the data separated into Ran students (N=45) and SAT students (N=64), the data still could fit into this model. (N>30)

Therefore, the Appendix D can safely transfer to the following two graphs (Graph 1 and Graph 2). Both graphs all can shows that the pulse 1 & 2 all changed but the pulse 1 and pulse 2 are heavily depend on RUN group that has 67% increased from pulse 1 to pulse 2.

**The accuracy for the data**

There is no evidence that some students did not run even though their coin toss came up heads. In this data, it gave different RUN and SAT group of data in different years. However, different years used different method to collect data, therefore, the accuracy need to be test. The purpose for the test is to prove whether some students did not run even though their coin toss came up head. The accuracy of the data should be test in different years. Therefore, the one proportion method was used to test each year. In this method, the hypothesized proportion was 0.5 because toss the coin has 2 result (Head and Tail), therefore the probability for them is 0.5 because of normal distribution.

In the Appendix E (Test and CI for One Proportion 93-98), it shows there is no evidence that the probabilities of student who tossed a head did not run because all exact P-value >0.05 that is greater than 5%. However, it should be noted that the sample in each year is small and some year’s data is missing – perhaps there are some bias though this analysis for the accuracy of data.

** **

**Relationship between year, lifestyle and physiological measurements**

**Pulse 1 Analysis**

**Height, Weight, and Age**

In the graph 3, this graph presents the relationship between Pulse 1, Height, Weight, and Age; the height and weight do not have obvious relationship which just cause a slightly difference. However, the graph 4 can shows that the elder people have a lower pulse.

**Gender**

In the graph 5, it shows that male has a higher average than female inpulse 1. Although it has one outlier in both samples, they will not affect a lot for the whole data pool.

**Lifestyle**

In the graph 6, it shows that excise can affect the pulse which reflect that the more exercise they do, the lower pulse they have. In the graph 7, it shows that smoke and drink or not will not affect a lot. It has lots of outlier for no smoke and no alcohol column. Therefore, the better conclusion need a larger sample to support and reduce the bias and error in the data

** **

** **

**Pulse 2 Analysis**

**Height, Weight, and Age**

In the graph 8, this graph presents the relationship between Pulse 2, Height, Weight, and Age; the height and weight do not have obvious relationship which just cause a slightly difference. However, the graph 9 can shows that the elder people has a lower pulse.

**Gender**

In the graph 10, it shows that female has a higher average than male inpulse 2.

**Lifestyle**

In the graph 11, it shows that excise can affect the pulse which reflect that the more exercise they do, the lower pulse they have. In the graph 12, it shows that smoke will not affect a lot but alcohol can affect which shows that Alcohol people has a higher pulse than the non-alcohol people.

**Year Analysis**

Basedon the data from different years, the graph 13 and 14 show analysis for pulse 1 and pulse 2, which shows the average pulses for pulse 1 in different years almost same. However, the pulse 2 in 98 was higher than average which can be affect by an outlier in the data.

** **

**Conclusion**

All in all, this experiment shows run can affect the pulse a lot. However, the data has some extreme outliers that can affect the accuracy for gender and lifestyle. Although the data is not normality and 100% accurate, it is robust to provide some evidence for health effects of physical exercise. The data shows some important finding between pulse, lifestyle, and physiological measurement. Male has a lower average pulse than female. Age and pulse has a negative relationship that means the elder people has a lower pulse. Alcohol and pulse has a positive relationship which means that alcohol people has a higher pulse. For smoke, it does not affect a lot for the pulse. However, it should be noted that the analysis is not accurate enough- perhaps the most sensible conclusion need a larger sample or replace the extreme outliers.