Lesson 2: Summarizing Data
Exercise Answers
Exercise 2.1
- C
- A
- D
- A
- D
Exercise 2.2
Previous Years | Frequency |
---|---|
0 | 2 |
1 | 5 |
2 | 4 |
3 | 3 |
4 | 1 |
5 | 1 |
6 | 1 |
7 | 0 |
8 | 1 |
9 | 0 |
10 | 0 |
11 | 0 |
12 | 1 |
Total | 19 |
Exercise 2.3
- Create frequency distribution (done in Exercise 2.2, above)
- Identify the value that occurs most often.
Most common value is 1, so mode is 1 previous vaccination.
Exercise 2.4
- Arrange the observations in increasing order.
0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 4, 5, 6, 8, 12 - Find the middle position of the distribution with 19 observations.
Middle position = (19 + 1) ⁄ 2 = 10 - Identify the value at the middle position.
0, 0, 1, 1, 1, 1, 1, 2, 2, *2*, 2, 3, 3, 3, 4, 5, 6, 8, 12
Counting from the left or right to the 10th position, the value is 2. So the median = 2 previous vaccinations.
Exercise 2.5
- Add all of the observed values in the distribution.
2 + 0 + 3 + 1 + 0 + 1 + 2 + 2 + 4 + 8 + 1 + 3 + 3 + 12 + 1 + 6 + 2 + 5 + 1 = 57 - Divide the sum by the number of observations
57 ⁄ 19 = 3.0So the mean is 3.0 previous vaccinations
Exercise 2.6
Using Method A:
- Take the log (in this case, to base 2) of each value.
ID # Convalescent Log base 2 1 1:512 9 2 1:512 9 3 1:128 7 4 1:512 9 5 1:1024 10 6 1:1024 10 7 1:2048 11 8 1:128 7 9 1:4096 12 10 1:1024 10 - Calculate the mean of the log values by summing and dividing by the number of observations (10).
Mean of log2(xi) = (9 + 9 + 7 + 9 + 10 + 10 + 11 + 7 + 12 + 10) ⁄ 10 = 94 ⁄ 10 = 9.4 - Take the antilog of the mean of the log values to get the geometric mean.
Antilog2(9.4) = 29.4 = 675.59. Therefore, the geometric mean dilution titer is 1:675.6.
Exercise 2.7
- E or A; equal number of patients in 1999 and 1998.
- C or B; mean and median are very close, so either would be acceptable.
- E or A; for a nominal variable, the most frequent category is the mode.
- D
- B; mean is skewed, so median is better choice.
- B; mean is skewed, so median is better choice.
Exercise 2.8
- Arrange the observations in increasing order.
0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 4, 5, 6, 8, 12 - Find the position of the 1st and 3rd quartiles. Note that the distribution has 19 observations.
Position of Q1 = (n + 1) ⁄ 4 = (19 + 1) ⁄ 4 = 5
Position of Q3 = 3(n + 1) ⁄ 4 = 3(19 + 1) ⁄ 4 = 15 - Identify the value of the 1st and 3rd quartiles.
Value at Q1 (position 5) = 1
Value at Q3 (position 15) = 4 - Calculate the interquartile range as Q3 minus Q1.
Interquartile range = 4 − 1 = 3 - The median (at position 10) is 2. Note that the distance between Q1 and the median is 2 − 1 = 1. The distance between Q3 and the median is 4 − 2 = 2. This indicates that the vaccination data is skewed slightly to the right (tail points to greater number of previous vaccinations).
Exercise 2.9
- Calculate the arithmetic mean.
Mean = (2 + 0 + 3 + 1 + 0 + 1 + 2 + 2 + 4 + 8 + 1 + 3 + 3 + 12 + 1 + 6 + 2 + 5 + 1) ⁄ 19
= 57 ⁄ 19
= 3.0 - Subtract the mean from each observation. Square the difference.
- Sum the squared differences.
Value Minus Mean Difference Difference Squared 2 − 3.0 −1.0 1.0 0 − 3.0 −3.0 9.0 3 − 3.0 0.0 0.0 1 − 3.0 −2.0 4.0 0 − 3.0 − 3.0 9.0 1 − 3.0 −2.0 4.0 2 − 3.0 −1.0 1.0 2 − 3.0 −1.0 1.0 4 − 3.0 1.0 1.0 8 −3.0 5.0 25.0 1 − 3.0 −2.0 4.0 57 − 57.0 = 0 0.0 162.0 - Divide the sum of the squared differences by n − 1.
Variance = 162 ⁄ (19 − 1) = 162 ⁄ 18 = 9.0 previous vaccinations squared - Take the square root of the variance. This is the standard deviation.
Standard deviation = 9.0 = 3.0 previous vaccinations
Exercise 2.10
Standard error of the mean = 42 divided by the square root of 4,462 = 0.629
Exercise 2.11
- Summarize the blood level data with a frequency distribution.
Table 2.14 Frequency Distribution (1:g/dL Intervals) of Blood Lead Levels — Rural Village, 1996 (Intervals with No Observations Not Shown)
Blood Lead Level (g/dL) Frequency 17 1 26 2 35 1 38 1 39 1 44 1 45 1 46 1 49 1 50 1 54 1 56 1 Blood Lead Level (g/dL) Frequency 57 2 58 3 61 1 63 1 64 1 67 1 68 1 69 1 72 1 73 1 74 1 Blood Lead Level (g/dL) Frequency 76 2 78 3 79 1 84 1 86 1 103 1 104 1 Unknown 48 To summarize the data further you could use intervals of 5, 10, or perhaps even 20 mcg/dL. Table 2.15 below uses 10 mcg/dL intervals.Table 2.15 Frequency Distribution (10 mcg/dL Intervals) of Blood Lead Levels — Rural Village, 1996
Blood Lead Level (g/dL) Frequency 0–9 0 10–19 1 20–29 2 30–39 3 40–49 6 50–59 8 60–69 6 70–79 9 80–89 2 90–99 0 100–110 2 Total 39 - Calculate the arithmetic mean.
Arithmetic mean = sum ⁄ n = 2,363 ⁄ 39 = 60.6 mcg/dL - Identify the median and interquartile range.
Median at (39 + 1) ⁄ 2 = 20th position. Median = value at 20th position = 58
Q1 at (39 + 1) ⁄ 4 = 10th position. Q1 = value at 10th position = 48
Q3 at 3 × Q1 position = 30th position. Q3 = value at 30th position = 76 - Subtract the arithmetic mean (question 2) from each of the 39 observed blood level levels.
Square each of these differences (“deviations”).
Sum the squared deviations = 14,577.59
Divide the sum of the squared deviations by n-1 to find the variance.
14,577.59 ∕ 39 = 383.62
Take the square root of the variance to find the standard deviation.
√383.62 = 19.6. - Calculate the geometric mean using the log lead levels provided.
Geometric mean = 10(68.45 ⁄ 39) = 10(1.7551) = 56.9 mcg/dL