# Lesson 2: Summarizing Data

## Exercise Answers

### Exercise 2.1

- C
- A
- D
- A
- D

### Exercise 2.2

Previous Years | Frequency |
---|---|

0 | 2 |

1 | 5 |

2 | 4 |

3 | 3 |

4 | 1 |

5 | 1 |

6 | 1 |

7 | 0 |

8 | 1 |

9 | 0 |

10 | 0 |

11 | 0 |

12 | 1 |

Total | 19 |

### Exercise 2.3

- Create frequency distribution (done in Exercise 2.2, above)
- Identify the value that occurs most often.

Most common value is 1, so mode is 1 previous vaccination.

### Exercise 2.4

- Arrange the observations in increasing order.

0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 4, 5, 6, 8, 12 - Find the middle position of the distribution with 19 observations.

Middle position = (19 + 1) ⁄ 2 = 10 - Identify the value at the middle position.

0, 0, 1, 1, 1, 1, 1, 2, 2, *2*, 2, 3, 3, 3, 4, 5, 6, 8, 12

Counting from the left or right to the 10^{th}position, the value is 2. So the median = 2 previous vaccinations.

### Exercise 2.5

- Add all of the observed values in the distribution.

2 + 0 + 3 + 1 + 0 + 1 + 2 + 2 + 4 + 8 + 1 + 3 + 3 + 12 + 1 + 6 + 2 + 5 + 1 = 57 - Divide the sum by the number of observations

57 ⁄ 19 = 3.0So the mean is 3.0 previous vaccinations

### Exercise 2.6

Using Method A:

- Take the log (in this case, to base 2) of each value.
ID # Convalescent Log base 2 1 1:512 9 2 1:512 9 3 1:128 7 4 1:512 9 5 1:1024 10 6 1:1024 10 7 1:2048 11 8 1:128 7 9 1:4096 12 10 1:1024 10 - Calculate the mean of the log values by summing and dividing by the number of observations (10).

Mean of log_{2}(x_{i}) = (9 + 9 + 7 + 9 + 10 + 10 + 11 + 7 + 12 + 10) ⁄ 10 = 94 ⁄ 10 = 9.4 - Take the antilog of the mean of the log values to get the geometric mean.

Antilog_{2}(9.4) = 2^{9.4}= 675.59. Therefore, the geometric mean dilution titer is 1:675.6.

### Exercise 2.7

- E or A; equal number of patients in 1999 and 1998.
- C or B; mean and median are very close, so either would be acceptable.
- E or A; for a nominal variable, the most frequent category is the mode.
- D
- B; mean is skewed, so median is better choice.
- B; mean is skewed, so median is better choice.

### Exercise 2.8

- Arrange the observations in increasing order.

0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 4, 5, 6, 8, 12 - Find the position of the 1
^{st}and 3^{rd}quartiles. Note that the distribution has 19 observations.

Position of Q_{1}= (n + 1) ⁄ 4 = (19 + 1) ⁄ 4 = 5

Position of Q_{3}= 3(n + 1) ⁄ 4 = 3(19 + 1) ⁄ 4 = 15 - Identify the value of the 1
^{st}and 3^{rd}quartiles.

Value at Q_{1}(position 5) = 1

Value at Q_{3}(position 15) = 4 - Calculate the interquartile range as Q
_{3}minus Q_{1}.

Interquartile range = 4 − 1 = 3 - The median (at position 10) is 2. Note that the distance between Q
_{1}and the median is 2 − 1 = 1. The distance between Q_{3}and the median is 4 − 2 = 2. This indicates that the vaccination data is skewed slightly to the right (tail points to greater number of previous vaccinations).

### Exercise 2.9

- Calculate the arithmetic mean.

Mean = (2 + 0 + 3 + 1 + 0 + 1 + 2 + 2 + 4 + 8 + 1 + 3 + 3 + 12 + 1 + 6 + 2 + 5 + 1) ⁄ 19

= 57 ⁄ 19

= 3.0 - Subtract the mean from each observation. Square the difference.
- Sum the squared differences.
Value Minus Mean Difference Difference Squared 2 − 3.0 −1.0 1.0 0 − 3.0 −3.0 9.0 3 − 3.0 0.0 0.0 1 − 3.0 −2.0 4.0 0 − 3.0 − 3.0 9.0 1 − 3.0 −2.0 4.0 2 − 3.0 −1.0 1.0 2 − 3.0 −1.0 1.0 4 − 3.0 1.0 1.0 8 −3.0 5.0 25.0 1 − 3.0 −2.0 4.0 57 − 57.0 = 0 0.0 162.0 - Divide the sum of the squared differences by n − 1.

Variance = 162 ⁄ (19 − 1) = 162 ⁄ 18 = 9.0 previous vaccinations squared - Take the square root of the variance. This is the standard deviation.

Standard deviation = 9.0 = 3.0 previous vaccinations

### Exercise 2.10

Standard error of the mean = 42 divided by the square root of 4,462 = 0.629

### Exercise 2.11

- Summarize the blood level data with a frequency distribution.
**Table 2.14 Frequency Distribution (1:g/dL Intervals) of Blood Lead Levels — Rural Village, 1996 (Intervals with No Observations Not Shown)**Blood Lead Level (g/dL) Frequency 17 1 26 2 35 1 38 1 39 1 44 1 45 1 46 1 49 1 50 1 54 1 56 1 Blood Lead Level (g/dL) Frequency 57 2 58 3 61 1 63 1 64 1 67 1 68 1 69 1 72 1 73 1 74 1 Blood Lead Level (g/dL) Frequency 76 2 78 3 79 1 84 1 86 1 103 1 104 1 Unknown 48 To summarize the data further you could use intervals of 5, 10, or perhaps even 20 mcg/dL. Table 2.15 below uses 10 mcg/dL intervals.Table 2.15 Frequency Distribution (10 mcg/dL Intervals) of Blood Lead Levels — Rural Village, 1996

Blood Lead Level (g/dL) Frequency 0–9 0 10–19 1 20–29 2 30–39 3 40–49 6 50–59 8 60–69 6 70–79 9 80–89 2 90–99 0 100–110 2 Total 39 - Calculate the arithmetic mean.

Arithmetic mean = sum ⁄ n = 2,363 ⁄ 39 = 60.6 mcg/dL - Identify the median and interquartile range.

Median at (39 + 1) ⁄ 2 = 20^{th}position. Median = value at 20^{th}position = 58

Q_{1}at (39 + 1) ⁄ 4 = 10^{th}position. Q_{1}= value at 10^{th}position = 48

Q_{3}at 3 × Q_{1}position = 30^{th}position. Q_{3}= value at 30^{th}position = 76 - Subtract the arithmetic mean (question 2) from each of the 39 observed blood level levels.

Square each of these differences (“deviations”).

Sum the squared deviations = 14,577.59

Divide the sum of the squared deviations by n-1 to find the variance.

14,577.59 ∕ 39 = 383.62

Take the square root of the variance to find the standard deviation.

√383.62 = 19.6. - Calculate the geometric mean using the log lead levels provided.

Geometric mean = 10^{(68.45 ⁄ 39)}= 10^{(1.7551)}= 56.9 mcg/dL