Lesson 4: Displaying Public Health Data

Section 2: Tables

Textbox module not selected or not found.

A table is a set of data arranged in rows and columns. Almost any quantitative information can be organized into a table. Tables are useful for demonstrating patterns, exceptions, differences, and other relationships. In addition, tables usually serve as the basis for preparing additional visual displays of data, such as graphs and charts, in which some of the details may be lost.

Tables designed to present data to others should be as simple as possible.(1) Two or three small tables, each focusing on a different aspect of the data, are easier to understand than a single large table that contains many details or variables.

A table in a printed publication should be self-explanatory. If a table is taken out of its original context, it should still convey all the information necessary for the reader to understand the data. To create a table that is self-explanatory, follow the guidelines below.

More About Constructing Tables

  • Use a clear and concise title that describes person, place and time — what, where, and when — of the data in the table. Precede the title with a table number.
  • Label each row and each column and include the units of measurement for the data (for example, years, mm Hg, mg/dl, rate per 100,000).
  • Show totals for rows and columns, where appropriate. If you show percentages (%), also give their total (always 100).
  • Identify missing or unknown data either within the table (for example, Table 4.11) or in a footnote below the table.
  • Explain any codes, abbreviations, or symbols in a footnote (for example, Syphilis P&S = primary and secondary syphilis).
  • Note exclusions in a footnote (e.g., 1 case and 2 controls with unknown family history were excluded from this analysis).
  • Note the source of the data below the table or in a footnote if the data are not original.

One-variable tables

In descriptive epidemiology, the most basic table is a simple frequency distribution with only one variable, such as Table 4.1a, which displays number of reported syphilis cases in the United States in 2002 by age group.(2) (Frequency distributions are discussed in Lesson 2.) In this type of frequency distribution table, the first column shows the values or categories of the variable represented by the data, such as age or sex. The second column shows the number of persons or events that fall into each category. In constructing any table, the choice of columns results from the interpretation to be made. In Table 4.1a, the point the analyst wishes to make is the role of age as a risk factor of syphilis. Thus, age group is chosen as column 1 and case count as column 2.

Textbox module not selected or not found.

Often, an additional column lists the percentage of persons or events in each category (see Table 4.1b). The percentages shown in Table 4.1b actually add up to 99.9% rather than 100.0% due to rounding to one decimal place. Rounding that results in totals of 99.9% or 100.1% is common in tables that show percentages. Nonetheless, the total percentage should be displayed as 100.0%, and a footnote explaining that the difference is due to rounding should be included.

The addition of percent to a table shows the relative burden of illness; for example, in Table 4.1b, we see that the largest contribution to illness for any single age category is from 35–39 year olds. The subsequent addition of cumulative percent (e.g., Table 4.1c) allows the public health analyst to illustrate the impact of a targeted intervention. Here, any intervention effective at preventing syphilis among young people and young adults (under age 35) would prevent almost half of the cases in this population.

The one-variable table can be further modified to show cumulative frequency and/or cumulative percentage, as in Table 4.1c. From this table, you can see at a glance that 46.7% of the primary and secondary syphilis cases occurred in persons younger than age 35 years, meaning that over half of the syphilis cases occurred in persons age 35 years or older. Note that the choice of age-groupings will affect the interpretation of your data.(3)

Table 4.1a Reported Cases of Primary and Secondary Syphilis by Age — United States, 2002

Age Group (years) Number of Cases
≤14 21
15–19 351
20–24 842
25–29 895
30–34 1,097
35–39 1,367
40–44 1,023
45–54 982
≥55 284
Total 6,862

Data Source: Centers for Disease Control and Prevention. Sexually Transmitted Disease Surveillance 2002. Atlanta: U.S. Department of Health and Human Services; 2003.

Table 4.1b Reported Cases of Primary and Secondary Syphilis by Age — United States, 2002

CASES
Age Group (years) Number Percent
Total 6,862 100.0*
≤14 21 0.3
15–19 351 5.1
20–24 842 12.3
25–29 895 13.0
30–34 1,097 16.0
35–39 1,367 19.9
40–44 1,023 14.9
45–54 982 14.3
≥55 284 4.1

* Actual total of percentages for this table is 99.9% and does not add to 100.0% due to rounding error.

Data Source: Centers for Disease Control and Prevention. Sexually Transmitted Disease Surveillance 2002. Atlanta: U.S. Department of Health and Human Services; 2003.

Table 4.1c Reported Cases of Primary and Secondary Syphilis by Age — United States, 2002

CASES
Age Group (years) Number Percent Cumulative Percent
Total 6,862 100.0* 100.0*
≤14 21 0.3 0.3
15–19 351 5.1 5.4
20–24 842 12.3 17.7
25–29 895 13 30.7
30–34 1,097 16 46.7
35–39 1,367 19.9 66.6
40–44 1,023 14.9 81.6
45–54 982 14.3 95.9
≥55 284 4.1 100

* Percentages do not add to 100.0% due to rounding error.

Data Source: Centers for Disease Control and Prevention. Sexually Transmitted Disease Surveillance 2002. Atlanta: U.S. Department of Health and Human Services; 2003.

Two- and three-variable tables

Tables 4.1a, 4.1b, and 4.1c show case counts (frequency) by a single variable, e.g., age. Data can also be cross-tabulated to show counts by an additional variable. Table 4.2 shows the number of syphilis cases cross-classified by both age group and sex of the patient.

Table 4.2 Reported Cases of Primary and Secondary Syphilis by Age and Sex — United States, 2002

NUMBER OF CASES
Age Group (years) Male Female Total
Total 5,268 1,594 6,862
≤14 9 12 21
15–19 135 216 351
20–24 533 309 842
25–29 668 227 895
30–34 877 220 1,097
35–39 1,121 246 1,367
40–44 845 178 1,023
45–54 825 157 982
≥55 255 29 284

Data Source: Centers for Disease Control and Prevention. Sexually Transmitted Disease Surveillance 2002. Atlanta: U.S. Department of Health and Human Services; 2003.

Textbox module not selected or not found.

A two-variable table with data categorized jointly by those two variables is known as a contingency table. Table 4.3 is an example of a special type of contingency table, in which each of the two variables has two categories. This type of table is called a two-by-two table and is a favorite among epidemiologists. Two-by-two tables are convenient for comparing persons with and without the exposure and those with and without the disease. From these data, epidemiologists can assess the relationship, if any, between the exposure and the disease. Table 4.3 is a two-by-two table that shows one of the key findings from an investigation of carbon monoxide poisoning following an ice storm and prolonged power failure in Maine.(4) In the table, the exposure variable, location of power generator, has two categories — inside or outside the home. Similarly the outcome variable, carbon monoxide poisoning, has two categories — cases (number of persons who became ill) and controls (number of persons who did not become ill).

Table 4.3 Generator Location and Risk of Carbon Monoxide Poisoning After an Ice Storm — Maine, 1998

NUMBER OF
Cases Controls Total
Total
27 162 189
Generator location
Inside home or
attached structure
23 23 46
Outside home
4 139 143

Data Source: Daley RW, Smith A, Paz-Argandona E, Mallilay J, McGeehin M. An outbreak of carbon monoxide poisoning after a major ice storm in Maine. J Emerg Med 2000;18:87–93.

Table 4.4 illustrates a generic format and standard notation for a two-by-two table. Disease status (e.g., ill versus well, sometimes denoted cases vs. controls if a case-control study) is usually designated along the top of the table, and exposure status (e.g., exposed versus not exposed) is designated along the side. The letters a, b, c, and d within the 4 cells of the two-by-two table refer to the number of persons with the disease status indicated above and the exposure status indicated to its left. For example, in Table 4.4, “c” represents the number of persons in the study who are ill but who did not have the exposure being studied. Note that the “Hi” represents horizontal totals; H1 and H0 represent the total number of exposed and unexposed persons, respectively. The “Vi” represents vertical totals; V1 and V0 represent the total number of ill and well persons (or cases and controls), respectively. The total number of subjects included in the two-by-two table is represented by the letter T (or N).

Table 4.4 General Format and Notation for a Two-by-Two Table

Ill Well Total Attack Rate (Risk)
Total a + c = V1 b + d = V0 T V1 ⁄ T
Exposed a b a + b = H1 a ⁄ a+b
Unexposed c d c + d = H0 c ⁄ c+d

When producing a table to display either in print or projection, it is best, generally, to limit the number of variables to one or two. One exception to this rule occurs when a third variable modifies the effect (technically, produces an interaction) of the first two. Table 4.5 is intended to convey the way in which race/ethnicity may modify the effect of age and sex on incidence of syphilis. Because three-way tables are often hard to understand, they should be used only when ample explanation and discussion is possible.

Table 4.5 Number of Reported Cases of Primary and Secondary Syphilis, by Race/Ethnicity, Age, and Sex — United States, 2002

Race/ethnicity Age Group (years) Male Female Total
American Indian/
Alaskan Native
≤14 1 0 1
15–19 0 1 1
20–24 5 3 8
25–29 3 1 4
30–34 1 2 3
35–39 3 5 8
40–44 4 3 7
45–54 8 8 16
≥55 2 1 3
Total 27 24 51
Asian/Pacific Islander ≤14 1 1 2
15–19 0 2 2
20–24 9 4 13
25–29 16 1 17
30–34 21 1 22
35–39 14 1 15
40–44 14 1 15
45–54 8 0 8
≥55 0 0 0
Total 83 11 94
Black, Non-Hispanic ≤14 3 9 12
15–19 89 164 253
20–24 313 233 546
25–29 322 163 485
30–34 310 166 476
35–39 385 183 568
40–44 305 142 447
45–54 370 112 482
≥55 129 23 152
Total 2,226 1,195 3,421
Hispanic ≤14 1 1 2
15–19 37 25 62
20–24 117 29 146
25–29 139 26 165
30–34 172 20 192
35–39 178 22 200
40–44 93 9 102
45–54 69 14 83
≥55 18 1 19
Total 824 147 971
White, Non-Hispanic ≤14 3 1 4
15–19 9 24 33
20–24 89 40 129
25–29 188 36 224
30–34 373 31 404
35–39 541 35 576
40–44 429 23 452
45–54 370 23 393
≥55 106 4 110
Total 2,108 217 2,325

Data Source: Centers for Disease Control and Prevention. Sexually Transmitted Disease Surveillance 2002. Atlanta: U.S. Department of Health and Human Services; 2003. p. 118.

Pencil graphic Exercise 4.1

The data in Table 4.6 describe characteristics of the 38 persons who ate food at or from a church supper in Texas in August 2001. Fifteen of these persons later developed botulism.(5)

  1. Construct a table of the illness (botulism) by age group. Use botulism status (yes/no) as the column labels and age groups as the row labels.
  2. Construct a two-by-two table of the illness (botulism) by exposure to chicken.
  3. Construct a two-by-two table of the illness (botulism) by exposure to chili.
  4. Construct a three-way table of illness (botulism) by exposure to chili and chili leftovers.

Check your answer.

Table 4.6 Line Listing for Exercise 4.1

ID Age Attended Supper Case Date of Onset Case Status Ate Any Food Ate Chili Ate Chicken Ate Chili Leftovers
1 1 Y N Y Y Y N
2 3 Y Y 8/27 Lab-confirmed Y Y N N
3 7 Y Y 8/31 Lab-confirmed Y Y N N
4 7 Y N Y Y Y N
5 10 Y N Y Y N Y
6 17 Y Y 8/28 Lab-confirmed Y Y Y N
7 21 Y N N N N N
8 23 Y N Y Y N N
9 25 Y Y 8/26 Epi-linked Y Y N N
10 29 N Y 8/28 Lab-confirmed Y Unk Unk Y
11 38 Y N N N N N
12 39 Y N N N N N
13 41 Y N Y Y Y N
14 41 Y N N N N N
15 42 Y Y 8/26 Lab-confirmed Y Y Unk N
16 45 Y Y 8/26 Lab-confirmed Y Y Y Y
17 45 Y Y 8/27 Epi-linked Y Y Y N
18 46 Y N Y N Y N
19 47 Y N Y N Y N
20 48 Y Y 9/1 Lab-confirmed Y Y Unk N
21 50 Y Y 8/29 Epi-linked Y Y N N
22 50 Y N Y N Y N
23 50 Y N Y N N Y
24 52 Y Y 8/28 Lab-confirmed Y Y Y N
25 52 Y N N N N N
26 53 Y Y 8/27 Epi-linked Y Y Y N
27 53 Y N Y Y Y N
28 62 Y Y 8/27 Epi-linked Y Y Y N
29 62 Y N Y N Y N
30 63 Y N N N N N
31 67 Y N N N N N
32 68 Y N N N N N
33 69 Y N Y Y Y N
34 71 Y N Y N Y N
35 72 Y Y 8/27 Lab-confirmed Y Y Y N
36 74 Y N Y Y N N
37 74 Y N Y N Y N
38 78 Y Y 8/25 Epi-linked Y Y Y N

Data Source: Kalluri P, Crowe C, Reller M, Gaul L, Hayslett J, Barth S, Eliasberg S, Ferreira J, Holt K, Bengston S, Hendricks K, Sobel J. An outbreak of foodborne botulism associated with food sold at a salvage store in Texas. Clin Infect Dis 2003;37:1490–5.

Tables of statistical measures other than frequency

Tables 4.1–4.5 show case counts (frequency). The cells of a table could also display averages, rates, relative risks, or other epidemiological measures. As with any table, the title and/or headings must clearly identify what data are presented. For example, the title of Table 4.7 indicates that the data for reported cases of primary and secondary syphilis are rates rather than numbers.

Table 4.7 Rate per 100,000 Population for Reported Cases of Primary and Secondary Syphilis, by Age and Race — United States, 2002

Age Group (years) Am. Indian/ Alaska Native Asian/
Pacific Is.
Black, Non-
Hispanic
Hispanic White, Non-
Hispanic
Total
10–14 0.0 0.1 0.3 0.1 0.0 0.1
15–19 0.5 0.2 8.6 1.9 0.3 1.7
20–24 5.0 1.5 20.7 4.3 1.1 4.4
25–29 2.7 1.6 19.1 4.9 1.8 4.6
30–34 2.0 2.2 18.2 6.1 3.0 5.4
35–39 4.8 1.6 20.1 7.1 3.6 6.0
40–44 4.5 1.6 16.6 4.4 2.8 4.6
45–54 6.1 0.6 11.8 2.7 1.4 2.6
55–64 1.4 0.0 4.6 0.6 0.5 0.9
65+ 0.8 0.0 1.5 0.5 0.1 0.2
Totals 2.4 0.9 9.8 2.7 1.2 2.4

Data Source: Centers for Disease Control and Prevention. Sexually Transmitted Disease Surveillance 2002. Atlanta: U.S. Department of Health and Human Services; 2003.

Composite tables

To conserve space in a report or manuscript, several tables are sometimes combined into one. For example, epidemiologists often create simple frequency distributions by age, sex, and other demographic variables as separate tables, but editors may combine them into one large composite table for publication. Table 4.8 is an example of a composite table from the investigation of carbon monoxide poisoning following the power failure in Maine.(4)

It is important to realize that this type of table should not be interpreted as for a three-way table. The data in Table 4.8 have not been arrayed to indicate the interrelationship of sex, age, smoking, and disposition from medical care. Merely, several one variable tables (independently assessing the number of cases by each of these variables) have been concatenated for space conservation. So this table would not help in assessing the modification that smoking has on the risk of illness by age, for example. This difference also explains why portraying total values would be inappropriate and meaningless for Table 4.8.

Table 4.8 Number and Percentage of Confirmed Cases of Carbon Monoxide Poisoning Identified from Four Hospitals, by Selected Characteristics — Maine, January 1998

CASES
Characteristic Number Percent
Total cases 100 100
Sex (female) 59 59
Age (years)
0–3
5 5
4–12
17 17
13–18
9 9
19–64
52 52
≥65
17 17
Smokers 20 20
Disposition