Chapter 12 – Genome-wide association studies, ﬁeld synopses, and the development of the knowledge base on genetic variation and human diseases Tables

Human Genome Epidemiology (2^nd ed.): Building the evidence for using genetic information to improve health and prevent disease

“The findings and conclusions in this book are those of the author(s) and do not
necessarily represent the views of the funding agency.”

These chapters were published with modifications by Oxford University Press (2010)

Muin J. Khoury, Lars Bertram, Paolo Boffetta, Adam S. Butterworth, Stephen J. Chanock, Siobhan M. Dolan, Isabel Fortier, Montserrat Garcia-Closas, Marta Gwinn, Julian P. T. Higgins, A. Cecile J. W. Janssens, James M. Ostell, Ryan P. Owen, Roberta A. Pagon, Timothy R. Rebbeck, Nathaniel Rothman, Jonine L. Bernstein, Paul R. Burton, Harry Campbell, Anand P. Chokkalingam, Helena Furberg, Julian Little, Thomas R. O’Brien, Daniela Seminara, Paolo Vineis, Deborah M. Winn, Wei Yu, and John P. A. Ioannidis

Table 12-1
Trends in numbers of published articles on human genome epidemiology, meta-analyses, and genome-wide association studies and numbers of genes studied, by year, 2001–2007*
Year	No. of Genes^†	No. of Diseases	No. of Articles Published
Year	No. of Genes^†	No. of Diseases	Total	GWAS	Meta-Analyses^‡
2001	633	690	2,492	0	34
2002	794	855	3,196	0	45
2003	832	880	3,476	3	65
2004	1,124	1,021	4,280	0	86
2005	1,308	1,077	5,029	5	113
2006	1,502	1,109	5,364	12	155
2007	2,142	1,292	7,222	104	208
2008	3,336	1,203	7,659	134	236

^{*HuGE Navigator query}.
^†Genes column does not include the numbers of studied variants per gene (difficult to obtain).
^‡Meta analyses also include HuGE reviews.
GWAS: Genome-wide association studies (individual genes not counted in genes column, unless featured
in the paper).

[back to chapter]

Table 12-2
Key characteristics of pilot field synopses of genetic associations
	No. of Meta- Anal-yses	No. of Data Sets^*(a)	Thres-hold^† for Meta-Analysis	No. of Statist-ically Signi-ficant Asso-ciations^‡	Strong^§ (Grade A)	World Wide Web Address
Alzheimer disease^\|\|	228	1,072	4 data sets	53	NA	www.alzgene.org
Schizophrenia^#	118	1,179	4 data sets	24	4	http://www.schizophreniaforum.org/
DNA repair genes and various cancers	241	1,087	2 independent teams	31	3	www.episat.org/ episat/index.php
Bladder cancer	36	356	3 data sets	7	1	Not yet online
Coronary heart disease	48	1,039	–	4	0	www.chdgene.com
Preterm birth	17	87	3 data sets	2	0	www.prebic.net
Major depression	22	131	3 data sets	6	2	Not yet online

^*Total number of data sets included in the meta-analyses (not including data sets that did not undergo meta-analysis).
^†Authors’ prerequisite condition for conducting a meta-analysis.
^‡Statistically significant (P < 0.05) by random-effects calculations on the default (per allelele) analysis (for coronary heart disease, results are based on a meta-regression model and correspond to effects in the largest studies, while for DNA repair genes, both recessive and dominant models were investigated).
^§Grade AAA with regard to all three Venice criteria (18).
^||Current on February 27, 2008.
^#Current on April 30, 2008.
Data sets: the sum of data sets included in the meta-analyses (not including data sets that did not undergo meta-analysis); threshold: authors’ prerequisite condition for conducting metaanalysis; significant: p<0.05 by random effects calculations on the default (per allele) analysis (for coronary heart disease, results are based on a meta-regression model and correspond to effects in the largest studies, while for DNA repair genes both recessive and dominant models were investigated); strong (grade A): grade AAA in all three Venice criteria; online address: Web site for deposited data sets.

[back to chapter]

Table 12-3
Venice interim guidelines for assessing the credibility of cumulative evidence on genetic associations (Ioannidis et al., reference 22)
Criteria and Categories	Proposed Operationalization
Amount of evidence A: Large-scale evidence B: Moderate amount of evidence C: Little evidence	Thresholds may be defined on the basis of sample size, power, or false-discovery rate considerations. The frequency of the genetic variant of interest should be accounted for. As a simple rule, we suggest that category A require a sample size of more than 1,000 (total number in cases and controls, assuming a 1:1 ratio) evaluated in the least common genetic group of interest; that B correspond to a sample size of 100–1,000 evaluated in this group; and that C correspond to a sample size of less than 100 evaluated in this group (see “Discussion” section in the text and Table 12.2 for further elaboration).
Replication A: Extensive replication including at least 1 well-conducted meta-analysis with little between-study inconsistency B: Well-conducted meta-analysis with some methodological limitations or moderate between-study inconsistency C: No association; no independent replication; failed replication; scattered studies; flawed meta-analysis or large inconsistency	Between-study inconsistency entails statistical considerations (e.g., defined by metrics such as I 2, where values of 50% and above are considered large and values of 25–50% are considered moderate inconsistency) and also epidemiologic considerations for the similarity/standardization or at least harmonization of phenotyping, genotyping, and analytical models across studies. See “Discussion” section in the text for the threshold (statistical or other) required for claiming replication under different circumstances (e.g., with or without inclusion of the discovery data in situations with massive testing of polymorphisms).
Protection from bias A: Bias, if at all present, could affect the magnitude but probably not the presence of the association B: No obvious bias that may affect the presence of the association, but there is considerable missing information on the generation of evidence C: Considerable potential for or demonstrable bias that can affect even the presence or absence of the association	A prerequisite for A is that the bias due to phenotype measurement, genotype measurement, confounding (population stratification), and selective reporting (for meta-analyses) can be appraised as not being high (as shown in detail in Table 12.4)—plus, there is no other demonstrable bias in any other aspect of the design, analysis, or accumulation of the evidence that could invalidate the presence of the proposed association. In category B, although no strong biases are visible, there is no such assurance that major sources of bias have been minimized or accounted for, because information is missing on how phenotyping, genotyping, and confounding have been handled. Given that occult bias can never be ruled out completely, note that even in category A, we use the qualifier “probably.”

[back to chapter]

Table 12-4
Some checks for retrospective meta-analyses in field synopses of genetic associations
General checks for the occurrence of or susceptibility to potential problems^*
Small effect size (e.g., odds ratio <1.15-fold from the null value) Association lost with exclusion of first study Association lost with exclusion of HWE-violating studies or with adjustment for HWE Evidence for small-study effect in an asymmetry regression test with proper type I error (Stat Med. 2006;25:3443–3457) Evidence for excess of single studies with formally statistically significant results (Clin Trials. 2007;4:245–253)
Topic- or subject-specific checks: Consider whether they are problems
Unclear/misclassified phenotypes with possible differential misclassification against genotyping Differential misclassification of genotyping against phenotypes Major concerns for population stratification (need to justify for affecting odds ratio greater than 1.15-fold, not invoked to date) Any other reason (case-by-case basis) that would render the evidence for association highly questionable

^*All general checks are likely to have only modest, imperfect sensitivity and specificity for detecting problems. In particular for effect size, a small effect size may very well reflect a true association, since many genetic associations have small effect sizes. However, if this effect has been documented in a retrospective meta-analysis that is susceptible to publication and other reporting biases, it also needs to be replicated in a prospective setting where such biases cannot operate before high credibility can be attributed to it.

back to Chapter 12

Chapter 12 – Genome-wide association studies, ﬁeld synopses, and the development of the knowledge base on genetic variation and human diseases Tables

Human Genome Epidemiology (2nd ed.): Building the evidence for using genetic information to improve health and prevent disease

Table 12-1 Trends in numbers of published articles on human genome epidemiology, meta-analyses, and genome-wide association studies and numbers of genes studied, by year, 2001–2007*

Table 12-2 Key characteristics of pilot field synopses of genetic associations

Table 12-3 Venice interim guidelines for assessing the credibility of cumulative evidence on genetic associations (Ioannidis et al., reference 22)

Table 12-4 Some checks for retrospective meta-analyses in field synopses of genetic associations

Human Genome Epidemiology (2^nd ed.): Building the evidence for using genetic information to improve health and prevent disease

Table 12-1
Trends in numbers of published articles on human genome epidemiology, meta-analyses, and genome-wide association studies and numbers of genes studied, by year, 2001–2007*

Table 12-2
Key characteristics of pilot field synopses of genetic associations

Table 12-3
Venice interim guidelines for assessing the credibility of cumulative evidence on genetic associations (Ioannidis et al., reference 22)

Table 12-4
Some checks for retrospective meta-analyses in field synopses of genetic associations