Part IV: DEVELOPING, IMPLEMENTING, AND EVALUATING POPULATION INTERVENTIONS Chapter 18
‹View Table of Contents
necessarily represent the views of the funding agency.”
Genetics and Public Health in the 21st Century
Genetics and Prevention Effectiveness
Scott D. Grosse1 and Steven M. Teutsch2
1Office of Program Evaluation and Legislation, National Center for Environmental Health, Centers for Disease Control and Prevention, 4770 Buford Hwy, MS F29, Atlanta, GA 30341
2Outcomes Research & Management, Merck & Co., Inc., West Point, PA
INTRODUCTION
Advances in human genetics require systematic assessment for their rational translation into public health policy and practice. Prevention effectiveness research is the part of the policy assessment process that addresses tradeoffs among harms, benefits, and costs of disease-prevention strategies (1). If one strategy is both more effective and costs less than other strategies, and in addition poses no risk of harms, a decision is usually simple.More commonly, though, a strategy that is superior on one or more criteria, (i.e., is more effective or less costly), ranks poorly on another. In such cases, the tradeoffs need to be calculated with quantitative prevention effectiveness models. The results can then be used in developing guidelines and making resource allocation decisions.
Prevention effectiveness includes quantitative and qualitative methods of policy analysis. Qualitative issues include the ethical, legal, and social consequences of public choices, such as the differential effects of an intervention on population subgroups. Salient issues related to genetics include informed consent to genetic testing, stigmatization of individuals and groups, discrimination in employment, and access to insurance. These have implications for public policy as well as for individuals (2). For example, a program of genetic screening acceptable in a society with universal health insurance might pose unacceptable risk to individuals without guaranteed access to health care. These issues are addressed in section V of this book. Quantitative prevention effectiveness research integrates methods from economics, health services research, and technology assessment to analyze the cost of illness and the effectiveness, benefits, and costs of public health policies and programs. The most common analytic methods are decision analysis and economic analysis.
This chapter is intended to help the reader critically evaluate quantitative prevention effectiveness studies in genetics and to understand their uses and limitations. No prior knowledge of prevention effectiveness methods is assumed. Therefore, the first part of the chapter consists of an overview of the major types of analysis, definitions, underlying concepts, and rules for carrying out prevention effectiveness analyses. The second half of the paper applies these rules to case studies of recent economic evaluations of genetic screening, genetic testing, and genetic-test-specific therapeutic interventions.
Decision analysis and expected values
Decision analysis is used to calculate the expected values of health outcomes resulting from different strategies. An expected value is the average value of an outcome if a choice were repeated numerous times. It is defined as the sum of the products of the values of each event that could occur and the probabilities of each event occurring. For example, the expected value of a gamble that has a one tenth probability of yielding $100 and a nine-tenths probability of yielding nothing is $10 [(0.1 x 100) + (0.9 x 0)]. It is different from the typical yield, which is $0. Whether the value of a gamble is the same as the mathematical expectation depends on one’s risk preferences. A risk-averse individual, one who prefers a sure thing to an uncertain outcome with the same mathematical expectation, by definition would place a lower value on an uncertain outcome (3).
Life expectancy is an example of expected value. The future number of years lived by a cohort is calculated by multiplying age-specific survival rates by the number of individuals projected to be alive at the beginning of each age interval to calculate the number alive at the beginning of the next age interval. Life expectancy is calculated as the ratio of the projected number of years lived by all members of the cohort divided by the number of members of the cohort. Life expectancy is not necessarily the same as the life span of a typical individual.For example, a cohort of 100 people aged 60 years would have an additional life expectancy of 5 years if 20 are expected to survive 20 years and 80 are expected to each live for 1 year and 3 months. Use of life expectancies in decision analysis presumes risk neutrality (1). If analysts report the distribution of expected outcomes as well as the expected value, individuals who are risk averse can make their own assessments of the tradeoffs.
Building a decision model
In setting up a decision analysis, the analyst first creates a decision tree in which each strategy (intervention or no intervention) is assigned a branch (1). The expected value of each branch is calculated by multiplying the value of each outcome (e.g., years of life expectancy) by the probabilities of those outcomes. The probabilities vary depending upon factors such as choices made, biological factors, test characteristics, and behaviors. Under each strategy, separate branches are specified for each possible event, for example, becoming diseased or remaining healthy. In a disease branch, the average value of all possible outcomes (e.g., death, disability, recovery) is calculated and multiplied by the probabilities of each. The expected value of an intervention branch is the weighted average of the expected values of the disease and non-disease branches. Ultimately, the strategy with the highest expected value is considered the preferred choice.
A standard decision tree model involves discrete time periods and is nonrecursive.In contrast, a Markov model allows probabilities to vary by small increments, (e.g., annual incidence rates over a period of several decades), and individuals can cycle, that is, repeat states (4). While Markov models may more accurately reflect real-world situations, because of their complexity, they are difficult to document in journal articles. Spreadsheets can be used to approximate the results of a Markov model, as discussed below.
Utility assessment and QALYs
Life expectancy is only one potential outcome measure in decision or cost-effectiveness analyses. Analyses that consider only life years as an outcome measure may tend to favor interventions that extend the probability of life at the risk of serious side effects over interventions that are more beneficial in terms of perceived quality of life. For this reason, it is preferable to use an outcome measure that incorporates the potential harms and benefits in terms of health or quality of life among surviving individuals as a result of an intervention. To do this, one needs a common metric integrating morbidity and mortality outcomes. The most commonly used metric of this kind is an index known as quality-adjusted life years or QALYs (5).
The calculation of quality-adjusted life years is based on the use of expected utilities to value health outcomes. Utility refers to people’s values or preferences for different states. If people have stable preferences, it is possible to compute indices that combine the expected utility of living in a state of impaired health with the utility of being alive in good health (3). To calculate quality-adjusted life years, one first multiplies the utility for each health state (UI) by expected durations of time spent in each state (ti). The sum of these products is a QALY index. In mathematical terms,
QALYs = Σ Ui ti
One way to derive utility weights for QALYs is to use standard population-based multiattribute scales such as the Quality of Well-being Scale (QWB) or the Health Utilities Index (HUI) (6). If this approach is taken, the next step is to determine the symptoms or characteristics of a condition that correspond most closely to the items on the scale. This approach can work well for analyses of well-defined conditions. It is especially suitable for analyses performed from a societal viewpoint, since the scales are often based on community preferences.
The other common way of deriving utility weights is from primary survey data. This approach is particularly well suited for clinical decision analyses. It allows the impact of variation in individual responses on conclusions to be assessed. Several methods can be used to directly elicit preferences about multiple health states (3,6). In the standard gamble, individuals are asked to choose between remaining in a state of ill health and undergoing a procedure that will either return them to perfect health or kill them with a defined probability. The probability is varied until the individuals indicate they have no preference between the two choices. In the time-tradeoff method, individuals are asked how much time in a state of full health they would be willing to trade in return for a longer time alive in less than perfect health.
Economic analyses
Economic analyses of public health programs are generally identified with calculations of monetary costs. This is a rather narrow perspective, since economic theory is framed in terms of utility maximization, not simply in terms of dollars. One major limitation of many economic analyses is that they do not include the costs of pain and suffering. These are real costs that influence decision-making but are left out of economic analyses that focus on accounting costs. These costs are at least partially included in the expected utilities used to calculate QALYs, which constitutes another argument in favor of using QALYs as outcome measures for economic evaluations of health interventions.
Economic evaluations of health programs can be classified as partial, intermediate, or full evaluations (7). A full economic evaluation incorporates all aspects of the costs and benefits of an intervention. A partial economic evaluation consists of one component of a full evaluation. These include decision analyses, which model the effectiveness but not the cost of an intervention. Partial evaluations also include two types of cost accounting analyses: cost-of-illness and cost identification studies. Cost-of-illness studies are used to calculate the cost burden of a condition or illness, which sets an upper limit on the economic benefit of a preventive intervention that prevents all new cases of the condition. Cost identification studies are used to assess the costs of delivering interventions.
There are two primary methods for conducting a full economic evaluation of a health intervention exist, the most common of which is cost-effectiveness analysis (CEA). This type of evaluation gives results in terms of the ratio of cost per unit of improvement in health outcomes achieved. If the net cost of an intervention is negative, the intervention is said to be cost saving. In that case, a cost-effectiveness ratio is not meaningful. The other major type of economic evaluation, cost-benefit analysis (CBA), converts health outcomes into monetary values. This method is less commonly used in public health because of disagreement over the suitability and validity of methods of putting monetary valuations on health states and life (5).
The health outcome used in the denominator of a cost-effectiveness ratio can be either in the form of physical units, such as years of life saved, or utility indices such as QALYs. Many people refer to an analysis that uses QALYs as the denominator in the outcome ratio as a cost-utility analysis (CUA). However, a recent expert panel convened by the U.S. Public Health Service recommends that all cost-effectiveness analyses use QALYs whenever possible (6). Use of a standard outcome measure such as QALYs allows analysts to use a common denominator to compare the cost-effectiveness of interventions.
The calculation of a cost-effectiveness ratio can be represented in simplified form as follows. First, assume that an intervention, A, is being compared with a baseline of no intervention, O. The cost-effectiveness of the intervention is the total cost of the intervention plus the cost of illness if the intervention is implemented (Cost of illnessA) minus the cost of illness if one does nothing (Cost of illness0) divided by the difference between net outcomes under the intervention (Health outcomesA) compared with the baseline (Health outcomes 0).
CE ratio= Intervention cost + (Cost of illnessA – Cost of illness0)
(Health outcomesA – Health outcomes0)
If more than two alternatives are modeled, each intervention when compared to the baseline yields an average cost-effectiveness ratio. In addition, it is important to calculate incremental cost-effectiveness ratios that compare the costs and health outcomes of pairs of interventions (1). Interventions that are both more expensive and less effective than other interventions are said to be dominated, and are excluded from the calculation of incremental cost-effectiveness ratios. Average and incremental cost-effectiveness ratios may yield different conclusions. For example, universal screening of newborns for sickle-cell disease in a population with a low prevalence of the mutation has been reported to have a low (favorable) cost-effectiveness ratio when compared with no screening but a high (less favorable) cost-effectiveness ratio when compared with racially-targeted screening (10).
Cost-effectiveness analysis is often confused with intermediate economic analyses that examine only short-term outcomes. Ratios such as the cost of screening per person screened or per case identified are often referred to as estimates of cost-effectiveness, but this is not correct. A cost-effectiveness estimate requires calculation of all costs, benefits, and harms flowing from the identification of affected individuals. Although one strategy may have a lower cost per person tested or cost per case identified than the other, this is not sufficient to favor one strategy over the other if the two strategies result in different numbers of individuals identified. The higher-cost strategy may be preferred if it results in more cases identified and if the incremental benefit of case identification exceeds the incremental cost. This limitation is not true of cost-minimization analyses, which compare interventions such as screening protocols that have approximately the same outcomes to determine which one costs the least to operate (6).
Single numerical estimates of net benefits or cost-effectiveness ratios may provide a misleading sense of precision. Often, not much is known about key parameters such as the magnitude of a protective effect or the costs of an intervention. To clarify the degree of uncertainty and the degree to which policy decisions may have varying effects depending upon the projected benefit and cost of an intervention, prevention effectiveness studies typically report sensitivity analyses. Sensitivity analyses indicate how cost-effectiveness estimates vary under ranges of assumptions about key parameters. Recently, methods used to calculate confidence intervals around cost-effectiveness estimates have also become available. If a specific intervention is favored under a wide range of assumptions, the result is said to be robust.
Evaluating a cost-effectiveness study
Readers need to be critical in evaluating prevention effectiveness analyses, especially economic evaluations (7-9). A number of excellent texts on preparing economic evaluations of health interventions have been published (1,5,67). However, the peer-review process does not guarantee that recommended guidelines have been followed. The BMJ Economic Evaluation Working Party has published a checklist for reviewers and editors to use in assessing economic analyses (9). A paraphrased version of the BMJ checklist below condenses and rearranges the original 35 items into a set of 12 questions (Table 1). The questions refer to a full economic analysis; some of the questions are not relevant in evaluating a partial economic evaluation. Definitions of concepts such as discounting are explained in the text.
Table 1. Checklist for assessing economic analyses of health interventions
- Is the research question stated and its importance justified?
- Are the alternatives compared clearly described and the rationale for their choice presented? Is an incremental analysis reported for alternative interventions?
- Is the viewpoint (perspective) of the analysis clearly stated and justified?
- Are costs clearly and appropriately defined and sources reported?
- Is the time horizon of costs and benefits stated?
- Is the discount rate stated and justified?
- Are costs reported for specific years, along with details of adjustments for inflation?
- Is the source of effectiveness estimates stated and details of how the estimates were derived presented?
- Are the outcome measures of the analysis and their methods of calculation stated?
- Is a sensitivity analysis reported and the choice of variables and ranges justified?
- Do conclusions address the study question and follow from the data reported?
- Are the conclusions accompanied by appropriate caveats?
Framing the study
When framing a prevention effectiveness study, investigators must answer a series of issues. First, they should choose an appropriate study question, a hypothesis that is both testable and likely to contribute to a policy decision if answered. Second, investigators should ensure that all viable, policy-relevant alternatives are included. An evaluation that does not consider all viable intervention strategies may yield misleading results, because an excluded option may be more attractive than any of the strategies modeled.
The viewpoint or perspective from which the analysis is conducted must correspond to the study question and the intended audience (8). Public health impact is best addressed by analyses that use the societal perspective, which integrates costs experienced by all relevant groups, including the health care system and individuals or families. In the health care system perspective, only medical costs are included. The consensus of experts is that this should never be the only perspective used, although it is useful as a complement to the societal perspective (6). If the audience of a study consists of health care plan managers, a payer perspective is appropriate. However, switching of individuals among health plans, which reduces the benefits of prevention to the payer, should be taken into account. Finally, many decision analyses address harms and benefits faced by individuals, without considering costs to the health care system or society. This type of clinical decision analysis is valuable for case management but is less useful for drawing inferences for decisions by insurers or public policy makers.
Definitions of costs
The costs of disease include direct costs, productivity losses (often referred to as indirect costs), and intangible costs. Direct costs include medical care, services such as physical therapy or special education, and the time and travel costs of families. Direct costs are calculated after subtracting the usual medical care or education costs incurred by individuals in order to not overstate the benefits of preventing disease. Productivity losses consist of the useful work lost because of disease. These are excluded from analyses that use QALYs as outcome measures in order to avoid double counting (1). Psychosocial or intangible costs are excluded from cost-effectiveness analyses but may be included in cost-benefit analyses. Intervention costs can also be broken down into direct costs and overhead or indirect costs.
The choice of costs to include should be determined by the analytic perspective. Analyses from the health system perspective include the direct medical costs of disease and the costs of medical interventions but not patient or nonmedical costs. This type of analysis is less demanding of data, but the perspective should derive from the study question, not convenience. The societal perspective should encompass all costs associated with an intervention. For example, the costs of screening include all costs that follow upon a positive result, including diagnosis, follow-up, and treatment, as well as the costs of organizing and promoting screening. Published cost analyses often exclude many relevant costs and hence may offer misleadingly low estimates of screening costs.
Charges or list prices are often used to approximate the costs of purchased inputs. This presumes the existence of competitive markets. In non-competitive markets where there are barriers to entry and relatively few buyers or sellers, costs and prices may diverge. For example, list prices for commercial genetic tests may be multiples of actual resource costs (11). Actual amounts paid by purchasers of health services as recorded in medical claims databases may also be used in place of charges. However, these may understate the costs of providing services, since service providers can shift costs to other consumers.
In economic analyses conducted from the societal perspective, costs are estimated on the basis of resources consumed, which are not necessarily the same as payments. Resources are valued at their opportunity cost, which is the greatest value that resources could yield if employed elsewhere. For example, the value of time is measured by how much individuals could earn in another activity. Programs that use volunteers or donated or depreciated equipment may have low accounting costs but still incur substantial resource costs. The opportunity cost of time spent being screened or treated or providing care to family members is a major component of patient and family costs.
The most direct source of data on resource costs is micro-costing of quantities and unit costs of personnel, supplies, equipment, etc. (5). Micro-costing is laborious and may lead to underestimates of some types of costs for which detailed data are not available. A less-demanding approach is gross-costing, in which accounting data are used. One approach is to multiply stated charges hospital and physician services by available cost-to-charge ratios to approximate resource costs (6). As already mentioned, payments from medical claims databases may also be used. In general, gross-costing approaches are most commonly used for cost-of-illness estimates, while micro-costing is the preferred option for valuing the costs of delivering interventions.
In costing interventions, investigators should clearly state the method they use to allocate shared or overhead costs. They may follow accounting principles in assigning overhead charges to each activity (5). However, in economic analyses, only costs that vary with an intervention are included. Marginal costs are costs that vary with the scale of the intervention and may include some administrative costs. If the question is whether an intervention is adopted at all, incremental costs are relevant (8). Incremental cost is the difference in cost associated with one program or set of interventions and the costs of running another program. The incremental cost of an intervention that expands an existing program, for example, is the additional cost of running the expanded program after subtracting the cost of running the existing program.
Time
The time horizon of a study has two elements, the time frame and the analytic horizon (1). The time frame is the period over which intervention costs are measured, typically one year. The analytic horizon is the period over which costs and benefits associated with health outcomes resulting from early diagnosis and treatment are calculated. The analytic horizon may be a defined period (e.g., 10 or 20 years) or the lifetimes of the individuals receiving the intervention. The latter approach is in general preferred.
If the analytic horizon is longer than one year, costs are discounted to account for differential timing. The rationale for discounting includes time preference (people prefer to have benefits sooner) and the opportunity cost of resources (i.e., expected return on investments). As an example of discounting, suppose that an intervention yields $2 million in benefits 20 years in the future.At a discount rate of 5% per year, the benefits are worth $753,779 today, calculated according to the financial formula for the present value of a future sum. If the intervention cost $1 million, the net present value of the intervention would be negative if one used a 5% discount rate but would be positive if one did not employ discounting. In cost-effectiveness analysis, future health outcomes are discounted at the same rate as future costs or monetary benefits in part to avoid favoring interventions that yield health benefits far in the future (1).
Cost-effectiveness studies commonly use discount rates of 3% or 5%. The Panel on Cost-effectiveness in Health and Medicine recommends that estimates of societal time preference and returns to capital are consistent with both numbers, especially a 3% discount rate (6). Since a higher discount rate makes benefits occurring far in the future less attractive, it is important to compare studies using the same discount rate when evaluating interventions with long analytic horizons over which benefits are assessed. Studies conducted from the perspective of specific sectors should use discount rates reflecting the cost of capital and time preference of the group whose perspective is being modeled. For example, an analysis conducted from a payer perspective might use a discount rate that reflects the payer’s opportunity cost of capital, which may be much higher than 3 to 5%.
Inflation adjustment is needed in order to make cost data from different years equivalent. Suppose that one has data on intervention costs from 1994, earnings data from 1990, and budget data from 1998. In order to make the dollar figures equivalent, one would need to translate each set of data into the same year’s dollars. Earnings data are adjusted on the basis changes in hourly compensation, whereas medical costs are adjusted on the basis of changes in the medical component of the consumer price index. This method of inflation adjustment can overstate the costs of items for which costs either decrease or rise relatively slowly owing to increased technical efficiency in production.
Effectiveness and outcomes
The source of data on the effectiveness of an intervention is critical in assessing the validity of an analysis (9). Estimates taken from a randomized controlled trial are more reliable than estimates from observational data. The quality of data from observational studies is highly variable. The number of cases, completeness of follow-up, and representativeness of the data should be considered. “Expert opinion” is a less reliable source of estimates of effectiveness. For screening, the accuracy or validity of the screening tests and the effectiveness of the interventions that follow a positive diagnosis need to be considered. The two important test characteristics are sensitivity, the fraction of true cases that are detected, and specificity, the fraction of unaffected individuals that test negative. Optimistic assumptions about sensitivity and specificity can make a screening intervention appear unrealistically beneficial.
Sensitivity = | TP | Specificity = | TP | |
TP + FN | TN + FP |
where
TP = True positives (affected individuals who test positive) True positives (affected individuals who test positive)FP = False positives (unaffected individuals who test positive) False positives (unaffected individuals who test positive)
TN = True negatives (unaffected individuals who test negative) True negatives (unaffected individuals who test negative)
FN = False negatives (affected individuals who test negative) False negatives (affected individuals who test negative)
Efficacy, the benefit of an intervention conducted under ideal conditions, differs from effectiveness, the expected benefit in routine practice (12). The major cause of divergence between efficacy and effectiveness is incomplete adherence, including uptake of screening and adherence to prescribed interventions. Models that assume that everyone offered screening will accept it or come back for follow-up visits and comply with prescribed treatments can greatly overstate the benefits of the intervention.
Investigators should clearly define and justify the outcome measure they use to assess interventions. Expected years of life gained is a commonly used outcome; however, this measure may be problematic for two reasons. First, if there are significant non-fatal outcomes, results may be misleading. For example, focusing on mortality as an outcome overvalues interventions that prevent premature mortality but cause adverse health effects. This problem can be overcome by using quality-adjusted life years, which allow for the incorporation of harms and benefits into a single outcome measure. Because methods for calculating QALYs have not been standardized, presentation of information on how they are computed is critical to an intelligent interpretation of the study. The second problem with use of expected life years as an outcome measure, which applies to QALYs as well, is that expected values do not reflect individual risk preferences. For individual decision analysis models, it is desirable to report the distribution of expected outcomes, not just mean values.
Sensitivity analyses
Accounting for uncertainty is a critical facet of a prevention effectiveness analysis (1). Estimates of costs and effectiveness are usually imprecise and uncertain. Sensitivity analyses are used to quantify the impact of this uncertainty and model assumptions on a finding that a particular intervention is or is not cost saving or cost-effective. This can be done in one of two ways. One way is to vary parameters within ranges of plausible values to see if results change qualitatively. If not, the results are considered robust. The second way is a threshold analysis that calculates the value of a parameter that results in a qualitatively different outcome from the base case analysis. Instead of conducting a sensitivity analysis, it is also possible to calculate confidence intervals for cost-effectiveness ratios based on the distributions of values for each estimate of cost and effectiveness (13).
Most sensitivity analyses report the results of changing just one or two variables at a time, but a broader approach may be better (5). In a one-way sensitivity analysis, researchers vary one parameter at a time within a range of plausible values in order to see how the outcome of the model changes as the parameter changes. For example, if the estimate of efficacy is a 50% reduction in risk, a range from 30% to 70% could be tested in a sensitivity analysis. In two-way sensitivity analyses, two parameters are simultaneously varied. While sensitivity analyses indicate which parameters have the greatest influence on results, they may lead to overconfidence in the robustness of results. Often a result is qualitatively unaffected by changes in one or two parameters, but a set of plausible parameter values may reverse the conclusion. In order to avoid this hazard, researchers may also conduct “worst case” analyses (5).
Drawing conclusions
Conclusions should squarely address the study question posed at the beginning of an article and not over generalize. For example, an analysis from an individual perspective may not be generalizable to policy or reimbursement questions. External validity of results is dependent upon the representativeness of the data used. For example, the cost-effectiveness of screening depends upon the prevalence of the condition being screened for, and the results may be valid only for populations with similar prevalences.
Conclusions should also be accompanied by caveats, including recognition of potential harms (9). The conclusion should highlight results, including worst-case analyses that may have policy implications. The conclusion should indicate the degree to which assumptions about parameters are conservative in the sense of making a proposed intervention look less favorable than would other values that could have been chosen. If an intervention is cost-effective even under relatively unfavorable assumptions, greater confidence can be placed in the findings. Finally, in analyses that use QALYs as outcome measures, researchers should discuss how different methods of measuring or weighting preferences could affect the results.
CASE STUDIES OF PREVENTION EFFECTIVENESS AND GENETICS
The remainder of the chapter consists of a critical review of the economic and decision analysis literature on certain genetic disorders and interventions. The case studies selected for review are recently published prevention effectiveness analyses of population-based screening for genetic disorders, genetic testing of family members, and prophylactic surgeries contingent on genetic tests. The Task Force on Genetic Testing has defined genetic tests to include not only molecular tests that analyze human DNA, RNA, and chromosomes, but also biochemical tests of proteins and metabolites that can identify diseases caused by variants in single genes (14). The relative advantage of these two types of genetic tests (biochemical vs. molecular or DNA) is an important issue in population screening for genetic conditions. Single-gene conditions, such as PKU or sickle cell disease (SCD), that are already routinely identified through population-based biochemical tests of newborn infants are not explicitly addressed here.
The first case studies address population screening for two autosomal recessive single-gene conditions: cystic fibrosis, a disease that manifests in early childhood, and hereditary hemochromatosis, a disease of adults. In autosomal recessive disorders, individuals with two mutated alleles, whether homozygotes (two copies of the same variant allele) or compound heterozygotes (copies of two different variant alleles) are likely to become diseased while heterozygotes may be phenotypically normal. The remaining case studies address susceptibility genotypes for colorectal, breast, and ovarian cancers. In the case of autosomal dominant cancer syndromes, the carrier of a single mutated allele is at elevated risk of disease.
The penetrance, or risk of disease associated with a particular genotype, may be an important predictor of the clinical utility and prevention effectiveness of a test or intervention for a genetic disorder. If penetrance is very high, almost all individuals with the affected genotype will eventually become diseased. If penetrance is modest, for each case of disease there may be multiple individuals who do not experience disease. Since any harms resulting from identification and/or intervention are borne by all individuals with an affected genotype, yet benefits occur only for the fraction of cases resulting in phenotypic disease, the ratio of benefit to harm is lower if penetrance is lower.
Cystic fibrosis
Cystic fibrosis (CF) is a disorder of chloride transport across membranes that causes accumulation of mucus in the lungs and pancreas. Repeated infections, poor nutritional status, and lung dysfunction and destruction result from this condition. CF occurs in 1 in 3,000 white Americans, 1 in 15,000 African-Americans, and 1 in 30,000 Asian-Americans. Over 500 known mutations on the cystic fibrosis transmembrane conductance regulator (CFTR) gene have been identified. The most common of these is the DF508 allele, which comprises two thirds of CFTR mutant alleles among Americans of European ancestry with CF, although less than half of those among individuals of non-European backgrounds (15).
Identifying infants with cystic fibrosis allows for early initiation of therapy that may ameliorate the progression of disease but does not prevent the development of symptoms. Studies comparing cohorts that were or were not screened at birth indicate that early identification yields benefit in nutritional status and lung function (16). A randomized controlled trial, conducted in Wisconsin with enrollment during 1988-94, reports significantly greater heights for children screened at birth than for affected children who were not identified at birth (17). However, the published results are inconclusive owing to potential selection bias (18). Until the cohort was unblinded at age 4 years, measurements were taken on all affected children identified at birth, including those without symptoms, while children from the other arm of the study were measured only after identification on the basis of symptoms.
Newborn screening for CF for a number of years has been conducted in several countries as well as by state newborn screening programs in Colorado and Wisconsin and being introduced by additional states. Introducing CF screening in state newborn screening programs remains controversial. An expert group convened by CDC in January 1997 recommended additional pilot CF newborn screening programs (19), while a NIH Consensus Development Conference in April 1997 recommended against newborn CF screening (20).
A full economic evaluation of newborn CF screening has not yet been published. Such a study would require data that are not yet available on costs, benefits, and harms of screening compared to diagnosis on the basis of clinical symptoms. Costs of not screening include the medical tests and procedures used to rule out other causes before a CF diagnosis is established, as well as the parental anxiety and time involved in this process. Costs of screening include additional medical services provided following an earlier CF diagnosis. A systematic assessment of these costs has not yet been published.
Partial economic evaluations of newborn CF screening have compared two types of screening strategies. The first approach uses elevated immunoreactive trypsinogen (IRT) measures on newborn dried blood spots to identify children needing repeat IRT tests, followed by diagnosis on the basis of a sweat test for those with a second positive IRT test. Since the identification in 1989 of mutations on the CFTR gene, many programs have instituted a two-tier screening program in which samples with an initial elevated IRT are immediately subjected to a mutation analysis (21,22). This second approach eliminates the need for additional blood samples and requires fewer children to be referred for sweat tests. One disadvantage of this approach is that individuals without the CFTR mutations being tested for will be missed. Another is that carrier status detected by mutation analysis may be unwanted information, and carriers may be subject to stigmatization or discrimination.
Several analyses compare the costs of biochemical (IRT) and biochemical-molecular (IRT/DNA) methods of screening newborns for CF. For example, the incremental cost of adding a CF screening test to a newborn screening panel has been calculated to be $1.09 in Australia (21) and $1.60 in Wisconsin (22), both in U.S. dollars. There is inconsistency across studies in the costs that are included; administrative costs, sweat tests, and genetic counseling are often excluded. One analysis uses cost data from Wisconsin in conjunction with program data from two IRT/DNA programs in Wisconsin and South Australia, and two IRT programs in Colorado and northeastern Italy (23). The analysis follows a health system perspective and uses average costs, including part of the cost of specimen collection. The authors report that the IRT/DNA screening strategy employed in South Australia costs the least per case identified. This is due to a higher prevalence of CF in Australia; the IRT/DNA strategy is calculated to cost more per child screened. Specifically, the standardized cost per newborn tested for CF was $5.54 in Colorado, $5.68 in Italy, $5.80 in Australia, and $5.96 in Wisconsin, in 1994 U.S. dollars.
An analysis of CF screening costs in New Zealand calculates incremental costs, which is appropriate for a newborn screening program, which already collects, dried blood spots (24). During a 6-month period in 1995 an IRT/IRT protocol and an IRT/DNA protocol were simultaneously followed. All infants referred for diagnostic testing received genetic counseling and testing, as well as a sweat test, in order to provide conclusive confirmation of CF status. Because identical cases were reported, this is a cost-minimization analysis that compares the costs of two strategies with identical outcomes. When costs are analyzed from the laboratory perspective, IRT/DNA screening is found to be more expensive, $0.88 vs. $0.71 per newborn for IRT/IRT, in 1995 U.S. dollars.
The New Zealand study also employs a societal perspective by including parental time costs and provider costs. When all costs associated with CF screening are included, the IRT/DNA method used in New Zealand is found to be less costly, $1.85 vs. $3.07 per newborn for IRT/IRT. The lower costs result from fewer infants being recalled for blood draws and sweat tests. This conclusion cannot be validly generalized to other screening and diagnostic protocols, though (24). For example, the provider and parent time costs of the IRT/IRT protocol could be reduced if the second dried blood spot specimen were collected as part of a routine well-baby visit. The costs would also be reduced if only infants with positive sweat test results received genetic counseling and testing. On the other hand, screening costs are lower in New Zealand because of the use of in-house IRT assays. Because screening and diagnostic protocols are rarely standardized, it may be misleading to generalize on the basis of cost data from a single program.
Hereditary hemochromatosis
The most prevalent genetic disorder in the United States is hereditary hemochromatosis (HH), an inborn error of iron metabolism that causes excess absorption of dietary iron and can lead to iron overload. Hemochromatosis has many clinical manifestations, including liver cirrhosis, hepatocellular carcinoma, diabetes mellitus, and cardiomyopathy. The number of individuals affected by clinical hemochromatosis is unknown, because of widespread underdiagnosis. Most individuals who have hereditary hemochromatosis do not display symptoms, and there is disagreement over how best to determine which individuals have the condition. Estimates of prevalence of HH vary depending upon definitions and tests used, as well as the ethnic composition of the population. The most commonly cited estimates of prevalence in the United States are in the range of 2 to 5 per 1000 population (25).
Hemochromatosis is an attractive candidate for population-based screening because of the relatively high prevalence of the condition and the ability to prevent clinical disease with early identification. Treatment consists of phlebotomy to normalize iron stores, followed by regular phlebotomies three or four times per year for life. Treatment appears to prevent morbidity and normalize life expectancy in individuals detected prior to the development of cirrhosis or diabetes (26). On the other hand, an expert panel recently concluded that not enough is yet known about the natural history of hemochromatosis to recommend population-based screening (27). In particular, it is not known how many individuals identified through screening would develop clinical disease in the absence of screening.
Unlike in the case of CF, a number of full economic evaluations of screening for hereditary hemochromatosis have already been published. During 1994-95, six economic analyses of screening for HH using biochemical measures were published; all conclude that HH screening is either cost saving or highly cost-effective (28-33). One of these studies considers testing family members of affected individuals (28). The other five studies address population-based screening of adults for HH, including testing of family members of individuals identified through screening. Two of these studies (29,30) have previously been evaluated with regard to criteria for an economic evaluation (8). Three of the five studies are cost-effectiveness analyses that relate total direct costs (intervention costs and averted medical care costs) to changes in life expectancy (29,31,32). In the other two, cost per case identified is the main outcome measure (30,33). These are cost-identification studies, not cost-benefit or cost-effectiveness analyses according to conventional criteria (5).
Relatively few published full economic evaluations meet all of the criteria summarized in Table 1. In the case of the HH studies reviewed here, the study question is not framed adequately in terms of modeling a realistic screening intervention. Each of the studies is based on an idealized testing and intervention protocol that assumes that individuals comply fully with recommendations. None of the models allows for the fact that individuals may drop out at various stages of the screening, diagnostic, and treatment process. With attrition or incomplete adherence, the numbers of cases of disease prevented would be smaller than assumed, with adverse implications for calculated cost-effectiveness.
With regard to the second criterion, none of the studies considered a broad range of alternatives. Each compares a single screening strategy with the alternative of no screening. Hence, there is no incremental analysis of the costs and benefits of various screening strategies, and it is not possible to compare their relative cost-effectiveness. Incremental cost-effectiveness, together with other policy considerations, is particularly important in evaluating targeted versus universal screening. Two studies evaluate screening targeted to males (31) or white males (32) yet do not consider the costs, benefits, and harms accruing to individuals excluded from screening. A potential rationale for racial targeting is that hemochromatosis is much less common among non-whites; for example, one U.S. study reports that the prevalence of hemochromatosis is 3.5 per 1000 among whites and 0.5 per 1000 among non-whites (33). One study (29) reports separate cost-effectiveness ratios for males and females, allowing policy-makers to assess the harms and benefits of targeting screening to males.
The analytic perspective is either not stated or is not employed consistently. A public health policy analysis should employ the societal perspective, including costs to individuals, payers, providers, and governments. None of the studies includes costs to individuals, even though one of the studies states that a societal perspective was employed (32). Each study restricts itself to direct medical costs, which would be consistent with a health care system or payer perspective. However, coverage of direct medical costs was incomplete, for example one study leaving out costs of hospitalization for complications following liver biopsy (29).
The published hemochromatosis cost-effectiveness analyses all assume that the fraction of persons who test positive for HH is the same as the population prevalence. This assumption is valid for a test administered to individuals who have not previously been tested. Since the yield on repeat tests of individuals who have already tested negative is presumably extremely low, repeat testing must somehow be ruled out. Two studies propose screening either individuals attending physician offices (31) or blood donors (29). The results for these two studies may apply to a one-time intervention but cannot be extrapolated to routine screening as a public health intervention. The other study proposes screening cohorts of men when they turn 30 years of age (32). This strategy could be implemented on a routine basis without reduction in yield in subsequent waves.
Two of the CEA studies use elevated transferrin saturation (TS) as the first screening test, but with different cutoff points (31-32). One study assumes a test, unsaturated iron-binding capacity (UIBC), whose test validity has not been established and for which a price was not available (29). Two of the studies address complications from liver biopsy and allow for people to refuse a biopsy (29,32). Two of the studies discount future costs at a 3% rate (29,32), while the other study eschews discounting (31).
NR – Not Reported
The biggest unknown is the direct medical costs associated with hemochromatosis. This is a function of two things: the numbers of individuals with untreated HH who develop clinical symptoms and the cost of treating symptomatic individuals. The studies are in close agreement on the former but differ widely on the latter. One study uses a single cost of $4,000 per year for treating individuals with clinical disease (31), while the other two report treatment costs for specific symptoms. There is extremely wide variation in the treatment cost data.One reports the costs of treating liver cirrhosis as C$1,000 per year in outpatient costs and C$50,000 in the last year of life for hospitalization (29). The other assumes that the cost of treating cirrhosis is $250,000 for a liver transplant in 1% of cases (32). The cost of treating hepatocellular carcinoma is given as C$50,000 in the first study and $1,000 in the second study. Hospitalization cost for congestive heart failure is represented as C$10,416 in the first study and $45,000 in the second study. The costs of treatment are multiplied by the disease penetrance or fraction of individuals assumed to eventually develop disease manifestations (50% in two of the studies, 28-43% in the other study). If clinical data overstate penetrance among individuals identified through population screening (22), treatment costs and the benefits of prevention will also be overstated.
Uncertainty regarding parameter estimates can be addressed through sensitivity analysis. Each of the hemochromatosis studies reports the results of one-way or two-way sensitivity analyses. One of the biggest issues is penetrance, the proportion of individuals identified through screening who would eventually develop clinical symptoms if undiagnosed. One analysis reports that screening men for HH is cost saving if at least 40% develop symptomatic disease, while if the fraction is 20% the cost per life-year saved is just below $10,000 (32). Another reports a threshold of 52.7% for the same variable (31). The other study reports that HH screening is cost saving in males if the probability of disease symptoms exceeds 0.30 (29). Only one of the studies reports a multivariate sensitivity analysis, restricted to a set of four test validity parameters (31). This study reports that the cost per life year saved could be as high as $39,410 in the worst-case scenario.If other variables had been included in the worst-case scenario, the cost per life year saved could have been higher.
The results of the published sensitivity analyses are consistent with HH screening appearing to be cost-effective under a range of assumptions. A number of limitations need to be considered. First, none of the studies calculate the threshold for disease penetrance below which HH screening would be considered not cost-effective. For this reason, the question of whether HH screening is cost-effective at low levels of penetrance cannot be answered, even if all other assumptions are accurate. Second, sensitivity analyses only apply to parameters included in the models. The effects of excluding parameters such as patient time costs and adherence to referrals for testing or phlebotomy regimens are not addressed. It is not known whether taking these and other relevant variables into account might reverse conclusions about cost-effectiveness from a societal perspective.
The identification of the HFE candidate gene in 1996 has allowed for molecular tests to be used in screening for or diagnosing hereditary hemochromatosis. Between 60% and 85% of HH cases in the U.S. are homozygous for the C282Y missense mutation on the HFE gene (25). Smaller numbers of individuals with HH are homozygotes for the H63D missense mutation or complex heterozygotes for the C282Y and H63D mutations. The frequency of the C282Y mutation varies according to ancestral origins and is highest in populations of northwestern European ancestry. A compilation of findings from smaller studies reports that the frequency of C282Y homozygosity among unselected individuals is approximately 5 per 1000 in northern Europeans, less than 1 per 1000 among southern and eastern Europeans (e.g., Italians, Greeks), and extremely rare among non-Europeans (34).
The cost of a DNA test for population-based HH screening of blood donors has been assessed in one study, published in abstract form (35). The authors conclude that genetic testing, based on detection of C282Y homozygotes, costs less per case identified than phenotypic testing using transferrin saturation and serum ferritin tests. The analysis fails to identify the analytic perspective and assumes an unrealistically low cost of the molecular test, $10, less than one tenth of a typical charge (36). Also, it is unlikely that such a screening strategy would be approved for ethical reasons. Non-C282Y-homozygotes, comprising at least 15% of HH cases, would be missed by such a test. To avoid this problem, a genetic testing strategy for HH could include tests for H63D homozygosity and compound heterozygosity (37). However, because many more individuals would test positive and require treatment while only a small fraction of additional cases would be identified, cost-effectiveness of such a testing strategy would presumably be less favorable than for a C282Y homozygosity test.In either case, genetic testing of the population for HFE mutations could result in stigmatization and unnecessary treatment (27).
Mutation analysis on the HFE gene as a diagnostic substitute for liver biopsy has been addressed in a partial economic analysis from Australia that compares the costs of HH screening strategies in which a C282Y mutation test is or is not included in the diagnostic phase (36). At a cost of $120 for an HFE mutation analysis, compared to $900 for liver biopsy, it is reported that diagnostic costs are less expensive if mutation analysis is substituted for liver biopsy. This assumes that disease penetrance among homozygotes with repeated elevated TS measures is the same as among individuals with elevated iron stores determined by liver biopsy. The analysis does not take into account the availability of other diagnostic methods, including quantitative phlebotomy, for at least some cases. It unrealistically assumes 100% sensitivity of all tests, both biochemical and molecular, and 100% adherence with screening and treatment. Finally, only medical costs are included in the analysis. The likely effect of the exclusion of patient time costs is to understate the advantage of a molecular test, since inclusion of patient time costs can be expected to increase any cost advantage of a protocol that reduces the number of visits.
Inherited colorectal cancer syndromes
At least 10% of cases of colorectal cancer (CRC) are due to Mendelian-inherited genetic disorders (38). The two major forms of inherited CRC syndromes are familial adenomatous polyposis (FAP), associated with hundreds of mutations on the APC gene, and hereditary nonpolyposis colorectal carcinoma (HNPCC), most commonly associated with mismatch repair gene mutations on the hMSH2 and hMLH1 genes. Both are autosomal dominant disorders, so individuals who carry a single variant allele (heterozygotes) are at risk of developing disease, unlike recessive disorders where mutation carriers are generally phenotypically normal. Both FAP and HNPCC are associated with early onset of colon cancer, especially FAP. APC mutations associated with FAP are thought to have a penetrance of close to 100% by age 50. Prophylactic removal of the large bowel, with or without the rectum, is the standard clinical recommendation for individuals found to have multiple adenomas (39). Individuals identified on the basis of mutation analysis as being susceptible to FAP are recommended to undergo surveillance until the development of adenomas, and then have prophylactic surgery to prevent the emergence of cancer. Owing to heterogeneity, recommendations for surgery should be based on estimates of individual risk (40).
One cost analysis of two different strategies for preventing cancer in FAP pedigrees has been published (41). This analysis, conducted from a payer perspective, compares the costs of surveillance of family members with the cost of molecular tests for family members followed by surveillance of individuals identified as carrying a mutation. In each case, surveillance is by flexible sigmoidoscopy until the age of 50 or the emergence of adenomas. The conclusion is that for a test cost below $833, genetic testing is less expensive than conventional surveillance. Because health outcomes were not measured, this is not a cost-effectiveness analysis.
The analysis of costs of preventing colon cancer in FAP family members has limitations. First, it assumes 100% adherence. This is unlikely for either genetic testing or colonoscopy. Second, the results are sensitive to assumptions about cost. The baseline estimate for the cost of a test for mutations on the APC gene is $750, which appears low. The analysis does not include the cost of genetic counseling. Alternative strategies for testing family members are not addressed, such as haplotype analysis in place of DNA analysis (42). Hence, the study may not offer sufficient information for a full policy analysis of genetic testing of FAP family members.
Several prevention effectiveness studies address HNPCC, much more common than FAP as a cause of colorectal cancer. Missense mutations on four DNA repair genes have been found in at least 70% of HNPCC-affected families, most of which are mutations on the hMSH2 and hMLH1 genes. Penetrance for colorectal cancer among individuals with HNPCC genotypes is between 80 and 90%, based on data from highly affected HNPCC kindreds, and the age of onset appears to be an average of 45 years. HNPCC is not limited to colorectal cancer; endometrial, ovarian, urinary tract, and stomach cancers are also common (43-44). Expert opinion is that members of HNPCC-affected families should receive routine colonoscopy every 1 to 3 years from age 25, along with endometrial cancer screening in women, unless they are known to not be mutation carriers (45). Prophylactic surgery is also an option, but is not generally recommended.
Two decision analyses of case management of HNPCC mutation carriers have been published (46-47). One, by Vasen et al., is a cost-effectiveness analysis of colonoscopic surveillance every 3 years beginning at age 25 compared to no surveillance (46). The health outcome is life expectancy. No analytic perspective is specified, but the stated uses include influencing legislation and health benefits packages. Only medical costs are included, with no indication of the methods by which costs were computed or derived. The analysis concludes that surveillance of carriers is cost saving. Since colonoscopic surveillance is already the standard of care for HNPCC carriers, it is not clear that this a policy-relevant study question. A more sophisticated decision analysis by Syngal et al. evaluates a dozen strategies for a hypothetical 25-year old female mutation carrier, including surgery, colonoscopic surveillance every 3 years, and no surveillance (47). Health outcomes include both life expectancy and quality-adjusted life years. The intended audience consists of patients and health care providers, which is appropriate given the study design, which excludes costs and addresses only individual harms and benefits.
The assumptions and results of the two decision analyses are summarized in Table 3 below. The two analyses conclude that colonoscopic surveillance can raise life expectancy by 7 years (46) and 13.5 years (47). The difference in projected gains in life expectancy from colonoscopy reflects differences in three assumptions, for each of which the second study makes assumptions that are more favorable to screening. The lifetime risk of developing colorectal cancer (CRC) is assumed to be 80% in one study and 88% in the other. The higher the cumulative risk, the greater the benefit of prevention. Similarly, the worse the prognosis is following diagnosis of cancer, the greater is the benefit of prevention. Syngal et al. assume a poor prognosis for HNPCC carriers with CRC, the same as for non-carriers. However, HNPCC carriers who develop CRC are more likely to have localized disease, with a relatively favorable prognosis and have higher age- and stage-specific survival rates than other CRC patients (48). Finally, the greater is the efficacy of colonoscopy, the greater is the benefit of surveillance. Both studies cite the same article (49) as the source of their divergent estimates. A sensitivity analysis in Vasen et al. indicates that assuming a 62% efficacy, as in Syngal et al., adds less than an additional year of life to the projected gain from surveillance, indicating that this accounts for little of the difference in results.
Both studies exclude from consideration extracolonic malignancies. The lifetime risks for HNPCC mutation carriers are reported to be 43% for endometrial cancer (in women) and 9% to 19% for gastric, biliary tract, urinary tract, and ovarian cancers (43). Since carriers are at elevated risk of other cancers, it is incorrect to assume that preventing CRC leads to normal life expectancy. The assumption is particularly problematic for the study by Syngal et al. (47), owing to the high rate of endometrial cancer in female HNPCC mutation carriers. Indeed, in a population-based study, the risk of endometrial cancer in female HNPCC carriers was found to exceed that of colorectal cancer (44). Syngal et al. acknowledge the high mortality risk from other cancers and the impact that excluding this factor from their model has on their results.
The second study models the harms and benefits of preventive options (47). Quality of life weights are calculated from a panel of ten physicians. Two different prophylactic surgeries are modeled, proctocolectomy with an assumed 100% efficacy, and subtotal colectomy with an assumed 80% efficacy. Besides immediate surgery for a 25-year-old mutation carrier, various delays in surgery are modeled, but the results indicate little gain in life expectancy. As shown in Table 4 , if life expectancy is the outcome, immediate prophylactic surgery is preferred to surveillance. In contrast, QALYs are higher for the surveillance option. The authors conclude that providers should be very cautious in recommending prophylactic surgery for high-risk patients, and that surveillance may be a more attractive option.
Life years | QALYs | |
---|---|---|
Immediate proctocolectomy
|
2.1
|
-3.1
|
Immediate subtotal colectomy
|
1.8
|
-0.
|
The study by Syngal et al. does a very good job of reporting and interpreting the sensitivity of their results to variation in utility weights. Of ten individual physicians who provided weights, QALYs would be maximized by the surveillance strategy for only five. Total colectomy would maximize expected utility for almost as many, four, and only one would maximize expected utility by choosing subtotal colectomy. The results are also sensitive to small variations in epidemiologic parameters whose magnitudes are not well established. For example, if colonoscopy is less than 57% effective (compared to a baseline estimate of 62% effectiveness), surveillance no longer maximizes QALYs at mean utility weights. The main conclusion of the study is that no global generalization can be made as to which option is optimal. An individualized assessment is needed that incorporates personal preferences.
Finally, one analysis has evaluated the potential cost-effectiveness of population-based carrier screening for HNPCC (50). One attractive feature of this analysis is its candid approach to dealing with uncertainty. Rather than specify a best estimate of the prevalence of the condition, the authors use a very wide range of prevalences, from 1 to 50 per 10,000. To favor screening for the sake of argument, it was assumed that surveillance would prevent all CRC mortality, that survival of HNPCC carriers with CRC is no better than for other CRC patients, and that no carriers die from extracolonic malignancies. Also, the assumed penetrance for CRC of 80% is based on classical HNPCC kindreds, and CRC penetrance among mutation carriers in the general population is lower, reportedly 56% (44). The conclusion is that screening is unlikely to be cost-effective unless there is a very high prevalence of the mutation in the population (above 23 per 10,000). The authors also point out that if realistic assumptions had been made, the case for population-based screening would appear even less promising.
Inherited breast and ovarian cancer syndrome
Germline mutations on the BRCA1 and BRCA2 genes are autosomal dominant mutations that lead to elevated risks of breast and ovarian cancer. The prevalence of BRCA1 and BRCA2 mutation carriers is 1-2 per 1000 in the general population. Perhaps 5% of breast cancers and up to 10% of ovarian cancers are attributable to these mutations (51). The exact penetrance of BRCA1 and BRCA2 mutations is unclear. In high-risk families, the probability of developing breast cancer by age 70 is 84-85% (52-53). The risk of ovarian cancer is more variable. The best estimate for BRCA1 carriers in high-risk families is 44%, allowing for variation among alleles (52). If no allowance is made for heterogeneity, the estimated risk is 63%, but this estimate is biased because of the selection criteria for recruitment of families. The risk for BRCA2 mutation carriers is lower, 27% on average (53). Penetrance may be lower for mutation carriers in the general population. Two studies conducted among individuals not recruited from high-risk families report cumulative risks of 56% and 68% for breast cancer and 16% and 21% for ovarian cancer, respectively (54-55). A BRCA2 mutation common in Iceland is reported to pose a cumulative risk for breast cancer to age 70 of only 37% (56).
Prevention effectiveness methods applied to BRCA1 and BRCA2 mutations address two questions. Should an individual seek genetic testing? If a susceptibility genotype is found, is preventive action beneficial? Since the benefits of genetic testing depend on an affirmative response to the second question, we first address the efficacy of prophylactic treatment.
Four published decision analyses model the benefits of prophylactic surgeries for BRCA1 and BRCA2 mutation carriers (57-60). The studies calculate results for a range of penetrance estimates. Estimates of penetrance among members of high-risk families are used to define “high” risk. A study of Ashkenazi Jewish volunteers is used to define a “medium” or “average” risk in three of the studies (57-59). In the first two of these studies “low” risk is defined on the basis of the lower bounds of the 95% confidence intervals. The choice of labels and risk groups is arbitrary; it may be that the “medium” risk parameter reflects relatively low risk.
Two of the studies share an author and epidemiologic assumptions but differ by focusing respectively on “average”-risk (59) or “high”-risk women (60). Both also model the decision to be tested. Results include that the benefit of being tested is also a function of the probability of being a mutation carrier and the probabilities of accepting prophylactic surgery with and without the test results.
The published decision models share several features. Each uses Markov models and age-specific cancer incidence rates. Each assumes that prophylactic oophorectomy is accompanied by hormone replacement therapy (HRT), at least until age 50. Finally, the analyses assume that BRCA2 carriers have the same risks for breast and ovarian cancer as BRCA1 carriers. The assumption that ovarian cancer risk is the same for BRCA1 and BRCA2 mutation carriers likely overstates the benefits of prophylactic removal of the ovaries.
Other assumptions differ among the models, associated with marked differences in outcomes. Table 5 lays out the differences in key assumptions, for scenarios involving a hypothetical 30 year-old female mutation carrier. One key parameter is the risk of ovarian cancer in high-risk mutation carriers. Schrag et al. choose to use the 44% estimate from Easton et al. (52) in place of the 63% estimate. The latter estimate ignores variability in risk across mutations and may overstate the benefits of oophorectomy. Another key parameter is the prognosis of ovarian cancer in mutation carriers. Grann et al. assume the same grim prognosis as for non-mutation carriers, while the other three studies all base their models on a finding from one study that the 5-year survival rate from ovarian cancer is several times higher for BRCA1 mutation carriers (61). The better the prognosis, the smaller the potential benefit from prophylactic oophorectomy. One study has reported that the efficacy of prophylactic oophorectomy in preventing ovarian or peritoneal cancer is 50% (62), and two of the decision analyses use this number (57-58). The other studies rely on an expert panel to come up with a 78% efficacy estimate (59-60). Finally, these last two studies assume that oophorectomy reduces the risk of breast cancer, and HRT reduces but does not eliminate this benefit.
NA – Not applicable
In line with varying assumptions, the studies yield varying results, reported variously in life years and QALYs in Table 6. For high-risk women, it is only possible to make pair wise comparisons. Table 6 gives results for the options of immediate prophylactic surgery for a hypothetical 30-year-old high-risk female mutation carrier.
The greater projected gain in life expectancy from oophorectomy in the Grann study compared to Schrag et al. is due to the assumptions of higher penetrance and worse prognosis from cancer by Grann et al compared to Schrag et al. The same difference in assumptions has the effect of reducing the expected benefit of prophylactic mastectomy in Grann et al. compared to Schrag et al. This is because mastectomy without oophorectomy is of lesser benefit if a mutation carrier has a very high risk of dying from ovarian cancer.For this reason, the projected gain in life expectancy with the combination of the two surgeries differs by much less across the studies than the relative gains from the individual surgeries. Compared with the same QALYs metric used in Grann et al., Berry and Parmigiani project much greater benefit from oophorectomy, attributable to a much higher assumed efficacy of prophylactic oophorectomy in preventing both ovarian and breast cancer.
The question of whether a 30-year-old mutation carrier should delay ovarian surgery has been addressed in two studies, with opposing results. Schrag et al. report that delaying prophylactic oophorectomy by 10 years would have almost no effect on life expectancy (-0.4 years in high-risk carriers) (57). In contrast, Tengs et al. conclude that delaying oophorectomy “is never optimal,” but they model only a 20-year delay (61). Schrag et al. assume that virtually no cases of ovarian cancer in high-risk carriers occur before age 40. This assumption is based on data for “average”-risk women (54) and is inconsistent with data on age of onset of ovarian cancer in BRCA1 carriers (61) as well as expert opinion recommending that women from high-risk families consider surgery by age 35 (63).
The impact of surgery on quality of life has been modeled in three studies. Grann et al. calculate QALYs based on a time-tradeoff survey of 54 women and mean expected utilities (58). There was a wide range of responses, and one quarter of the women surveyed considered there to be no loss of quality of life from prophylactic surgery. Using mean values of the responses, the authors report that the negative effects of surgery offset most of the gain in expected years of life for high-risk carriers and result in negative net effects on QALYs for medium-risk carriers. The other two studies report QALYs calculated using arbitrary utility weights selected for expository purposes (59-60). The authors assume that oophorectomy would lower quality of life by only 1%, compared to an average reduction of 9% in the Grann study.
Grann et al. also address cost-effectiveness (58). The analytic perspective is not stated, although one audience mentioned is health insurance companies. In the base analysis, health care costs are based on HCFA payments, which are lower than costs faced by private insurers. Non-medical costs are excluded, which means that the societal perspective, essential for drawing policy conclusions about cost-effectiveness, is not followed. The discount rate is 3%, appropriate for a health system perspective but less so from a payer’s perspective. In the base analysis, prophylactic oophorectomy and the combination of oophorectomy and mastectomy are reported to be cost saving, and on this basis the authors recommend that insurance companies cover these procedures in BRCA1 and BRCA2 mutation carriers. This conclusion is unwarranted for at least three reasons. First, the parameters are not defined from the perspective of health insurers. Second, with QALYs, net health effects are negative for medium-risk carriers. This disqualifies the interventions as cost saving for that group of patients. Third, the model assumes that all women receive surveillance recommended for high-risk female patients, including ultrasound every 6 months for ovarian cancer surveillance. If all mutation carriers are not already receiving this expensive procedure, any cost savings to payers may not be realizable.
CONCLUSIONS
The application of prevention effectiveness methods to genetic-related diseases or conditions has received increasing attention. The opportunity of identifying individuals in a pre-clinical state and initiating preventive therapy promises benefits in averting disease treatment costs and suffering. However, results from quantitative prevention effectiveness models to date are difficult to apply to public policy decisions regarding genetic testing or prevention strategies. This is due only in part to limitations in the application of prevention effectiveness methods. Typically, the epidemiologic data are too incomplete to reach firm conclusions about the long-term health benefits of screening or testing.
Even if the epidemiologic data were stronger, published standards for conducting economic evaluations of health interventions (1,5-9) have not generally been followed. For example, few cost-effectiveness studies use the societal perspective, as is recommended for any study addressing public health policies. Most analyses only include medical costs. Few studies report multivariate sensitivity analysis (5). Relatively few studies have used population-based data on health preferences to analyze quality of life considerations in assessing the harms and benefits of interventions.
Reliable cost data are generally lacking. For most of the conditions reviewed here, no empirical data on the cost of screening is available. For cystic fibrosis, although cost estimates are available from operational screening programs, data on the costs of providing health services to screened and unscreened infants are lacking. Studies of genetic testing for autosomal recessive disorders may fail to take into account genetic counseling and other costs associated with identifying unaffected mutation carriers (10). Finally, even if cost data and assumptions are unassailable, cost-effectiveness is not by itself a sufficient basis for policy decisions. A full policy analysis must address potential ethical and social harms, distributional issues, and broader societal and political ramifications.
The most important contribution of prevention effectiveness research to public health genetics to date is in providing a framework for understanding complex policy decisions. Clear-cut issues do not require mathematical models to resolve. When there are major harms or uncertainty about benefits, a decision analysis can shed new light. Prevention effectiveness studies help to identify data needs, facilitate understanding of uncertainty, and focus discussion on critical issues. For example, adverse effects of prophylactic surgery have been shown to be influential in deciding on optimal strategies for carriers of autosomal dominant cancer-susceptibility genes. Use of the societal perspective favors screening strategies that minimize the number of follow-up visits and invasive tests, since costs to individuals and families are important costs to consider when formulating public health policies.
Another issue identified as important in cost-effectiveness studies of genetic testing is the cost of DNA mutation analyses. As technology evolves, the costs of tests for multiple mutations are expected to decline and the cost-effectiveness of screening strategies that incorporate DNA tests should become more favorable relative to other approaches. For this reason, it is important to retain flexibility in modeling as new data emerge rather than to regard a particular set of results as definitive.
REFERENCES:
- Haddix AH, Teutsch SM, Shaffer PA, Dunet DO, eds. Prevention effectiveness: A guide to decision analysis and economic evaluation. New York: Oxford University Press, 1996.
- Khoury MJ, Genetics Working Group. From genes to public health: applications of genetics in disease prevention. Am J Pub Health 1996;86:1717-1722.
- Zweifel P, Breyer F. Health economics. New York: Oxford University Press, 1997.
- Sonnenberg FA, Beck JR. Markov models in medical decision making: a practical guide. Med Decis Making 1993;13:322-338.
- Drummond MF, O’Brien B, Stoddart GL, Torrance GW. Methods for the economic evaluation of health care programmes. Second ed. Oxford: Oxford University Press, 1997.
- Gold MR, Siegel JE, Russell LB, Weinstein MC, eds. Cost-effectiveness in health and medicine. New York: Oxford University Press, 1996.
- Drummond MF, Richardson WS, O’Brien BJ, et al. User’s guides to the medical literature. XIII. How to use an article on economic analysis of clinical practice. A. Are the results of the study valid? JAMA 1997;277:1552-1557.
- Provenzale D, Lipscomb J. A reader’s guide to economic analysis in the GI literature. Am J Gastroenterology 1996;91:2461-2470.
- Drummond MF, Jefferson TO. Guidelines for authors and peer reviewers of economic submissions to the BMJ. The BMJ Economic Evaluation Working Party. BMJ 1996;313:275-283.
- Gessner BD, Teutsch SM, Shaffer P. A cost-effectiveness evaluation of newborn hemoglobinopathy screening from the perspective of state health care systems. Early Hum Dev 1996;45:257-75.
- Lieu T, Watson S, Washington AE. Cost effectiveness of prenatal carrier screening for cystic fibrosis. Presentation to NIH Consensus Development Conference on Genetic Testing for Cystic Fibrosis, April 14-16, 1997.
- Teutsch SM. A framework for assessing the effectiveness of disease and injury prevention. MMWR 1992;41(No. RR-3).
- Barber JA, Thompson G. Analysis and interpretation of cost data in randomized controlled trials: review of published studies. BMJ 1998;317:1195-1200.
- Holtzman NA, Watson MS. Promoting safe and effective genetic testing in the United States: final report of the Task Force on Genetic Testing. Baltimore: Johns Hopkins University Press, 1998.
- Cutting GR. Genetic epidemiology and genotype/phenotype correlations.In National Institutes of Health. Program and abstracts. NIH Consensus Development Conference on Genetic Testing for Cystic Fibrosis, April 14-16, 1997.
- Dankert-Roelse JE, te Meerman GJ. Long term prognosis of patients with cystic fibrosis in relation to early detection by neonatal screening and treatment in a cystic fibrosis centre. Thorax 1995;50:712-718.
- Farrell PM, Kosorok MR, Laxova A, et al.. Nutritional benefits of neonatal screening for cystic fibrosis. N Engl J Med 1997;337:963-9.
- Wald NJ, Morris JK. Neonatal screening for cystic fibrosis: No evidence yet of any benefit. BMJ 1998;316:404-405.
- Centers for Disease Control and Prevention. Newborn screening for cystic fibrosis: a paradigm for public health genetics policy development-proceedings of a 1997 workshop. MMWR 1997;46(No. RR-16).
- National Institutes of Health. Program and abstracts. NIH Consensus Development Conference on Genetic Testing for Cystic Fibrosis, April 14-16, 1997.
- Wilcken B, Wiley V, Sherry G, Bayliss U. Neonatal screening for cystic fibrosis: a comparison of two strategies for case detection in 1.2 million babies. J Pediatr 1995;127:965-970.
- Gregg RG, Wilfond BS, Farrell PM, et al. Application of DNA analysis in a population-screening program for neonatal diagnosis of cystic fibrosis (CF): comparison of screening protocols. Am J Hum Gen 1993;52:616-626.
- Qualls NL, Cono J, Kelly AE, Khoury MJ. The economic impact of population-based newborn screening for cystic fibrosis. MMWR 1997;46(RR-16):14-15.
- Grosse SD, Webster D, Hannon WH.Cost comparison of IRT and IRT/DNA screening of newborns for cystic fibrosis in New Zealand.Presentation to Thirteenth National Neonatal Screening Symposium, San Diego, March 2, 1998.
- Burke W, Press N, McDonnell SM, et al. Hemochromatosis: genetics helps to define a multifactorial disease. Clin Genet 1998;54:1-9.
- Niederau C, Fischer R, Purschel A, et al. Long-term survival in patients with hereditary hemochromatosis. Gastroenterology 1996;110:1107-1119.
- Burke W, Thomson E, Khoury MJ, et al. Hereditary hemochromatosis: Gene discovery and its implications for population-based screening. JAMA 1998;280:172-178.
- Adams PC, Kertesz AE, Valberg LS. Screening for hemochromatosis in children of homozygotes: prevalence and cost-effectiveness. Hepatology 1995;22:1720-1727.
- Adams PC, Gregor JC, Kertesz AE, Valberg LS. Screening blood donors for hereditary hemochromatosis: decision analysis model based on a 30-year database. Gastroenterology 1995;109:177-188.
- Balan V, Baldus W, Fairbanks V, et al. Screening for hemochromatosis: a cost-effectiveness study based on 12,258 patients. Gastroenterology 1994;107:453-459.
- Buffone GJ, Beck JR. Cost-effectiveness analysis for evaluation of screening programs: hereditary hemochromatosis. Clin Chem 1994;40:1631-1636.
- Phatak PD, Guzman G, Woll JE, et al. Cost-effectiveness of screening for hereditary hemochromatosis. Arch Intern Med 1994;154:769-776.
- Baer DM, Simons JL, Staples RL, et al. Hemochromatosis screening in asymptomatic ambulatory men 30 years of age and older. Am J Med 1995;98:464-468.
- Merryweather-Clarke AT, Pointon JJ, Shearman JD, Robson KJ. Global prevalence of putative haemochromatosis mutations. J Med Genet 1997;34:275-278.
- Adams PC, Valberg LS. Screening blood donors for hereditary hemochromatosis: decision analysis model comparing genotyping to phenotyping. Am J Med Genet 1997;17:A1207.
- Bassett ML, Leggett BA, Halliday JW. Analysis of the cost of population screening for haemochromatosis using biochemical and genetic markers. J Hepatol 1997;27:517-524.
- McDonnell SM, Witte DL, Cogswell ME, et al. Strategies to increase detection of hemochromatosis. Ann Intern Med 1998;129:980-986.
- Lynch HT, Lynch JF. Genetics of colonic cancer. Digestion 1998;59:481-492.
- Ambroze WL Jr, Orangio GR, Lucas G. Surgical options for familial adenomatous polyposis. Semin Surg Oncol 1995;11:423-7.
- Lynch HT, Smyrk TC. Classification of familial adenomatous polyposis: a diagnostic nightmare. Am J Hum Genet 1998;62:1288-9.
- Cromwell DM, Moore RD, Brensinger JD, et al. Cost analysis of alternative approaches to colorectal screening in familial adenomatous polyposis. Gastroenterology 1998;114:893-901.
- Gazzoli I, De Andreis C, Sirchia SM, et al. Molecular screening of families affected by familial adenomatous polyposis (FAP). J Med Screen 1996;3:195-199.
- Aarnio M, Mecklin JP, Aaltonen LA, et al. Life-time risk of different cancers in hereditary non-polyposis colorectal cancer (HNPCC) syndrome. Int J Cancer 1995;64:430-433.
- Dunlop MG, Farrington SM, Carothers AD, et al. Cancer risk associated with germline DNA mismatch repair gene mutations. Hum Mol Genet 1997;6:105-110.
- Burke W, Petersen G, Lynch P, et al. Recommendations for follow-up care of individuals with an inherited predisposition to cancer. I. Hereditary nonpolyposis colon cancer. Cancer Genetics Studies Consortium. JAMA 1997;277:915-919.
- Vasen HF, van Ballegooijen M, Buskens E, et al. A cost-effectiveness analysis of colorectal screening of hereditary nonpolyposis colorectal carcinoma gene carriers. Cancer 1998;82:1632-1637.
- Syngal S, Weeks JC, Schrag D, et al. Benefits of colonoscopic surveillance and prophylactic colectomy in mutation carriers for hereditary nonpolyposis colorectal cancer. Ann Intern Med 1998; 129:787-796.
- Watson P, Lin KM, Rodriguez-Bigas MA, et al. Colorectal cancer survival among hereditary nonpolyposis colorectal cancer family members. Cancer 1998;83:259-266.
- Jarvinen HJ, Mecklin JP, Sistonen P. Screening reduces colorectal cancer rate in families with hereditary nonpolyposis colorectal cancer. Gastroenterology 1995;108:1405-1411.
- Brown ML, Kessler LG. The use of gene tests to detect hereditary predisposition to cancer: economic considerations. J Natl Cancer Inst 1995;87:1131-1136.
- Parmigiani G, Berry D, Aguilar O. Determining carrier probabilities for breast cancer-susceptibility genes BRCA1 and BRCA2. Am J Hum Genet 1998;62:145-158.
- Easton DF, Ford D, Bishop DT. Breast and ovarian cancer incidence in BRCA1-mutation carriers. Breast Cancer Linkage Consortium. Am J Hum Genet 1995;56:265-271.
- Ford D, Easton DF, Stratton M, et al. Genetic heterogeneity and penetrance analysis of the BRCA1 and BRCA2 genes in breast cancer families. The Breast Cancer Linkage Consortium. Am J Hum Genet 1998;62:676-689.
- Struewing JP, Hartge P, Wacholder S, et al. The risk of cancer associated with specific mutations of BRCA1 and BRCA2 among Ashkenazi Jews. N Engl J Med 1997;336:1401-1408.
- Whittemore AS, Gong G, Itnyre J. Prevalence and contribution of BRCA1 mutations in breast cancer and ovarian cancer: results from three U.S. population-based case-control studies of ovarian cancer. Am J Hum Genet 1997;60:496-504.
- Thorlacius S, Struewing JP, Hartge P, et al. Population-based study of risk of breast cancer in carriers of BRCA2 mutation. Lancet 1998;352:1337-1339.
- Schrag D, Kuntz KM, Garber JE, Weeks JC. Decision analysis–effects of prophylactic mastectomy and oophorectomy on life expectancy among women with BRCA1 or BRCA2 mutations. N Engl J Med 1997;336:1465-1471.
- Grann VR, Panageas KS, Whang W, et al. Decision analysis of prophylactic mastectomy and oophorectomy in BRCA1-positive or BRCA2-positivepatients. J Clin Oncol 1998;16:979-985.
- Tengs TO, Winer EP, Paddock S, et al. Testing for the BRCA1 and BRCA2 breast-ovarian cancer susceptibility genes: a decision analysis. Med Decis Making 1998;18:365-375.
- Berry DA, Parmigiani G. Assessing the benefits of testing for breast cancer susceptibility genes: a decision analysis. Breast Dis 1998;10:115-125.
- Rubin SC, Benjamin I, Behbakht K, et al. Clinical and pathological features of ovarian cancer in women with germ-line mutations of BRCA1. N Engl J Med 1996;335:1413-1416.
- Struewing JP, Watson P, Easton DF, et al. Prophylactic oophorectomy in inherited breast/ovarian cancer families. J Natl Cancer Inst Monogr 1995;(17):33-35.
- Burke W, Daly M, Garber J, et al. Recommendations for follow-up care of individuals with an inherited predisposition to cancer. II. BRCA1 and BRCA2. Cancer Genetics Studies Consortium. JAMA 1997;277:997-1003.