Flawed Study Design—the Other “Pre-Analytical Variable” in Biomarker Research

Posted by Amy Brankin on Jul 27, 2016 11:27:53 AM

Biomarkers_Consortium.jpgWhy are we failing to find biomarkers for early detection of disease?  Throughout our blog we write a great deal about biospecimen integrity and emphasize the control of pre-analytical variability in biospecimens. However, the highest quality biospecimens are wasted in the search for biomarkers if the study design is flawed. And the literature on biomarker discovery studies is filled with examples of poor study design, creating an overabundance of false positives that cannot be validated.

 A lack of internal validity, in the form of inappropriately matched case and control biospecimens, constitutes a form of pre-analytical variability, in that it is present before the laboratory investigator performs the first assay. According to Rundle et al.1, nearly all published biomarker studies use either hospital-based designs where control selection is difficult, or instead use convenience samples as controls. Some discovery studies include matching but lack statistical means of adjusting for the resulting correlations.

Multiple analyses of this issue have been published over the past 15 years, including Pepe et al.2, Diamandis3, Ransohoff and Gourley4, Rundle et al.1, Schully et al.5, and others.

Rundle et al.1 proposed the nested case-control design as the optimal tool for eliminating bias and pre-analytical variables from sample selection. Also called the PRoBE approach, the nested case-control study is a sub-type of the “classic” case-control study in which all the controls are selected from the same longitudinal study set as the cases, which ensures uniformity of the data, sample handling, and other variables such as age, etc. They further proposed a new case-control study design they dubbed “anti-matching” to further improve biomarker discovery studies. By matching controls to cases in a way that counters known risk factors, “anti-matching” aims to increase the “signal to noise ratio” (higher specificity) for biomarkers that actually reflect the underlying disease process, making it easier for the bioinformatics tools to yield a valid biomarker using group-wise comparisons.

Ransohoff and Gourlay4 suggest that study design and case-control selection be separated from the laboratory assay-related part of biomarker research: clinical researchers, epidemiologists, and biostatisticians would focus on research design, including biospecimen collection to ensure high-quality or strongly unbiased biospecimens. The biospecimens would then be “handed off” by the clinical research group to the laboratory group for laboratory analysis.

Better study design would reduce the number of false positives, and increase the number of biomarkers in the clinical development pipeline. As with biomarker discovery, the development of biomarkers for clinical use also depends on well-designed case control studies. Pepe et al. (2001) proposed a phased development of biomarkers similar to the randomized clinical trial. Three of the five steps depend on critical appropriate matching of case and control samples.

The first step, of course, is pre-clinical studies, and as already discussed, relies on good study design as well as high quality specimens for discovery of potential biomarkers.

The second step involves development and validation of an assay for the biomarkers under consideration. Assay data will be used to estimate the sensitivity and specificity of the biomarker for the disease condition, and plot a receiver operating curve (ROC) to determine if higher values represent greater probability of disease. The appropriate matching of cases and controls is critical to this step as well as step one.

Third is to potentially perform retrospective longitudinal studies to “look back” and determine if the candidate biomarker appears well before diagnosis (if not, the biomarker is not likely to be useful for early detection). Here again, correct matching of cases and controls is critical, as is sample size and a number of other factors.

The final two steps—prospective screening studies (to “look ahead” to determine if the biomarker can predict early disease) and population studies (to determine if the biomarker, used as a screening tool, can reduce the burden of disease in the population)—are beyond biospecimen-based research, except perhaps that the lack of biomarkers in such studies is further evidence of the excess of false positives emerging from initial discovery.

The NCI held a workshop in August of 2013 to improve the validity of biomarker studies5 and break the “ongoing cycle of promise and disappointment in studies of early detection cancer biomarkers...” The workshop participants particularly called for a change in approach to study design, and collaboration between basic biologists and laboratory scientists, and epidemiologists and biostatisticians. Good study design, like control of pre-analytical variability in biospecimens, will improve biomarker discovery and make better use resources as well.

The DO-HEALTH clinical trial is creating a biobank of processed samples for downstream research, greatly enhancing the potential value of the data collected during the interventions and potentially enabling the development of other preventive strategies. To learn more about this clinical trial and the biobank for biomarkers, download our eBook European DO-HEALTH Clinical Trial Aims at Simple, Affordable Interventions to Improve Senior Health.


Download eBook


  1. Rundle, A.; Ahsan, H. & Vineis, P. (2012). Better cancer biomarker discovery through better study design. European Journal of Clinical Investigations, 42, 1–18. doi:10.1111/j.1365-2362.2012.02727.x.
  2. Pepe, M. S.; Etzioni, R.; Feng, Z.; Potter, J. D. & Thompson, M. L. et al. (2001). Phases of biomarker development for early detection of cancer. Journal of the National Cancer Institute, 93, 1054–1061.
  3. Diamandis, E. P. (2010). Cancer biomarkers: Can we turn recent failures into success? Journal of the National Cancer Institute, 102, 1462–1467. doi: 10.1093/jnci/djq306
  4. Ransohoff, D. F. & Gourlay, M. L. (2010). Sources of bias in specimens for research about molecular markers for cancer. Journal of Clinical Oncology, 4, 698–704. doi: 10.1200/JCO.2009.25.6065
  5. Schully, S.D.; Carrick, D.M.; Mechanic, L.E.; Srivastava, S & Anderson, G. et al. (2015). Leveraging Biospecimen Resources for Discovery or Validation of Markers for Early Cancer Detection. Journal of the National Cancer Institute, 107,1–7. doi:10.1093/jnci/djv012