Researchers Propose a Validated Phenotyping Algorithm for Genetic Association Studies in Age-related Macular Degeneration
Age-related macular degeneration (AMD) is a leading cause of vision loss among people aged 50 and older. This multifactorial neurodegenerative disease damages the macula, responsible for sharp central vision. Rapid advancement of DNA sequencing technologies has allowed researchers to identify many AMD-associated genetic polymorphisms. However, phenotyping and patient recruitment are currently the most time-consuming steps facing researchers performing this work. Patient identification and DNA collection are often the rate-limiting steps in genetic association studies.
Facilitated by the rapid growth of electronic medical record (EMR)-linked DNA biorepositories, patient selection algorithms can improve efficiency in genetic association studies.
In this blog, we'll discuss a new study, published in Nature, where researchers show that using stepwise validation of such an algorithm could provide reliable cohort selection outcomes and, when networked with an EMR-linked DNA biorepository, replicate previously published AMD-associated single nucleotide polymorphisms (SNPs).
More on AMD and SNPs
Several AMD-associated SNPs play an important role in the progression of AMD through the specific disease stages. Combining genetic, clinical and demographic data, scientists in independent patient cohorts have validated several AMD progression risk prediction models. Including relevant genetic markers improves the accuracy of modeling the risk for progression from early stage AMD to the advanced stages of either geographic atrophy (GA) or choroidal neovascularization (CNV) as compared to phenotype-only models. Furthermore, incorporating a higher number of AMD-associated SNPs within the CFH gene improves accuracy over models that use only one or two AMD-associated SNPs. This can give researchers greater insight into the conferred risk of haplotype combinations.
High-throughput clinical phenotyping (HTCP) leverages the machine-processable content from EMR to make subject selection practical and scalable. Expanded use of EMR-linked DNA biorepositories improves HTCP algorithms for cohort selection, making process automation and clinically linked data sharing across research fields more appealing. HTCP algorithms are particularly important for chronic, complex conditions like AMD that are associated with numerous environmental and genetic risk factors.
HTCP algorithms will require a multi-step validation method to achieve the high phenotyping accuracy, in particular high positive predicitive value (PPV), needed to identify the genetic variants associated with multifactorial diseases like AMD. The researchers in this study sought to develop an automated algorithm to help identify neovascular (wet) AMD, non-neovascular (dry) AMD and control subjects.
The researchers assigned an AMD case status to 61 of 11,075 subjects (0.55 percent) enrolled in the EMR-linked DNA biorepository. They calculated PPV at 91.7 percent and negative predictive value (NPV) at 97.5 percent using expert chart review. The scientists then applied the algorithm to an EMR-linked DNA biorepository, using case/control status determined by the algorithm to investigate previously identified SNPs associated with AMD. The researchers found the risk alleles of three SNPs, rs1410996 (CFH), rs1061170 (CFH), and rs10490924 (ARMS2) to be strongly associated with the AMD case/control status.
The algorithm correctly classified 94 percent of the patients as either AMD or control. Five of the 60 patients identified as having AMD were misclassified, including one case of proliferative diabetic retinopathy, one patient with posterior vitreous detachment with recurrent vitreous hemorrhage, one instance of pattern dystrophy, a case of atypical angioid streaks and a case of ruptured macroaneurysm.
Of the 40 patients identified as controls, one was misclassified. That patient actually had a large macular scar and a history of polypoidal choroidal vasculopathy. Overall, the PPV was 91.67 percent (55/60), NPV was 97.50 percent (39/40), and FNR was 1.79 percent (1/56).
The scientists assert that the PPV and NPV achieved by this algorithm are high enough to identify AMD cases and controls properly. In this study, the algorithm determination of PPV for both wet and dry AMD subtype was less than 90 percent, at 86.7 and 73.3 percent respectively. The false negative ratios (FNRs) were significantly higher than the FNR for overall AMD case determination.
Relative weaknesses in the algorithm in discriminating AMD subtypes are likely multifactorial. The spectrum of AMD retinal pathology could complicate the algorithm. Adding searchable keywords within the EMR free text and the addition of ICD9/CPT codes could improve the algorithm. Applying the algorithm to a larger AMD cohort could help evaluate the algorithm’s reliability in differentiating wet and dry AMD.
The scientists developed an algorithm that identified all AMD cases using AMD ICD-9 codes. They required a CPT code J2778: ranibizumab injection, J9035, J3490 or J3590: bevacizumab injection or an order or prescription for ranibizumab , bevacizumab, or aflibercept for classification as wet AMD. The researchers tested the initial algorithm on 10 cases of dry and 10 cases of wet AMD. They found using CPT codes alone was insufficient so they revised the algorithm to require subjects to be 60 years of age or older at the time of diagnosis. The researchers required an ICD-9 code starting with 362.5 for inclusion in the wet AMD group.
These new inclusions improved the accuracy of the algorithm greatly. The initial algorithm correctly classified 45 percent of the 20 suspected AMD patient charts as either dry or wet AMD. After adding diagnosis dates, age and the requirement to include AMD diagnosis improved the algorithm. Of the 55 correctly classified AMD cases, the algorithm identified 26 as dry and 29 as wet.
The Challenges of HTCP Algorithm Development
The lack of standardization across EMRs presents a major obstacle to the widespread use of high-throughput clinical phenotyping (HTCP) algorithms. Accuracy will become more challenging as researchers apply algorithms across multiple EMR systems, especially when the scientists try to include treatment response and other complex criteria.
Although EMR systems now share ICD-9 and CPT codes, validating HTCP algorithms on external electronic medical records systems is essential for assess its performance in other EMR systems.
Increasing compatibility between EMR systems will likely allow for widespread application of HTCP algorithms and will boost potential sample size. Utilization of EMR organization tools and shared billing codes continues to improve, as does efforts to develop an external informatics infrastructure that can normalize EMR data.
Scientists have already applied phenotyping algorithms to clinical EMR platforms and accurately identified specific patient cohorts for the purpose of addressing Meaningful Use standards. Use of HTCP algorithms can also reduce the number of clinical data annotations required to create a precise classification model.
Cutting edge technology is now being used to manage and integrate recruitment, scheduling, sample tracking, and other participant date. In our eBook, Next Generation Cohort Studies and Biobanking: How Cloud Technology is Accelerating Translational Research, we explore how mobile devices and cloud-based technology were used to cut time and cost in a prospective epidemiology cohort study called the California Teacher's Study (CTS). To learn more, I highly recommend downloading our eBook below!