It’s difficult to read the news today and not see some reference to artificial intelligence (AI). The rapid emergence of AI tools and methods such as machine learning (ML) and large language models (LLMs) has already touched many aspects of daily life, and the implications are profound.
A ChatGPT app on an iPhone can already generate written content well enough to provide new challenges for educators, and recent tests indicate that advanced AI tools can effectively accelerate processes and complement human expertise for many complex tasks, including clinical diagnostics and legal analyses. It’s also becoming increasingly important in another challenging field: biomedical research.
To put it simply, researchers are generating more data than they can actually analyze, leading to bottlenecks. They also face daunting hurdles when trying to combine data sets because of differing data formats and incomplete standardization across the research community. AI can help, and researchers at The Jackson Laboratory (JAX) are at the forefront of developing and implementing AI-based technologies for research applications. The use of ML algorithms to analyze extensive video data has proven beneficial for many research areas, and recent funding is supporting an expanding program of AI-related work at JAX.
Integrative analysis across experiments and species
Associate Director, Computational Science, Yi Li, Ph.D., and Professor Gregory Carter, Ph.D., were awarded a National Institute on Aging R21/R33 grant, entitled “An explainable unified AI strategy for efficient and robust integrative analysis of multi-omics data from highly heterogeneous multiple studies.” Their project has a specific focus on combining human and mouse model data to accelerate biological interpretation for signatures of exceptional longevity. With the funding, they will pursue transformative AI/ML-based strategies to identify determinants of exceptional health and lifespan and tackle several challenges encountered when studying human longevity. Current data analysis methods are hamstrung by the understandable lack of data about the oldest subset of humans (100+ years old for females and 96+ years old for males), the difficulty of integrating data across studies that generate different kinds of omics (e.g., genomics, transcriptomics, metabolomics, proteomics) data, and the integration of longevity effects across species, including those with far shorter life spans than humans.
The grant encompasses two phases, with the second (R33) dependent upon the success of the first (R21). In the first phase, the team has two years to develop explainable, robust AI software with a backbone of graph neural networks. The software will need to be applicable to a wide range of research data across multiple platforms, including the integrated analysis of multiple complex disease studies. The second phase will use the software to identify exceptional longevity (EL)-associated pathways and biomarkers in human longevity data, as well as across 100 species of diverse life span. Once developed, the software will also be publicly accessible for use by outside researchers investigating other topics using varied data sources.
Analysis across the human genome
Associate Professor Sheng Li, Ph.D., and Senior Computational Scientist Brian White, Ph.D., are joining the existing Molecular Phenotypes of Null Alleles in Cells (MorPhiC) Consortium with the establishment of a Data Analysis and Validation Center (DAV). MorPhiC is a large, multiple-institution effort established by the National Human Genome Research Institute to utilize CRISPR-based perturbation strategies to knock out and assign function to every human gene. Supported by a U01 grant entitled “Multi-omic phenotyping of human transcriptional regulators,” Li and White will join the existing JAX MorPhiC Data Production Center (DPC; PIs, Professors Paul Robson, Ph.D., and Bill Skarnes, Ph.D.).
LI and White’s efforts within the JAX MorPhiC DAV will focus on defining the impacts of transcription factor (TF) perturbation, a project of critical importance given the role of TFs in determining cell state. Deep learning approaches — a type of ML that employs computational neural networks to simulate human brain function — will be used to prioritize TF targets for perturbation across the ~1,600 TFs in the human genome and the multitude of cell line models deployed in the consortium. In silico(computer-based) perturbations of individual TF genes within the resulting system will then be used to predict their impact on downstream targets and cell state transitions. Ultimately, the work will bolster the ability of researchers to decipher the regulatory function of the TF genes within the human genome.