New Genetic Insights into Systemic Sclerosis Through Exome Sequencing and Machine Learning
Researchers at Baylor College of Medicine have made significant strides in understanding systemic sclerosis (SSc), a complex autoimmune disease, through a new study that combines integrative exome sequencing with machine learning techniques. By identifying new genetic factors associated with SSc, this research paves the way for potential targeted treatments.
Systemic sclerosis is characterized by fibrosis and vascular abnormalities, yet its intricate genetic roots have remained largely elusive. Despite some known genetic contributors, the identification of additional genes is crucial for advancing therapeutic strategies. The study, published in the Annals of the Rheumatic Diseases, employed a combination of exome sequencing and machine learning to uncover protein alterations and their underlying mechanisms related to SSc.
Previous genome-wide association studies (GWAS) have identified significant genetic variants primarily located within the human leukocyte antigen (HLA) region on chromosome six. In this latest research, the team, led by Dr. Shamika Ketkar, conducted GWAS utilizing exome sequencing data from a cohort of 2,559 SSc patients and 893 healthy controls sourced from the Scleroderma Family Registry and DNA Repository at the University of Texas Health Science Center at Houston. The primary objective was to discover novel genes and rare variants that contribute to the risk of developing SSc.
One of the most exciting findings from this research was the identification of the MICB gene, which, while situated within the HLA region, appears to function independently from traditional HLA genes. This gene had not previously been associated with systemic sclerosis, marking it as a novel genetic contributor and a potential target for future therapies. Further validation was achieved through collaborative efforts with researchers in Spain, who replicated these findings using a dataset of nearly 10,000 cases from prior European GWAS studies.
In addition to MICB, the study also highlighted other significant genes, such as NOTCH4, as well as rare missense variants in genes linked to interferon signaling pathways, including IFI44L and IFIT5. The use of the evolutionary action machine learning (EAML) framework by Dr. Olivier Lichtarge's lab at Baylor enabled the researchers to analyze the exome sequencing data effectively, prioritizing genes with high-impact variants that are predictive of SSc.
"With our machine learning framework, we are not only identifying whether a variant occurs frequently, but also assessing the likelihood that the variant disrupts protein function, which ultimately affects patient outcomes," Lichtarge explained. This innovative method had previously been applied to diseases with larger genomic datasets, such as Alzheimer's and heart disease, demonstrating its versatility in tackling complex diseases even with smaller sample sizes.
To further elucidate the functional implications of the identified genetic variants, the research team integrated publicly available single-cell RNA sequencing data from skin biopsies of SSc patients, allowing for the analysis of cell type-specific expression patterns of the risk genes. Additionally, expression quantitative trait locus (eQTL) analysis was conducted using whole blood datasets to establish regulatory connections between disease-associated variants and transcriptomic alterations.
The research confirmed that MICB and NOTCH4 are expressed in fibroblasts and endothelial cells, which play crucial roles in the fibrosis and vascular complications characteristic of SSc. Dr. Brendan Lee, the corresponding author of the study, emphasized the necessity of combining diverse methodologies and machine learning to analyze extensive datasets of DNA, RNA, and proteins, in order to uncover hidden therapeutic targets for complex diseases like SSc.