Into the Unknown — How Artificial Intelligence Can Help Biotech Companies Chart the Dark Genome

by Louise von Stechow  (contributor )   •     

Disclaimer: All opinions expressed by Contributors are their own and do not represent those of their employers, or BiopharmaTrend.com.
Contributors are fully responsible for assuring they own any required copyright for any content they submit to BiopharmaTrend.com. This website and its owners shall not be liable for neither information and content submitted for publication by Contributors, nor its accuracy.

Share:   Share in LinkedIn  Share in Bluesky  Share in Reddit  Share in Hacker News  Share in X  Share in Facebook  Send by email   |  

With DeepMind’s release of AlphaGenome—a deep learning model that can predict the relevance of variants in both coding and non-coding DNA—the so-called “dark genome” has taken center stage in biotech news. AlphaGenome, an “all-in-one” genome exploration model, is a unified deep learning model that can predict multimodal properties and variant effects from up to 1 million base pairs of DNA sequence. In the preprint first published on June 25, 2025, the authors showed that AlphaGenome outperforms current state-of-the-art models in the majority of tasks—and holds great potential to improve the interpretation of non-coding variants.

But what exactly lies within these dark, or non-coding, regions that make up 98% of our 3.1 billion–letter genome?

For some years, a number of biotech companies and their pharma partners have been exploring the parts of the genome that go beyond the roughly 20,000 human genes mapped by the Human Genome Project in the early 2000s. Their quest to uncover new disease mechanisms and drug targets within this vast non-coding space has been enabled by both advanced sequencing technologies, which make the dark genome accessible, and artificial intelligence algorithms, which help assign function to these previously uncharted regions.


 

Driving discovery: How technology helps transform disease understanding

Genetic variation is linked to disease in both subtle and more direct ways. Especially in the rare disease field, 80% of the over 6,000 rare diseases can likely be attributed to genetic causes. Estimates suggest that around 90% of disease-relevant genetic variation can be found in the dark, or non-coding, areas of the genome.

Traditional next-generation sequencing (NGS),  which relies on complex algorithms to reconstruct sequences based on short reads (50–300 base pairs), works well for unique regions but struggles with repetitive, ambiguous regions typical of the dark genome. In contrast, long-read sequencing generates much longer reads—often several kilobases or more—providing greater continuity and enabling accurate mapping of challenging, previously inaccessible genomic regions.

Research initiatives such as the telomere-to-telomere (T2T) sequencing project, completed in 2022, have implemented long-read sequencing to assess previously uncharted regions, achieving near-complete sequencing of all chromosomes. With the increased accuracy provided by the latest product releases, long-read sequencing is moving closer to becoming part of routine genetic testing. The inclusion of long-read data in analyses of disease relevance might greatly expand the detection of disease-relevant genetic variations, especially in the rare disease area.


 

From dark matter to drug targets: How biotechs explore the dark genome

The exploration of the dark genome has the potential to unveil a treasure trove of therapeutic opportunities across various disease areas, which has attracted a number of biotech companies and pharma partnerships to the space. Pathogenic non-coding variants are often linked to disturbances in gene regulatory elements such as promoters, enhancers, silencers, and a variety of regulatory RNA species.

Swiss HAYA Therapeutics and Boston-based NextRNA Therapeutics have set their eyes on long non-coding (lnc) RNAs as potential therapeutic targets in cardiovascular disease and oncology, and big pharma is taking notice. HAYA, which received $65 million in Series A funding in May 2025, has a collaboration with Eli Lilly worth up to $1 billion to discover lncRNA-based drug targets for obesity and other metabolic diseases. NextRNA has entered a collaboration with Bayer for the discovery and development of small molecules targeting lncRNAs in cancer, worth up to $547 million. As initial clinical trials of drugs targeting lncRNAs begin, we will soon learn more about the validity of this new target class.

Another feature of the genome that has sparked the interest of multiple biotechs are transposable elements (TEs), which make up almost half of our genome. TEs are current or former mobile genetic elements—DNA sequences that can change their position, or “jump”, within the genome. Activation of virus-like transposable elements, such as long interspersed nuclear elements (LINEs) and human endogenous retroviruses (HERVs), has been implicated in neurodegenerative disorders like Alzheimer’s disease and amyotrophic lateral sclerosis (ALS).

US companies Transposon Therapeutics and Rome Therapeutics, are targeting the biology of LINE-1, a prominent transposon which by itself accounts for 17% of the genome. Transposon’s TPN-101—originally conceived as an antiretroviral drug for HIV-1—inhibits LINE-1 reverse transcriptase (RT). TPN-101 showed initially encouraging results in a Phase 2 trial for progressive supranuclear palsy and was fast-tracked by the FDA. Rome Therapeutics focuses on the potential inflammatory role of LINE-1. Rome is testing LINE-1 reverse transcriptase (RT) inhibition as a non-immunosuppressive therapeutic strategy in autoimmune diseases such as type I interferonopathies (e.g., systemic lupus erythematosus (SLE), cutaneous lupus erythematosus (CLE)) and further aims to explore LINE-1 biology in cancer and neurodegenerative diseases.

Danish biotech HERVolution aims to harness the therapeutic potential of human endogenous retroviruses (HERVs) for cancer and aging-related neurodegenerative diseases. HERVolution employs rationally designed HERV antigens to overcome the immune system's “self-tolerance” and induce potent and durable anti-HERV immune responses to tackle cancer, metabolic, and other age-related diseases. However, Swiss biotech’s GeNeuro Phase 2 study in HERV-W ENV patients suffering from post-COVID-19 neuropsychiatric syndromes, showed no meaningful improvement after HERV-targeting monoclonal antibody temelimab compared to placebo.

The dark genome also offers opportunities for the development of cancer immunotherapies. Evaxion targets HERVs as neoantigens for cancer immunotherapy, leveraging its AI‑Immunology™ platform to identify ERVs reactivated in cancer cells and to design precision cancer vaccines. Similarly, Enara Bio uses its proprietary EDAPT® platform to discover novel “Dark Antigens”—peptide–HLA targets derived from non-coding genomic regions in tumors—to develop TCR-directed immunotherapies and therapeutic vaccines.

Biotech Therapeutic Area Focus Dark Genome Element Targeted Recent funding events Pharma Partnerships / Collaborations
Enara Bio

Oxford, UK
Cancer immunotherapy (TCR-based) “Dark antigens” from non-coding/transcribed regions Series B $32.5M (Oct2024), backed by Pfizer and Merck KGaA’s   Boehringer Ingelheim; GWU collaboration 
ROME Therapeutics

Boston, MA, USA
Autoimmune, cancer, neurodegeneration Repetitive non-coding elements (LINEs, SINEs, HERVs) Series B extension $72 M in (Sep 2023), backed by J&J and BMS  -
Transposon Therapeutics

San Diego, CA, USA
Neurodegenerative diseases, aging LINE‑1 reverse transcriptase in transposons/ repeat elements $4.68M
HERVolution Therapeutics

Copenhagen, Denmark
Cancer, metabolic & aging therapeutics HERV (human endogenous retrovirus) antigens Series A €11.7M (Dec2024)  Backed by Serum Institute India, European Innovation Council (EIC) Fund -
Evaxion

Hørsholm, Denmark
Cancer & infectious disease vaccines Endogenous retroviruses (ERVs), repetitive elements Public offering $10.8M (Jan2025) MSD/Merck: EVX‑B2/B3 options, Clin trial collaboration EVX‑01 with Keytruda; Afrigen/WHO; Gates Foundation
GENeuro

Geneva, Switzerland
MS, ALS, long‑COVID (autoimmune/neurology) HERV‑derived Envelope proteins Debt-restructuring moratorium, May 2025 NINDS/NIH collaboration
Nucleome Therapeutics

Oxford, UK
Autoimmune, rare, precision medicine 3D non-coding genomic regulatory variants Series A £37.5M (~$47M) (Oct2022) Backed by Pfizer, Merck KGaA and J&J, Strategic partnership with J&J
Lucid Genomics

Berlin, Germany
Rare disease diagnostics Non-coding regulatory DNA variants Pre-seed €1.3 M (Sep 2022)   
NextRNA Therapeutics

Boston, MA, USA
Multi-indication via lncRNA therapies Long non-coding RNAs (lncRNAs) Series A $46.8M (March 2022) collaboration with Bayer (up to $547 M)
HAYA Therapeutics

Lausanne, Switzerland
Heart failure, fibrosis, obesity Long non-coding RNAs (lncRNAs) Series A $65M (May2025) Backed by Soffinova Partners, Eli Lilly   Eli Lilly partnership (up to $1B) 
Amaroq Therapeutics

Auckland, New Zealand
Cancer  Long non‑coding RNAs (lncRNAs) in tumors Seed $14M (Oct 2023) -
Flamingo Therapeutics

Leuven, Belgium & San Diego, USA
Oncology lncRNAs (e.g. MALAT1)  €1.7M grant  (2023)  Alliance with Ionis Pharma

 

Illuminating the dark genome – How AI / ML can aid interpretation of non-coding DNA

While our understanding of the dark genome and its links to disease is growing, the interpretability of genetic variation within those unmapped genetic territories remains challenging—due to the sheer amount of dark DNA and the often more subtle outcomes of variation in regulatory regions compared to protein-coding regions.

Various AI models for genome exploration have been developed, including, most recently, DeepMind’s AlphaGenome, which allows the prediction of multimodal outcomes of single-nucleotide variation within the non-coding parts of the genome. A number of biotech companies, like Nucleome, LUCID genomics, HAYA and Evaxion, also employ Artificial Intelligence and Machine Learning (AI/ML) algorithms to better explore and exploit genetic elements within the dark genome.

For example, Nucleome’s platform is focused on interpreting SNPs in a cell-type-specific manner, aiming to discover new targets and biomarkers with an initial focus on autoimmune diseases, which Nucleome will further explore through a strategic partnership with J&J, which started in October 2024.

Lucid Genomics’ TAD annotation tool (TADA) algorithm allows prioritization of SNPs, as well as structural variants. Structural variants (typically ≥1 kb in size), such as deletions, duplications, or translocations, are prevalent in the dark genome and have been shown to be linked to many disease-relevant contexts. Lucid’s algorithm allows annotation of structural variation in the context of its functional environment by using the boundaries of so-called topologically associating domains (TADs), which make up the long-range regulatory architecture of genes. Combining this approach with a disease-specific expert decision system, the company—which recently raised €1.3 million in pre-seed funding—aims to explore full genomes in a quest to improve clinical genomics for better diagnosis and to aid drug discovery and development in rare disease and beyond.

In an interesting 2024 study, investigators at Johns Hopkins University used a machine learning technique they termed Artemis (Analysis of RepeaT EleMents in dISease) to decipher the role of repetitive DNA elements in cancer. In a large-scale effort, the team analyzed tumor DNA and cell-free plasma samples from 1,975 cancer patients, discovering 820 previously unknown repeat elements altered in human cancer. Notably, repeat elements were enriched fifteenfold on average within 736 genes known to drive cancers—holding promise for better understanding the complex genomes of cancer cells and for finding new therapeutic avenues.

Further exploration of the dark genome may help improve diagnostic rates in rare disease and contribute to target discovery and the development of new therapeutic modalities. Personalized diagnoses also hold promise for individualized gene-editing–based approaches, as recently demonstrated for the first time in an infant with a rare, previously incurable disease at Children’s Hospital of Philadelphia.

References

  1. https://storage.googleapis.com/deepmind-media/papers/alphagenome.pdf
  2. https://www.nature.com/articles/d41586-025-01998-w
  3. https://www.nature.com/articles/538275a
  4. https://www.thelancet.com/journals/langlo/article/PIIS2214-109X(24)00056-1/fulltext
  5. https://pubmed.ncbi.nlm.nih.gov/30445434/
  6. https://pubmed.ncbi.nlm.nih.gov/30089779/
  7. https://pubmed.ncbi.nlm.nih.gov/37508427/
  8. https://www.nature.com/articles/s41576-020-0236-x
  9. https://www.science.org/doi/10.1126/science.abj6987
  10. https://pmc.ncbi.nlm.nih.gov/articles/PMC10951082/
  11. https://www.nature.com/articles/s41467-025-57695-9
  12. https://www.labiotech.eu/trends-news/lncrna-dark-genome-buzz/
  13. https://firstwordhealthtech.com/story/5958707
  14. https://genomemedicine.biomedcentral.com/articles/10.1186/gm97
  15. https://pmc.ncbi.nlm.nih.gov/articles/PMC11202925/
  16. https://storage.googleapis.com/deepmind-media/papers/alphagenome.pdf
  17. https://www.nature.com/articles/s41436-020-00974-1
  18. https://genomebiology.biomedcentral.com/articles/10.1186/s13059-022-02631-z
  19. https://www.science.org/doi/10.1126/scitranslmed.adj9283
  20. https://www.nih.gov/news-events/news-releases/infant-rare-incurable-disease-first-successfully-receive-personalized-gene-therapy-treatment

Topics: AI & Digital   

Share:   Share in LinkedIn  Share in Bluesky  Share in Reddit  Share in Hacker News  Share in X  Share in Facebook  Send by email