
The Estonian Biobank’s Journey from Biobanking to Personalized Medicine.
Nature Communications, 16, 3270 (2025)
DOI: 10.1038/s41467-025-58465-3
Authors
Lili Milani, Maris Alver, Sven Laur, Sulev Reisberg, Toomas Haller, Oliver Aasmets, Erik Abner, Helene Alavere, Annely Allik, Tarmo Annilo, Krista Fischer, Robin Hofmeister, Georgi Hudjashov, Maarja Jõeloo, Mart Kals, Liis Karo-Astover, Silva Kasela, Anastassia Kolde, Kristi Krebs, Kertu Liis Krigul, Jaanika Kronberg, Karoliina Kruusmaa, Viktorija Kukuškina, Kadri Kõiv, Kelli Lehto, Liis Leitsalu, Sirje Lind, Laura Birgit Luitva, Kristi Läll, Kreete Lüll, Kristjan Metsalu, Mait Metspalu, René Mõttus, Mari Nelis, Tiit Nikopensius, Miriam Nurm, Margit Nõukas, Marek Oja, Elin Org, Marili Palover, Priit Palta, Vasili Pankratov, Kateryna Pantiukh, Natalia Pervjakova, Natàlia Pujol-Gualdo, Anu Reigo, Ene Reimann, Steven Smit, Diana Rogozina, Dage Särg, Nele Taba, Harry-Anton Talvik, Maris Teder-Laving, Neeme Tõnisson, Mariliis Vaht, Uku Vainik, Urmo Võsa, Burak Yelmen, Tõnu Esko, Raivo Kolde, Reedik Mägi, Jaak Vilo, Triin Laisk & Andres Metspalu
Abstract
Large biobanks have set a new standard for research and innovation in human genomics and implementation of personalized medicine. The Estonian Biobank was founded a quarter of a century ago, and its biological specimens, clinical, health, omics, and lifestyle data have been included in over 800 publications to date. What makes the biobank unique internationally is its translational focus, with active efforts to conduct clinical studies based on genetic findings, and to explore the effects of return of results on participants. In this review, we provide an overview of the Estonian Biobank, highlight its strengths for studying the effects of genetic variation and quantitative phenotypes on health-related traits, development of methods and frameworks for bringing genomics into the clinic, and its role as a driving force for implementing personalized medicine on a national level and beyond.
Pathway level metabolomics analysis identifies carbon metabolism as a key factor of incident hypertension in the Estonian Biobank
Scientific Reports 15, 8470 (2025)
DOI: 10.1038/s41598-025-92840-w
Authors
Liis Hiie, Anastassia Kolde, Natalia Pervjakova, Anu Reigo, Estonian Biobank Research Team, Erik Abner, Urmo Võsa, Tõnu Esko, Krista Fischer, Priit Palta & Jaanika Kronberg
Abstract
The purpose of this study was to find metabolic changes associated with incident hypertension in the volunteer-based Estonian Biobank. We used a subcohort of the Estonian Biobank where metabolite levels had been measured by mass-spectrometry (LC-MS, Metabolon platform). We divided annotated metabolites of 989 individuals into KEGG pathways, followed by principal component analysis of metabolites in each pathway, resulting in a dataset of 91 pathway components. Next, we defined incident hypertension cases and controls based on electronic health records, resulting in a dataset of 101 incident hypertension cases and 450 controls. We used Cox proportional hazards models and replicated the results in a separate cohort of the Estonian Biobank, assayed with LC-MS dataset of the Broad platform and including 582 individuals. Our results show that body mass index and a component of the carbon metabolism KEGG pathway are associated with incident hypertension in both discovery and replication cohorts. We demonstrate that a high-dimensional dataset can be meaningfully reduced into informative pathway components that can subsequently be analysed in an interpretable way, and replicated in a metabolomics dataset from a different platform.
Exploiting uniqueness: seed-chain-extend alignment on elastic founder graphs
Bioinformatics, 41 (2025)
DOI: 10.1093/bioinformatics/btaf225
Authors
Nicola Rizzo, Manuel Cáceres & Veli Mäkinen
Abstract
Sequence-to-graph alignment is a central challenge of computational pangenomics. To overcome the theoretical hardness of the problem, state-of-the-art tools use seed-and-extend or seed-chain-extend heuristics to alignment. We implement a complete seed-chain-extend alignment workflow based on indexable elastic founder graphs (iEFGs) that support linear-time exact searches unlike general graphs. We show how to construct iEFGs, find high-quality seeds, chain, and extend them at the scale of a telomere-to-telomere assembled human chromosome.
PLOS One, 20 (10) (2025)
DOI: 10.1371/journal.pone.0335064
Authors
Tuuli Puusepp, Ave Põld, Lili Milani, Aet Elken, Estonian Biobank Research Team, Mikk Jürisson & Krista Fischer
Abstract
Aim: To develop a model for cardiovascular disease (CVD) risk, combining polygenic risk score (PRS) with traditional risk factors while assessing the added value of PRS in two cohorts of biobank participants.
Methods: Data of 128 209 participants from the Estonian Biobank recruited between 2002–2017 and 2018–2022 without prevalent cardiovascular disease, was included. Hazard ratios (HR) for polygenic risk versus conventional risk factors were estimated with Cox proportional hazards models, cumulative incidence was assessed with Aalen-Johansen curves. Predictive performance was tested using a split-sample approach and competing risk modelling. Age at CVD event served as the outcome, and the impact of the PRS was evaluated by age group (25–59 vs. 60+), sex, and recruitment period, using HRs, Harrell’s C-index, and net reclassification indices (NRI).
Results: The estimated HR per one standard deviation (SD) of PRS ranged from 1.1, 95% CI 1.06–1.15 (age 60 + , earlier cohort) to 1.36, 95% CI 1.24–1.49 (men 25–59, later cohort). Adding PRS to the conventional risk factors in the age group 25–59 increased the C-statistic by 0.028 (p < 0.0001) for men. In the age group 60 + , the increase was 0.016 (p = 0.0002) across all. In the independent validation set, the continuous NRI was 19.1% (95% CI 13.3%–24.9%) in the 25–59 group and 13.9% (95% CI 8.1%–19.6%) in the 60 + group.
Conclusions: In a high-risk population, PRS is a strong independent risk factor for CVD and should be considered in routine risk assessment, starting at a relatively young age.
Analysis of follow-up data in large biobank cohorts: a review of methodology
Frontiers in Genetics, 16, 1534726 (2025)
DOI: 10.3389/fgene.2025.1534726
Authors
Anastassia Kolde, Merli Koitmäe, Meelis Käärik, Märt Möls & Krista Fischer
Abstract
This study focuses on key methodological challenges in genome-wide association studies (GWAS) of biobank data with time-to-event outcomes, analyzed using the Cox proportional hazards (CPH) model. We address four primary issues: left-truncation of the data, computational inefficiency of standard model-fitting algorithms, relatedness among individuals, and model misspecification. To manage left-truncation, the common practice is to use age as the timescale, with individuals entering the risk set at their age of recruitment. We assess how this choice of timescale influences bias and statistical power, under realistic GWAS conditions of varying effect sizes and censoring rates. In addition, to alleviate the computational burden typical in large-scale data, we propose and evaluate a two-step martingale residual (MR) approach for high-dimensional CPH modeling. Our results show that the timescale choice has minimal effect on accuracy for small hazard ratios, though using time since birth as the timescale – ignoring recruitment age – yields the highest power for association detection. We find that relatedness, when ignored, does not substantially bias effect size estimates, while omitting key covariates introduces significant bias. The two-step MR approach proves to be computationally efficient, retaining power for detecting small effect sizes, making it suitable for large-scale association studies. However, when precise effect size estimates are critical, particularly for moderate or larger effect sizes, we recommend recalculating these estimates using the conventional CPH model, with careful attention to left-truncation and relatedness. These conclusions are drawn from simulations and illustrated with data from the Estonian Biobank cohort.
Characterization of prevalent genetic variants in the Estonian Biobank body-mass index GWAS
Nature Communications, 16, 8956 (2025)
DOI: 10.1038/s41467-025-64006-9
Authors
Erik Abner, Kanwal Batool, Nele Taba, Tiit Nikopensius, Kristi Läll, Anastasiia Alekseienko, Anders Eriksson, Joel Rämö, Hele Haapaniemi, Hanna Maria Kariis, Liis Haljasmägi, Urmo Võsa, Taavi Tillmann, Uku Vainik, Kelli Lehto, Hanna M. Ollila, Kai Kisand, Estonian Biobank Research Team & Tõnu Esko1
Abstract
Population-specific genome-wide association studies can reveal high-impact genomic variants that influence traits like body-mass index (BMI). Using the
Estonian Biobank BMI dataset (n = 204,747 participants) we identified 214 genome-wide significant loci. Among those hits, we identified a common noncoding variant within the newly associated ADGRL3 gene (−0.18 kg/m²; P = 3.21 × 10⁻⁹). Moreover, the missense rare variant PTPRT:p.Arg1384His
associated with lower BMI (−0.44 kg/m²; P = 2.51 × 10⁻¹⁰), while the proteintruncating variant POMC:p.Glu206* was associated with considerably higher
BMI (+ 0.81 kg/m²; P = 1.48 × 10−12), both likely affecting the functioning of the leptin-melanocortin pathway. POMC:p.Glu206* was observed in different North-European populations, suggesting a broader, yet elusive, distribution of this damaging variant. These observations indicate the previously unrecognized roles of the ADGRL3 and PTPRT genes in body weight regulation and suggest an increased prevalence of the POMC:p.Glu206* variant in European populations, offering avenues for developing interventions in obesity management.