Cost-Effective Assembly of Over 1,000 Human Genomes: Novel Technique Drives Medical Advances

0c67270c24f802ef9521b9d080dcd0ee Assembling Over 1,000 Human Genomes Affordably: New Method Powers Medicine's Future

(SeaPRwire) –   HANGZHOU, China, April 3, 2026 — A research group, spearheaded by Zhen-Xing Endowed Professor Jian Yang from Westlake University’s School of Life Sciences, in collaboration with other researchers, announced their latest findings in Nature on April 1. The study introduced an innovative pangenome-informed genome assembly (PIGA) technique. By employing a cost-effective hybrid sequencing approach that combines long and short reads, the team successfully constructed a pangenome encompassing over a thousand individuals. This accomplishment overcomes the limitations of earlier pangenomes, which were based on smaller sample sizes, and establishes crucial foundational infrastructure for both medical and population genetics research.

Ever since the completion of the Human Genome Project, single linear reference genomes (like GRCh38) have served as the cornerstone for biomedical investigations. However, the genetic makeup of human individuals varies considerably, and a singular reference genome cannot fully capture the extensive genetic diversity present across populations. This often results in complex genetic variations, such as structural variants (SVs) and tandem repeats (TRs), being overlooked in conventional analyses. To tackle this issue, scientists proposed the concept of a pangenome—a comprehensive collection of genome sequences that represents the genetic diversity within a population.

While advances in long-read sequencing have facilitated the assembly of high-quality diploid genomes, the substantial costs associated with sequencing have restricted the sample sizes of previous pangenomes to merely a few dozen individuals. Such limited sample sizes are inadequate for accurately estimating the frequency of genetic variants in populations or for resolving low-frequency variants and highly complex genomic regions. Consequently, the development of an economical pangenome construction strategy for large-scale populations has become an urgent necessity for understanding the functional impact of complex variants and improving clinical diagnostics.

Professor Yang’s team has a long-standing commitment to methodological research in statistical genetics, genomics, and the analysis of large-scale data for human complex traits. Through the development of efficient computational methods, the team has consistently addressed key challenges in processing vast genomic datasets. Analytical tools created by the team, including GCTA-GREML, SMR, and gsMap, have gained widespread adoption globally. To overcome the hurdle of constructing large-scale pangenomes, the research team devised the pangenome-informed genome assembly (PIGA) workflow (Fig. 1). Unlike de novo assembly methods, which depend on sequencing data from individual samples, PIGA utilizes a pangenome-guided framework to integrate sequence information across an entire cohort. It fully leverages a cost-efficient hybrid sequencing strategy based on modest-coverage Illumina short-read and PacBio long-read whole-genome sequencing (WGS) data. This approach significantly lowers sequencing expenses while enabling genome assembly from data with moderate coverage, thereby offering a practical new technical avenue for future population-scale hybrid sequencing studies.

Utilizing this method, the research team successfully built the world’s largest human pangenome to date, comprising 1,116 diploid genomes with an average quality value (QV) of 46. This pangenome identified 405.3 million base pairs (Mb) of non-reference sequences that are absent from current reference genomes (GRCh38 and CHM13). Notably, the team annotated 26.2 Mb of these sequences as functional genic and predicted regulatory elements, significantly enhancing our comprehension of non-reference sequences within the human genome.

Fig. 1. The pangenome-informed genome assembly (PIGA) workflow.

Drawing upon the extensive assembly dataset, the researchers compiled a comprehensive catalog of genetic variation. In addition to 35.4 million small variants, the catalog encompassed a broad spectrum of complex variants, including 110,530 SVs, 485,575 TRs, and 0.86 million nested variants embedded within non-reference sequences.

Using this catalog, the team characterized medically relevant variations across multiple scales (Fig. 2), such as gene-altering SVs, pathogenic TR expansions, gene cluster variations, and HLA gene haplotypes. These discoveries suggest that the 1KCP variant catalog serves as a crucial reference for the clinical screening of pathogenic mutations.

By integrating gene expression data, the team performed pan-variant expression quantitative trait loci (eQTL) mapping. They identified 3,256 eQTLs involving complex variants (SVs, TRs, and nested variants), thereby clarifying the regulatory intricacies of these diverse variant types.

Collectively, this study significantly advances our understanding of complex genetic variants and their functional implications, establishing a new framework for human health research and pangenome studies in other species.

Ph.D. student Yifei Wang and Research Assistant Professor Zhongqu Duan are recognized as the co-first authors of the study, with Professor Jian Yang serving as the last author. This research received support from the National Natural Science Foundation of China, the National Key R&D Program, the Zhejiang “Pioneer & Leading Goose” Program, and the New Cornerstone Science Foundation. Computational resources were provided by Westlake University’s High-Performance Computing Center.

Professor Jian Yang’s research group is dedicated to developing statistical genetics and bioinformatics methodologies. By conducting in-depth analyses of genomic and multi-omic data from large-scale population cohorts, they aim to uncover the genetic architecture and molecular mechanisms underlying complex diseases, translating these findings into novel strategies for disease diagnosis, drug target identification, and precision medicine.

Related links:
Paper link: https://www.nature.com/articles/s41586-026-10315-y
Jian Yang lab website: https://yanglab.westlake.edu.cn/

Media contact:
Chi Zhang
media@westlake.edu.cn
+86-15659837873

SOURCE Westlake University

This article is provided by a third-party content provider. SeaPRwire (https://www.seaprwire.com/) makes no warranties or representations regarding its content.

Category: Top News, Daily News

SeaPRwire provides global press release distribution services for companies and organizations, covering more than 6,500 media outlets, 86,000 editors and journalists, and over 3.5 million end-user desktop and mobile apps. SeaPRwire supports multilingual press release distribution in English, Japanese, German, Korean, French, Russian, Indonesian, Malay, Vietnamese, Chinese, and more.

jones