Science Cast

FastDedup - A fast and memory-efficient tool for read deduplication

librarianMay 4, 2026 4:56pm

Views (5)
Comments (0)

Export Citation

Voice is AI-generated

Connected to paperThis paper is a preprint and has not been certified by peer review

FastDedup - A fast and memory-efficient tool for read deduplication

bioRxivPDFMay 4, 2026 12:00am

Authors

Ribes, R.; Mandier, C.; Baniel, A.

Abstract

PCR duplicate removal is a critical first step in high-throughput sequencing pipelines, yet existing tools struggle with speed, memory, or correctness at modern dataset scales. We present FastDedup, a Rust-based FASTX deduplicator that transforms each read or read pair to a compact xxh3 hash fingerprint, drastically reducing memory usage and binding most of the execution time to disk I/O. Benchmarked against six competing tools on synthetic human WGS datasets up to 300 million reads, FastDedup consistently leads on paired-end data, running more than 10 times faster than fastp. It also outperforms all tools on uncompressed single-end data, deduplicating a million reads in a second. We additionally report correctness failures in prinseq++ and clumpify. FastDedup is available under the MIT License via GitHub, Bioconda, and Cargo

TwitterandLinkedIn

0 comments

Add comment

FastDedup - A fast and memory-efficient tool for read deduplication

FastDedup - A fast and memory-efficient tool for read deduplication

AI-powered Paper ChatBeta

AI-powered Paper ChatBeta

0 comments