GigaScience

Papers
(The median citation count of GigaScience is 4. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2020-11-01 to 2024-11-01.)
ArticleCitations
Twelve years of SAMtools and BCFtools6415
Significantly improving the quality of genome assemblies through curation967
SoupX removes ambient RNA contamination from droplet-based single-cell RNA sequencing data735
HTSlib: C library for reading/writing high-throughput sequencing data233
A chromosome-level genome of the spider Trichonephila antipodiana reveals the genetic basis of its polyphagy and evidence of an ancient whole-genome duplication event206
BiG-SLiCE: A highly scalable tool maps the diversity of 1.2 million biosynthetic gene clusters114
GALLO: An R package for genomic annotation and integration of multiple data sources in livestock for positional candidate loci109
A chromosome-level genome assembly for the Pacific oyster Crassostrea gigas104
Comparison of the two up-to-date sequencing technologies for genome assembly: HiFi reads of Pacific Biosciences Sequel II system and ultralong reads of Oxford Nanopore102
Long-read assembly of the Brassica napus reference genome Darmor-bzh76
Comparison of long-read methods for sequencing and assembly of a plant genome67
Inferring microbiota functions from taxonomic genes: a review64
Parliament2: Accurate structural variant calling at scale57
CNVpytor: a tool for copy number variation detection and analysis from read depth and allele imbalance in whole-genome sequencing52
Chromosome-level genome assembly of the hard-shelled mussel Mytilus coruscus, a widely distributed species from the temperate areas of East Asia49
Preventing dataset shift from breaking machine-learning biomarkers48
Dadasnake, a Snakemake implementation of DADA2 to process amplicon sequencing data for microbial ecology48
NuCLS: A scalable crowdsourcing approach and dataset for nucleus classification and segmentation in breast cancer44
Accelerated deciphering of the genetic architecture of agricultural economic traits in pigs using a low-coverage whole-genome sequencing strategy44
Chromosome-level draft genome of a diploid plum (Prunus salicina)43
BiSulfite Bolt: A bisulfite sequencing analysis platform42
An improved ovine reference genome assembly to facilitate in-depth functional annotation of the sheep genome41
DeePhage: distinguishing virulent and temperate phage-derived sequences in metavirome data with a deep learning approach41
long-read-tools.org: an interactive catalogue of analysis methods for long-read sequencing data39
Long-read and chromosome-scale assembly of the hexaploid wheat genome achieves high resolution for research and breeding38
A new duck genome reveals conserved and convergently evolved chromosome architectures of birds and mammals37
Understanding the impact of preprocessing pipelines on neuroimaging cortical surface analyses36
Chromosome-level reference genome of the European wasp spiderArgiope bruennichi: a resource for studies on range expansion and evolutionary adaptation34
The genome of the venomous snail Lautoconus ventricosus sheds light on the origin of conotoxin diversity33
How to remove or control confounds in predictive models, with applications to brain biomarkers33
Genome size evolution in the diverse insect order Trichoptera33
The haplotype-resolved chromosome pairs of a heterozygous diploid African cassava cultivar reveal novel pan-genome and allele-specific transcriptome features33
Streamlining data-intensive biology with workflow systems31
Multi-stage malaria parasite recognition by deep learning31
The Gene Expression Deconvolution Interactive Tool (GEDIT): accurate cell type quantification from gene expression data31
A chromosome-level genome assembly of the oriental river prawn, Macrobrachium nipponense30
Torix Rickettsia are widespread in arthropods and reflect a neglected symbiosis30
Fractional ridge regression: a fast, interpretable reparameterization of ridge regression30
The germline mutational process in rhesus macaque and its implications for phylogenetic dating29
Efficient DNA sequence compression with neural networks29
Population modeling with machine learning can enhance measures of mental health29
U-Limb: A multi-modal, multi-center database on arm motion control in healthy and post-stroke conditions28
A microbial gene catalog of anaerobic digestion from full-scale biogas plants28
Loop detection using Hi-C data with HiCExplorer27
De novo genome assemblies of butterflies26
Association mapping across a multitude of traits collected in diverse environments in maize25
Localized effect of treated wastewater effluent on the resistome of an urban watershed25
Building the mega single-cell transcriptome ocular meta-atlas25
The rise of genomics in snake venom research: recent advances and future perspectives24
Comparative analysis of 7 short-read sequencing platforms using the Korean Reference Genome: MGI and Illumina sequencing benchmark for whole-genome sequencing24
Trajectories, bifurcations, and pseudo-time in large clinical datasets: applications to myocardial infarction and diabetes data24
Genetic demultiplexing of pooled single-cell RNA-sequencing samples in cancer facilitates effective experimental design24
Label3DMaize: toolkit for 3D point cloud data annotation of maize shoots24
SYNPRED: prediction of drug combination effects in cancer using different synergy metrics and ensemble learning23
A chromosome-level reference genome of Ensete glaucum gives insight into diversity and chromosomal and repetitive sequence evolution in the Musaceae23
Two high-qualityde novogenomes from single ethanol-preserved specimens of tiny metazoans (Collembola)23
Genome sequence and genetic diversity analysis of an under-domesticated orphan crop, white fonio (Digitaria exilis)23
Chromosomal genome ofTriplophysa bleekeriprovides insights into its evolution and environmental adaptation22
Mantis: flexible and consensus-driven genome annotation22
Future-proofing and maximizing the utility of metadata: The PHA4GE SARS-CoV-2 contextual data specification package22
A new mass spectral library for high-coverage and reproducible analysis of the Plasmodium falciparum–infected red blood cell proteome22
Desiderata for the development of next-generation electronic health record phenotype libraries21
ISA API: An open platform for interoperable life science experimental metadata21
High-throughput proteomics and in vitro functional characterization of the 26 medically most important elapids and vipers from sub-Saharan Africa21
Assessing species coverage and assembly quality of rapidly accumulating sequenced genomes20
Message in a Bottle—Metabarcoding enables biodiversity comparisons across ecoregions20
Benchmarking ultra-high molecular weight DNA preservation methods for long-read and long-range sequencing20
0s and 1s in marine molecular research: a regional HPC perspective20
MB-GAN: Microbiome Simulation via Generative Adversarial Network19
Accurate assembly of the olive baboon (Papio anubis) genome using long-read and Hi-C data19
iGenomics: Comprehensive DNA sequence analysis on your Smartphone18
M2aia—Interactive, fast, and memory-efficient analysis of 2D and 3D multi-modal mass spectrometry imaging data18
An in vitro whole-cell electrophysiology dataset of human cortical neurons18
Improved microbial genomes and gene catalog of the chicken gut from metagenomic sequencing of high-fidelity long reads18
Comparative analysis of common alignment tools for single-cell RNA sequencing18
Antibiotic resistance genes are differentially mobilized according to resistance mechanism17
Adaptive venom evolution and toxicity in octopods is driven by extensive novel gene formation, expansion, and loss17
Efficient real-time selective genome sequencing on resource-constrained devices17
Correcting for experiment-specific variability in expression compendia can remove underlying signals17
Spacemake: processing and analysis of large-scale spatial transcriptomics data16
Centering inclusivity in the design of online conferences—An OHBM–Open Science perspective16
Agricultural plant cataloging and establishment of a data framework from UAV-based crop images by computer vision16
What the Phage: a scalable workflow for the identification and analysis of phage sequences16
Toward global integration of biodiversity big data: a harmonized metabarcode data generation module for terrestrial arthropods16
DENTIST—using long reads for closing assembly gaps at high accuracy16
Fungal and ciliate protozoa are the main rumen microbes associated with methane emissions in dairy cattle15
AXIOME3: Automation, eXtension, and Integration Of Microbial Ecology15
SSNOMBACTER: A collection of scattering-type scanning near-field optical microscopy and atomic force microscopy images of bacterial cells15
Multi-modal data collection for measuring health, behavior, and living environment of large-scale participant cohorts15
Evolution of complex genome architecture in gymnosperms15
Myth-busting the provider-user relationship for digital sequence information15
A curated human cellular microRNAome based on 196 primary cell types15
Chromosome-level genome assemblies of the malaria vectors Anopheles coluzzii and Anopheles arabiensis15
Benchmarking missing-values approaches for predictive models on health databases14
Vulcan: Improved long-read mapping and structural variant calling via dual-mode alignment14
Triku: a feature selection method based on nearest neighbors for single-cell data14
Linking big biomedical datasets to modular analysis with Portable Encapsulated Projects14
Genome diversity in Ukraine14
A chromosome-level reference genome of the hazelnut, Corylus heterophylla Fisch14
MesKit: a tool kit for dissecting cancer evolution of multi-region tumor biopsies through somatic alterations14
The Manchurian Walnut Genome: Insights into Juglone and Lipid Biosynthesis14
Synonymous variants that disrupt messenger RNA structure are significantly constrained in the human population13
Identification of a differentiation stall in epithelial mesenchymal transition in histone H3–mutant diffuse midline glioma13
Clonality, inbreeding, and hybridization in two extremotolerant black yeasts13
BIGwas: Single-command quality control and association testing for multi-cohort and biobank-scale GWAS/PheWAS data13
Quantifying research interests in 7,521 mammalian species with h-index: a case study13
Democratizing data-independent acquisition proteomics analysis on public cloud infrastructures via the Galaxy framework13
Analysis of SARS-CoV-2 known and novel subgenomic mRNAs in cell culture, animal model, and clinical samples using LeTRS, a bioinformatic tool to identify unique sequence identifiers13
RNAProt: an efficient and feature-rich RNA binding protein binding site predictor13
Open and reusable annotated mass spectrometry dataset of a chemodiverse collection of 1,600 plant extracts13
Lilikoi V2.0: a deep learning–enabled, personalized pathway-based R package for diagnosis and prognosis predictions using metabolomics data12
High-quality chromosome-level genome assembly and full-length transcriptome analysis of the pharaoh ant Monomorium pharaonis12
A high-quality assembly reveals genomic characteristics, phylogenetic status, and causal genes for leucism plumage of Indian peafowl12
An analysis of security vulnerabilities in container images for scientific data analysis12
Internet of Samples (iSamples): Toward an interdisciplinary cyberinfrastructure for material samples12
Maternal plasma lipids are involved in the pathogenesis of preterm birth12
Chromosome-length genome assembly and linkage map of a critically endangered Australian bird: the helmeted honeyeater12
Fluorescence microscopy datasets for training deep neural networks12
Genomic view of the diversity and functional role of archaea and bacteria in the skeleton of the reef-building coralsPorites luteaandIsopora palifera12
A chromosome-level genome assembly and annotation of the desert horned lizard, Phrynosoma platyrhinos, provides insight into chromosomal rearrangements among reptiles12
A high-throughput multiplexing and selection strategy to complete bacterial genomes12
Chromosome-level genome assembly of the black widow spiderLatrodectus elegansilluminates composition and evolution of venom and silk proteins12
SurvBenchmark: comprehensive benchmarking study of survival analysis methods using both omics data and clinical data11
ChronoRoot: High-throughput phenotyping by deep segmentation networks reveals novel temporal parameters of plant root system architecture11
AMR-meta: a k-mer and metafeature approach to classify antimicrobial resistance from high-throughput short-read metagenomics data11
3D-Beacons: decreasing the gap between protein sequences and structures through a federated network of protein structure data resources11
ViReMa: a virus recombination mapper of next-generation sequencing data characterizes diverse recombinant viral nucleic acids11
A proteomic approach reveals possible molecular mechanisms and roles for endosymbiotic bacteria in begomovirus transmission by whiteflies11
DeePVP: Identification and classification of phage virion proteins using deep learning11
High-quality genome assemblies from key Hawaiian coral species11
Multi-dimensional leaf phenotypes reflect root system genotype in grafted grapevine over the growing season11
A large-scale metagenomic survey dataset of the post-weaning piglet gut lumen11
Chromosome-level assembly and annotation of the blue catfishIctalurus furcatus, an aquaculture species for hybrid catfish reproduction, epigenetics, and heterosis studies10
Galaxy CLIP-Explorer: a web server for CLIP-Seq data analysis10
Tourmaline: A containerized workflow for rapid and iterable amplicon sequence analysis using QIIME 2 and Snakemake10
Monash DaCRA fPET-fMRI: A dataset for comparison of radiotracer administration for high temporal resolution functional FDG-PET10
Metabarcoding versus mapping unassembled shotgun reads for identification of prey consumed by arthropod epigeal predators10
DrivAER: Identification of driving transcriptional programs in single-cell RNA sequencing data10
Tool recommender system in Galaxy using deep learning10
Association of female reproductive tract microbiota with egg production in layer chickens10
High-throughput microscopy reveals the impact of multifactorial environmental perturbations on colorectal cancer cell growth10
CAT: a computational anatomy toolbox for the analysis of structural MRI data10
Alignstein: Optimal transport for improved LC-MS retention time alignment10
Data standardization of plant–pollinator interactions10
Transcriptome annotation in the cloud: complexity, best practices, and cost10
xAtlas: scalable small variant calling across heterogeneous next-generation sequencing experiments10
Best genome sequencing strategies for annotation of complex immune gene families in wildlife10
Statistical quantification of confounding bias in machine learning models10
Genome sequencing of deep-sea hydrothermal vent snails reveals adaptions to extreme environments10
GuideMaker: Software to design CRISPR-Cas guide RNA pools in non-model genomes9
Near-chromosomalde novoassembly of Bengal tiger genome reveals genetic hallmarks of apex predation9
Profiling the baseline performance and limits of machine learning models for adaptive immune receptor repertoire classification9
Fully resolved assembly of Cryptosporidium parvum9
Evaluating short-term forecasting of COVID-19 cases among different epidemiological models under a Bayesian framework9
MBGC: Multiple Bacteria Genome Compressor9
Fusion transcripts and their genomic breakpoints in polyadenylated and ribosomal RNA–minus RNA sequencing data9
Chromosome-level genome assemblies ofChanna argusandChanna maculataand comparative analysis of their temperature adaptability9
Chromosome-level genome assembly, annotation, and phylogenomics of the gooseneck barnaclePollicipes pollicipes9
Chromosome-level genome of the globe skimmer dragonfly (Pantala flavescens)9
X-ray microtomography–based atlas of mouse cranial development8
Genomic analyses provide insights into the evolution and salinity adaptation of halophyteTamarix chinensis8
Making experimental data tables in the life sciences more FAIR: a pragmatic approach8
Driftage: a multi-agent system framework for concept drift detection8
Citation needed? Wikipedia bibliometrics during the first wave of the COVID-19 pandemic8
Lessons learned about the biology and genomics of Diaphorina citri infection with “Candidatus Liberibacter asiaticus” by integrating new and archived organ-specific transcriptome data8
Qiber3D—an open-source software package for the quantitative analysis of networks from 3D image stacks8
Genomes and demographic histories of the endangeredBretschneidera sinensis(Akaniaceae)8
Disease classification for whole-blood DNA methylation: Meta-analysis, missing values imputation, and XAI8
GigaSOM.jl: High-performance clustering and visualization of huge cytometry datasets8
Chromosome-level genome and the identification of sex chromosomes inUloborus diversus8
Characterization and simulation of metagenomic nanopore sequencing data with Meta-NanoSim7
File-based localization of numerical perturbations in data analysis pipelines7
A high-quality assembled genome and its comparative analysis decode the adaptive molecular mechanism of the number one Chinese cotton variety CRI-127
Unifying package managers, workflow engines, and containers: Computational reproducibility with BioNix7
Improved chromosome-level genome assembly of the Glanville fritillary butterfly (Melitaea cinxia) integrating Pacific Biosciences long reads and a high-density linkage map7
A scalable software solution for anonymizing high-dimensional biomedical data7
ChiRA: an integrated framework for chimeric read analysis from RNA-RNA interactome and RNA structurome data7
A chromosome-level genome assembly and intestinal transcriptome of Trypoxylus dichotomus (Coleoptera: Scarabaeidae) to understand its lignocellulose digestion ability7
Confound-leakage: confound removal in machine learning leads to leakage7
The complexity landscape of viral genomes7
A machine learning framework for discovery and enrichment of metagenomics metadata from open access publications7
biotoolsSchema: a formalized schema for bioinformatics software description7
An overview of the National COVID-19 Chest Imaging Database: data quality and cohort analysis7
A high-quality genome and comparison of short- versus long-read transcriptome of the palaearctic duck Aythya fuligula (tufted duck)7
Delineating regions of interest for mass spectrometry imaging by multimodally corroborated spatial segmentation7
Genome assembly ofMusa beccariishows extensive chromosomal rearrangements and genome expansion during evolution of Musaceae genomes6
The Pioneer Advantage: Filling the blank spots on the map of genome diversity in Europe6
The founding charter of the Omic Biodiversity Observation Network (Omic BON)6
Making Common Fund data more findable: catalyzing a data ecosystem6
KOREF_S1: phased, parental trio-binned Korean reference genome using long reads and Hi-C sequencing methods6
Chromosome-level genome assembly for the Aldabra giant tortoise enables insights into the genetic health of a threatened population6
Construction and analysis of the chromosome-level haplotype-resolved genomes of two Crassostrea oyster congeners: Crassostrea angulata and Crassostrea gigas6
EraSOR: a software tool to eliminate inflation caused by sample overlap in polygenic score analyses6
Chromosome-level genome assembly ofPlazaster borealissheds light on the morphogenesis of multiarmed starfish and its regenerative capacity6
A Chromosome-level assembly of the Japanese eel genome, insights into gene duplication and chromosomal reorganization6
Open-source benchmarking of IBD segment detection methods for biobank-scale cohorts6
The state of Medusozoa genomics: current evidence and future challenges6
Parvovirus dark matter in the cloaca of wild birds6
SimFFPE and FilterFFPE: improving structural variant calling in FFPE samples6
Chromatin conformation capture (Hi-C) sequencing of patient-derived xenografts: analysis guidelines6
CoVEffect: interactive system for mining the effects of SARS-CoV-2 mutations and variants based on deep learning6
Interpretable network propagation with application to expanding the repertoire of human proteins that interact with SARS-CoV-26
Identifying the effect of vancomycin on health care–associated methicillin-resistant Staphylococcus aureus strains using bacteriological and physiological media6
NETMAGE: A human disease phenotype map generator for the network-based visualization of phenome-wide association study results5
learnMSA: learning and aligning large protein families5
Interpretable network-guided epistasis detection5
The new COST Action European Venom Network (EUVEN)—synergy and future perspectives of modern venomics5
TheCapparis spinosavar.herbaceagenome provides the first genomic instrument for a diversity and evolution study of the Capparaceae family5
NPSV: A simulation-driven approach to genotyping structural variants in whole-genome sequencing data5
Chromosome-level genome assembly of goose provides insight into the adaptation and growth of local goose breeds5
Generation and application of pseudo–long reads for metagenome assembly5
The Nencki-Symfonia electroencephalography/event-related potential dataset: Multiple cognitive tasks and resting-state data collected in a sample of healthy adults5
Computational prediction of human deep intronic variation5
Stardust: improving spatial transcriptomics data analysis through space-aware modularity optimization-based clustering5
Bias-invariant RNA-sequencing metadata annotation5
Construction of a new chromosome-scale, long-read reference genome assembly for the Syrian hamster,Mesocricetus auratus5
A methodological approach to correlate tumor heterogeneity with drug distribution profile in mass spectrometry imaging data5
Metaphor—A workflow for streamlined assembly and binning of metagenomes5
Toward a scalable framework for reproducible processing of volumetric, nanoscale neuroimaging datasets5
A novel ground truth multispectral image dataset with weight, anthocyanins, and Brix index measures of grape berries tested for its utility in machine learning pipelines5
Fast and accurate estimation of multidimensional site frequency spectra from low-coverage high-throughput sequencing data5
A new haplotype-resolved turkey genome to enable turkey genetics and genomics research5
A high-quality pseudo-phased genome for Melaleuca quinquenervia shows allelic diversity of NLR-type resistance genes5
Identification of transcriptional regulatory variants in pig duodenum, liver, and muscle tissues4
FriendlyClearMap: an optimized toolkit for mouse brain mapping and analysis4
Publishing data to support the fight against human vector-borne diseases4
The GEN-ERA toolbox: unified and reproducible workflows for research in microbial genomics4
Studying mutation rate evolution in primates—the effects of computational pipelines and parameter choices4
ricu: R’s interface to intensive care data4
A chromosome-level genome of the booklouse,Liposcelis brunnea, provides insight into louse evolution and environmental stress adaptation4
High temporal resolution Nanopore sequencing dataset of SARS-CoV-2 and host cell RNAs4
Honey bee (Apis mellifera) wing images: a tool for identification and conservation4
Accurate gene consensus at low nanopore coverage4
Computational reproducibility of Jupyter notebooks from biomedical publications4
FAIR data station for lightweight metadata management and validation of omics studies4
CopyDetective: Detection threshold–aware copy number variant calling in whole-exome sequencing data4
MLcps: machine learning cumulative performance score for classification problems4
MuLan-Methyl—multiple transformer-based language models for accurate DNA methylation prediction4
ChemChaste: Simulating spatially inhomogeneous biochemical reaction–diffusion systems for modeling cell–environment feedbacks4
Chromosome-level genome of the poultry shaft louse Menopon gallinae provides insight into the host-switching and adaptive evolution of parasitic lice4
Chromosome-level genome assembly of the shuttles hoppfish, Periophthalmus modestus4
A decade of GigaScience: A perspective on conservation genetics4
The Global Atlas of Bamboo and Rattan (GABR) Phase II: new resources for sustainable development4
0.0954430103302