Scientific Data

Papers
(The H4-Index of Scientific Data is 72. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-09-01 to 2025-09-01.)
ArticleCitations
A database of seed plants on taxonomy, geography and ecology in the Qinling-Daba Mountains and adjacent areas1351
Ultra-deep sequencing data from a liquid biopsy proficiency study demonstrating analytic validity600
CreelCat, a Catalog of United States Inland Creel and Angler Survey Data569
Directional wave buoy data measured near Campbell Island, New Zealand449
RNA-seq of peripheral blood mononuclear cells of congenital generalized lipodystrophy type 2 patients360
Author Correction: Mobility networks in Greater Mexico City328
Author Correction: The Plegma dataset: Domestic appliance-level and aggregate electricity demand with metadata from Greece320
Reinterpretation of prostate cancer pathology by Appl1, Sortilin and Syndecan-1 biomarkers317
A Synthetic Dataset for Semantic Segmentation of Waterbodies in Out-of-Distribution Situations308
Linking Research Data with Physically Preserved Research Materials in Chemistry296
Empowering open data sharing for social good: a privacy-aware approach295
A daily high-resolution (1 km) human thermal index collection over the North China Plain from 2003 to 2020257
The first high-quality chromosome-level genome of Parupeneus biaculeatus using HiFi and Hi-C data248
Chromosome-level genome assembly of the Rhizoctonia solani217
Author Correction: Whales from space dataset, an annotated satellite image dataset of whales for training machine learning models200
The R package for DICOM to brain imaging data structure conversion167
Chromosome-level genome assembly of Oriental chestnut gall wasp (Dryocosmus kuriphilus)163
Multi-proteomics and interactome dataset of tick-borne encephalitis virus infected host cells152
EEG Dataset for the Recognition of Different Emotions Induced in Voice-User Interaction150
A global dataset of fossil fungi records from the Cenozoic144
The interplay between brain and behavior during development: A multisite effort to generate and share simulated datasets142
A dataset of the daily edge of each polynya in the Antarctic132
Molecular landscape of respiratory infection: A large-scale, multi-centre blood transcriptome dataset128
Students’ performance dataset for using machine learning technique in physics education research125
Near-complete reference genome assembly of Hoya carnosa123
Exploring the electrophysiology of Parkinson’s disease with magnetoencephalography and deep brain recordings115
A thermosurvey dataset: Older adults’ experiences and adaptation to urban heat and climate change113
Generating FAIR research data in experimental tribology110
NeuMa - the absolute Neuromarketing dataset en route to an holistic understanding of consumer behaviour109
An open-access database of nature-based carbon offset project boundaries107
A Frontal Ablation Dataset for 49 Tidewater Glaciers in Greenland107
Chromosome-level assemblies of cultivated water chestnut Trapa bicornis and its wild relative Trapa incisa104
A global 1 km resolution daily surface longwave radiation product from MODIS satellite data from 2000–2023104
Head model dataset for mixed reality navigation in neurosurgical interventions for intracranial lesions101
Machine learning-ready remote sensing data for Maya archaeology100
The Superfund Research Program Analytics Portal: linking environmental chemical exposure to biological phenotypes99
Global Ocean Particulate Organic Phosphorus, Carbon, Oxygen for Respiration, and Nitrogen (GO-POPCORN)98
Canopy height model and NAIP imagery pairs across CONUS97
A neuroimaging dataset during sequential color qualia similarity judgments with and without reports97
Author Correction: Open-access quantitative MRI data of the spinal cord and reproducibility across participants, sites and manufacturers97
ML-extendable framework for multiphysics-multiscale simulation workflow and data management using Kadi4Mat96
PPB-Affinity: Protein-Protein Binding Affinity dataset for AI-based protein drug discovery95
Sea ice records over more than a century at an observatory facing the Okhotsk coast of Hokkaido, Japan93
SDUST2023GRA_MSS: the new global marine gravity anomaly model determined from mean sea surface model91
Unveiling the Spatiotemporal Dynamics of Global Brain Circulation: A Comprehensive Corpus (2000–2024)91
Whole genome and exome sequencing reference datasets from a multi-center and cross-platform benchmark study90
A focus groups study on data sharing and research data management90
Chromosome-level haplotype-resolved genome assembly of bread wheat’s wild relative Aegilops mutica90
A semantic approach to mapping the Provenance Ontology to Basic Formal Ontology89
A Field-Level Asset Mapping Dataset for England’s Agricultural Sector88
Optimizing drug combination and mechanism analysis based on risk pathway crosstalk in pan cancer88
A large-scale dataset of patient summaries for retrieval-based clinical decision support systems88
An 8-model ensemble of CMIP6-derived ocean surface wave climate88
PAVC: The foundation for a Pan-Arctic Vegetation Cover database86
Enrichment of lung cancer computed tomography collections with AI-derived annotations85
Identifying Cocoa Flower Visitors: A Deep Learning Dataset84
Chinese environmentally extended input-output database for 2017 and 201884
MarNemaFunDiv: a first comprehensive dataset of functional traits for marine nematodes83
District-scale surface temperatures generated from high-resolution longitudinal thermal infrared images82
A dataset of scientific dates from archaeological sites in eastern Africa spanning 5000 BCE to 1800 CE82
Multi-Domain Indoor Dataset for Visual Place Recognition and Anomaly Detection by Mobile Robots78
Chromosome-level genome assembly of rock carp (Procypris rabaudi)77
Making Mathematical Research Data FAIR: Pathways to Improved Data Sharing77
Bioclimatic atlas of the terrestrial Arctic76
A haplotype-resolved chromosomal-level genome assembly of Oxalis articulata76
GARD-LENS: A downscaled large ensemble dataset for understanding future climate and its uncertainties76
Mediterranean marine sediment cores database: unlocking paleoclimatic signals for the last 20,000 years75
What’s the TEE: Metrics of Temperature Extremes in Europe NUTS Regions (1980-2024)73
Spatial and temporal data to study residential heat decarbonisation pathways in England and Wales72
A century-long eddy-resolving simulation of global oceanic large- and mesoscale state72
Dataset on the effects of psychological care on depression and suicide ideation in underrepresented children72
A Cross Spatio-Temporal Pathology-based Lung Nodule Dataset72
0.1124279499054