Scientific Data

Papers
(The median citation count of Scientific Data is 2. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-05-01 to 2025-05-01.)
ArticleCitations
Chromosome-level haplotype-resolved genome assembly of bread wheat’s wild relative Aegilops mutica955
Quantum computing dataset of maximum independent set problem on king lattice of over hundred Rydberg atoms487
SDUST2023GRA_MSS: the new global marine gravity anomaly model determined from mean sea surface model430
Shotgun metagenomes from productive lakes in an urban region of Sweden349
A database of seed plants on taxonomy, geography and ecology in the Qinling-Daba Mountains and adjacent areas310
Ultra-deep sequencing data from a liquid biopsy proficiency study demonstrating analytic validity286
Slovak database of speech affected by neurodegenerative diseases248
Author Correction: Open-access quantitative MRI data of the spinal cord and reproducibility across participants, sites and manufacturers245
CreelCat, a Catalog of United States Inland Creel and Angler Survey Data240
Monitoring non-pharmaceutical public health interventions during the COVID-19 pandemic230
A thermosurvey dataset: Older adults’ experiences and adaptation to urban heat and climate change205
Unified access to up-to-date residue-level annotations from UniProtKB and other biological databases for PDB data199
Directional wave buoy data measured near Campbell Island, New Zealand196
RNA-seq of peripheral blood mononuclear cells of congenital generalized lipodystrophy type 2 patients193
Author Correction: Mobility networks in Greater Mexico City188
The Carbon Catalogue, carbon footprints of 866 commercial products from 8 industry sectors and 5 continents187
Occurrence of human infection with Salmonella Typhi in sub-Saharan Africa180
Author Correction: The Plegma dataset: Domestic appliance-level and aggregate electricity demand with metadata from Greece161
The Superfund Research Program Analytics Portal: linking environmental chemical exposure to biological phenotypes160
District-scale surface temperatures generated from high-resolution longitudinal thermal infrared images140
Reinterpretation of prostate cancer pathology by Appl1, Sortilin and Syndecan-1 biomarkers136
A Synthetic Dataset for Semantic Segmentation of Waterbodies in Out-of-Distribution Situations124
Bioclimatic atlas of the terrestrial Arctic124
Canopy height model and NAIP imagery pairs across CONUS114
Linking Research Data with Physically Preserved Research Materials in Chemistry112
Enrichment of lung cancer computed tomography collections with AI-derived annotations112
Empowering open data sharing for social good: a privacy-aware approach110
BUS-UCLM: Breast ultrasound lesion segmentation dataset103
A Cross Spatio-Temporal Pathology-based Lung Nodule Dataset100
A semantic approach to mapping the Provenance Ontology to Basic Formal Ontology99
Statistical performance indicators and index—a new tool to measure country statistical capacity98
A neuroimaging dataset during sequential color qualia similarity judgments with and without reports94
Making Mathematical Research Data FAIR: Pathways to Improved Data Sharing93
Spatial and temporal data to study residential heat decarbonisation pathways in England and Wales91
A century-long eddy-resolving simulation of global oceanic large- and mesoscale state91
Author Correction: Whales from space dataset, an annotated satellite image dataset of whales for training machine learning models88
In toto light sheet fluorescence microscopy live imaging datasets of Ceratitis capitata embryonic development88
The R package for DICOM to brain imaging data structure conversion87
Chromosome-level genome assembly of Oriental chestnut gall wasp (Dryocosmus kuriphilus)87
EEG Dataset for the Recognition of Different Emotions Induced in Voice-User Interaction86
Multi-proteomics and interactome dataset of tick-borne encephalitis virus infected host cells86
A large-scale dataset of patient summaries for retrieval-based clinical decision support systems86
A global dataset of fossil fungi records from the Cenozoic84
Dynamic urban morphology mapping in Chinese cities based on local climate zone approach82
The interplay between brain and behavior during development: A multisite effort to generate and share simulated datasets82
Coswara: A respiratory sounds and symptoms dataset for remote screening of SARS-CoV-2 infection81
PPB-Affinity: Protein-Protein Binding Affinity dataset for AI-based protein drug discovery80
An agenda for addressing bias in conflict data80
A Frontal Ablation Dataset for 49 Tidewater Glaciers in Greenland78
Globe-LFMC 2.0, an enhanced and updated dataset for live fuel moisture content research78
NeuMa - the absolute Neuromarketing dataset en route to an holistic understanding of consumer behaviour78
Chromosome-level genome assembly of the traditional medicinal plant Lindera aggregata76
The landscape of abiotic and biotic stress-responsive splice variants with deep RNA-seq datasets in hot pepper75
FIGARO-E3: a high-resolution extended multi-regional input-output database consistent with official statistics73
Whole genome and exome sequencing reference datasets from a multi-center and cross-platform benchmark study73
A dataset of the daily edge of each polynya in the Antarctic73
A multilayered urban tree dataset of point clouds, quantitative structure and graph models73
A construction waste landfill dataset of two districts in Beijing, China from high resolution satellite images72
Ensemble of CMIP6 derived reference and potential evapotranspiration with radiative and advective components71
Head model dataset for mixed reality navigation in neurosurgical interventions for intracranial lesions71
A comprehensive genomic and transcriptomic dataset of triple-negative breast cancers71
A western United States snow reanalysis dataset over the Landsat era from water years 1985 to 202170
Home monitoring with connected mobile devices for asthma attack prediction with machine learning70
Dataset on the effects of psychological care on depression and suicide ideation in underrepresented children69
Global Ocean Particulate Organic Phosphorus, Carbon, Oxygen for Respiration, and Nitrogen (GO-POPCORN)69
Enhancing radiomics and Deep Learning systems through the standardization of medical imaging workflows68
Generating FAIR research data in experimental tribology68
Analysis of AlphaMissense data in different protein groups and structural context68
An 8-model ensemble of CMIP6-derived ocean surface wave climate67
A focus groups study on data sharing and research data management67
Chromosome-level assemblies of cultivated water chestnut Trapa bicornis and its wild relative Trapa incisa67
T1DiabetesGranada: a longitudinal multi-modal dataset of type 1 diabetes mellitus67
OPERAnet, a multimodal activity recognition dataset acquired from radio frequency and vision-based sensors67
Scenarios of future Indian electricity demand accounting for space cooling and electric vehicle adoption67
A Global Database of Soil Plant Available Phosphorus67
A large EEG dataset for studying cross-session variability in motor imagery brain-computer interface66
A daily high-resolution (1 km) human thermal index collection over the North China Plain from 2003 to 202064
A large-scale multi-label 12-lead electrocardiogram database with standardized diagnostic statements63
Dataset on heavy metal pollution assessment in freshwater ecosystems63
An open-access database of nature-based carbon offset project boundaries63
Very High Resolution Projections over Italy under different CMIP5 IPCC scenarios62
China’s provincial process CO2 emissions from cement production during 1993–201961
An open dataset for oracle bone character recognition and decipherment61
A dataset for deep learning based detection of printed circuit board surface defect61
QMugs, quantum mechanical properties of drug-like molecules60
An automatic multi-tissue human fetal brain segmentation benchmark using the Fetal Tissue Annotation Dataset60
RailFOD23: A dataset for foreign object detection on railroad transmission lines59
Scaling up SoccerNet with multi-view spatial localization and re-identification58
GARD-LENS: A downscaled large ensemble dataset for understanding future climate and its uncertainties58
Exploring the electrophysiology of Parkinson’s disease with magnetoencephalography and deep brain recordings58
Optimizing drug combination and mechanism analysis based on risk pathway crosstalk in pan cancer58
A 24-hour population distribution dataset based on mobile phone data from Helsinki Metropolitan Area, Finland57
Machine learning-ready remote sensing data for Maya archaeology57
Hydrological model-based streamflow reconstruction for Indian sub-continental river basins, 1951–202156
Multimodal Data for the Detection of Freezing of Gait in Parkinson’s Disease56
Global monthly gridded atmospheric carbon dioxide concentrations under the historical and future scenarios56
Chinese environmentally extended input-output database for 2017 and 201855
Metagenomic sequencing and reconstruction of 82 microbial genomes from barley seed communities54
Constructing a global human epidemic database using open-source digital biosurveillance54
3D motion analysis dataset of healthy young adult volunteers walking and running on overground and treadmill54
Measurement of ship-generated waves in German coastal waterways from 1998–202254
LungHist700: A dataset of histological images for deep learning in pulmonary pathology53
VME: A Satellite Imagery Dataset and Benchmark for Detecting Vehicles in the Middle East and Beyond53
REAL-Colon: A dataset for developing real-world AI applications in colonoscopy52
Mobility of Erasmus+ students in Europe: Geolocated individual and aggregate mobility flows from 2014 to 202252
High-resolution ethograms, accelerometer recordings, and behavioral time series of Japanese quail52
A database of in situ water temperatures for large inland lakes across the coterminous United States52
Contextualized race and ethnicity annotations for clinical text from MIMIC-III51
Chromosome-scale genome assembly and annotation of Xenocypris argentea51
Accumulation-depuration data collection in support of toxicokinetic modelling51
An Observation-Based Dataset of Global Sub-Daily Precipitation Indices (GSDR-I)50
Multiorder hydrologic Position for Europe — a Set of Features for Machine Learning and Analysis in Hydrology50
Global photovoltaic solar panel dataset from 2019 to 202250
A soil database from Queretaro, Mexico for assessment of crop and irrigation water requirements49
A Chinese Face Dataset with Dynamic Expressions and Diverse Ages Synthesized by Deep Learning49
Combining citizen science data and literature to build a traits dataset of Taiwan’s birds48
Global inventory of species categorized by known underwater sonifery48
Gap-free 16-year (2005–2020) sub-diurnal surface meteorological observations across Florida48
A two-year dataset of energy, environment, and system operations for an ultra-low energy office building48
Endoscapes, a critical view of safety and surgical scene segmentation dataset for laparoscopic cholecystectomy48
NuInsSeg: A fully annotated dataset for nuclei instance segmentation in H&E-stained histological images48
Cognitive tasks, anatomical MRI, and functional MRI data evaluating the construct of self-regulation48
MSPB: a longitudinal multi-sensor dataset with phenotypic trait measurements from honey bees48
A near-complete chromosome-level genome assembly of looseleaf lettuce (Lactuca sativa var. crispa)47
Mapping Road Surface Type of Kenya Using OpenStreetMap and High-resolution Google Satellite Imagery47
Haplotype-resolved chromosome-level genome assembly of Ehretia macrophylla47
An Experimental and Clinical Physiological Signal Dataset for Automated Pain Recognition46
A multi-site, multi-modal travelling-heads resource for brain MRI harmonisation46
Observing the Central Arctic Atmosphere and Surface with University of Colorado uncrewed aircraft systems46
Dual Radar: A Multi-modal Dataset with Dual 4D Radar for Autononous Driving46
Dataset of the suitability of major food crops in Africa under climate change46
Molecular structural dataset of lignin macromolecule elucidating experimental structural compositions46
Dataset on child vaccination in Brazil from 1996 to 202146
A geospatial database of close-to-reality travel times to obstetric emergency care in 15 Nigerian conurbations45
Inventory of shallow landslides triggered by extreme precipitation in July 2023 in Beijing, China45
Georectified polygon database of ground-mounted large-scale solar photovoltaic sites in the United States45
A pseudoproxy emulation of the PAGES 2k database using a hierarchy of proxy system models45
GriddingMachine, a database and software for Earth system modeling at global and regional scales43
Ontology for the Avida digital evolution platform43
PEARL-Neuro Database: EEG, fMRI, health and lifestyle data of middle-aged people at risk of dementia43
Improved high quality sand fly assemblies enabled by ultra low input long read sequencing43
A large-scale multicenter breast cancer DCE-MRI benchmark dataset with expert segmentations43
Analysis of metabolic dynamics during drought stress in Arabidopsis plants43
ROBIN: Reference observatory of basins for international hydrological climate change detection43
Daily station-level records of air temperature, snow depth, and ground temperature in the Northern Hemisphere42
Global nature run data with realistic high-resolution carbon weather for the year of the Paris Agreement42
A global dataset on mungbean for managing seed yield and quality42
A new high-resolution global topographic factor dataset calculated based on SRTM42
A chromosome-scale reference genome of grasspea (Lathyrus sativus)42
Chromosome-level genome assembly of Odontothrips loti Haliday (Thysanoptera: Thripidae)42
The W2024 database of the water isotopologue $${{\rm{H}}}_{2}^{\,16}{\rm{O}}$$42
A biologging database of juvenile white sharks from the northeast Pacific41
Confocal imaging dataset to assess endothelial cell orientation during extreme glucose conditions41
Borrelia PeptideAtlas: A proteome resource of common Borrelia burgdorferi isolates for Lyme research41
Bimodal electroencephalography-functional magnetic resonance imaging dataset for inner-speech recognition41
Historical dataset details the distribution, extent and form of lost Ostrea edulis reef ecosystems40
Publisher Correction: Chromosome-level genome assembly and annotation of xerophyte secretohalophyte Reaumuria soongarica40
Assessing temporal dynamics of nitrogen surplus in Indian agriculture: district scale data from 1966 to 201740
A Multidisciplinary Multimodal Aligned Dataset for Academic Data Processing40
24-hour average PM2.5 concentration caused by aircraft in Chinese airports from Jan. 2006 to Dec. 202340
Sm-Nd Isotope Data Compilation from Geoscientific Literature Using an Automated Tabular Extraction Method40
Manually annotated and curated Dataset of diverse Weed Species in Maize and Sorghum for Computer Vision40
An East Antarctic, sub-annual resolution water isotope record from the Mount Brown South Ice core39
The HAInich: A multidisciplinary vision data-set for a better understanding of the forest ecosystem39
Author Correction: Geographical characterisation of British urban form and function using the spatial signatures framework39
BIRAFFE2, a multimodal dataset for emotion-based personalization in rich affective game environments39
The genome assembly and annotation of the cricket Gryllus longicercus39
An integrated multi-source dataset of elasmobranchs in the Red Sea following the Red Sea Decade Expedition39
SignEEG v1.0: Multimodal Dataset with Electroencephalography and Hand-written Signature for Biometric Systems39
A database with frailty, functional and inertial gait metrics for the research of fall causes in older adults39
1.5 million materials narratives generated by chatbots39
A global dataset on species occurrences and functional traits of Schizothoracinae fish38
A benchmark for domain adaptation and generalization in smartphone-based human activity recognition38
Distribution of soil macrofauna across different habitats in the Eastern European Alps38
A combined microbial and biogeochemical dataset from high-latitude ecosystems with respect to methane cycle38
A Biomechanical Dataset of 1,798 Healthy and Injured Subjects During Treadmill Walking and Running37
Hong Kong Corpus of Chinese Sentence and Passage Reading37
Chromosome-level genome assembly and annotation of the Yunling cattle with PacBio and Hi-C sequencing data37
A tree-based corpus annotated with Cyber-Syndrome, symptoms, and acupoints37
Innovative molecular networking analysis of steroids and characterisation of the urinary steroidome37
A speech corpus of Quechua Collao for automatic dimensional emotion recognition36
Acting Emotions: a comprehensive dataset of elicited emotions36
Coral community data Heron Island Great Barrier Reef 1962–201636
Sharkipedia: a curated open access database of shark and ray life history traits and abundance time-series36
ValLAI_Crop, a validation dataset for coarse-resolution satellite LAI products over Chinese cropland36
A point-of-use drinking water quality dataset from fieldwork in Detroit, Michigan35
Datasets for characterizing extreme events relevant to hydrologic design over the conterminous United States35
Harmonized Database of Western U.S. Water Rights (HarDWR) v.135
Analysis-ready optical underwater images of Manganese-nodule covered seafloor of the Clarion-Clipperton Zone35
Data scheme and data format for transferable force fields for molecular simulation35
Detection of differential bait proteoforms through immunoprecipitation-mass spectrometry data analysis35
Surrounding road density of child care centers in Australia35
Daily precipitation dataset at 0.1° for the Yarlung Zangbo River basin from 2001 to 201534
Dataset of soil hydraulic parameters in the Yellow River Basin based on in situ deep sampling34
Chromosome-level genome assemblies of sunflower oilseed and confectionery cultivars34
PTB-XL+, a comprehensive electrocardiographic feature dataset34
A multi-year campus-level smart meter database34
A panel sequencing dataset of peripheral blood gene variations in pan-cancer34
A chromosome-level genome assembly of skipjack tuna, Katsuwonus pelamis (Perciformes: Scombridae)34
An RNA-seq time series of the medaka pituitary gland during sexual maturation34
A dataset of riverine nitrogen yield across watersheds in the Conterminous United States34
Spatial transcriptome profiling of normal human liver33
A benchmark GaoFen-7 dataset for building extraction from satellite images33
COFACTOR Drammen dataset - 4 years of hourly energy use data from 45 public buildings in Drammen, Norway33
High resolution climate change observations and projections for the evaluation of heat-related extremes33
3DSC - a dataset of superconductors including crystal structures33
An ageing study of twenty 18650 lithium-ion Graphite/LFP cells in first and second life use33
An intra-annual 30-m dataset of small lakes of the Qilian Mountains for the period 1987–202033
Seven years of time-tracking data capturing collaboration and failure dynamics: the Gryzzly dataset33
Quantum chemical calculations of lithium-ion battery electrolyte and interphase species32
ISARIC-COVID-19 dataset: A Prospective, Standardized, Global Dataset of Patients Hospitalized with COVID-1932
AVDOS-VR: Affective Video Database with Physiological Signals and Continuous Ratings Collected Remotely in VR32
Sentinel-3 Altimetry Thematic Products for Hydrology, Sea Ice and Land Ice32
An EEG Dataset of Neural Signatures in a Competitive Two-Player Game Encouraging Deceptive Behavior32
Three-dimensional chromatin architecture datasets for aging and Alzheimer’s disease32
ResOpsUS, a dataset of historical reservoir operations in the contiguous United States32
Characterization of hormone-producing cell types in the teleost pituitary gland using single-cell RNA-seq32
Discrete typing units of Trypanosoma cruzi: Geographical and biological distribution in the Americas32
Perovskite- and Dye-Sensitized Solar-Cell Device Databases Auto-generated Using ChemDataExtractor32
Global hydro-environmental lake characteristics at high spatial resolution31
Brightfield vs Fluorescent Staining Dataset–A Test Bed Image Set for Machine Learning based Virtual Staining31
A database of mapped global fishing activity 1950–201731
Chromosome-level genome assembly of the sap beetle Glischrochilus (Librodor) japonius (Coleoptera: Nitidulidae)31
Big data collection in pharmaceutical manufacturing and its use for product quality predictions31
ReaLSAT, a global dataset of reservoir and lake surface area variations31
A taxonomic, genetic and ecological data resource for the vascular plants of Britain and Ireland31
A hierarchical inventory of the world’s mountains for global comparative mountain science30
Bias-corrected NESM3 global dataset for dynamical downscaling under 1.5 °C and 2 °C global warming scenarios30
A dataset of eye gaze images for calibration-free eye tracking augmented reality headset30
NEON-SD: A 30-m Structural Diversity Product Derived from the NEON Discrete-Return LiDAR Point Cloud30
A clinical microscopy dataset to develop a deep learning diagnostic test for urinary tract infection30
A global dataset of terrestrial evapotranspiration and soil moisture dynamics from 1982 to 202030
Abrasivity database of different genetic rocks based on CERCHAR Abrasivity Test30
Elaboration of a new framework for fine-grained epidemiological annotation30
A high-quality genome assembly of the waterlily aphid Rhopalosiphum nymphaeae30
Global 1 km × 1 km gridded revised real gross domestic product and electricity consumption during 1992–2019 based on calibrated nighttime light data30
A dataset of hourly sea surface temperature from drifting buoys30
Bioclimatic indicators dataset for the orographically complex Canary Islands archipelago30
Library of rough hailstone backscattering coefficients at 2.8 GHz30
Environmental risks to humans, the first database of valence and arousal ratings for images of natural hazards29
Reconstructed SPECT images of 177Lu homogeneous cylindrical phantom used for calibration and texture analysis29
Measuring the presence and incidence of cholera in Hindustan: New data from primary sources for the colonial era29
Publisher Correction: Chromosome-scale assembly and high-density genetic map of the yellow drum, Nibea albiflora29
A multi-omics dataset of the response to early plant polysaccharide ingestion in rabbits29
Unfolding the downloads of datasets: A multifaceted exploration of influencing factors29
A global ocean dissolved organic phosphorus concentration database (DOPv2021)29
Interaction networks of Escherichia coli replication proteins under different bacterial growth conditions29
MeadoWatch: a long-term community-science database of wildflower phenology in Mount Rainier National Park29
A 14-year time series of marine megafauna bycatch in the Italian midwater pair trawl fishery29
Paired field and water measurements from drainage management practices in row-crop agriculture29
0.068670034408569