Scientific Data

Papers
(The median citation count of Scientific Data is 3. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2022-01-01 to 2026-01-01.)
ArticleCitations
Author Correction: The Plegma dataset: Domestic appliance-level and aggregate electricity demand with metadata from Greece1794
Author Correction: Mobility networks in Greater Mexico City711
A database of seed plants on taxonomy, geography and ecology in the Qinling-Daba Mountains and adjacent areas706
Tsunami Runup Survey Data From The Taan Fjord Landslide Event623
Multi-proteomics and interactome dataset of tick-borne encephalitis virus infected host cells451
Linking Research Data with Physically Preserved Research Materials in Chemistry447
Chromosome-level genome assembly of the Rhizoctonia solani436
GARD-LENS: A downscaled large ensemble dataset for understanding future climate and its uncertainties426
Shotgun metagenomes from productive lakes in an urban region of Sweden375
Author Correction: GERDA: The German Election Database373
The interplay between brain and behavior during development: A multisite effort to generate and share simulated datasets313
Bioclimatic atlas of the terrestrial Arctic308
Occurrence of human infection with Salmonella Typhi in sub-Saharan Africa257
CreelCat, a Catalog of United States Inland Creel and Angler Survey Data251
In toto light sheet fluorescence microscopy live imaging datasets of Ceratitis capitata embryonic development250
A dataset of scientific dates from archaeological sites in eastern Africa spanning 5000 BCE to 1800 CE238
What’s the TEE: Metrics of Temperature Extremes in Europe NUTS Regions (1980-2024)201
Mediterranean marine sediment cores database: unlocking paleoclimatic signals for the last 20,000 years183
Near-complete reference genome assembly of Hoya carnosa178
Dataset on the effects of psychological care on depression and suicide ideation in underrepresented children178
A Field-Level Asset Mapping Dataset for England’s Agricultural Sector165
A Simulated Comprehensive Photon Flux Shielding Spectra Dataset for Advanced Radiation Safety Assessment157
Empowering open data sharing for social good: a privacy-aware approach156
Chromosome-level assemblies of cultivated water chestnut Trapa bicornis and its wild relative Trapa incisa151
Enrichment of lung cancer computed tomography collections with AI-derived annotations151
The first high-quality chromosome-level genome of Parupeneus biaculeatus using HiFi and Hi-C data149
A chromosome-scale assembly of Ormosia boluoensis (Fabaceae)133
Author Correction: Database covering the prayer movements which were not available previously128
A thermosurvey dataset: Older adults’ experiences and adaptation to urban heat and climate change128
Generating FAIR research data in experimental tribology126
A curated dataset of great ape genome diversity126
Global Ocean Particulate Organic Phosphorus, Carbon, Oxygen for Respiration, and Nitrogen (GO-POPCORN)125
An open dataset for oracle bone character recognition and decipherment124
Molecular landscape of respiratory infection: A large-scale, multi-centre blood transcriptome dataset123
Chromosome-level genome assembly of rock carp (Procypris rabaudi)122
A Frontal Ablation Dataset for 49 Tidewater Glaciers in Greenland119
PPB-Affinity: Protein-Protein Binding Affinity dataset for AI-based protein drug discovery119
ML-extendable framework for multiphysics-multiscale simulation workflow and data management using Kadi4Mat117
Students’ performance dataset for using machine learning technique in physics education research117
Multi-Domain Indoor Dataset for Visual Place Recognition and Anomaly Detection by Mobile Robots116
An open-access database of nature-based carbon offset project boundaries114
Author Correction: Whales from space dataset, an annotated satellite image dataset of whales for training machine learning models111
Statistical performance indicators and index—a new tool to measure country statistical capacity111
SDUST2023GRA_MSS: the new global marine gravity anomaly model determined from mean sea surface model107
Canopy height model and NAIP imagery pairs across CONUS106
A longitudinal cross-country dataset on agricultural productivity and welfare in Sub-Saharan Africa106
A global dataset of fossil fungi records from the Cenozoic105
The Latin American Legislators Dataset105
A database of steric and electronic properties of heteroaryl substituents104
Spatio-temporal dataset (2009–2012) of Culicoides spp., vectors of livestock viruses, in France101
One-year high-frequency environmental and behavioral data from ALAN experience in a French coastal area101
The Carbon Catalogue, carbon footprints of 866 commercial products from 8 industry sectors and 5 continents101
Pennsieve: A Collaborative Platform for Translational Neuroscience and Beyond100
A haplotype-resolved chromosomal-level genome assembly of Oxalis articulata100
F-DATA: A Fugaku Workload Dataset for Job-centric Predictive Modelling in HPC Systems97
Home monitoring with connected mobile devices for asthma attack prediction with machine learning94
Slovak database of speech affected by neurodegenerative diseases93
Multimodal Data for the Detection of Freezing of Gait in Parkinson’s Disease93
A focus groups study on data sharing and research data management92
A semantic approach to mapping the Provenance Ontology to Basic Formal Ontology89
The Superfund Research Program Analytics Portal: linking environmental chemical exposure to biological phenotypes89
An 8-model ensemble of CMIP6-derived ocean surface wave climate87
A dataset of the daily edge of each polynya in the Antarctic86
Ultra-deep sequencing data from a liquid biopsy proficiency study demonstrating analytic validity85
Optimizing drug combination and mechanism analysis based on risk pathway crosstalk in pan cancer84
EEG Dataset for the Recognition of Different Emotions Induced in Voice-User Interaction82
A century-long eddy-resolving simulation of global oceanic large- and mesoscale state82
A VibV Dataset Integrating Vibration and Vision for Enhanced Safety in Self-Driving Tasks82
T1DiabetesGranada: a longitudinal multi-modal dataset of type 1 diabetes mellitus81
Quantum computing dataset of maximum independent set problem on king lattice of over hundred Rydberg atoms81
NeuMa - the absolute Neuromarketing dataset en route to an holistic understanding of consumer behaviour81
Machine learning-ready remote sensing data for Maya archaeology81
Reinterpretation of prostate cancer pathology by Appl1, Sortilin and Syndecan-1 biomarkers81
A daily high-resolution (1 km) human thermal index collection over the North China Plain from 2003 to 202081
An Enhanced Phenology Dataset for Global Drylands from 2001 to 201981
M3OT: A Multi-Drone Multi-Modality dataset for Multi-Object Tracking80
Reconstructing high-quality ground-level ozone records from 1980 to 2012 in central and eastern China79
FIGARO-E3: a high-resolution extended multi-regional input-output database consistent with official statistics78
A Synthetic Dataset for Semantic Segmentation of Waterbodies in Out-of-Distribution Situations78
MarNemaFunDiv: a first comprehensive dataset of functional traits for marine nematodes78
Spatial and temporal data to study residential heat decarbonisation pathways in England and Wales76
OPERAnet, a multimodal activity recognition dataset acquired from radio frequency and vision-based sensors76
A western United States snow reanalysis dataset over the Landsat era from water years 1985 to 202176
Chromosome-level haplotype-resolved genome assembly of bread wheat’s wild relative Aegilops mutica75
Near complete T2T genome assembly of the banded goonch (Bagarius rutilus)75
Sea ice records over more than a century at an observatory facing the Okhotsk coast of Hokkaido, Japan75
Identifying Cocoa Flower Visitors: A Deep Learning Dataset74
Dataset on heavy metal pollution assessment in freshwater ecosystems74
The R package for DICOM to brain imaging data structure conversion72
Coswara: A respiratory sounds and symptoms dataset for remote screening of SARS-CoV-2 infection72
PAVC: The foundation for a Pan-Arctic Vegetation Cover database72
Dynamic urban morphology mapping in Chinese cities based on local climate zone approach71
Ensemble of CMIP6 derived reference and potential evapotranspiration with radiative and advective components71
A large-scale dataset of patient summaries for retrieval-based clinical decision support systems71
Unveiling the Spatiotemporal Dynamics of Global Brain Circulation: A Comprehensive Corpus (2000–2024)70
A global 1 km resolution daily surface longwave radiation product from MODIS satellite data from 2000–202370
Hydrological model-based streamflow reconstruction for Indian sub-continental river basins, 1951–202170
Globe-LFMC 2.0, an enhanced and updated dataset for live fuel moisture content research70
A multilayered urban tree dataset of point clouds, quantitative structure and graph models70
Chromosome-level genome assembly of the traditional medicinal plant Lindera aggregata70
A comprehensive genomic and transcriptomic dataset of triple-negative breast cancers69
Head model dataset for mixed reality navigation in neurosurgical interventions for intracranial lesions68
An agenda for addressing bias in conflict data67
A 24-hour population distribution dataset based on mobile phone data from Helsinki Metropolitan Area, Finland67
Paired magnetic susceptibility and geochemistry of young volcanism in Iceland and Tengchong, China65
District-scale surface temperatures generated from high-resolution longitudinal thermal infrared images65
China’s provincial process CO2 emissions from cement production during 1993–201965
Chromosome-level genome assembly of Oriental chestnut gall wasp (Dryocosmus kuriphilus)64
BUS-UCLM: Breast ultrasound lesion segmentation dataset63
The landscape of abiotic and biotic stress-responsive splice variants with deep RNA-seq datasets in hot pepper62
A large EEG dataset for studying cross-session variability in motor imagery brain-computer interface61
Full Field Digital Mammography Dataset from a Population Screening Program61
Enhancing radiomics and Deep Learning systems through the standardization of medical imaging workflows61
A Global Database of Soil Plant Available Phosphorus60
RailFOD23: A dataset for foreign object detection on railroad transmission lines60
A comprehensive dataset of riverine levee overtopping events for advancing risk assessment59
QMugs, quantum mechanical properties of drug-like molecules59
A dataset for deep learning based detection of printed circuit board surface defect58
A Cross Spatio-Temporal Pathology-based Lung Nodule Dataset58
A construction waste landfill dataset of two districts in Beijing, China from high resolution satellite images58
Very High Resolution Projections over Italy under different CMIP5 IPCC scenarios58
Unified access to up-to-date residue-level annotations from UniProtKB and other biological databases for PDB data58
A large-scale dataset of pre- and postsurgical MRI data from patients with chronic trigeminal neuralgia58
Making Mathematical Research Data FAIR: Pathways to Improved Data Sharing57
Global monthly gridded atmospheric carbon dioxide concentrations under the historical and future scenarios57
Scaling up SoccerNet with multi-view spatial localization and re-identification57
A large-scale multi-label 12-lead electrocardiogram database with standardized diagnostic statements57
Analysis of AlphaMissense data in different protein groups and structural context56
Measurement of ship-generated waves in German coastal waterways from 1998–202256
Exploring the electrophysiology of Parkinson’s disease with magnetoencephalography and deep brain recordings56
A neuroimaging dataset during sequential color qualia similarity judgments with and without reports56
Constructing a global human epidemic database using open-source digital biosurveillance56
Borrelia PeptideAtlas: A proteome resource of common Borrelia burgdorferi isolates for Lyme research55
Author Correction: Geographical characterisation of British urban form and function using the spatial signatures framework55
Confocal imaging dataset to assess endothelial cell orientation during extreme glucose conditions55
The HAInich: A multidisciplinary vision data-set for a better understanding of the forest ecosystem55
24-hour average PM2.5 concentration caused by aircraft in Chinese airports from Jan. 2006 to Dec. 202355
A tree-based corpus annotated with Cyber-Syndrome, symptoms, and acupoints55
RecyBat24: a dataset for detecting lithium-ion batteries in electronic waste disposal54
Seven years of time-tracking data capturing collaboration and failure dynamics: the Gryzzly dataset54
A chromosomal-level genome assembly of Odontolabis cuvera Hope, 1842 (Coleoptera: Lucanidae)54
A corpus and a modular infrastructure for the empirical study of (an)notated music54
A curated bacterial and archaeal 16S rRNA Gene Oral Sequences dataset54
The draft genome sequences of the cosmopolitan centric diatom, the genus Skeletonema54
A database of in situ water temperatures for large inland lakes across the coterminous United States53
Innovative molecular networking analysis of steroids and characterisation of the urinary steroidome53
Haplotype-resolved chromosome-level genome assembly of Ehretia macrophylla53
Coral community data Heron Island Great Barrier Reef 1962–201653
A chromosome-scale reference genome of grasspea (Lathyrus sativus)53
Machine learning training data: over 500,000 images of butterflies and moths (Lepidoptera) with species labels53
Publisher Correction: Chromosome-level genome assembly and annotation of xerophyte secretohalophyte Reaumuria soongarica52
DiTEC-WDN: A Large-Scale Dataset of Hydraulic Scenarios across Multiple Water Distribution Networks52
Hong Kong Corpus of Chinese Sentence and Passage Reading52
Chromosome-scale genome assembly and annotation of Xenocypris argentea52
A new high-resolution global topographic factor dataset calculated based on SRTM52
A panel sequencing dataset of peripheral blood gene variations in pan-cancer51
De novo transcriptome analysis of Perna perna L. (Bivalve) with functional and metabolic pathway analysis51
Global nature run data with realistic high-resolution carbon weather for the year of the Paris Agreement51
A soil database from Queretaro, Mexico for assessment of crop and irrigation water requirements51
A global dataset on mungbean for managing seed yield and quality51
A Multidisciplinary Multimodal Aligned Dataset for Academic Data Processing50
Harmonized Database of Western U.S. Water Rights (HarDWR) v.150
Renji endoscopic submucosal dissection video data set for colorectal neoplastic lesions50
A dataset of riverine nitrogen yield across watersheds in the Conterminous United States50
A chromosome-level genome assembly of skipjack tuna, Katsuwonus pelamis (Perciformes: Scombridae)50
Whole genome sequencing and structural variations provide insights into the body size traits of Hu sheep50
OSMlanduse a dataset of European Union land use at 10 m resolution derived from OpenStreetMap and Sentinel-250
Detection of differential bait proteoforms through immunoprecipitation-mass spectrometry data analysis50
A biologging database of juvenile white sharks from the northeast Pacific50
Data scheme and data format for transferable force fields for molecular simulation50
Transcriptome profiling of mRNA and lncRNA involved in wax biosynthesis in cauliflower49
Multiorder hydrologic Position for Europe — a Set of Features for Machine Learning and Analysis in Hydrology49
Curated CYP450 Interaction Dataset: Covering the Majority of Phase I Drug Metabolism49
Haplotype-resolved T2T genome assembly of the pear cultivar ‘Danxiahong’49
A benchmark database of ten years of prospective next-day earthquake forecasts in California from the Collaboratory for the Study of Earthquake Predictability49
Soil carbon stock densities in mangrove and forested wetland ecosystems of Panama49
An EEG Dataset of Neural Signatures in a Competitive Two-Player Game Encouraging Deceptive Behavior48
Psilocybin’s acute and persistent brain effects: a precision imaging drug trial48
3DSC - a dataset of superconductors including crystal structures48
An Experimental and Clinical Physiological Signal Dataset for Automated Pain Recognition48
The RESILIENT Dataset: Multimodal Monitoring of Ageing-Related Comorbidities and Cognitive Decline48
Impact factors for quantifying country-level terrestrial biodiversity intactness footprints (IBIF)48
GriddingMachine, a database and software for Earth system modeling at global and regional scales48
A near-complete chromosome-level genome assembly of looseleaf lettuce (Lactuca sativa var. crispa)47
A global dataset on species occurrences and functional traits of Schizothoracinae fish47
A multimodal dataset for coronary microvascular disease biomarker discovery47
Acting Emotions: a comprehensive dataset of elicited emotions47
A global dataset for steel aluminum and cement in-use stocks at 500 m gridded level 2000-201946
High-resolution ethograms, accelerometer recordings, and behavioral time series of Japanese quail46
Chromosome-level genome assembly of the Tyrrhenian tree frog (Hyla sarda)45
Genome Skimming Reveals Genetic Diversity in 220 Papaver Individuals from China45
Assessing temporal dynamics of nitrogen surplus in Indian agriculture: district scale data from 1966 to 201745
Improved high quality sand fly assemblies enabled by ultra low input long read sequencing45
Chromosome-level genome assembly and annotation of the Yunling cattle with PacBio and Hi-C sequencing data45
A Chinese Face Dataset with Dynamic Expressions and Diverse Ages Synthesized by Deep Learning45
Metagenomic sequencing and reconstruction of 82 microbial genomes from barley seed communities45
Three-dimensional chromatin architecture datasets for aging and Alzheimer’s disease45
Non-coding RNA profiling in BRAFV600E-mutant cutaneous melanoma before and after Spry1 depletion45
AHAD: African major crops harvested area dataset for the years of 2000, 2010, and 202045
The genome assembly and annotation of the cricket Gryllus longicercus45
A pseudoproxy emulation of the PAGES 2k database using a hierarchy of proxy system models45
Gap-free 16-year (2005–2020) sub-diurnal surface meteorological observations across Florida44
A database of mapped global fishing activity 1950–201744
MCV-Intention: A Multimodalities and Cross-View Dataset for Human Assembly Intention Recognition44
Surrounding road density of child care centers in Australia44
Brightfield vs Fluorescent Staining Dataset–A Test Bed Image Set for Machine Learning based Virtual Staining44
An integrated multi-source dataset of elasmobranchs in the Red Sea following the Red Sea Decade Expedition44
Observing the Central Arctic Atmosphere and Surface with University of Colorado uncrewed aircraft systems44
Global Bias-Corrected CORDEX Datasets at Half Degree Resolution44
A multi-model based dataset of global atmospheric moisture source-sink relationships and atmospheric basins43
An East Antarctic, sub-annual resolution water isotope record from the Mount Brown South Ice core43
Dataset on child vaccination in Brazil from 1996 to 202143
Accumulation-depuration data collection in support of toxicokinetic modelling43
Cognitive tasks, anatomical MRI, and functional MRI data evaluating the construct of self-regulation43
A Multi-Omics Dataset of Prostate Cancer Response to Oncolytic Virus OH2 Treatment43
A haplotype-resolved genome assembly of Anoectochilus roxburghii42
MiTra: A Drone-Based Trajectory Data for an All-Traffic-State Inclusive Freeway with Ramps42
VME: A Satellite Imagery Dataset and Benchmark for Detecting Vehicles in the Middle East and Beyond42
Big data collection in pharmaceutical manufacturing and its use for product quality predictions42
Global inventory of species categorized by known underwater sonifery42
Chromosome-level genome assembly of Odontothrips loti Haliday (Thysanoptera: Thripidae)42
Ontology for the Avida digital evolution platform42
Mapping Road Surface Type of Kenya Using OpenStreetMap and High-resolution Google Satellite Imagery42
SignEEG v1.0: Multimodal Dataset with Electroencephalography and Hand-written Signature for Biometric Systems42
Bias-corrected NESM3 global dataset for dynamical downscaling under 1.5 °C and 2 °C global warming scenarios42
A geospatial database of close-to-reality travel times to obstetric emergency care in 15 Nigerian conurbations41
An RNA-seq time series of the medaka pituitary gland during sexual maturation41
Sm-Nd Isotope Data Compilation from Geoscientific Literature Using an Automated Tabular Extraction Method41
Comprehensive curation and validation of genomic datasets for chestnut41
A compendium of temperature and salinity profiles and discrete nutrients from selected NOAA programs in Alaska41
An Observation-Based Dataset of Global Sub-Daily Precipitation Indices (GSDR-I)41
Distribution of soil macrofauna across different habitats in the Eastern European Alps40
PTB-XL+, a comprehensive electrocardiographic feature dataset40
BIRAFFE2, a multimodal dataset for emotion-based personalization in rich affective game environments40
STInt: An integrated dataset covering science, technology and industry information in the pharmaceutical field40
Datasets for characterizing extreme events relevant to hydrologic design over the conterminous United States40
ROBIN: Reference observatory of basins for international hydrological climate change detection40
Analysis of metabolic dynamics during drought stress in Arabidopsis plants40
Wheel-Mounted Inertial Datasets39
A benchmark GaoFen-7 dataset for building extraction from satellite images39
A Biomechanical Dataset of 1,798 Healthy and Injured Subjects During Treadmill Walking and Running39
Spatial transcriptome profiling of normal human liver39
The W2024 database of the water isotopologue $${{\rm{H}}}_{2}^{\,16}{\rm{O}}$$39
Endoscapes, a critical view of safety and surgical scene segmentation dataset for laparoscopic cholecystectomy39
3D motion analysis dataset of healthy young adult volunteers walking and running on overground and treadmill39
Proteomic Dataset of Sparganum proliferum and Spirometra mansoni to Understand Asexual Proliferation in Hosts39
Underground well water level observation grid dataset from 2005 to 202239
A multi-site, multi-modal travelling-heads resource for brain MRI harmonisation39
Molecular structural dataset of lignin macromolecule elucidating experimental structural compositions39
Georectified polygon database of ground-mounted large-scale solar photovoltaic sites in the United States38
0.055786848068237