Scientific Data

Papers
(The H4-Index of Scientific Data is 73. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-11-01 to 2025-11-01.)
ArticleCitations
Author Correction: The Plegma dataset: Domestic appliance-level and aggregate electricity demand with metadata from Greece1567
Author Correction: Mobility networks in Greater Mexico City650
A Synthetic Dataset for Semantic Segmentation of Waterbodies in Out-of-Distribution Situations634
A database of seed plants on taxonomy, geography and ecology in the Qinling-Daba Mountains and adjacent areas553
Identifying Cocoa Flower Visitors: A Deep Learning Dataset399
Tsunami Runup Survey Data From The Taan Fjord Landslide Event382
Multi-proteomics and interactome dataset of tick-borne encephalitis virus infected host cells373
Chromosome-level genome assembly of Oriental chestnut gall wasp (Dryocosmus kuriphilus)373
An agenda for addressing bias in conflict data346
Linking Research Data with Physically Preserved Research Materials in Chemistry345
Chromosome-level genome assembly of the Rhizoctonia solani281
A database of steric and electronic properties of heteroaryl substituents279
SDUST2023GRA_MSS: the new global marine gravity anomaly model determined from mean sea surface model240
Full Field Digital Mammography Dataset from a Population Screening Program222
Occurrence of human infection with Salmonella Typhi in sub-Saharan Africa202
A Cross Spatio-Temporal Pathology-based Lung Nodule Dataset187
T1DiabetesGranada: a longitudinal multi-modal dataset of type 1 diabetes mellitus178
Author Correction: Whales from space dataset, an annotated satellite image dataset of whales for training machine learning models175
EEG Dataset for the Recognition of Different Emotions Induced in Voice-User Interaction170
Quantum computing dataset of maximum independent set problem on king lattice of over hundred Rydberg atoms169
CreelCat, a Catalog of United States Inland Creel and Angler Survey Data154
Making Mathematical Research Data FAIR: Pathways to Improved Data Sharing149
In toto light sheet fluorescence microscopy live imaging datasets of Ceratitis capitata embryonic development136
OPERAnet, a multimodal activity recognition dataset acquired from radio frequency and vision-based sensors135
A dataset of scientific dates from archaeological sites in eastern Africa spanning 5000 BCE to 1800 CE131
A dataset of the daily edge of each polynya in the Antarctic131
Enhancing radiomics and Deep Learning systems through the standardization of medical imaging workflows130
Home monitoring with connected mobile devices for asthma attack prediction with machine learning119
Hydrological model-based streamflow reconstruction for Indian sub-continental river basins, 1951–2021116
Generating FAIR research data in experimental tribology115
China’s provincial process CO2 emissions from cement production during 1993–2019113
Exploring the electrophysiology of Parkinson’s disease with magnetoencephalography and deep brain recordings113
An open dataset for oracle bone character recognition and decipherment113
A large-scale multi-label 12-lead electrocardiogram database with standardized diagnostic statements113
A large EEG dataset for studying cross-session variability in motor imagery brain-computer interface111
RailFOD23: A dataset for foreign object detection on railroad transmission lines111
Ensemble of CMIP6 derived reference and potential evapotranspiration with radiative and advective components110
What’s the TEE: Metrics of Temperature Extremes in Europe NUTS Regions (1980-2024)108
Mediterranean marine sediment cores database: unlocking paleoclimatic signals for the last 20,000 years107
Dataset on the effects of psychological care on depression and suicide ideation in underrepresented children106
A haplotype-resolved chromosomal-level genome assembly of Oxalis articulata105
Near-complete reference genome assembly of Hoya carnosa104
A semantic approach to mapping the Provenance Ontology to Basic Formal Ontology104
A Simulated Comprehensive Photon Flux Shielding Spectra Dataset for Advanced Radiation Safety Assessment101
A Field-Level Asset Mapping Dataset for England’s Agricultural Sector101
F-DATA: A Fugaku Workload Dataset for Job-centric Predictive Modelling in HPC Systems100
A construction waste landfill dataset of two districts in Beijing, China from high resolution satellite images98
Chromosome-level genome assembly of rock carp (Procypris rabaudi)95
Empowering open data sharing for social good: a privacy-aware approach95
Chromosome-level genome assembly of the traditional medicinal plant Lindera aggregata94
Enrichment of lung cancer computed tomography collections with AI-derived annotations94
FIGARO-E3: a high-resolution extended multi-regional input-output database consistent with official statistics93
Chromosome-level assemblies of cultivated water chestnut Trapa bicornis and its wild relative Trapa incisa93
A focus groups study on data sharing and research data management92
PAVC: The foundation for a Pan-Arctic Vegetation Cover database92
The first high-quality chromosome-level genome of Parupeneus biaculeatus using HiFi and Hi-C data90
A chromosome-scale assembly of Ormosia boluoensis (Fabaceae)89
Very High Resolution Projections over Italy under different CMIP5 IPCC scenarios85
Unveiling the Spatiotemporal Dynamics of Global Brain Circulation: A Comprehensive Corpus (2000–2024)85
Sea ice records over more than a century at an observatory facing the Okhotsk coast of Hokkaido, Japan84
Students’ performance dataset for using machine learning technique in physics education research81
A global 1 km resolution daily surface longwave radiation product from MODIS satellite data from 2000–202381
A Frontal Ablation Dataset for 49 Tidewater Glaciers in Greenland81
MarNemaFunDiv: a first comprehensive dataset of functional traits for marine nematodes79
Slovak database of speech affected by neurodegenerative diseases78
The Superfund Research Program Analytics Portal: linking environmental chemical exposure to biological phenotypes77
Global Ocean Particulate Organic Phosphorus, Carbon, Oxygen for Respiration, and Nitrogen (GO-POPCORN)77
ML-extendable framework for multiphysics-multiscale simulation workflow and data management using Kadi4Mat77
Canopy height model and NAIP imagery pairs across CONUS75
Chromosome-level haplotype-resolved genome assembly of bread wheat’s wild relative Aegilops mutica74
A thermosurvey dataset: Older adults’ experiences and adaptation to urban heat and climate change74
Molecular landscape of respiratory infection: A large-scale, multi-centre blood transcriptome dataset74
Multi-Domain Indoor Dataset for Visual Place Recognition and Anomaly Detection by Mobile Robots74
An open-access database of nature-based carbon offset project boundaries73
Whole genome and exome sequencing reference datasets from a multi-center and cross-platform benchmark study73
0.44286918640137