Scientific Data

Papers
(The TQCC of Scientific Data is 8. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2020-11-01 to 2024-11-01.)
ArticleCitations
County-level CO2 emissions and sequestration in China during 1997–2017523
MIMIC-IV, a freely accessible electronic health record dataset447
Dynamic World, Near real-time global 10 m land use land cover mapping336
Version 3 of the Global Aridity Index and Potential Evapotranspiration Database256
The World Checklist of Vascular Plants, a continuously updated resource for exploring global plant diversity241
A patient-centric dataset of images and metadata for identifying melanomas using clinical context202
MedMNIST v2 - A large-scale lightweight benchmark for 2D and 3D biomedical image classification193
Highly accurate long-read HiFi sequencing data for five complex genomes185
The 10-m crop type maps in Northeast China during 2017–2019177
Systematic phenotyping and characterization of the 5xFAD mouse model of Alzheimer’s disease174
NASA Global Daily Downscaled Projections, CMIP6172
The COUGHVID crowdsourcing dataset, a corpus for the study of large-scale cough analysis algorithms168
The human O-GlcNAcome database and meta-analysis161
Multiscale dynamic human mobility flow dataset in the U.S. during the COVID-19 epidemic154
A global record of annual terrestrial Human Footprint dataset from 2000 to 2018150
Global 1 km × 1 km gridded revised real gross domestic product and electricity consumption during 1992–2019 based on calibrated nighttime light data150
Data sharing practices and data availability upon request differ across scientific disciplines143
Operationalizing the CARE and FAIR Principles for Indigenous data futures140
Carbon Monitor, a near-real-time daily dataset of global CO2 emission from fossil fuel and cement production137
Introducing the FAIR Principles for research software136
National contributions to climate change due to historical emissions of carbon dioxide, methane, and nitrous oxide since 1850128
COVID-CT-MD, COVID-19 computed tomography scan dataset applicable in machine learning and deep learning122
VinDr-CXR: An open dataset of chest X-rays with radiologist’s annotations120
Kvasir-Capsule, a video capsule endoscopy dataset113
Building a knowledge graph to enable precision medicine105
Gridded daily weather data for North America with comprehensive uncertainty quantification102
Bias-corrected CMIP6 global dataset for dynamical downscaling of the historical and future climate (1979–2100)99
AusTraits, a curated plant trait database for the Australian flora90
Chinese provincial multi-regional input-output database for 2012, 2015, and 201790
PERSIANN-CCS-CDR, a 3-hourly 0.04° global precipitation climate data record for heavy precipitation studies88
GEOM, energy-annotated molecular conformations for property prediction and molecular generation83
The Amsterdam Open MRI Collection, a set of multimodal MRI datasets for individual difference analyses82
Hourly potential evapotranspiration at 0.1° resolution for the global land surface from 1981-present82
A whole-body FDG-PET/CT Dataset with manually annotated Tumor Lesions79
An automatic multi-tissue human fetal brain segmentation benchmark using the Fetal Tissue Annotation Dataset76
COVIDiSTRESS Global Survey dataset on psychological and behavioural consequences of the COVID-19 outbreak75
CT-ORG, a new dataset for multiple organ segmentation in computed tomography73
Refractiveindex.info database of optical constants73
Global monthly gridded atmospheric carbon dioxide concentrations under the historical and future scenarios72
DISPERSE, a trait database to assess the dispersal potential of European aquatic macroinvertebrates72
Global trends and forecasts of breast cancer incidence and deaths70
GlobSnow v3.0 Northern Hemisphere snow water equivalent dataset70
Expanded dataset of mechanical properties and observed phases of multi-principal element alloys68
Global daily 1 km land surface precipitation based on cloud cover-informed downscaling67
FIVES: A Fundus Image Dataset for Artificial Intelligence based Vessel Segmentation66
Global soil moisture data derived through machine learning trained with in-situ measurements65
QM7-X, a comprehensive dataset of quantum-mechanical properties spanning the chemical space of small organic molecules64
High-resolution (1 km) Köppen-Geiger maps for 1901–2099 based on constrained CMIP6 projections64
A SARS-CoV-2 cytopathicity dataset generated by high-content screening of a large drug repurposing collection64
Downscaling GRACE total water storage change using partial least squares regression62
Probabilistic atlas for the language network based on precision fMRI data from >800 individuals62
ClimateEU, scale-free climate normals, historical time series, and future projections for Europe62
Local sea level trends, accelerations and uncertainties over 1993–201961
Author Correction: The FLUXNET2015 dataset and the ONEFlux processing pipeline for eddy covariance data61
Gridded fossil CO2 emissions and related O2 combustion consistent with national inventories 1959–201860
A multi-site, multi-disorder resting-state magnetic resonance image database59
LCVP, The Leipzig catalogue of vascular plants, a new taxonomic reference list for all known vascular plants58
A long term global daily soil moisture dataset derived from AMSR-E and AMSR2 (2002–2019)57
An update on global mining land use57
Creation and validation of a chest X-ray dataset with eye-tracking and report dictation for AI development57
Vectorized rooftop area data for 90 cities in China57
A curated diverse molecular database of blood-brain barrier permeability with chemical descriptors56
Global gridded GDP data set consistent with the shared socioeconomic pathways56
A multi-modal open dataset for mental-disorder analysis56
Comprehensive dataset of shotgun metagenomes from oxygen stratified freshwater lakes and ponds55
OPTIMADE, an API for exchanging materials data55
Lower-limb kinematics and kinetics during continuously varying human locomotion55
A Global Building Occupant Behavior Database55
An open tool for creating battery-electric vehicle time series from empirical data, emobpy54
The “Narratives” fMRI dataset for evaluating models of naturalistic language comprehension54
VitalDB, a high-fidelity multi-parameter vital signs database in surgical patients54
QMugs, quantum mechanical properties of drug-like molecules54
A band-gap database for semiconducting inorganic materials calculated with hybrid functional52
Global land projection based on plant functional types with a 1-km resolution under socio-climatic scenarios52
Electrochemical metrics for corrosion resistant alloys51
Global spatiotemporally continuous MODIS land surface temperature dataset51
The OceanDNA MAG catalog contains over 50,000 prokaryotic genomes originated from various marine environments50
LoDoPaB-CT, a benchmark dataset for low-dose computed tomography reconstruction50
Benchmarking second and third-generation sequencing platforms for microbial metagenomics50
Worldwide continuous gap-filled MODIS land surface temperature dataset50
Long-term and large-scale multispecies dataset tracking population changes of common European breeding birds49
Caravan - A global community dataset for large-sample hydrology49
Greenhouse gas emissions from municipal wastewater treatment facilities in China from 2006 to 201948
The United States COVID-19 Forecast Hub dataset48
MOFSimplify, machine learning models with extracted stability data of three thousand metal–organic frameworks48
Heidelberg colorectal data set for surgical data science in the sensor operating room48
Response2covid19, a dataset of governments’ responses to COVID-19 all around the world48
The Allen Ancient DNA Resource (AADR) a curated compendium of ancient human genomes47
Global offshore wind turbine dataset47
A new global dataset of bioclimatic indicators47
A global occurrence database of the Atlantic blue crab Callinectes sapidus47
China’s greenhouse gas emissions for cropping systems from 1978–201647
A global dataset for the projected impacts of climate change on four major crops47
A large, curated, open-source stroke neuroimaging dataset to improve lesion segmentation algorithms46
The University of Pennsylvania glioblastoma (UPenn-GBM) cohort: advanced MRI, clinical, genomics, & radiomics46
An integrated landscape of protein expression in human cancer46
Projecting 1 km-grid population distributions from 2020 to 2100 globally under shared socioeconomic pathways46
Quantum chemical benchmark databases of gold-standard dimer interaction energies46
A standardized catalogue of spectral indices to advance the use of remote sensing in Earth system research46
Global data on fertilizer use by crop and by country45
Segmentation of vestibular schwannoma from MRI, an open annotated dataset and baseline algorithm45
Global forest management data for 2015 at a 100 m resolution45
The normalised Sentinel-1 Global Backscatter Model, mapping Earth’s land surface with C-band microwaves45
Electronic healthcare records and external outcome data for hospitalized patients with heart failure44
An Indo-Pacific coral spawning database44
Human and economic impacts of natural disasters: can we trust the global data?44
A new vector-based global river network dataset accounting for variable drainage density43
Genetic variation among 481 diverse soybean accessions, inferred from genomic re-sequencing43
A taxonomic, genetic and ecological data resource for the vascular plants of Britain and Ireland43
GDIS, a global dataset of geocoded disaster locations42
16 years of topographic surveys of rip-channelled high-energy meso-macrotidal sandy beach42
In vivo human whole-brain Connectom diffusion MRI dataset at 760 µm isotropic resolution42
A multilevel carbon and water footprint dataset of food commodities42
Validation and refinement of cropland data layer using a spatial-temporal decision tree algorithm41
Global gridded crop harvested area, production, yield, and monthly physical area data circa 201541
p3k14c, a synthetic global database of archaeological radiocarbon dates41
A high-resolution climate simulation dataset for the past 540 million years41
A multimodal sensor dataset for continuous stress detection of nurses in a hospital41
A global 0.05° dataset for gross primary production of sunlit and shaded vegetation canopies from 1992 to 202040
High-resolution Digital Surface Model of the 2021 eruption deposit of Cumbre Vieja volcano, La Palma, Spain40
GloSEM: High-resolution global estimates of present and future soil displacement in croplands by water erosion40
A large-scale study on research code quality and execution40
PAPILA: Dataset with fundus images and clinical data of both eyes of the same patient for glaucoma assessment40
An improved daily standardized precipitation index dataset for mainland China from 1961 to 201840
The International Bathymetric Chart of the Southern Ocean Version 240
Emognition dataset: emotion recognition with self-reports, facial expressions, and physiology using wearables40
Quality control and removal of technical variation of NMR metabolic biomarker data in ~120,000 UK Biobank participants40
A gene expression atlas for different kinds of stress in the mouse brain39
Crop production and nitrogen use in European cropland and grassland 1961–201939
Maps of cropping patterns in China during 2015–202139
Lower limb kinematic, kinetic, and EMG data from young healthy humans during walking at controlled speeds39
Design and evaluation of a data anonymization pipeline to promote Open Science on COVID-1938
Implementation of FAIR principles in the IPCC: the WGI AR6 Atlas repository37
SAVI, in silico generation of billions of easily synthesizable compounds through expert-system type rules37
SMAP-HydroBlocks, a 30-m satellite-based soil moisture dataset for the conterminous US37
An Open MRI Dataset For Multiscale Neuroscience36
SPICE, A Dataset of Drug-like Molecules and Peptides for Training Machine Learning Potentials36
A curated dataset for data-driven turbulence modelling36
ISLES 2022: A multi-center magnetic resonance imaging stroke lesion segmentation dataset36
A dataset of remote-sensed Forel-Ule Index for global inland waters during 2000–201836
Materials informatics platform with three dimensional structures, workflow and thermoelectric applications36
A map of the extent and year of detection of oil palm plantations in Indonesia, Malaysia and Thailand36
The Swiss data cube, analysis ready data archive using earth observations of Switzerland36
ISARIC-COVID-19 dataset: A Prospective, Standardized, Global Dataset of Patients Hospitalized with COVID-1936
A high-spatial-resolution dataset of human thermal stress indices over South and East Asia36
A global dataset for crop production under conventional tillage and no tillage systems35
Global 1-km present and future hourly anthropogenic heat flux35
Monthly direct and indirect greenhouse gases emissions from household consumption in the major Japanese cities35
A computed tomography vertebral segmentation dataset with anatomical variations and multi-vendor scanner data35
Reef Cover, a coral reef classification for global habitat mapping from remote sensing35
Inflation of test accuracy due to data leakage in deep learning-based classification of OCT images35
A completely annotated whole slide image dataset of canine breast cancer to aid human breast cancer research34
Discrete typing units of Trypanosoma cruzi: Geographical and biological distribution in the Americas34
GLORIA - A globally representative hyperspectral in situ dataset for optical sensing of water quality34
NLM-Chem, a new resource for chemical entity recognition in PubMed full text literature34
Open-access quantitative MRI data of the spinal cord and reproducibility across participants, sites and manufacturers34
GLOBathy, the global lakes bathymetry dataset34
A three-year dataset supporting research on building energy management and occupancy analytics34
Evaluating explainability for graph neural networks34
A multi-centre polyp detection and segmentation dataset for generalisability assessment33
Thinking out loud, an open-access EEG-based BCI dataset for inner speech recognition33
The Cuban Human Brain Mapping Project, a young and middle age population-based EEG, MRI, and cognition dataset33
Mapping 20 years of irrigated croplands in China using MODIS and statistics and existing irrigation products33
A hierarchical inventory of the world’s mountains for global comparative mountain science33
A harmonised, high-coverage, open dataset of solar photovoltaic installations in the UK33
A multimodal psychological, physiological and behavioural dataset for human emotions in driving tasks33
Peeking into a black box, the fairness and generalizability of a MIMIC-III benchmarking model33
A database framework for rapid screening of structure-function relationships in PFAS chemistry33
A global map of planting years of plantations33
VinDr-Mammo: A large-scale benchmark dataset for computer-aided diagnosis in full-field digital mammography32
1 km land use/land cover change of China under comprehensive socioeconomic and climate scenarios for 2020–210032
A compilation of experimental data on the mechanical properties and microstructural features of Ti-alloys32
Solar and wind power data from the Chinese State Grid Renewable Energy Generation Forecasting Competition32
City- and county-level spatio-temporal energy consumption and efficiency datasets for China from 1997 to 201732
Accelerometer data collected with a minimum set of wearable sensors from subjects with Parkinson’s disease32
UWB-gestures, a public dataset of dynamic hand gestures acquired using impulse radar sensors32
Cric searchable image database as a public platform for conventional pap smear cytology data32
The global spectrum of plant form and function: enhanced species-level trait dataset32
The IDEAL household energy dataset, electricity, gas, contextual sensor data and survey data for 255 UK homes31
Dataset on electrical single-family house and heat pump load profiles in Germany31
A 21-year dataset (2000–2020) of gap-free global daily surface soil moisture at 1-km grid resolution31
A building height dataset across China in 2017 estimated by the spatially-informed approach31
Global data on earthworm abundance, biomass, diversity and corresponding environmental properties31
Chest imaging representing a COVID-19 positive rural U.S. population31
GazeBase, a large-scale, multi-stimulus, longitudinal eye movement dataset31
European primary forest database v2.031
A comprehensive spectral assay library to quantify the Escherichia coli proteome by DIA/SWATH-MS31
Energy audit and carbon footprint in trawl fisheries30
EU-Trees4F, a dataset on the future distribution of European tree species30
The near-global ocean mesoscale eddy atmospheric-oceanic-biological interaction observational dataset30
Human viral nucleic acids concentrations in wastewater solids from Central and Coastal California USA30
China’s environmental policy intensity for 1978–201930
CloudSEN12, a global dataset for semantic understanding of cloud and cloud shadow in Sentinel-230
A comprehensive database of active and potentially-active continental faults in Chile at 1:25,000 scale30
Wet-Bulb Globe Temperature, Universal Thermal Climate Index, and Other Heat Metrics for US Counties, 2000–202030
A global coral-bleaching database, 1980–202030
Annual dynamic dataset of global cropping intensity from 2001 to 201930
A statistics-based reconstruction of high-resolution global terrestrial climate for the last 800,000 years29
Updating global urbanization projections under the Shared Socioeconomic Pathways29
LepTraits 1.0 A globally comprehensive dataset of butterfly traits29
The short-term mortality fluctuation data series, monitoring mortality shocks across time and space29
A long-term reconstructed TROPOMI solar-induced fluorescence dataset using machine learning algorithms29
A 120,000-year long climate record from a NW-Greenland deep ice core at ultra-high resolution29
China’s provincial process CO2 emissions from cement production during 1993–201929
EUBUCCO v0.1: European building stock characteristics in a common and open database for 200+ million individual buildings29
Global seasonal Sentinel-1 interferometric coherence and backscatter data set29
Global hydro-environmental lake characteristics at high spatial resolution29
The two decades brainclinics research archive for insights in neurophysiology (TDBRAIN) database29
Dataset of solution-based inorganic materials synthesis procedures extracted from the scientific literature29
A spatially-explicit harmonized global dataset of critical infrastructure28
A global dataset for prevalence of Salmonella Gallinarum between 1945 and 202128
Perovskite- and Dye-Sensitized Solar-Cell Device Databases Auto-generated Using ChemDataExtractor28
Text-mined dataset of gold nanoparticle synthesis procedures, morphologies, and size entities28
A news-based climate policy uncertainty index for China28
Development of a Flame Retardant and an Organohalogen Flame Retardant Chemical Inventory28
A Global Database of Soil Plant Available Phosphorus28
HIT-UAV: A high-altitude infrared thermal dataset for Unmanned Aerial Vehicle-based object detection28
Dataset on SARS-CoV-2 non-pharmaceutical interventions in Brazilian municipalities28
Carbon Monitor Cities near-real-time daily estimates of CO2 emissions from 1500 cities worldwide28
Movement-related artefacts (MR-ART) dataset of matched motion-corrupted and clean structural MRI brain scans27
Cydrasil 3, a curated 16S rRNA gene reference package and web app for cyanobacterial phylogenetic placement27
Genome-wide association analysis of type 2 diabetes in the EPIC-InterAct study27
High accuracy barrier heights, enthalpies, and rate coefficients for chemical reactions27
Community-curated and standardised metadata of published ancient metagenomic samples with AncientMetagenomeDir27
High-resolution surface faulting from the 1983 Idaho Lost River Fault Mw 6.9 earthquake and previous events27
Nelumbo genome database, an integrative resource for gene expression and variants of Nelumbo nucifera27
Fault2SHA Central Apennines database and structuring active fault data for seismic hazard assessment27
A comprehensive LFQ benchmark dataset on modern day acquisition strategies in proteomics27
EEG Dataset for RSVP and P300 Speller Brain-Computer Interfaces27
Transition1x - a dataset for building generalizable reactive machine learning potentials27
An fMRI dataset in response to “The Grand Budapest Hotel”, a socially-rich, naturalistic movie27
EORNA, a barley gene and transcript abundance database26
Database of ab initio L-edge X-ray absorption near edge structure26
Synthetic skull bone defects for automatic patient-specific craniofacial implant design26
Time series of useful energy consumption patterns for energy system modeling26
The SUSTech-SYSU dataset for automated exudate detection and diabetic retinopathy grading26
An improved global vegetation health index dataset in detecting vegetation drought26
Human EEG recordings for 1,854 concepts presented in rapid serial visual presentation streams26
A guide to sharing open healthcare data under the General Data Protection Regulation26
Global Dam Tracker: A database of more than 35,000 dams with location, catchment, and attribute information26
AnimalTraits - a curated animal trait database for body mass, metabolic rate and brain size26
In vivo high-resolution structural MRI-based atlas of human thalamic nuclei26
QUaternary fault strain INdicators database - QUIN 1.0 - first release from the Apennines of central Italy26
A global dataset of seaweed net primary productivity26
Comprehensive ultrahigh resolution whole brain in vivo MRI dataset as a human phantom25
The FAIR Cookbook - the essential resource for and by FAIR doers25
A multi-scale time-series dataset with benchmark for machine learning in decarbonized energy grids25
Developing a large-scale dataset of flood fatalities for territories in the Euro-Mediterranean region, FFEM-DB25
Author Correction: MIMIC-IV, a freely accessible electronic health record dataset25
A Large-scale Synthetic Pathological Dataset for Deep Learning-enabled Segmentation of Breast Cancer25
0.15893816947937