Scientific Data

Papers
(The TQCC of Scientific Data is 9. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2020-02-01 to 2024-02-01.)
ArticleCitations
Version 4 of the CRU TS monthly high-resolution gridded multivariate climate dataset1890
The FLUXNET2015 dataset and the ONEFlux processing pipeline for eddy covariance data585
China CO2 emission accounts 2016–2017500
County-level CO2 emissions and sequestration in China during 1997–2017399
A cross-country database of COVID-19 testing323
PTB-XL, a large publicly available electrocardiography dataset309
High resolution temporal profiles in the Emissions Database for Global Atmospheric Research281
Epidemiological data from the COVID-19 outbreak, real-time case information246
A harmonized global nighttime light dataset 1992–2018220
Provincial and gridded population projection for China under shared socioeconomic pathways from 2010 to 2100187
Dynamic World, Near real-time global 10 m land use land cover mapping187
HyperKvasir, a comprehensive multi-class image and video dataset for gastrointestinal endoscopy186
COVID-19 outbreak response, a dataset to assess mobility changes in Italy following national lockdown184
Materials Cloud, a platform for open computational science174
Holocene global mean surface temperature, a multi-method reconstruction approach171
A 12-lead electrocardiogram database for arrhythmia research covering more than 10,000 patients158
The World Checklist of Vascular Plants, a continuously updated resource for exploring global plant diversity157
InvaCost, a public database of the economic costs of biological invasions worldwide154
MIMIC-IV, a freely accessible electronic health record dataset150
Harmonized global maps of above and belowground biomass carbon density in the year 2010148
A patient-centric dataset of images and metadata for identifying melanomas using clinical context148
Bias-corrected climate projections for South Asia from Coupled Model Intercomparison Project-6147
The TRUST Principles for digital repositories145
The COUGHVID crowdsourcing dataset, a corpus for the study of large-scale cough analysis algorithms135
Highly accurate long-read HiFi sequencing data for five complex genomes134
AiiDA 1.0, a scalable computational infrastructure for automated reproducible workflows and data provenance132
The 10-m crop type maps in Northeast China during 2017–2019131
Outlining where humans live, the World Settlement Footprint 2015130
The human O-GlcNAcome database and meta-analysis129
Multiscale dynamic human mobility flow dataset in the U.S. during the COVID-19 epidemic124
A structured open dataset of government interventions in response to COVID-19122
Version 3 of the Global Aridity Index and Potential Evapotranspiration Database120
Systematic phenotyping and characterization of the 5xFAD mouse model of Alzheimer’s disease119
The International Bathymetric Chart of the Arctic Ocean Version 4.0116
A global-scale data set of mining areas108
A global database of Holocene paleotemperature records103
Carbon Monitor, a near-real-time daily dataset of global CO2 emission from fossil fuel and cement production102
COVID-CT-MD, COVID-19 computed tomography scan dataset applicable in machine learning and deep learning99
Data sharing practices and data availability upon request differ across scientific disciplines96
The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules95
Operationalizing the CARE and FAIR Principles for Indigenous data futures94
High-throughput screening platform for solid electrolytes combining hierarchical ion-transport prediction algorithms90
NASA Global Daily Downscaled Projections, CMIP685
Building a PubMed knowledge graph85
Global land use for 2015–2100 at 0.05° resolution under diverse socioeconomic and climate scenarios80
A platinum standard pan-genome resource that represents the population structure of Asian rice79
A global map of terrestrial habitat types79
The Building Data Genome Project 2, energy meter data from the ASHRAE Great Energy Predictor III competition79
A high-resolution in vivo magnetic resonance imaging atlas of the human hypothalamic region76
Gridded daily weather data for North America with comprehensive uncertainty quantification76
VinDr-CXR: An open dataset of chest X-rays with radiologist’s annotations74
Kvasir-Capsule, a video capsule endoscopy dataset74
COVID-19 Disease Map, building a computational repository of SARS-CoV-2 virus-host interaction mechanisms73
Global 1 km × 1 km gridded revised real gross domestic product and electricity consumption during 1992–2019 based on calibrated nighttime light data73
Global quantitative analysis of the human brain proteome and phosphoproteome in Alzheimer’s disease71
GlobalFungi, a global database of fungal occurrences from high-throughput-sequencing metabarcoding studies71
A global record of annual terrestrial Human Footprint dataset from 2000 to 201871
MedMNIST v2 - A large-scale lightweight benchmark for 2D and 3D biomedical image classification70
A database of battery materials auto-generated using ChemDataExtractor70
The global dataset of historical yields for major crops 1981–201669
Introducing the FAIR Principles for research software68
COVIDiSTRESS Global Survey dataset on psychological and behavioural consequences of the COVID-19 outbreak68
Geomorpho90m, empirical evaluation and accuracy assessment of global high-resolution geomorphometric layers66
Harmonised global datasets of wind and solar farm locations and power66
CIDO, a community-based ontology for coronavirus disease knowledge and data integration, sharing, and analysis66
Global terrestrial carbon fluxes of 1999–2019 estimated by upscaling eddy covariance data with a random forest66
Reactants, products, and transition states of elementary chemical reactions based on quantum chemistry66
PERSIANN-CCS-CDR, a 3-hourly 0.04° global precipitation climate data record for heavy precipitation studies65
AusTraits, a curated plant trait database for the Australian flora64
High-resolution monthly precipitation and temperature time series from 2006 to 210064
A database of freshwater fish species of the Amazon Basin63
Bias-corrected CMIP6 global dataset for dynamical downscaling of the historical and future climate (1979–2100)63
A SARS-CoV-2 cytopathicity dataset generated by high-content screening of a large drug repurposing collection63
Large eQTL meta-analysis reveals differing patterns between cerebral cortical and cerebellar brain regions63
Thirty complete Streptomyces genome sequences for mining novel secondary metabolite biosynthetic gene clusters62
ERA5-based global meteorological wildfire danger maps62
HIT-COVID, a global database tracking public health interventions to COVID-1962
Generation of a global synthetic tropical cyclone hazard dataset using STORM62
The Amsterdam Open MRI Collection, a set of multimodal MRI datasets for individual difference analyses60
DISPERSE, a trait database to assess the dispersal potential of European aquatic macroinvertebrates59
Combining expert and crowd-sourced training data to map urban form and functions for the continental US59
Thermodynamic and transport properties of hydrogen containing streams59
Estimating nitrogen and phosphorus concentrations in streams and rivers, within a machine learning framework59
K-EmoCon, a multimodal sensor dataset for continuous emotion recognition in naturalistic conversations59
A database of human gait performance on irregular and uneven surfaces collected by wearable sensors57
Chinese provincial multi-regional input-output database for 2012, 2015, and 201757
Coastal sea level anomalies and associated trends from Jason satellite altimetry over 2002–201856
A global ensemble of ocean wave climate projections from CMIP5-driven models55
A synthesis of bacterial and archaeal phenotypic trait data55
A synthetic energy dataset for non-intrusive load monitoring in households55
An automatic multi-tissue human fetal brain segmentation benchmark using the Fetal Tissue Annotation Dataset55
Systematic analysis of infectious disease outcomes by age shows lowest severity in school-age children53
A chromosome-scale reference genome for Giardia intestinalis WB53
COVID-19 pandemic reveals the peril of ignoring metadata standards53
GlobSnow v3.0 Northern Hemisphere snow water equivalent dataset52
Mapping twenty years of corn and soybean across the US Midwest using the Landsat archive52
Fox Insight collects online, longitudinal patient-reported outcomes and genetic data on Parkinson’s disease52
Downscaling GRACE total water storage change using partial least squares regression51
Expanded dataset of mechanical properties and observed phases of multi-principal element alloys51
A naturalistic neuroimaging database for understanding the brain using ecological stimuli51
Global high-resolution emissions of soil NOx, sea salt aerosols, and biogenic volatile organic compounds50
Gridded fossil CO2 emissions and related O2 combustion consistent with national inventories 1959–201850
Hourly potential evapotranspiration at 0.1° resolution for the global land surface from 1981-present50
ClimateEU, scale-free climate normals, historical time series, and future projections for Europe49
High-resolution terrestrial climate, bioclimate and vegetation for the last 120,000 years49
CT-ORG, a new dataset for multiple organ segmentation in computed tomography48
Atomic structures and orbital energies of 61,489 crystal-forming organic molecules48
LCVP, The Leipzig catalogue of vascular plants, a new taxonomic reference list for all known vascular plants48
OPTIMADE, an API for exchanging materials data48
A voltage and current measurement dataset for plug load appliance identification in households47
Multivariate time series dataset for space weather data analytics46
Building fault detection data to aid diagnostic algorithm creation and performance testing46
Local sea level trends, accelerations and uncertainties over 1993–201946
Comprehensive dataset of shotgun metagenomes from oxygen stratified freshwater lakes and ponds45
A global dataset of surface water and groundwater salinity measurements from 1980–201945
QM7-X, a comprehensive dataset of quantum-mechanical properties spanning the chemical space of small organic molecules45
A global database of soil nematode abundance and functional group composition45
Harmonised LUCAS in-situ land cover and use database for field surveys from 2006 to 2018 in the European Union45
Author Correction: The FLUXNET2015 dataset and the ONEFlux processing pipeline for eddy covariance data44
A multi-site, multi-disorder resting-state magnetic resonance image database44
Publisher Correction: Present and future Köppen-Geiger climate classification maps at 1-km resolution44
Building a knowledge graph to enable precision medicine44
Mapping of 30-meter resolution tile-drained croplands using a geospatial modeling approach44
CAVD, towards better characterization of void space for ionic transport analysis44
Global karst springs hydrograph dataset for research and management of the world’s fastest-flowing groundwater43
FIVES: A Fundus Image Dataset for Artificial Intelligence based Vessel Segmentation43
Global daily 1 km land surface precipitation based on cloud cover-informed downscaling42
A new comprehensive trait database of European and Maghreb butterflies, Papilionoidea42
Quantum chemical calculations for over 200,000 organic radical species and 40,000 associated closed-shell molecules42
Response2covid19, a dataset of governments’ responses to COVID-19 all around the world42
An open tool for creating battery-electric vehicle time series from empirical data, emobpy42
An annotated fluorescence image dataset for training nuclear segmentation methods42
Global soil moisture data derived through machine learning trained with in-situ measurements42
Air pollution emissions from Chinese power plants based on the continuous emission monitoring systems network42
The green and blue crop water requirement WATNEEDS model and its global gridded outputs41
Rapid flood and damage mapping using synthetic aperture radar in response to Typhoon Hagibis, Japan41
CerebrA, registration and manual label correction of Mindboggle-101 atlas for MNI-ICBM152 template40
BAGLS, a multihospital Benchmark for Automatic Glottis Segmentation40
Global monthly gridded atmospheric carbon dioxide concentrations under the historical and future scenarios40
A long term global daily soil moisture dataset derived from AMSR-E and AMSR2 (2002–2019)40
Fetal electrocardiograms, direct and abdominal with reference heartbeat annotations39
GEOM, energy-annotated molecular conformations for property prediction and molecular generation39
Benchmark maps of 33 years of secondary forest age for Brazil39
lncRNAKB, a knowledgebase of tissue-specific functional annotation and trait association of long noncoding RNA39
Genetic variation among 481 diverse soybean accessions, inferred from genomic re-sequencing38
A new global dataset of bioclimatic indicators38
CU-BEMS, smart building electricity consumption and indoor environmental sensor datasets38
LoDoPaB-CT, a benchmark dataset for low-dose computed tomography reconstruction38
Development and validation of the CHIRTS-daily quasi-global high-resolution daily temperature data set37
Creation and validation of a chest X-ray dataset with eye-tracking and report dictation for AI development37
Experimental database of optical properties of organic compounds37
A database of chlorophyll and water chemistry in freshwater lakes37
A band-gap database for semiconducting inorganic materials calculated with hybrid functional37
Electrochemical metrics for corrosion resistant alloys37
Developing reliable hourly electricity demand data through screening and imputation37
A new vector-based global river network dataset accounting for variable drainage density37
An automatically curated first-principles database of ferroelectrics37
An integrated landscape of protein expression in human cancer37
China’s greenhouse gas emissions for cropping systems from 1978–201637
TILES-2018, a longitudinal physiologic and behavioral data set of hospital workers36
The “Narratives” fMRI dataset for evaluating models of naturalistic language comprehension36
National contributions to climate change due to historical emissions of carbon dioxide, methane, and nitrous oxide since 185036
Discharge profile of a zinc-air flow battery at various electrolyte flow rates and discharge currents36
Electronic healthcare records and external outcome data for hospitalized patients with heart failure36
Lower-limb kinematics and kinetics during continuously varying human locomotion36
A curated diverse molecular database of blood-brain barrier permeability with chemical descriptors36
A fine-tuned global distribution dataset of marine forests35
16 years of topographic surveys of rip-channelled high-energy meso-macrotidal sandy beach35
Heidelberg colorectal data set for surgical data science in the sensor operating room35
A taxonomic, genetic and ecological data resource for the vascular plants of Britain and Ireland35
A dataset of clinically recorded radar vital signs with synchronised reference sensor signals34
An interactive database of Leishmania species distribution in the Americas34
A global occurrence database of the Atlantic blue crab Callinectes sapidus34
Gut microbiome diversity detected by high-coverage 16S and shotgun sequencing of paired stool and colon sample34
QMugs, quantum mechanical properties of drug-like molecules33
A gene expression atlas for different kinds of stress in the mouse brain33
Global spatiotemporally continuous MODIS land surface temperature dataset33
Database of pharmacokinetic time-series data and parameters for 144 environmental chemicals33
Worldwide continuous gap-filled MODIS land surface temperature dataset33
Long-term and large-scale multispecies dataset tracking population changes of common European breeding birds33
The global lake area, climate, and population dataset33
Validation and refinement of cropland data layer using a spatial-temporal decision tree algorithm33
Design and evaluation of a data anonymization pipeline to promote Open Science on COVID-1933
Quantum chemical benchmark databases of gold-standard dimer interaction energies32
Segmentation of vestibular schwannoma from MRI, an open annotated dataset and baseline algorithm32
Dataset of segmented nuclei in hematoxylin and eosin stained histopathology images of ten cancer types32
High-resolution Digital Surface Model of the 2021 eruption deposit of Cumbre Vieja volcano, La Palma, Spain32
Multidisciplinary database of permeability of fault zones and surrounding protolith rocks at world-wide sites32
Unravelling the diversity of magnetotactic bacteria through analysis of open genomic databases31
A rasterized building footprint dataset for the United States31
A multilevel carbon and water footprint dataset of food commodities31
In vivo human whole-brain Connectom diffusion MRI dataset at 760 µm isotropic resolution31
A completely annotated whole slide image dataset of canine breast cancer to aid human breast cancer research30
A large, curated, open-source stroke neuroimaging dataset to improve lesion segmentation algorithms30
A comprehensive, multisource database for hydrometeorological modeling of 14,425 North American watersheds30
Reef Cover, a coral reef classification for global habitat mapping from remote sensing30
A high-spatial-resolution dataset of human thermal stress indices over South and East Asia30
SAVI, in silico generation of billions of easily synthesizable compounds through expert-system type rules29
Probabilistic atlas for the language network based on precision fMRI data from >800 individuals29
An update on global mining land use29
The United States COVID-19 Forecast Hub dataset29
Publisher Correction: A global database of Holocene paleotemperature records29
Simultaneous human intracerebral stimulation and HD-EEG, ground-truth for source localization methods29
A global dataset for the projected impacts of climate change on four major crops29
Genome assembly of six polyploid potato genomes29
An Indo-Pacific coral spawning database29
The normalised Sentinel-1 Global Backscatter Model, mapping Earth’s land surface with C-band microwaves29
A relational database to identify differentially expressed genes in the endometrium and endometriosis lesions28
The Swiss data cube, analysis ready data archive using earth observations of Switzerland28
A map of the extent and year of detection of oil palm plantations in Indonesia, Malaysia and Thailand28
Epigenomic profiling of neuroblastoma cell lines28
A dataset of remote-sensed Forel-Ule Index for global inland waters during 2000–201828
Lower limb kinematic, kinetic, and EMG data from young healthy humans during walking at controlled speeds28
A kinematic and kinetic dataset of 18 above-knee amputees walking at various speeds28
Inflation of test accuracy due to data leakage in deep learning-based classification of OCT images28
Global offshore wind turbine dataset28
Genome assembly and annotation of Meloidogyne enterolobii, an emerging parthenogenetic root-knot nematode28
Chest imaging representing a COVID-19 positive rural U.S. population28
A real-time survey on the psychological impact of mild lockdown for COVID-19 in the Japanese population28
MOFSimplify, machine learning models with extracted stability data of three thousand metal–organic frameworks27
Multicenter dataset of multi-shell diffusion MRI in healthy traveling adults with identical settings27
A Global Building Occupant Behavior Database27
Vectorized rooftop area data for 90 cities in China27
A high-resolution climate simulation dataset for the past 540 million years27
p3k14c, a synthetic global database of archaeological radiocarbon dates27
The OceanDNA MAG catalog contains over 50,000 prokaryotic genomes originated from various marine environments27
A comprehensive database of active and potentially-active continental faults in Chile at 1:25,000 scale27
A harmonised, high-coverage, open dataset of solar photovoltaic installations in the UK26
A comprehensive spectral assay library to quantify the Escherichia coli proteome by DIA/SWATH-MS26
GDIS, a global dataset of geocoded disaster locations26
A whole-body FDG-PET/CT Dataset with manually annotated Tumor Lesions26
A global 0.05° dataset for gross primary production of sunlit and shaded vegetation canopies from 1992 to 202026
Global forest management data for 2015 at a 100 m resolution26
The short-term mortality fluctuation data series, monitoring mortality shocks across time and space26
VitalDB, a high-fidelity multi-parameter vital signs database in surgical patients25
A dataset of multi-functional ecological traits of Brazilian bees25
A compilation of experimental data on the mechanical properties and microstructural features of Ti-alloys25
Draft genomes of two Atlantic bay scallop subspecies Argopecten irradians irradians and A. i. concentricus25
A 3 km spatially and temporally consistent European daily soil moisture reanalysis from 2000 to 201525
A global dataset for crop production under conventional tillage and no tillage systems25
Dataset on SARS-CoV-2 non-pharmaceutical interventions in Brazilian municipalities25
An improved daily standardized precipitation index dataset for mainland China from 1961 to 201825
Greenhouse gas emissions from municipal wastewater treatment facilities in China from 2006 to 201925
A dataset of radar-recorded heart sounds and vital signs including synchronised reference sensor signals25
Cultivar-specific transcriptome and pan-transcriptome reconstruction of tetraploid potato25
An improved high-quality genome assembly and annotation of Tibetan hulless barley25
Accelerometer data collected with a minimum set of wearable sensors from subjects with Parkinson’s disease25
A multimodal sensor dataset for continuous stress detection of nurses in a hospital24
All-hazards dataset mined from the US National Incident Management System 1999–201424
An fMRI dataset in response to “The Grand Budapest Hotel”, a socially-rich, naturalistic movie24
0.063364028930664