VLDB Journal

Papers
(The median citation count of VLDB Journal is 1. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-02-01 to 2025-02-01.)
ArticleCitations
Towards flexibility and robustness of LSM trees159
Incremental discovery of denial constraints52
Algorithms for the discovery of embedded functional dependencies48
Answering reachability and K-reach queries on large graphs with label constraints45
Accelerating multi-way joins on the GPU41
To share or not to share vector registers?37
Optimizing navigational graph queries35
HeteroStamp: leveraging heterogeneous social interactions for mobility prediction-enhanced cost-aware spatiotemporal crowdsensing33
An in-depth analysis of pre-trained embeddings for entity resolution32
A survey of multimodal event detection based on data fusion32
A graph pattern mining framework for large graphs on GPU30
Reconciling tuple and attribute timestamping for temporal data warehouses27
ProS: data series progressive k-NN similarity search and classification with probabilistic quality guarantees26
Efficient kNN query for moving objects on time-dependent road networks23
Special issue on responsible data management and data science23
PM-LSH: a fast and accurate in-memory framework for high-dimensional approximate NN and closest pair search22
Effective entity matching with transformers22
Tempura: a general cost-based optimizer framework for incremental data processing (Journal Version)19
A design space for RDF data representations17
When hierarchy meets 2-hop-labeling: efficient shortest distance and path queries on road networks16
A model and query language for temporal graph databases15
Maximum and top-k diversified biclique search at scale15
Anchored coreness: efficient reinforcement of social networks14
VolcanoML: speeding up end-to-end AutoML via scalable search space decomposition14
F-IVM: analytics over relational databases under updates14
Exploiting domain knowledge to address class imbalance and a heterogeneous feature space in multi-class classification14
Optimizing RPQs over a compact graph representation14
Data distribution tailoring revisited: cost-efficient integration of representative data13
Discovering approximate implicit domain orders through order dependencies13
Efficient and effective algorithms for densest subgraph discovery and maintenance13
Application-driven graph partitioning13
Efficient detection of multivariate correlations with different correlation measures12
RDFFrames: knowledge graph access for machine learning tools12
Picket: guarding against corrupted data in tabular data during learning and inference12
HERMES: data placement and schema optimization for enterprise knowledge bases12
Tabular data synthesis with generative adversarial networks: design space and optimizations11
Anytime bottom-up rule learning for large-scale knowledge graph completion10
Span-reachability querying in large temporal graphs10
Lero: applying learning-to-rank in query optimizer10
DumpyOS: A data-adaptive multi-ary index for scalable data series similarity search10
Local dampening: differential privacy for non-numeric queries via local sensitivity10
Data distribution debugging in machine learning pipelines9
A benchmark and comprehensive survey on knowledge graph entity alignment via representation learning9
Information Resilience: the nexus of responsible and agile approaches to information use9
Privacy and efficiency guaranteed social subgraph matching9
Survey of vector database management systems8
Correction to: TurboLift: fast accuracy lifting for historical data recovery8
Data dependencies for query optimization: a survey8
Correction to: Survey of window types for aggregation in stream processing systems8
Correction to: Data dependencies for query optimization: a survey8
Managing bias and unfairness in data for decision support: a survey of machine learning and data engineering approaches to identify and mitigate bias and unfairness within data management and analytic8
Generating highly customizable python code for data processing with large language models7
PARROT: pattern-based correlation exploitation in big partitioned data series7
PrefixFPM: a parallel framework for general-purpose mining of frequent and closed patterns7
Pivot selection algorithms in metric spaces: a survey and experimental study7
Practical planning and execution of groupjoin and nested aggregates7
MDDE: multitasking distributed differential evolution for privacy-preserving database fragmentation7
WavingSketch: an unbiased and generic sketch for finding top-k items in data streams6
A fractional memory-efficient approach for online continuous-time influence maximization6
Efficient distributed discovery of bidirectional order dependencies6
Continuous monitoring of moving skyline and top-k queries6
Eris: efficiently measuring discord in multidimensional sources6
Internal and external memory set containment join6
Distance labeling: on parallelism, compression, and ordering6
A quantitative evaluation of persistent memory hash indexes6
Distributed detection of sequential anomalies in univariate time series6
Cardinality estimation using normalizing flow6
Cache-efficient sweeping-based interval joins for extended Allen relation predicates6
General graph generators: experiments, analyses, and improvements6
The full story of 1000 cores5
Mis-categorized entities detection5
Optimizing LSM-based indexes for disaggregated memory5
Efficient and robust active learning methods for interactive database exploration5
$$\hbox {CDBTune}^{+}$$: An efficient deep reinforcement learning-based automatic cloud database tuning system5
eRiskCom: an e-commerce risky community detection platform5
A survey on outlier explanations5
An analysis of one-to-one matching algorithms for entity resolution5
Dragoon: a hybrid and efficient big trajectory management system for offline and online analytics5
(p,q)-biclique counting and enumeration for large sparse bipartite graphs5
Reverse spatial top-k keyword queries5
A survey on deep learning approaches for text-to-SQL5
Assisted design of data science pipelines4
AutoML in heavily constrained applications4
A survey on semantic schema discovery4
AutoCTS++: zero-shot joint neural architecture and hyperparameter search for correlated time series forecasting4
Minimum motif-cut: a workload-aware RDF graph partitioning strategy4
A new distributional treatment for time series anomaly detection4
Accelerating directed densest subgraph queries with software and hardware approaches4
Zen+: a robust NUMA-aware OLTP engine optimized for non-volatile main memory4
A survey on the evolution of stream processing systems4
Interactively discovering and ranking desired tuples by data exploration4
Hypergraph motifs and their extensions beyond binary3
Leveraging user itinerary to improve personalized deep matching at Fliggy3
Have query optimizers hit the wall?3
Hu-Fu: efficient and secure spatial queries over data federation3
Efficient and effective ER with progressive blocking3
Deep entity matching with adversarial active learning3
MinJoin++: a fast algorithm for string similarity joins under edit distance3
Efficient cryptanalysis of an encrypted database supporting data interoperability3
Survey of window types for aggregation in stream processing systems3
MM-DIRECT3
Open benchmark for filtering techniques in entity resolution3
Formal semantics and high performance in declarative machine learning using Datalog3
xDBTagger: explainable natural language interface to databases using keyword mappings and schema graph3
ABC of order dependencies3
A new window Clause for SQL++3
Learned sketch for subgraph counting: a holistic approach2
Stochastic gradient descent without full data shuffle: with applications to in-database machine learning and deep learning systems2
Efficient Hop-constrained s-t Simple Path Enumeration2
HINT: a hierarchical interval index for Allen relationships2
A learning-based framework for spatial join processing: estimation, optimization and tuning2
Efficient structural node similarity computation on billion-scale graphs2
Alfa: active learning for graph neural network-based semantic schema alignment2
Cross-chain deals and adversarial commerce2
Unified route representation learning for multi-modal transportation recommendation with spatiotemporal pre-training2
Better database cost/performance via batched I/O on programmable SSD2
Performant almost-latch-free data structures using epoch protection in more depth2
Enabling space-time efficient range queries with REncoder2
Morphtree: a polymorphic main-memory learned index for dynamic workloads2
Model averaging in distributed machine learning: a case study with Apache Spark2
Sliding window-based approximate triangle counting with bounded memory usage2
G-thinker: a general distributed framework for finding qualified subgraphs in a big graph with load balancing2
Special issue: modern hardware2
Fast subgraph query processing and subgraph matching via static and dynamic equivalences2
Scalable decoupling graph neural network with feature-oriented optimization2
A meta-level analysis of online anomaly detectors2
Tidy Tuples and Flying Start: fast compilation and fast execution of relational queries in Umbra2
Special issue on “Machine learning and databases”1
On efficient 3D object retrieval1
Time series data encoding in Apache IoTDB: comparative analysis and recommendation1
RNE: computing shortest paths using road network embedding1
Estimating simplet counts via sampling1
Correction to: Internal and external memory set containment join1
Toward maintenance of hypercores in large-scale dynamic hypergraphs1
Ontological databases with faceted queries1
An authorization model for query execution in the cloud1
Fast, exact, and parallel-friendly outlier detection algorithms with proximity graph in metric spaces1
Ingress: an automated incremental graph processing system1
Efficient exploratory clustering analyses in large-scale exploration processes1
Flexible grouping of linear segments for highly accurate lossy compression of time series data1
Cleaning timestamps with temporal constraints1
A survey of RDF stores & SPARQL engines for querying knowledge graphs1
Correction to: Unsupervised and scalable subsequence anomaly detection in large data series1
Correction to: “Refiner: a reliable and efficient incentive-driven federated learning system powered by blockchain”1
Adaptive algorithms for crowd-aided categorization1
Accelerated butterfly counting with vertex priority on bipartite graphs1
SQUID: subtrajectory query in trillion-scale GPS database1
Resource-aware adaptive indexing for in situ visual exploration and analytics1
Location- and keyword-based querying of geo-textual data: a survey1
Special issue on the best papers of DaMoN 20201
Efficient local locking for massively multithreaded in-memory hash-based operators1
Similarity-driven and task-driven models for diversity of opinion in crowdsourcing markets1
Micro-architectural analysis of in-memory OLTP: Revisited1
In-database query optimization on SQL with ML predicates1
A multi-facet analysis of BERT-based entity matching models1
ICS-GNN$$^+$$: lightweight interactive community search via graph neural network1
Hyper-distance oracles in hypergraphs1
Comparison and evaluation of state-of-the-art LSM merge policies1
A systematic evaluation of machine learning on serverless infrastructure1
BatchHL$$^{+}$$: batch dynamic labelling for distance queries on large-scale networks1
How good are machine learning clouds? Benchmarking two snapshots over 5 years1
0.069393873214722