Data Mining and Knowledge Discovery

Papers
(The TQCC of Data Mining and Knowledge Discovery is 5. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-02-01 to 2025-02-01.)
ArticleCitations
A combinatorial multi-armed bandit approach to correlation clustering100
Human-in-the-loop handling of knowledge drift92
Weighted sparse simplex representation: a unified framework for subspace clustering, constrained clustering, and active learning83
Improving position encoding of transformers for multivariate time series classification50
Joint dynamic topic model for recognition of lead-lag relationship in two text corpora48
Affinity analysis for studying physicians’ prescription behavior.44
Counterfactual explanations as interventions in latent space42
Expected passes38
Dataset2Vec: learning dataset meta-features35
On regime changes in text data using hidden Markov model of contaminated vMF distribution34
Strengthening ties towards a highly-connected world34
trie-nlg: trie context augmentation to improve personalized query auto-completion for short and unseen prefixes33
Random walks with variable restarts for negative-example-informed label propagation32
Bounding the family-wise error rate in local causal discovery using Rademacher averages32
Sequential query prediction based on multi-armed bandits with ensemble of transformer experts and immediate feedback31
ArcMatch: high-performance subgraph matching for labeled graphs by exploiting edge domains26
Randomnet: clustering time series using untrained deep neural networks26
Towards effective urban region-of-interest demand modeling via graph representation learning25
Fast, accurate and explainable time series classification through randomization23
Session-based recommendation by exploiting substitutable and complementary relationships from multi-behavior data22
Community detection in interval-weighted networks22
Multiple-input neural networks for time series forecasting incorporating historical and prospective context20
Introducing the contrast profile: a novel time series primitive that allows real world classification20
Structure-aware decoupled imputation network for multivariate time series19
PAC-Bayesian lifelong learning for multi-armed bandits18
Extended missing data imputation via GANs for ranking applications18
Controlling hallucinations at word level in data-to-text generation18
Multi-label learning with missing and completely unobserved labels17
MSGNN: Multi-scale Spatio-temporal Graph Neural Network for epidemic forecasting16
Mondrian forest for data stream classification under memory constraints15
VFC-SMOTE: very fast continuous synthetic minority oversampling for evolving data streams15
An alternating nonmonotone projected Barzilai–Borwein algorithm of nonnegative factorization of big matrices15
MODE-Bi-GRU: orthogonal independent Bi-GRU model with multiscale feature extraction15
Exploring uplift modeling with high class imbalance15
Mint: MDL-based approach for Mining INTeresting Numerical Pattern Sets14
Sequential recommendation with metric models based on frequent sequences13
Active learning with biased non-response to label requests12
Unsupervised feature based algorithms for time series extrinsic regression12
Online concept evolution detection based on active learning12
The Hadamard decomposition problem12
Learning a Bayesian network with multiple latent variables for implicit relation representation12
Modeling the impact of out-of-schema questions in task-oriented dialog systems11
On the impact of multi-dimensional local differential privacy on fairness11
Fast and robust video-based exercise classification via body pose tracking and scalable multivariate time series classifiers11
Adversarial balancing-based representation learning for causal effect inference with observational data11
Link prediction in dynamic networks using random dot product graphs10
Efficient binary embedding of categorical data using BinSketch10
Who can receive the pass? A computational model for quantifying availability in soccer10
NICE: an algorithm for nearest instance counterfactual explanations10
Hypercore decomposition for non-fragile hyperedges: concepts, algorithms, observations, and applications10
Scalable classifier-agnostic channel selection for multivariate time series classification10
A two-step anomaly detection based method for PU classification in imbalanced data sets9
An adaptive meta-heuristic for music plagiarism detection based on text similarity and clustering9
Correction to: Studying bias in visual features through the lens of optimal transport9
A probabilistic model for API contract specification retrieval focusing on the openAPI standard9
BROCCOLI: overlapping and outlier-robust biclustering through proximal stochastic gradient descent9
Z-Time: efficient and effective interpretable multivariate time series classification9
Fair detection of poisoning attacks in federated learning on non-i.i.d. data9
End-to-end deep representation learning for time series clustering: a comparative study9
Smoothed dilated convolutions for improved dense prediction9
Improving Graph Neural Networks by combining active learning with self-training9
A semi-supervised interactive algorithm for change point detection9
MIRACLE: Malware image recognition and classification by layered extraction8
Tackling ordinal regression problem for heterogeneous data: sparse and deep multi-task learning approaches8
An eager splitting strategy for online decision trees in ensembles8
Reciprocity in directed hypergraphs: measures, findings, and generators8
Informative pseudo-labeling for graph neural networks with few labels8
TCMI: a non-parametric mutual-dependence estimator for multivariate continuous distributions8
Making clusterings fairer by post-processing: algorithms, complexity results and experiments8
Enforcing fairness using ensemble of diverse Pareto-optimal models8
Unsupervised domain adaptation with non-stochastic missing data8
When graph convolution meets double attention: online privacy disclosure detection with multi-label text classification7
Generalized density attractor clustering for incomplete data7
Inferring range of information diffusion based on historical frequent items7
Missing value replacement in strings and applications7
An anomaly aware network embedding framework for unsupervised anomalous link detection7
Hydra: competing convolutional kernels for fast and accurate time series classification7
Thompson sampling-based recursive block elimination for dynamic assignment under limited budget in pure-exploration7
The art of centering without centering for robust principal component analysis7
FRAPPE: fast rank approximation with explainable features for tensors6
ClaSP: parameter-free time series segmentation6
Knowledge graph embedding methods for entity alignment: experimental review6
Streaming changepoint detection for transition matrices6
Homophily outlier detection in non-IID categorical data6
Navigating the metric maze: a taxonomy of evaluation metrics for anomaly detection in time series6
MCCE: Monte Carlo sampling of valid and realistic counterfactual explanations for tabular data6
CrashNet: an encoder–decoder architecture to predict crash test outcomes6
Exploring potential biases towards blockbuster items in ranking-based recommendations6
Regularization-based methods for ordinal quantification5
SOKNL: A novel way of integrating K-nearest neighbours with adaptive random forest regression for data streams5
Exploiting sensor data in professional road cycling: personalized data-driven approach for frequent fitness monitoring5
The grammar of interactive explanatory model analysis5
What’s in a name? – gender classification of names with character based machine learning models5
Better trees: an empirical study on hyperparameter tuning of classification decision tree induction algorithms5
Evaluating outlier probabilities: assessing sharpness, refinement, and calibration using stratified and weighted measures5
Extending greedy feature selection algorithms to multiple solutions5
Social norm bias: residual harms of fairness-aware algorithms5
Shapley values for cluster importance5
Grouped feature importance and combined features effect plot5
Model-agnostic feature importance and effects with dependent features: a conditional subgroup approach5
GeoRF: a geospatial random forest5
Discord-based counterfactual explanations for time series classification5
Enhancing racism classification: an automatic multilingual data annotation system using self-training and CNN5
Interpretable linear dimensionality reduction based on bias-variance analysis5
Ranking with submodular functions on a budget5
Dynamic self-paced sampling ensemble for highly imbalanced and class-overlapped data classification5
Data-driven detection of counterpressing in professional football5
0.045407056808472