Data Mining and Knowledge Discovery

Papers
(The median citation count of Data Mining and Knowledge Discovery is 2. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-04-01 to 2025-04-01.)
ArticleCitations
Proximity forest 2.0: a new effective and scalable similarity-based classifier for time series138
Mint: MDL-based approach for Mining INTeresting Numerical Pattern Sets104
Randomnet: clustering time series using untrained deep neural networks98
A probabilistic model for API contract specification retrieval focusing on the openAPI standard78
MIRACLE: Malware image recognition and classification by layered extraction57
NICE: an algorithm for nearest instance counterfactual explanations55
Active learning with biased non-response to label requests49
On the impact of multi-dimensional local differential privacy on fairness46
Modeling the impact of out-of-schema questions in task-oriented dialog systems42
An alternating nonmonotone projected Barzilai–Borwein algorithm of nonnegative factorization of big matrices41
BROCCOLI: overlapping and outlier-robust biclustering through proximal stochastic gradient descent40
Human-in-the-loop handling of knowledge drift38
Affinity analysis for studying physicians’ prescription behavior.37
Strengthening ties towards a highly-connected world36
Community detection in interval-weighted networks35
Hypercore decomposition for non-fragile hyperedges: concepts, algorithms, observations, and applications34
Fast and robust video-based exercise classification via body pose tracking and scalable multivariate time series classifiers32
Weighted sparse simplex representation: a unified framework for subspace clustering, constrained clustering, and active learning31
On regime changes in text data using hidden Markov model of contaminated vMF distribution31
A two-step anomaly detection based method for PU classification in imbalanced data sets29
Expected passes25
Improving Graph Neural Networks by combining active learning with self-training23
Efficient binary embedding of categorical data using BinSketch21
PAC-Bayesian lifelong learning for multi-armed bandits20
Learning a Bayesian network with multiple latent variables for implicit relation representation19
Joint dynamic topic model for recognition of lead-lag relationship in two text corpora18
Controlling hallucinations at word level in data-to-text generation18
Correction to: Studying bias in visual features through the lens of optimal transport17
Counterfactual explanations as interventions in latent space17
The Hadamard decomposition problem17
Online concept evolution detection based on active learning16
Link prediction in dynamic networks using random dot product graphs15
Random walks with variable restarts for negative-example-informed label propagation15
Adversarial balancing-based representation learning for causal effect inference with observational data15
Unsupervised feature based algorithms for time series extrinsic regression15
Bounding the family-wise error rate in local causal discovery using Rademacher averages15
Fair detection of poisoning attacks in federated learning on non-i.i.d. data15
VFC-SMOTE: very fast continuous synthetic minority oversampling for evolving data streams14
Multiple-input neural networks for time series forecasting incorporating historical and prospective context14
MSGNN: Multi-scale Spatio-temporal Graph Neural Network for epidemic forecasting14
Exploring uplift modeling with high class imbalance14
Sequential query prediction based on multi-armed bandits with ensemble of transformer experts and immediate feedback14
Introducing the contrast profile: a novel time series primitive that allows real world classification14
Towards effective urban region-of-interest demand modeling via graph representation learning13
Z-Time: efficient and effective interpretable multivariate time series classification13
Session-based recommendation by exploiting substitutable and complementary relationships from multi-behavior data13
A semi-supervised interactive algorithm for change point detection12
Fast, accurate and explainable time series classification through randomization12
ArcMatch: high-performance subgraph matching for labeled graphs by exploiting edge domains12
trie-nlg: trie context augmentation to improve personalized query auto-completion for short and unseen prefixes12
Mondrian forest for data stream classification under memory constraints11
Scalable classifier-agnostic channel selection for multivariate time series classification11
Who can receive the pass? A computational model for quantifying availability in soccer11
A combinatorial multi-armed bandit approach to correlation clustering11
An adaptive meta-heuristic for music plagiarism detection based on text similarity and clustering11
MODE-Bi-GRU: orthogonal independent Bi-GRU model with multiscale feature extraction11
End-to-end deep representation learning for time series clustering: a comparative study11
Structure-aware decoupled imputation network for multivariate time series11
Smoothed dilated convolutions for improved dense prediction11
Improving position encoding of transformers for multivariate time series classification11
Extended missing data imputation via GANs for ranking applications11
Extending greedy feature selection algorithms to multiple solutions10
Knowledge graph embedding methods for entity alignment: experimental review10
TED: related party transaction guided tax evasion detection on heterogeneous graph10
Informative pseudo-labeling for graph neural networks with few labels10
Missing value replacement in strings and applications10
Inferring tie strength in temporal networks10
Evaluating outlier probabilities: assessing sharpness, refinement, and calibration using stratified and weighted measures9
Data-driven detection of counterpressing in professional football9
Streaming changepoint detection for transition matrices9
When graph convolution meets double attention: online privacy disclosure detection with multi-label text classification9
Studying bias in visual features through the lens of optimal transport9
Navigating the metric maze: a taxonomy of evaluation metrics for anomaly detection in time series9
Correction: FRAPPE: fast rank approximation with explainable features for tensors9
Grouped feature importance and combined features effect plot9
TCMI: a non-parametric mutual-dependence estimator for multivariate continuous distributions8
K-plex cover pooling for graph neural networks8
Ranking with submodular functions on a budget8
Unsupervised domain adaptation with non-stochastic missing data8
A graph convolutional fusion model for community detection in multiplex networks8
Homophily outlier detection in non-IID categorical data8
MCCE: Monte Carlo sampling of valid and realistic counterfactual explanations for tabular data7
The grammar of interactive explanatory model analysis7
The art of centering without centering for robust principal component analysis7
Exploring potential biases towards blockbuster items in ranking-based recommendations7
An eager splitting strategy for online decision trees in ensembles7
Synwalk: community detection via random walk modelling7
What’s in a name? – gender classification of names with character based machine learning models7
BDRI: block decomposition based on relational interaction for knowledge graph completion7
Hydra: competing convolutional kernels for fast and accurate time series classification7
Making clusterings fairer by post-processing: algorithms, complexity results and experiments6
An anomaly aware network embedding framework for unsupervised anomalous link detection6
Structural learning of simple staged trees6
Representing ensembles of networks for fuzzy cluster analysis: a case study6
Generalized density attractor clustering for incomplete data6
Enforcing fairness using ensemble of diverse Pareto-optimal models6
Better trees: an empirical study on hyperparameter tuning of classification decision tree induction algorithms6
CrashNet: an encoder–decoder architecture to predict crash test outcomes6
Interpretable linear dimensionality reduction based on bias-variance analysis6
Correction: Marginal effects for non-linear prediction functions6
Thompson sampling-based recursive block elimination for dynamic assignment under limited budget in pure-exploration6
Dynamic self-paced sampling ensemble for highly imbalanced and class-overlapped data classification5
Regularization-based methods for ordinal quantification5
Reciprocity in directed hypergraphs: measures, findings, and generators5
FRAPPE: fast rank approximation with explainable features for tensors5
Discord-based counterfactual explanations for time series classification5
Multiple hypergraph convolutional network social recommendation using dual contrastive learning5
Enhancing racism classification: an automatic multilingual data annotation system using self-training and CNN5
ClaSP: parameter-free time series segmentation5
SOKNL: A novel way of integrating K-nearest neighbours with adaptive random forest regression for data streams5
GeoRF: a geospatial random forest5
Exploiting sensor data in professional road cycling: personalized data-driven approach for frequent fitness monitoring4
Inferring range of information diffusion based on historical frequent items4
Social norm bias: residual harms of fairness-aware algorithms4
Attention based adversarially regularized learning for network embedding4
Correction to: Bias characterization, assessment, and mitigation in location-based recommender systems4
A hyperbolic approach for learning communities on graphs4
Model-agnostic feature importance and effects with dependent features: a conditional subgroup approach4
Shapley values for cluster importance4
Guest editorial: Special issue on mining for health4
Reflective-net: learning from explanations4
Correction to: AA-forecast: anomaly-aware forecast for extreme events4
MultiETSC: automated machine learning for early time series classification4
Bias-aware ranking from pairwise comparisons4
Large scale K-means clustering using GPUs4
Traffic forecasting on new roads using spatial contrastive pre-training (SCPT)4
Fairness in vulnerable attribute prediction on social media4
An alternative for data visualization using space-filling curve4
MERLIN++: parameter-free discovery of time series anomalies4
PETSC: pattern-based embedding for time series classification4
Improving the core resilience of real-world hypergraphs3
Differentially Private Distance Learning in Categorical Data3
Methods for explaining Top-N recommendations through subgroup discovery3
Fast computation of Katz index for efficient processing of link prediction queries3
Transfer how much: a fine-grained measure of the knowledge transferability of user behavior sequences in social network3
Neural content-aware collaborative filtering for cold-start music recommendation3
Model-agnostic variable importance for predictive uncertainty: an entropy-based approach3
Joint leaf-refinement and ensemble pruning through $$L_1$$ regularization3
Effective interpretable learning for large-scale categorical data3
Temporal state change Bayesian networks for modeling of evolving multivariate state sequences: model, structure discovery and parameter estimation3
Negative-sample-free knowledge graph embedding3
Column-coherent matrix decomposition3
Opinion dynamics in social networks incorporating higher-order interactions3
Fusing structural information with knowledge enhanced text representation for knowledge graph completion3
Boosting house price predictions using geo-spatial network embedding3
Meta-path based proximity learning in heterogeneous information networks3
Intersectional fair ranking via subgroup divergence3
Counterfactual inference with latent variable and its application in mental health care3
Robust regression via error tolerance3
An attention matrix for every decision: faithfulness-based arbitration among multiple attention-based interpretations of transformers in text classification3
Using differential evolution for an attribute-weighted inverted specific-class distance measure for nominal attributes3
VEM$$^2$$L: an easy but effective framework for fusing text and structure knowledge on sparse knowledge graph completion3
Bayesian network Motifs for reasoning over heterogeneous unlinked datasets3
quant: a minimalist interval method for time series classification3
INK: knowledge graph embeddings for node classification3
Isolation kernel: the X factor in efficient and effective large scale online kernel learning3
Making individually fair predictions with causal pathways3
ConvMOS: climate model output statistics with deep learning3
Dynamic cyber risk estimation with competitive quantile autoregression3
Residual projection for quantile regression in vertically partitioned big data3
HARPA: hierarchical attention with relation paths for knowledge graph embedding adversarial learning3
Intention enhanced mixed attentive model for session-based recommendation3
Exploring the diverse world of SAX-based methodologies3
Improving embedded knowledge graph multi-hop question answering by introducing relational chain reasoning2
The minimum description length principle for pattern mining: a survey2
Implicit consensus clustering from multiple graphs2
Improving graph-based recommendation with unraveled graph learning2
Preventing deception with explanation methods using focused sampling2
MMA: metadata supported multi-variate attention for onset detection and prediction2
ContE: contextualized knowledge graph embedding for circular relations2
Effective signal reconstruction from multiple ranked lists via convex optimization2
Matrix sketching for supervised classification with imbalanced classes2
Mining sequences with exceptional transition behaviour of varying order using quality measures based on information-theoretic scoring functions2
OEC: an online ensemble classifier for mining data streams with noisy labels2
Individualized passenger travel pattern multi-clustering based on graph regularized tensor latent dirichlet allocation2
A comparative study of methods for estimating model-agnostic Shapley value explanations2
MultiRocket: multiple pooling operators and transformations for fast and effective time series classification2
Detach-ROCKET: sequential feature selection for time series classification with random convolutional kernels2
On computing exact means of time series using the move-split-merge metric2
MASS: distance profile of a query over a time series2
Forming coordinated teams that balance task coverage and expert workload2
POI recommendation with queuing time and user interest awareness2
Regression tree-based active learning2
Time series clustering with random convolutional kernels2
Can local explanation techniques explain linear additive models?2
Wisdom of the contexts: active ensemble learning for contextual anomaly detection2
Fake review detection on online E-commerce platforms: a systematic literature review2
Relational Learning Analysis of Social Politics using Knowledge Graph Embedding2
Syntheval: a framework for detailed utility and privacy evaluation of tabular synthetic data2
A Lagrangian-based score for assessing the quality of pairwise constraints in semi-supervised clustering2
Efficient outlier detection in numerical and categorical data2
Forecast evaluation for data scientists: common pitfalls and best practices2
Widening: using parallel resources to improve model quality2
A comparative evaluation of clustering-based outlier detection2
XEM: An explainable-by-design ensemble method for multivariate time series classification2
Towards more sustainable and trustworthy reporting in machine learning2
A practical approach to novel class discovery in tabular data2
0.055469036102295