Statistical Analysis and Data Mining

Papers
(The TQCC of Statistical Analysis and Data Mining is 2. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2020-11-01 to 2024-11-01.)
ArticleCitations
Optimal ratio for data splitting292
Generalized mixed‐effects random forest: A flexible approach to predict university student dropout28
Unsupervised random forests22
Supervised compression of big data18
Imbalanced classification: A paradigm‐based review18
Modal linear regression models with multiplicative distortion measurement errors15
Data Twinning13
An efficientk‐modes algorithm for clustering categorical datasets12
A linear time method for the detection of collective and point anomalies12
Fourier neural networks as function approximators and differential equation solvers11
The fairness‐accuracy Pareto front9
Exponential calibration for correlation coefficient with additive distortion measurement errors9
Measure inducing classification and regression trees for functional data9
A comparison of Gaussian processes and neural networks for computer model emulation and calibration8
Markov chain to analyze web usability of a university website using eye tracking data8
Weighted pivot coordinates for partial least squares‐based marker discovery in high‐throughput compositional data8
Power grid frequency prediction using spatiotemporal modeling7
A tutorial on generative adversarial networks with application to classification of imbalanced data7
Handwriting identification using random forests and score‐based likelihood ratios7
Visual diagnostics of an explainer model: Tools for the assessment of LIME explanations7
A clustering method for graphical handwriting components and statistical writership analysis7
Trees, forests, chickens, and eggs: when and why to prune trees in a random forest6
Survival trees based on heterogeneity in time‐to‐event and censoring distributions using parameter instability test6
A framework for stability‐based module detection in correlation graphs6
Specifying composites in structural equation modeling: A refinement of the Henseler–Ogasawara specification6
An adaptive nonparametric exponentially weighted moving average control chart with dynamic sampling intervals6
Feature selection for imbalanced data with deep sparse autoencoders ensemble5
A fast and efficient Modal EM algorithm for Gaussian mixtures5
Extreme ensemble of extreme learning machines5
Tracking clusters and anomalies in evolving data streams5
Traditional kriging versus modern Gaussian processes for large‐scale mining data4
Parallel coordinate order forhigh‐dimensionaldata4
A study of the impact of COVID‐19 on the Chinese stock market based on a new textual multiple ARMA model4
Precision aggregated local models4
Sample selection bias in evaluation of prediction performance of causal models3
Online embedding and clustering of evolving data streams3
Learning compact physics‐aware delayed photocurrent models using dynamic mode decomposition3
An approach to characterizing spatial aspects of image system blur3
Buckley–Jamesestimation of generalized additive accelerated lifetime model with ultrahigh‐dimensional data3
A general iterative clustering algorithm3
Coefficient tree regression for generalized linear models3
Intuitively adaptable outlier detector3
Factor analysis of mixed data for anomaly detection3
Machine learning and neural network based model predictions of soybean export shares from US Gulf to China3
Adaptive batching for Gaussian process surrogates with application in noisy level set estimation3
A tree‐based gene–environment interaction analysis with rare features3
Ensembled sparse‐input hierarchical networks for high‐dimensional datasets3
Model‐based clustering of time‐dependent categorical sequences with application to the analysis of major life event patterns3
Simplicial depth and its median: Selected properties and limitations2
Comparison of merging strategies for building machine learning models on multiple independent gene expression data sets2
Frequentist model averaging for zero‐inflated Poisson regression models2
Negative binomial graphical model with excess zeros2
Coupled support tensor machine classification for multimodal neuroimaging data2
Cluster analysis via random partition distributions2
Evaluating causal‐based feature selection for fuel property prediction models2
Penalized composite likelihood for colored graphical Gaussian models2
Local support vector machine based dimension reduction2
Residuals and diagnostics for multinomial regression models2
A family of mixture models for biclustering2
Sketched Stochastic Dictionary Learning for large‐scale data and application to high‐throughput mass spectrometry2
Estimation of disease progression for ischemic heart disease using latent Markov with covariates2
Ensemble learning for score likelihood ratios under the common source problem2
Emulated order identification for models of big time series data2
Comparison of machine learning approaches used to identify the drivers of Bakken oil well productivity2
Next waves in veridical network embedding*2
Bilateral‐WeightedOnline Adaptive Isolation Forest for anomaly detection in streaming data2
Data‐drivensparse partial least squares2
0.039644956588745