Statistical Analysis and Data Mining

Papers
(The median citation count of Statistical Analysis and Data Mining is 0. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-05-01 to 2025-05-01.)
ArticleCitations
415
Predictive models with end user preference26
Evaluating causal‐based feature selection for fuel property prediction models15
A practical extension of the recursive multi‐fidelity model for the emulation of hole closure experiments13
Data‐drivensparse partial least squares12
Sample selection bias in evaluation of prediction performance of causal models12
Semi‐Parametric Least‐Area Linear‐Circular Regression Through Möbius Transformation12
Modeling and inference for mixtures of simple symmetric exponential families of ‐dimensional distributions for vectors with binary coordinates12
Survival trees based on heterogeneity in time‐to‐event and censoring distributions using parameter instability test11
Randomized multiarm bandits: An improved adaptive data collection method11
CLADAG 2021 special issue: Selected papers on classification and data analysis9
9
Kernel learning with nonconvex ramp loss8
Some Bayesian biclustering methods: Modeling and inference8
Issue Information7
BayesMultiomics: An R Package for Bayesian Shrinkage Models for Integration and Analysis of Multi‐Platform High‐Dimensional Genomics Data7
Model Averaging for Regression Kink Models7
Data Twinning7
Optimal ratio for data splitting6
An efficientk‐modes algorithm for clustering categorical datasets5
Negative binomial graphical model with excess zeros5
Bayesian shrinkage models for integration and analysis of multiplatform high‐dimensional genomics data5
5
Weighted AutoEncoding recommender system5
Tracking clusters and anomalies in evolving data streams4
Multi‐node Expectation–Maximization algorithm for finite mixture models3
On difference‐based gradient estimation in nonparametric regression3
Model‐Based Recursive Partitioning for Discrete Event Times3
3
Bayesian modeling of location, scale, and shape parameters in skew‐normal regression models3
Input‐response space‐filling designs incorporating response uncertainty3
Integrative learning of structuredhigh‐dimensionaldata from multiple datasets3
Estimating basis functions in massive fields under the spatial mixed effects model3
Comparison of merging strategies for building machine learning models on multiple independent gene expression data sets3
Bayesian inference for nonprobability samples with nonignorable missingness3
An ImprovedD2GAN‐based oversampling algorithm for imbalanced data classification3
Issue Information3
Robust deep neural network surrogate models with uncertainty quantification via adversarial training3
A tree‐based gene–environment interaction analysis with rare features3
Multivariate contaminated normal mixture regression modeling of longitudinal data based on jointmean‐covariancemodel3
A finely tuned deep transfer learning algorithm to compare outsole images3
2
Adversarially robust subspace learning in the spiked covariance model2
eRPCA: Robust Principal Component Analysis for Exponential Family Distributions2
Issue Information2
Nonparametric clustering of RNA‐sequencing data2
Driving mode analysis—How uncertain functional inputs propagate to an output2
Biclustering high‐frequency financial time series based on information theory2
A Novel Approach for APT Detection Based on Ensemble Learning Model2
Sparse Bayesian variable selection in high‐dimensional logistic regression models with correlated priors2
A new formulation of sparse multiple kernel k$$ k $$‐means clustering and its applications2
Local influence analysis for the sliced average third‐moment estimation2
Issue Information2
Development and validation of models for two‐week mortality of inpatients with COVID‐19 infection: A large prospective cohort study2
Cost‐sensitive classification with time constraint on incomplete data2
Issue Information2
Sketched Stochastic Dictionary Learning for large‐scale data and application to high‐throughput mass spectrometry2
The analysis of association rules: Latent class analysis2
The fairness‐accuracy Pareto front2
Interaction Tests With Covariate‐Adaptive Randomization2
Bayesian Hybrid Model Search and Averaging for Sparse Gaussian Process Regression2
Confidence bounds for threshold similarity graph in random variable network1
A study of the impact of COVID‐19 on the Chinese stock market based on a new textual multiple ARMA model1
Efficient importance sampling imputation algorithms for quantile and composite quantile regression1
Coupled support tensor machine classification for multimodal neuroimaging data1
A new parametric approach to gender gap with application to EUSILC data in Poland and Italy1
Convolutional Sparse Coding for Time Series Via a ℓ0 Penalty: An Efficient Algorithm With Statistical Guarantees1
A Conversational Assistant for Democratization of Data Visualization: A Comparative Study of Two Approaches of Interaction1
High‐dimensional classification based on nonparametric maximum likelihood estimation under unknown and inhomogeneous variances1
Data‐driven stochastic model for quantifying the interplay between amyloid‐beta and calcium levels in Alzheimer's disease1
Corrigendum1
Semiparametric detection of changepoints in location, scale, and copula1
Quantifying Epistemic Uncertainty in Binary Classification via Accuracy Gain1
Weighted validation of heteroscedastic regression models for better selection1
Cluster analysis via random partition distributions1
A family of mixture models for biclustering1
1
An automated alignment algorithm for identification of the source of footwear impressions with common class characteristics1
1
Semiparametric estimation of average treatment effects in observational studies1
Bayesian Posterior Interval Calibration to Improve the Interpretability of Observational Studies1
Density estimation via measure transport: Outlook for applications in the biological sciences1
Estimation of disease progression for ischemic heart disease using latent Markov with covariates1
Gaussian process selections in semiparametric multi‐kernel machine regression for multi‐pathway analysis1
Bayesian batch optimization for molybdenum versus tungsten inertial confinement fusion double shell target design1
Multi‐scale affinities with missing data: Estimation and applications1
1
Modeling matrix variate time series via hidden Markov models with skewed emissions1
Issue Information1
1
A Homogeneity Test for Ordinal Receiver Operating Characteristic Regression With Application to Facial Recognition Accuracy Assessment1
Simplicial depth and its median: Selected properties and limitations1
1
Weighted pivot coordinates for partial least squares‐based marker discovery in high‐throughput compositional data1
Regrouped design in privacy analysis for multinomial microdata1
A deep learning factor analysis model based on importance‐weighted variational inference and normalizing flow priors: Evaluation within a set of multidimensional performance assessments in youth elite1
0
Online Variable Selection and Parameter Estimation for Massive Data via Square Root Lasso0
Application of nonparametric quantifiers for online handwritten signature verification: A statistical learning approach0
Sequential metamodel‐based approaches to level‐set estimation under heteroscedasticity0
Modal linear regression models with multiplicative distortion measurement errors0
An Efficient Filtering Approach for Model Estimation in Sparse Regression0
A deep learning approach for the comparison of handwritten documents using latent feature vectors0
Semi‐supervised multi‐label learning with missing labels by exploiting feature‐label correlations0
Application of the Cox proportional hazards model and competing risks models to critical illness insurance data0
Nonparametric Expectile Regression Meets Deep Neural Networks: A Robust Nonlinear Variable Selection method0
Model selection with bootstrap validation0
Issue Information0
Online learning for streaming data classification in nonstationary environments0
Sequence Outlier Detection and Application of Gated Recurrent Unit Autoencoder Gaussian Mixture Model Based on Various Loss Optimization0
Individualized image region detection with total variation0
0
On an Empirical Likelihood Based Solution to the Approximate Bayesian Computation Problem0
Issue Information0
A new logarithmic multiplicative distortion for correlation analysis0
Noise‐Augmented ℓ0 Regularization of Tensor Regression With Tucker Decomposition0
Marginal clustered multistate models for longitudinal progressive processes with informative cluster size0
Feature selection for imbalanced data with deep sparse autoencoders ensemble0
Ensemble learning for score likelihood ratios under the common source problem0
Stratified learning: A general‐purpose statistical method for improved learning under covariate shift0
Greenwood Statistic Under Distortion Measurement Errors0
Towards accelerating particle‐resolved direct numerical simulation with neural operators0
Specifying composites in structural equation modeling: A refinement of the Henseler–Ogasawara specification0
Smart data augmentation: One equation is all you need0
0
Node Centrality Inference via Hypothesis Testing0
Factor analysis of mixed data for anomaly detection0
Issue Information0
Frequentist model averaging for zero‐inflated Poisson regression models0
On Algorithms and Approximations for Progressively Type‐I Censoring Schemes0
Issue Information0
A fast and efficient Modal EM algorithm for Gaussian mixtures0
Issue Information0
CLADAG 2019 Special Issue: Selected Papers on Classification and Data Analysis0
Ensembled sparse‐input hierarchical networks for high‐dimensional datasets0
Neural‐networktransformation models for counting processes0
A modified least angle regression algorithm for interaction selection with heredity0
Identifying Nuclear Data Correlated Through Predicting Bias in Integral Experiments via Applying Principal Component Analysis to Random Forest0
Penalized composite likelihood for colored graphical Gaussian models0
Robustifying Marginal Linear Models for Correlated Responses Using a Constructive Multivariate Huber Distribution0
Issue Information0
Out‐of‐bag stability estimation for k‐means clustering0
Share density‐based clustering of income data0
Compositional variable selection in quantile regression for microbiome data with false discovery rate control0
Lq regularization for fair artificial intelligence robust to covariate shift0
Regression‐based Bayesian estimation and structure learning for nonparanormal graphical models0
Study of a bounded interval perks distribution with quantile regression analysis0
Issue Information0
Measure inducing classification and regression trees for functional data0
Nonparametric mean and variance adaptive classification rule for high‐dimensional data with heteroscedastic variances0
A random forest approach for interval selection in functional regression0
An Adaptive Microbiome‐Based Truncated Test0
Rarity updated ensemble with oversampling: An ensemble approach to classification of imbalanced data streams0
Imbalanced classification: A paradigm‐based review0
Adaptive batching for Gaussian process surrogates with application in noisy level set estimation0
Randomized algorithms for tensor response regression0
Machine learning and neural network based model predictions of soybean export shares from US Gulf to China0
Error‐controlled feature selection for ultrahigh‐dimensional and highly correlated feature space using deep learning0
0
0
0
0
Imputed quantile vector autoregressive model for multivariate spatial–temporal data0
Online embedding and clustering of evolving data streams0
Nonparametric Bayesian functional clustering with applications to racial disparities in breast cancer0
Local support vector machine based dimension reduction0
Factor analysis for high‐dimensional time series: Consistent estimation and efficient computation0
Issue Information0
Conformal Multi‐Target Hyperrectangles0
Categorical classifiers in multiclass classification with imbalanced datasets0
Robust multitask learning in high dimensions under memory constraint0
Using Neural Networks to Identify Mixture Components in Hyperspectral Reflectance Data0
Issue Information0
0
Buckley–Jamesestimation of generalized additive accelerated lifetime model with ultrahigh‐dimensional data0
Detection of Unknown Functional Departure in Generalized Functional Regression0
A novel Bayesian method for variable selection and estimation in binary quantile regression0
The Classification Algorithm Based on Functional Logistic Regression Model With Spatial Effects and Its Application in Air Quality Analysis0
Issue Information0
Handwriting identification using random forests and score‐based likelihood ratios0
Modeling subpopulations for hierarchically structured data0
Coefficient tree regression for generalized linear models0
Score Tests for Overdispersion in Marginalized Zero‐Inflated Poisson Regression Based on Marginalized Zero‐Inflated Generalized Poisson Model0
Evaluation and interpretation of driving risks: Automobile claim frequency modeling with telematics data0
Precision aggregated local models0
The finite mixture model for the tails of distribution: Monte Carlo experiment and empirical applications0
Transfer learning under the Cox model with interval‐censored data0
Considerations in Bayesian agent‐based modeling for the analysis of COVID‐19 data0
Revisiting Winnow: A modified online feature selection algorithm for efficient binary classification0
0
0
Issue Information0
Feature screening of ultrahigh dimensional longitudinal data based on the C‐statistic0
Doubly robust estimation for non‐probability samples with modified intertwined probabilistic factors decoupling0
Residual's influence index (RINFIN), bad leverage and unmasking in high dimensionalL2‐regression0
Fourier neural networks as function approximators and differential equation solvers0
An auxiliary Part‐of‐Speech tagger for blog and microblog cyber‐slang0
Issue Information0
A tutorial on generative adversarial networks with application to classification of imbalanced data0
Distributed dimension reduction with nearly oracle rate0
Bilateral‐WeightedOnline Adaptive Isolation Forest for anomaly detection in streaming data0
Markov chain to analyze web usability of a university website using eye tracking data0
Boosting diversity in regression ensembles0
Issue Information0
Hub‐aware random walk graph embedding methods for classification0
Issue Information0
Association rules and decision rules0
Prior effective sample size for exponential family distributions with multiple parameters0
A linear time method for the detection of collective and point anomalies0
0
Characterizing climate pathways using feature importance on echo state networks0
A novel two‐step extrapolation‐insertion risk model based on the Expectile under the Pareto‐type distribution0
Assessment of the real‐time pattern recognition capability of machine learning algorithms0
Analyzing relevance vector machines using a single penalty approach0
Residuals and diagnostics for multinomial regression models0
A machine learning oracle for parameter estimation0
Spatially‐correlated time series clustering using location‐dependent Dirichlet process mixture model0
Neural interval‐censored survival regression with feature selection0
0
Adaptive boosting for ordinal target variables using neural networks0
Non‐uniform active learning for Gaussian process models with applications to trajectory informed aerodynamic databases0
The generalized hyperbolic family and automatic model selection through the multiple‐choiceLASSO0
Power grid frequency prediction using spatiotemporal modeling0
A general iterative clustering algorithm0
Parallel coordinate order forhigh‐dimensionaldata0
A neutral zone classifier for three classes with an application to text mining0
Subsampling under distributional constraints0
A treeless absolutely random forest with closed‐form estimators of expected proximities0
0
Hierarchy‐assisted gene expression regulatory network analysis0
Expert‐in‐the‐loop design of integral nuclear data experiments0
Issue Information0
Issue Information0
BOSTONPUPA: A Bayesian Online Spatio‐Temporal Outbreak Detection Framework With Prior Updating and p‐Value Adaptation0
Simplicial depth: Characterization and reconstruction0
Multivariate Gaussian RBF‐net for smooth function estimation and variable selection0
Traditional kriging versus modern Gaussian processes for large‐scale mining data0
0
Intuitively adaptable outlier detector0
Portability analysis of data mining models for fog events forecasting0
Issue Information0
Issue Information0
Triangulation‐Based Spatial Clustering for Adjacent Data With Heterogeneous Density0
Two‐sample testing for random graphs0
Bayesian relative composite quantile regression approach of ordinal latent regression model with L1/2 regularization0
Trees, forests, chickens, and eggs: when and why to prune trees in a random forest0
Persistent Classification: Understanding Adversarial Attacks by Studying Decision Boundary Dynamics0
Bag of little bootstraps for massive and distributed longitudinal data0
1.6316959857941