Statistical Analysis and Data Mining

Papers
(The median citation count of Statistical Analysis and Data Mining is 0. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2020-11-01 to 2024-11-01.)
ArticleCitations
Optimal ratio for data splitting292
Generalized mixed‐effects random forest: A flexible approach to predict university student dropout28
Unsupervised random forests22
Supervised compression of big data18
Imbalanced classification: A paradigm‐based review18
Modal linear regression models with multiplicative distortion measurement errors15
Data Twinning13
An efficientk‐modes algorithm for clustering categorical datasets12
A linear time method for the detection of collective and point anomalies12
Fourier neural networks as function approximators and differential equation solvers11
The fairness‐accuracy Pareto front9
Exponential calibration for correlation coefficient with additive distortion measurement errors9
Measure inducing classification and regression trees for functional data9
A comparison of Gaussian processes and neural networks for computer model emulation and calibration8
Markov chain to analyze web usability of a university website using eye tracking data8
Weighted pivot coordinates for partial least squares‐based marker discovery in high‐throughput compositional data8
Power grid frequency prediction using spatiotemporal modeling7
A tutorial on generative adversarial networks with application to classification of imbalanced data7
Handwriting identification using random forests and score‐based likelihood ratios7
Visual diagnostics of an explainer model: Tools for the assessment of LIME explanations7
A clustering method for graphical handwriting components and statistical writership analysis7
Trees, forests, chickens, and eggs: when and why to prune trees in a random forest6
Survival trees based on heterogeneity in time‐to‐event and censoring distributions using parameter instability test6
A framework for stability‐based module detection in correlation graphs6
Specifying composites in structural equation modeling: A refinement of the Henseler–Ogasawara specification6
An adaptive nonparametric exponentially weighted moving average control chart with dynamic sampling intervals6
Feature selection for imbalanced data with deep sparse autoencoders ensemble5
A fast and efficient Modal EM algorithm for Gaussian mixtures5
Extreme ensemble of extreme learning machines5
Tracking clusters and anomalies in evolving data streams5
Traditional kriging versus modern Gaussian processes for large‐scale mining data4
Parallel coordinate order forhigh‐dimensionaldata4
A study of the impact of COVID‐19 on the Chinese stock market based on a new textual multiple ARMA model4
Precision aggregated local models4
Sample selection bias in evaluation of prediction performance of causal models3
Online embedding and clustering of evolving data streams3
Learning compact physics‐aware delayed photocurrent models using dynamic mode decomposition3
An approach to characterizing spatial aspects of image system blur3
Buckley–Jamesestimation of generalized additive accelerated lifetime model with ultrahigh‐dimensional data3
A general iterative clustering algorithm3
Coefficient tree regression for generalized linear models3
Intuitively adaptable outlier detector3
Factor analysis of mixed data for anomaly detection3
Machine learning and neural network based model predictions of soybean export shares from US Gulf to China3
Adaptive batching for Gaussian process surrogates with application in noisy level set estimation3
A tree‐based gene–environment interaction analysis with rare features3
Ensembled sparse‐input hierarchical networks for high‐dimensional datasets3
Model‐based clustering of time‐dependent categorical sequences with application to the analysis of major life event patterns3
Simplicial depth and its median: Selected properties and limitations2
Comparison of merging strategies for building machine learning models on multiple independent gene expression data sets2
Frequentist model averaging for zero‐inflated Poisson regression models2
Negative binomial graphical model with excess zeros2
Coupled support tensor machine classification for multimodal neuroimaging data2
Cluster analysis via random partition distributions2
Evaluating causal‐based feature selection for fuel property prediction models2
Penalized composite likelihood for colored graphical Gaussian models2
Local support vector machine based dimension reduction2
Residuals and diagnostics for multinomial regression models2
A family of mixture models for biclustering2
Sketched Stochastic Dictionary Learning for large‐scale data and application to high‐throughput mass spectrometry2
Estimation of disease progression for ischemic heart disease using latent Markov with covariates2
Ensemble learning for score likelihood ratios under the common source problem2
Emulated order identification for models of big time series data2
Comparison of machine learning approaches used to identify the drivers of Bakken oil well productivity2
Next waves in veridical network embedding*2
Bilateral‐WeightedOnline Adaptive Isolation Forest for anomaly detection in streaming data2
Data‐drivensparse partial least squares2
Stratified learning: A general‐purpose statistical method for improved learning under covariate shift1
A note on marginal correlation based screening1
Understanding the merits of winning data competition solutions for varied sets of objectives1
Multi‐node Expectation–Maximization algorithm for finite mixture models1
Out‐of‐bag stability estimation for k‐means clustering1
Learning network event sequences using long short‐term memory and second‐order statistic loss1
Portability analysis of data mining models for fog events forecasting1
A new parametric approach to gender gap with application to EUSILC data in Poland and Italy1
Adaptive boosting for ordinal target variables using neural networks1
Analyzing relevance vector machines using a single penalty approach1
Confidence bounds for threshold similarity graph in random variable network1
A novel Bayesian method for variable selection and estimation in binary quantile regression1
On difference‐based gradient estimation in nonparametric regression1
Semi‐supervised multi‐label learning with missing labels by exploiting feature‐label correlations1
1
Considerations in Bayesian agent‐based modeling for the analysis of COVID‐19 data1
1
Kernel learning with nonconvex ramp loss1
High‐dimensional classification based on nonparametric maximum likelihood estimation under unknown and inhomogeneous variances1
Predictive models with end user preference1
A practical extension of the recursive multi‐fidelity model for the emulation of hole closure experiments1
A random forest approach for interval selection in functional regression1
Boosting diversity in regression ensembles1
Model selection with bootstrap validation1
1
Expert‐in‐the‐loop design of integral nuclear data experiments1
Weighted AutoEncoding recommender system1
1
Development and validation of models for two‐week mortality of inpatients with COVID‐19 infection: A large prospective cohort study1
Issue Information1
Factor analysis for high‐dimensional time series: Consistent estimation and efficient computation1
Nonparametric clustering of RNA‐sequencing data1
Conformal Multi‐Target Hyperrectangles0
Randomized multiarm bandits: An improved adaptive data collection method0
Multivariate contaminated normal mixture regression modeling of longitudinal data based on jointmean‐covariancemodel0
A neutral zone classifier for three classes with an application to text mining0
Issue Information0
Simplicial depth: Characterization and reconstruction0
0
0
Application of the Cox proportional hazards model and competing risks models to critical illness insurance data0
Spatially‐correlated time series clustering using location‐dependent Dirichlet process mixture model0
Modeling matrix variate time series via hidden Markov models with skewed emissions0
Issue Information0
Gaussian process selections in semiparametric multi‐kernel machine regression for multi‐pathway analysis0
Robust deep neural network surrogate models with uncertainty quantification via adversarial training0
Evaluation and interpretation of driving risks: Automobile claim frequency modeling with telematics data0
Bayesian relative composite quantile regression approach of ordinal latent regression model with L1/2 regularization0
Individualized image region detection with total variation0
Neural interval‐censored survival regression with feature selection0
Issue Information0
Two‐sample testing for random graphs0
Issue Information0
Issue Information0
A finely tuned deep transfer learning algorithm to compare outsole images0
Study of a bounded interval perks distribution with quantile regression analysis0
Issue Information0
Rarity updated ensemble with oversampling: An ensemble approach to classification of imbalanced data streams0
Doubly robust estimation for non‐probability samples with modified intertwined probabilistic factors decoupling0
Estimating basis functions in massive fields under the spatial mixed effects model0
Driving mode analysis—How uncertain functional inputs propagate to an output0
Issue Information0
Issue Information0
Issue Information0
Online learning for streaming data classification in nonstationary environments0
Semiparametric estimation of average treatment effects in observational studies0
Issue Information0
Categorical classifiers in multiclass classification with imbalanced datasets0
Efficient importance sampling imputation algorithms for quantile and composite quantile regression0
Issue Information0
Bayesian inference for nonprobability samples with nonignorable missingness0
0
A modified least angle regression algorithm for interaction selection with heredity0
Association rules and decision rules0
0
Hub‐aware random walk graph embedding methods for classification0
Sequential metamodel‐based approaches to level‐set estimation under heteroscedasticity0
Residual's influence index (RINFIN), bad leverage and unmasking in high dimensionalL2‐regression0
The generalized hyperbolic family and automatic model selection through the multiple‐choiceLASSO0
Issue Information0
Hierarchy‐assisted gene expression regulatory network analysis0
Multivariate Gaussian RBF‐net for smooth function estimation and variable selection0
Towards accelerating particle‐resolved direct numerical simulation with neural operators0
Smart data augmentation: One equation is all you need0
An ImprovedD2GAN‐based oversampling algorithm for imbalanced data classification0
Biclustering high‐frequency financial time series based on information theory0
0
A new formulation of sparse multiple kernel k$$ k $$‐means clustering and its applications0
Data‐driven stochastic model for quantifying the interplay between amyloid‐beta and calcium levels in Alzheimer's disease0
Compositional variable selection in quantile regression for microbiome data with false discovery rate control0
Issue Information0
Corrigendum0
Issue Information0
An auxiliary Part‐of‐Speech tagger for blog and microblog cyber‐slang0
Regression‐based Bayesian estimation and structure learning for nonparanormal graphical models0
The analysis of association rules: Latent class analysis0
A machine learning oracle for parameter estimation0
Issue Information0
Lq regularization for fair artificial intelligence robust to covariate shift0
Non‐uniform active learning for Gaussian process models with applications to trajectory informed aerodynamic databases0
Regrouped design in privacy analysis for multinomial microdata0
0
0
0
Feature screening of ultrahigh dimensional longitudinal data based on the C‐statistic0
Issue Information0
An Efficient Filtering Approach for Model Estimation in Sparse Regression0
A novel two‐step extrapolation‐insertion risk model based on the Expectile under the Pareto‐type distribution0
Error‐controlled feature selection for ultrahigh‐dimensional and highly correlated feature space using deep learning0
Sparse Bayesian variable selection in high‐dimensional logistic regression models with correlated priors0
A deep learning factor analysis model based on importance‐weighted variational inference and normalizing flow priors: Evaluation within a set of multidimensional performance assessments in youth elite0
Randomized algorithms for tensor response regression0
Some Bayesian biclustering methods: Modeling and inference0
0
Prior effective sample size for exponential family distributions with multiple parameters0
0
Characterizing climate pathways using feature importance on echo state networks0
Robust multitask learning in high dimensions under memory constraint0
Bag of little bootstraps for massive and distributed longitudinal data0
Modeling subpopulations for hierarchically structured data0
0
Issue Information0
A deep learning approach for the comparison of handwritten documents using latent feature vectors0
Approximation error ofFourierneural networks0
Distributed dimension reduction with nearly oracle rate0
A network model that combines latent factors and sparse graphs0
Modeling and inference for mixtures of simple symmetric exponential families of ‐dimensional distributions for vectors with binary coordinates0
Application of nonparametric quantifiers for online handwritten signature verification: A statistical learning approach0
Nonparametric mean and variance adaptive classification rule for high‐dimensional data with heteroscedastic variances0
An initial exploration of Bayesian model calibration for estimating the composition of rocks and soils on Mars0
Bayesian modeling of location, scale, and shape parameters in skew‐normal regression models0
Marginal clustered multistate models for longitudinal progressive processes with informative cluster size0
0
Cost‐sensitive classification with time constraint on incomplete data0
0
Node Centrality Inference via Hypothesis Testing0
Erratum to “Data‐driven dimension reduction in functional principal component analysis identifying the change‐point in functional data”0
Issue Information0
Weighted validation of heteroscedastic regression models for better selection0
Input‐response space‐filling designs incorporating response uncertainty0
Bayesian shrinkage models for integration and analysis of multiplatform high‐dimensional genomics data0
CLADAG 2019 Special Issue: Selected Papers on Classification and Data Analysis0
Local influence analysis for the sliced average third‐moment estimation0
Transfer learning under the Cox model with interval‐censored data0
Adversarially robust subspace learning in the spiked covariance model0
Issue Information0
Assessment of the real‐time pattern recognition capability of machine learning algorithms0
On an Empirical Likelihood Based Solution to the Approximate Bayesian Computation Problem0
Issue Information0
Issue Information0
The finite mixture model for the tails of distribution: Monte Carlo experiment and empirical applications0
0
0
Revisiting Winnow: A modified online feature selection algorithm for efficient binary classification0
Subsampling from features in large regression to find “winning features”0
Share density‐based clustering of income data0
Subsampling under distributional constraints0
Quantifying Epistemic Uncertainty in Binary Classification via Accuracy Gain0
Semiparametric detection of changepoints in location, scale, and copula0
CLADAG 2021 special issue: Selected papers on classification and data analysis0
Imputed quantile vector autoregressive model for multivariate spatial–temporal data0
0
Density estimation via measure transport: Outlook for applications in the biological sciences0
0
eRPCA: Robust Principal Component Analysis for Exponential Family Distributions0
Bayesian batch optimization for molybdenum versus tungsten inertial confinement fusion double shell target design0
An automated alignment algorithm for identification of the source of footwear impressions with common class characteristics0
A new logarithmic multiplicative distortion for correlation analysis0
Nonparametric Bayesian functional clustering with applications to racial disparities in breast cancer0
Issue Information0
Issue Information0
Multi‐scale affinities with missing data: Estimation and applications0
0
A treeless absolutely random forest with closed‐form estimators of expected proximities0
Neural‐networktransformation models for counting processes0
Integrative learning of structuredhigh‐dimensionaldata from multiple datasets0
Retracted: Multi‐model penalized regression0
0
0
Issue Information0
Identifying build orientation of 3D‐printed materials using convolutional neural networks0
0.045409917831421