Statistical Analysis and Data Mining

Papers
(The median citation count of Statistical Analysis and Data Mining is 0. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2020-04-01 to 2024-04-01.)
ArticleCitations
Optimal ratio for data splitting173
GRATIS: GeneRAting TIme Series with diverse and controllable characteristics61
Generalized mixed‐effects random forest: A flexible approach to predict university student dropout23
Unsupervised random forests17
Supervised compression of big data17
Multiclass machine learning classification of functional brain images for Parkinson's disease stage prediction16
Modal linear regression models with multiplicative distortion measurement errors13
Imbalanced classification: A paradigm‐based review13
Data Twinning12
Fourier neural networks as function approximators and differential equation solvers10
A linear time method for the detection of collective and point anomalies10
Two‐stage hybrid learning techniques for bankruptcy prediction*10
Exponential calibration for correlation coefficient with additive distortion measurement errors9
Delaunay triangulation‐based spatial colocation pattern mining without distance thresholds9
An efficient k‐modes algorithm for clustering categorical datasets8
Multivariate Hidden Markov Models for disease progression8
Weighted k‐nearest neighbor based data complexity metrics for imbalanced datasets8
A comparison of Gaussian processes and neural networks for computer model emulation and calibration8
Measure inducing classification and regression trees for functional data7
Weighted pivot coordinates for partial least squares‐based marker discovery in high‐throughput compositional data7
A clustering method for graphical handwriting components and statistical writership analysis7
Power grid frequency prediction using spatiotemporal modeling7
Handwriting identification using random forests and score‐based likelihood ratios6
The fairness‐accuracy Pareto front6
A machine learning method for selection of genetic variants to increase prediction accuracy of type 2 diabetes mellitus using sequencing data5
A framework for stability‐based module detection in correlation graphs5
Visual diagnostics of an explainer model: Tools for the assessment of LIME explanations5
Use of data mining in a two‐step process of profiling student preferences in relation to the enhancement of English as a foreign language teaching5
A tutorial on generative adversarial networks with application to classification of imbalanced data5
Markov chain to analyze web usability of a university website using eye tracking data5
An analytical toast to wine: Using stacked generalization to predict wine preference5
An adaptive nonparametric exponentially weighted moving average control chart with dynamic sampling intervals5
MR plot: A big data tool for distinguishing distributions4
Feature selection for imbalanced data with deep sparse autoencoders ensemble4
The next wave: We will all be data scientists4
Precision aggregated local models4
Tracking clusters and anomalies in evolving data streams4
Trees, forests, chickens, and eggs: when and why to prune trees in a random forest4
Extreme ensemble of extreme learning machines4
A fast and efficient Modal EM algorithm for Gaussian mixtures4
Parallel coordinate order for high‐dimensional data3
Intuitively adaptable outlier detector3
Knot selection in sparse Gaussian processes with a variational objective function3
Survival trees based on heterogeneity in time‐to‐event and censoring distributions using parameter instability test3
Learning compact physics‐aware delayed photocurrent models using dynamic mode decomposition3
Traditional kriging versus modern Gaussian processes for large‐scale mining data3
Factor analysis of mixed data for anomaly detection3
Specifying composites in structural equation modeling: A refinement of the Henseler–Ogasawara specification3
Ensembled sparse‐input hierarchical networks for high‐dimensional datasets3
A tree‐based gene–environment interaction analysis with rare features3
An approach to characterizing spatial aspects of image system blur3
Model‐based clustering of time‐dependent categorical sequences with application to the analysis of major life event patterns3
Classification of high‐dimensional electroencephalography data with location selection using structured spike‐and‐slab prior2
Online embedding and clustering of evolving data streams2
Frequentist model averaging for zero‐inflated Poisson regression models2
Negative binomial graphical model with excess zeros2
Sketched Stochastic Dictionary Learning for large‐scale data and application to high‐throughput mass spectrometry2
Evaluating causal‐based feature selection for fuel property prediction models2
Penalized composite likelihood for colored graphical Gaussian models2
Adaptive batching for Gaussian process surrogates with application in noisy level set estimation2
The future of precision health is data‐driven decision support2
Coefficient tree regression for generalized linear models2
SURE estimates for high dimensional classification2
Cluster analysis via random partition distributions2
Clover plot: Versatile visualization in nonparametric classification*2
Minimizing information loss in shared data: Hiding frequent patterns with multiple sensitive support thresholds2
Emulated order identification for models of big time series data2
Comparison of machine learning approaches used to identify the drivers of Bakken oil well productivity2
Next waves in veridical network embedding*2
Sample selection bias in evaluation of prediction performance of causal models2
1
Buckley–James estimation of generalized additive accelerated lifetime model with ultrahigh‐dimensional data1
Factor analysis for high‐dimensional time series: Consistent estimation and efficient computation1
Predictive models with end user preference1
Boosting diversity in regression ensembles1
A practical extension of the recursive multi‐fidelity model for the emulation of hole closure experiments1
Simplicial depth and its median: Selected properties and limitations1
Local support vector machine based dimension reduction1
Comparison of merging strategies for building machine learning models on multiple independent gene expression data sets1
Scalable network estimation with L0 penalty1
Weighted AutoEncoding recommender system1
Coupled support tensor machine classification for multimodal neuroimaging data1
Data‐driven dimension reduction in functional principal component analysis identifying the change‐point in functional data1
Kernel learning with nonconvex ramp loss1
Bilateral‐Weighted Online Adaptive Isolation Forest for anomaly detection in streaming data1
1
Estimation of disease progression for ischemic heart disease using latent Markov with covariates1
Ensemble learning for score likelihood ratios under the common source problem1
Machine learning and neural network based model predictions of soybean export shares from US Gulf to China1
Complementary dimension reduction1
Objective identification of local spatial structure for material characterization1
On difference‐based gradient estimation in nonparametric regression1
1
1
A family of mixture models for biclustering1
A new parametric approach to gender gap with application to EUSILC data in Poland and Italy1
Issue Information1
Nonparametric clustering of RNA‐sequencing data1
Data‐driven sparse partial least squares1
Analyzing relevance vector machines using a single penalty approach1
A novel Bayesian method for variable selection and estimation in binary quantile regression1
Confidence bounds for threshold similarity graph in random variable network1
A general iterative clustering algorithm1
Understanding the merits of winning data competition solutions for varied sets of objectives1
Model selection with bootstrap validation1
Multi‐node Expectation–Maximization algorithm for finite mixture models1
Learning network event sequences using long short‐term memory and second‐order statistic loss1
A study of the impact of COVID‐19 on the Chinese stock market based on a new textual multiple ARMA model1
Development and validation of models for two‐week mortality of inpatients with COVID‐19 infection: A large prospective cohort study1
A new formulation of sparse multiple kernel k$$ k $$‐means clustering and its applications0
Semi‐supervised multi‐label learning with missing labels by exploiting feature‐label correlations0
Issue Information0
0
Weighted linear programming discriminant analysis for high‐dimensional binary classification0
0
Feature screening of ultrahigh dimensional longitudinal data based on the C‐statistic0
0
Modeling and inference for mixtures of simple symmetric exponential families of ‐dimensional distributions for vectors with binary coordinates0
Issue Information0
Composite quantile‐based classifiers0
Association rules and decision rules0
Issue Information0
0
Modeling matrix variate time series via hidden Markov models with skewed emissions0
Online learning for streaming data classification in nonstationary environments0
0
Simplicial depth: Characterization and reconstruction0
Neural‐network transformation models for counting processes0
0
Stratified learning: A general‐purpose statistical method for improved learning under covariate shift0
The finite mixture model for the tails of distribution: Monte Carlo experiment and empirical applications0
An auxiliary Part‐of‐Speech tagger for blog and microblog cyber‐slang0
High‐dimensional classification based on nonparametric maximum likelihood estimation under unknown and inhomogeneous variances0
Nonparametric Bayesian functional clustering with applications to racial disparities in breast cancer0
Biclustering high‐frequency financial time series based on information theory0
Issue Information0
Doubly robust estimation for non‐probability samples with modified intertwined probabilistic factors decoupling0
Residual's influence index (RINFIN), bad leverage and unmasking in high dimensional L2‐regression0
Evaluation and interpretation of driving risks: Automobile claim frequency modeling with telematics data0
Rarity updated ensemble with oversampling: An ensemble approach to classification of imbalanced data streams0
Issue Information0
Issue Information0
Driving mode analysis—How uncertain functional inputs propagate to an output0
Adaptive boosting for ordinal target variables using neural networks0
Application of nonparametric quantifiers for online handwritten signature verification: A statistical learning approach0
0
Error‐controlled feature selection for ultrahigh‐dimensional and highly correlated feature space using deep learning0
Local influence analysis for the sliced average third‐moment estimation0
Input‐response space‐filling designs incorporating response uncertainty0
Issue Information0
0
Issue Information0
Estimating basis functions in massive fields under the spatial mixed effects model0
A modified least angle regression algorithm for interaction selection with heredity0
0
Robust deep neural network surrogate models with uncertainty quantification via adversarial training0
Share density‐based clustering of income data0
Efficient importance sampling imputation algorithms for quantile and composite quantile regression0
0
Issue Information0
Bag of little bootstraps for massive and distributed longitudinal data0
Multi‐scale affinities with missing data: Estimation and applications0
Application of the Cox proportional hazards model and competing risks models to critical illness insurance data0
Issue Information0
Distributed dimension reduction with nearly oracle rate0
Expert‐in‐the‐loop design of integral nuclear data experiments0
A network model that combines latent factors and sparse graphs0
Study of a bounded interval perks distribution with quantile regression analysis0
Considerations in Bayesian agent‐based modeling for the analysis of COVID‐19 data0
Smart data augmentation: One equation is all you need0
0
Categorical classifiers in multiclass classification with imbalanced datasets0
Issue Information0
eRPCA: Robust Principal Component Analysis for Exponential Family Distributions0
Bayesian inference for nonprobability samples with nonignorable missingness0
0
Regrouped design in privacy analysis for multinomial microdata0
Hierarchy‐assisted gene expression regulatory network analysis0
Modeling subpopulations for hierarchically structured data0
Erratum to “Data‐driven dimension reduction in functional principal component analysis identifying the change‐point in functional data”0
Issue Information0
0
An initial exploration of Bayesian model calibration for estimating the composition of rocks and soils on Mars0
Lq regularization for fair artificial intelligence robust to covariate shift0
Sparse Bayesian variable selection in high‐dimensional logistic regression models with correlated priors0
0
0
Issue Information0
Weighted validation of heteroscedastic regression models for better selection0
A novel two‐step extrapolation‐insertion risk model based on the Expectile under the Pareto‐type distribution0
Issue Information0
CLADAG 2021 special issue: Selected papers on classification and data analysis0
Issue Information0
Spatially‐correlated time series clustering using location‐dependent Dirichlet process mixture model0
Integrative learning of structured high‐dimensional data from multiple datasets0
0
Randomized algorithms for tensor response regression0
Regression‐based Bayesian estimation and structure learning for nonparanormal graphical models0
Issue Information0
0
Issue Information0
Issue Information0
Portability analysis of data mining models for fog events forecasting0
Issue Information0
A note on marginal correlation based screening0
Issue Information0
Subsampling from features in large regression to find “winning features”0
0
An Improved D2GAN‐based oversampling algorithm for imbalanced data classification0
Compositional variable selection in quantile regression for microbiome data with false discovery rate control0
0
Retracted: Multi‐model penalized regression0
Hub‐aware random walk graph embedding methods for classification0
Semiparametric detection of changepoints in location, scale, and copula0
Multivariate Gaussian RBF‐net for smooth function estimation and variable selection0
The generalized hyperbolic family and automatic model selection through the multiple‐choice LASSO0
A machine learning oracle for parameter estimation0
Non‐uniform active learning for Gaussian process models with applications to trajectory informed aerodynamic databases0
A neutral zone classifier for three classes with an application to text mining0
Marginal clustered multistate models for longitudinal progressive processes with informative cluster size0
A deep learning approach for the comparison of handwritten documents using latent feature vectors0
Adversarially robust subspace learning in the spiked covariance model0
0
0
Residuals and diagnostics for multinomial regression models0
Approximation error of Fourier neural networks0
Subsampling under distributional constraints0
A deep learning factor analysis model based on importance‐weighted variational inference and normalizing flow priors: Evaluation within a set of multidimensional performance assessments in youth elite0
Identifying build orientation of 3D‐printed materials using convolutional neural networks0
Out‐of‐bag stability estimation for k‐means clustering0
Imputed quantile vector autoregressive model for multivariate spatial–temporal data0
A finely tuned deep transfer learning algorithm to compare outsole images0
Issue Information0
An automated alignment algorithm for identification of the source of footwear impressions with common class characteristics0
Bayesian modeling of location, scale, and shape parameters in skew‐normal regression models0
Multivariate contaminated normal mixture regression modeling of longitudinal data based on joint mean‐covariance model0
CLADAG 2019 Special Issue: Selected Papers on Classification and Data Analysis0
Some Bayesian biclustering methods: Modeling and inference0
Corrigendum0
Issue Information0
Issue Information0
0.092108011245728