Approximation bounds for hierarchical clustering: average linkage, bisecting k-means, and local search
Hierarchical clustering is a data analysis method that has been used for decades. Despite its widespread use, the method has an underdeveloped analytical foundation. Having a well understood foundation would both support the currently used methods and ...
The Brier score under administrative censoring: problems and a solution
The Brier score is commonly used for evaluating probability predictions. In survival analysis, with right-censored observations of the event times, this score can be weighted by the inverse probability of censoring (IPCW) to retain its original ...
Bayesian spiked Laplacian graphs
In network analysis, it is common to work with a collection of graphs that exhibit heterogeneity. For example, neuroimaging data from patient cohorts are increasingly available. A critical analytical task is to identify communities, and graph Laplacian-...
Efficient structure-preserving support tensor train machine
An increasing amount of the collected data are high-dimensional multi-way arrays (tensors), and it is crucial for efficient learning algorithms to exploit this tensorial structure as much as possible. The ever present curse of dimensionality for high ...
Cluster-specific predictions with multi-task Gaussian processes
A model involving Gaussian processes (GPs) is introduced to simultaneously handle multitask learning, clustering, and prediction for multiple functional data. This procedure acts as a model-based clustering method for functional data as well as a ...
AutoKeras: an AutoML library for deep learning
To use deep learning, one needs to be familiar with various software tools like TensorFlow or Keras, as well as various model architecture and optimization best practices. Despite recent progress in software usability, deep learning remains a highly ...
On distance and kernel measures of conditional dependence
Measuring conditional dependence is one of the important tasks in statistical inference and is fundamental in causal discovery, feature selection, dimensionality reduction, Bayesian network learning, and others. In this work, we explore the connection ...
A relaxed inertial forward-backward-forward algorithm for solving monotone inclusions with application to GANs
We introduce a relaxed inertial forward-backward-forward (RIFBF) splitting algorithm for approaching the set of zeros of the sum of a maximally monotone operator and a single-valued monotone and Lipschitz continuous operator. This work aims to extend ...
Sampling random graph homomorphisms and applications to network data analysis
A graph homomorphism is a map between two graphs that preserves adjacency relations. We consider the problem of sampling a random graph homomorphism from a graph into a large network. We propose two complementary MCMC algorithms for sampling random graph ...
A line-search descent algorithm for strict saddle functions with complexity guarantees
We describe a line-search algorithm which achieves the best-known worst-case complexity results for problems with a certain "strict saddle" property that has been observed to hold in low-rank matrix optimization problems. Our algorithm is adaptive, in ...
Optimal strategies for reject option classifiers
In classification with a reject option, the classifier is allowed in uncertain cases to abstain from prediction. The classical cost-based model of a reject option classifier requires the rejection cost to be defined explicitly. The alternative bounded-...
Learning-augmented count-min sketches via Bayesian nonparametrics
The count-min sketch (CMS) is a time and memory efficient randomized data structure that provides estimates of tokens' frequencies in a data stream of tokens, i.e. point queries, based on random hashed data. A learning-augmented version of the CMS, ...
Adaptation to the range in K-armed bandits
We consider stochastic bandit problems with K arms, each associated with a distribution supported on a given finite range [m,M]. We do not assume that the range [m,M] is known and show that there is a cost for learning this range. Indeed, a new trade-off ...
Python package for causal discovery based on LiNGAM
Causal discovery is a methodology for learning causal graphs from data, and LiNGAM is a well-known model for causal discovery. This paper describes an open-source Python package for causal discovery based on LiNGAM. The package implements various LiNGAM ...
Extending adversarial attacks to produce adversarial class probability distributions
Despite the remarkable performance and generalization levels of deep learning models in a wide range of artificial intelligence tasks, it has been demonstrated that these models can be easily fooled by the addition of imperceptible yet malicious ...
Globally-consistent rule-based summary-explanations for machine learning models: application to credit-risk evaluation
We develop a method for understanding specific predictions made by (global) predictive models by constructing (local) models tailored to each specific observation (these are also called "explanations" in the literature). Unlike existing work that "...
Learning mean-field games with discounted and average costs
We consider learning approximate Nash equilibria for discrete-time mean-field games with stochastic nonlinear state dynamics subject to both average and discounted costs. To this end, we introduce a mean-field equilibrium (MFE) operator, whose fixed ...
An inertial block majorization minimization framework for nonsmooth nonconvex optimization
In this paper, we introduce TITAN, a novel inerTIal block majorizaTion minimizAtioN framework for nonsmooth nonconvex optimization problems. To the best of our knowledge, TITAN is the first framework of block-coordinate update method that relies on the ...
Regularized joint mixture models
Regularized regression models are well studied and, under appropriate conditions, offer fast and statistically interpretable results. However, large data in many applications are heterogeneous in the sense of harboring distributional differences between ...
Interpolating classifiers make few mistakes
This paper provides elementary analyses of the regret and generalization of minimum-norm interpolating classifiers (MNIC). The MNIC is the function of smallest Reproducing Kernel Hilbert Space norm that perfectly interpolates a label pattern on a finite ...
Graph-aided online multi-kernel learning
Multi-kernel learning (MKL) has been widely used in learning problems involving function learning tasks. Compared with single kernel learning approach which relies on a preselected kernel, the advantage of MKL is its exibility results from combining a ...
Lower bounds and accelerated algorithms for bilevel optimization
Bilevel optimization has recently attracted growing interests due to its wide applications in modern machine learning problems. Although recent studies have characterized the convergence rate for several such popular algorithms, it is still unclear how ...
Bayesian data selection
Insights into complex, high-dimensional data can be obtained by discovering features of the data that match or do not match a model of interest. To formalize this task, we introduce the "data selection" problem: finding a lower-dimensional statistic--...
Calibrated multiple-output quantile regression with representation learning
We develop a method to generate predictive regions that cover a multivariate response variable with a user-specified probability. Our work is composed of two components. First, we use a deep generative model to learn a representation of the response that ...
Discrete variational calculus for accelerated optimization
Many of the new developments in machine learning are connected with gradient-based optimization methods. Recently, these methods have been studied using a variational perspective (Betancourt et al., 2018). This has opened up the possibility of ...
Generalization bounds for noisy iterative algorithms using properties of additive noise channels
Machine learning models trained by different optimization algorithms under different data distributions can exhibit distinct generalization behaviors. In this paper, we analyze the generalization of models trained by noisy iterative algorithms. We derive ...
The SKIM-FA kernel: high-dimensional variable selection and nonlinear interaction discovery in linear time
Many scientific problems require identifying a small set of covariates that are associated with a target response and estimating their effects. Often, these effects are nonlinear and include interactions, so linear and additive methods can lead to poor ...
Impact of classification difficulty on the weight matrices spectra in deep learning and application to early-stopping
Much recent research effort has been devoted to explain the success of deep learning. Random Matrix Theory (RMT) provides an emerging way to this end by analyzing the spectra of large random matrices involved in a trained deep neural network (DNN) such ...
HiClass: a Python library for local hierarchical classification compatible with Scikit-learn
HiClass is an open-source Python library for local hierarchical classification entirely compatible with scikit-learn. It contains implementations of the most common design patterns for hierarchical machine learning models found in the literature, that is,...
Attacks against federated learning defense systems and their mitigation
The susceptibility of federated learning (FL) to attacks from untrustworthy endpoints has led to the design of several defense systems. FL defense systems enhance the federated optimization algorithm using anomaly detection, scaling the updates from ...