0% found this document useful (0 votes)

15 views12 pages

Atm IEEE BIgData-9

Uploaded by

Praveen Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views12 pages

Atm IEEE BIgData-9

Uploaded by

Praveen Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

ATM: A distributed, collaborative, scalable system for automated machine learning

Thomas Swearingen∗ , Will Drevo† , Bennett Cyphers† , Alfredo Cuesta-Infante‡ , Arun Ross∗ and Kalyan Veeramachaneni†

∗ Departmentof Computer Science and Engineering

Michigan State University, East Lansing, Michigan 48824
Email: swearin3,rossarun@cse.msu.edu

† Laboratory for Information and Decision Systems

MIT
Cambridge, MA - 02139
Email: drevo, bcyphers, kalyanv@mit.edu

‡ UniversidadRey Juan Carlos, Madrid, Spain

Email: alfredo.cuesta@urjc.es

Abstract—In this paper, we present Auto-Tuned Models, requires numerous hyperparameters to be set before learning
or ATM, a distributed, collaborative, scalable system for a model. A model’s performance can also be judged by many
automated machine learning. Users of ATM can simply upload possible metrics, including accuracy, precision, recall, and
a dataset, choose a subset of modeling methods, and choose
to use ATM’s hybrid Bayesian and multi-armed bandit op- F1-score.
timization system. The distributed system works in a load- It is important to note that this last step is abstracted away
balanced fashion to quickly deliver results in the form of from domain-specific intricacies by relying on a domain-
ready-to-predict models, confusion matrices, cross-validation agnostic input data format, the feature matrix. This makes it
results, and training timings. By automating hyperparameter
a good target for automation. The past few years have seen
tuning and model selection, ATM returns the emphasis of the
machine learning workflow to its most irreducible part: feature the development of an overwhelming number of automatic
engineering. We demonstrate the usefulness of ATM on 420 method selection and hyperparameter tuning algorithms and
datasets from OpenML and train over 3 million classifiers. Our systems1 [1], [2], [3], each purporting to be better then the
initial results show ATM can beat human-generated solutions other in its ability to search the space. Data scientists hoping
for 30% of the datasets, and can do so in 1/100th of the time.
to take advantage of these advances are once again lost in
an enormous option space: which AutoML method should
I. I NTRODUCTION one use?
One of the most common data science problems is that At the same time, several pragmatic data science needs
of deriving a predictive model from a labeled set of raw remain chronically unaddressed. These are: (a) support for
training data. From the beginning, a data scientist working parallel exploration, in which multiple data scientists may
on such a problem is flooded with choices and contingencies. simultaneously search through the model space for the same
The data set may comprise images, text, relational data, or a dataset but different pipelines (or even different datasets), (b)
mix of these types. Pre-processing it often requires a number an ability to extend the model search space, and algorithms
of steps, including data cleaning, feature engineering, and that allow for simultaneously searching through methods and
feature selection, before model training can even begin. their hyperparameters, and (c) abstractions for bringing to-
These pre-processing steps are often domain-specific, and gether these numerous search approaches, so data scientists
the space of possible options for them is vast. Formulating can explore multiple search methods as needed.
the correct set of steps, also known as a pipeline, requires In this paper, we present Auto-Tuned Models, or ATM, a
iteration, collaboration and even verification. multi-user machine learning service that can be hosted on
In most cases, the last step in the pipeline is building and any distributed computing infrastructure, whether a cloud
learning a model, given a feature matrix and labels. This or a cluster. ATM is a multi-method, multi-parameter, self-
last step presents its own choices and challenges, as there optimizing machine learning system for the automation
are now an overwhelming number of ways to learn a model. of classification modeling and hyperparameter tuning. By
Over the years, many methods for classification have been enabling the simultaneous execution of multiple methods
developed, including support vector machines (svm), neural
networks (nn), bayesian networks (bn), deep neural net- 1 which now define a sub-area within the machine learning community,
works (dnn), and deep belief networks (dbn). Each method called AUTO ML
40
and providing a suite of best possible models, ATM returns 31.91

Percentage
the emphasis to the part of the machine learning workflow 30 27.66

that has proven most irreducible: feature engineering. 20

While commercial solutions can attempt to solve some 10

of the aforementioned problems, we believe this technology
0
should be open source, extensible, community-driven, and Grid GP/Bandit
widely available. This has motivated us to develop abstrac-
Figure 1: The percentage of datasets out of 47 for which
tions, systems, and contributory frameworks in addition to
ATM models outperform the first 500 human-in-the-loop
the actual search algorithms. We imagine our system will
attempts.
have dual effects. First, it will demystify AUTO ML by
providing standardized abstractions in a common library
where continually improving approaches are integrated – 300

Time (Days)
243.6
similar to the way that scikit-learn has demystified and 200
democratized machine learning. Second, it will provide a
100
trusted open-source solution for enterprises to save time and
5 · 10−4 8 · 10−3
resources and bring back focus to the impactful, domain- 0
specific aspects of the data science endeavor: formulating Human-500 Grid GP/Bandit
the precise machine learning task and transforming their data
Figure 2: The average time required for ATM to find
into information-rich features.
a classifier that outperforms the first 500 human-in-the-
Summary of results: We provided ATM with 420 datasets loop attempts. The Human-500 bar shows the average time
downloaded from the OpenML platform and trained models needed for the best-performing classifier to be found by an
for several days. OpenML user in the first 500 attempts.
– ATM generated the largest repository of trained
models: We learned a total of 3 million models across
these 420 datasets, and generated a total of 4 terabytes II. ATM OVERVIEW
of files consisting of models and their performance We describe our system from the point of view of an end
metrics. We will release this data in a structured way user. We imagine ATM drawing various categories of users,
and provide APIs to access it. To the best of our as detailed below:
knowledge, this will be the largest repository of trained Data scientists using ATM can simply upload a dataset,
models available to date. select their desired methods, and choose hyperparameter
– ATM is able to beat human-generated solutions for ranges to search over. 3 They can decide to use either of
30% of the datasets: Once the dataset is available on ATM’s two optimization/model search/selection methods:
the OpenML platform, data scientists can download the (1) a hybrid Bayesian and multi-armed bandit optimization
data, try different classifiers on their local machines, system, or (2) a model recommender system that works by
and upload their results. At the time of writing this exploiting the modeling techniques’ previous performances
paper, we were able to extract the first 500 submissions on a variety of datasets. Multiple users can submit multiple
made to 47 datasets using their APIs. Figure 1 shows the disparate datasets to be run simultaneously. They can also
percentage of datasets ATM was able to beat the best submit a different version of the same dataset – for example,
amongst these first 500 by using simple grid search or one that has undergone a different set of preprocessing and
by an intelligent search technique called GP/Bandit. transformation steps. After submitting the dataset, the user
– ATM is able to perform in 1/100th of the time: can begin getting results (models) via streaming. The system
For every dataset for which ATM beats the human- works in a load-balanced fashion to quickly deliver results
submitted solution, we calculated the time difference in the form of ready-to-predict models, confusion matrices,
between the first and the best submission made by cross-validation results, and training timing. We describe this
humans to the OpenML platform (best within the first in detail in Section V.
500). On average, this difference was 243 days 2 . A AutoML experts can contribute newer search methods by
similar calculation reveals that ATM’s grid search can following the abstractions we define for tuning hyperparame-
generate the solution that beats the best human solution ters and selecting methods. In our current library, we propose
within a few minutes. and use two different methods, and provide abstractions so
that anyone can modify and propose new ones. We describe
2 We note that there was no particular incentive for humans to submit
or enhance solutions. We also acknowledge that we do not know the skill 3 Scaling and dimensionality reduction techniques are available to aid in
level of the humans who submitted these solutions. the discovery process.
Upload Dataset
one metric, even though the user may be more concerned
with a different metric.
Storage
The preponderance of existing work focuses on the
classifier part of the pipeline, which accepts as its input
.. .

Results Worker
Register a set of features with labels. Thornton, et al. released

.. .
..
the first major AutoML system, Auto-WEKA [1]. They
develop the Combined Algorithm Selection and Hyperpa-
Worker rameter (CASH) objective function, and use two tree-based
Bayesian optimization methods (SMAC and TPE) to solve
Figure 3: Overview of the ATM system. A user uploads their the problem. With the goal of creating an AutoML system to
dataset to storage and registers it to the ModelHub database. remove the parameter tuning, they build their system around
The ATM workers query ModelHub for classifiers to run and WEKA [4]. WEKA itself is a suite of machine learning
report the results back to the ModelHub. algorithms intended for a large audience, with the caveat
that parameters must be tuned using a trial and error process,
which may be difficult for people with limited knowledge
this in detail in Section IV
of machine learning to grasp. With the addition of Auto-
Machine learning enthusiasts who know more about each
WEKA, the parameter estimation stage of the ML pipeline
of the individual methods used in our library can enhance
is automated, and machine learning is more accessible to
them by (a) proposing appropriate ranges for their hy-
non-experts.
perparameters and (b) proposing newer methods or better
Feurer, et al. later proposed the auto-sklearn system,
implementations for them.
which makes two substantive improvements to Auto-
The contributions of our work include:
WEKA [2]. First, they add a meta-learning step to initialize
– An asynchronous, distributed system that simulta- the parameters, which they find improves the final classi-
neously generates multiple classifiers (for multiple fication performance. Second, they do not discard models
datasets), records classification results (including per- once trained and tested on the data, but instead store them
formance, models, and meta-information) and logs any and use them all to build an ensemble classifier at the end.
errors as they occur. They replace WEKA with sklearn, which has become the
– An innovative organization of the complex search space dominant machine learning software package. They also
of classifiers. This allows users to extend this space to follow the CASH model developed in Auto-WEKA, but
include newer classification approaches in a structured only use the SMAC Bayesian optimization method, as it
way. performed better than TPE [1].
– A hierarchical algorithm that allows for search at the There have been other approaches to and extensions of
method level and tunes hyperparameters. AutoML. Salvador, et al. propose one such extension, for
– Abstractions to enable the contribution and integration time-varying data [5]. They combine Bayesian optimiza-
of disparate AUTO ML techniques while simultaneously tion with four different adaptive strategies, which leads to
testing them. improved predictive performance in the majority of test
– A database repository of several hundred thousand datasets. Olson, et al. eschew the CASH objective function
classifiers trained on hundreds of datasets. and instead evolve the machine learning pipeline using
– An open source software release at: https://github.com/ genetic algorithms [3]. They demonstrate the “evolved”
HDI-Project/ATM machine learning pipeline’s potential, and show that for
III. R ELATED W ORK some datasets, it can even outperform primitive machine
learning. This is especially interesting because there is no
There are a few popular existing AutoML packages, strict, human-derived pipeline in the problem formulation –
such as Auto-WEKA [1], auto-sklearn [2], and TPOT [3]. instead, the pipeline is learned by the genetic algorithm.
Each package has its own proprietary model search/tuning One drawback of AutoML can be the cost of performing
method and set of hyperparameters. In general, these current multiple runs of the training stage. Some works circumvent
AutoML routines present a few problems. First, they are this issue by using a meta-learning approach to recommend
developed as single-user systems run on local machines, a classifier using a process called algorithm selection [6].
as opposed to a multi-user systems run in a cloud-based There is an offline setup stage where each base-AutoML
environment. Second, they do not track the current state of procedure is run on a set of preselected datasets. A classifier
runs, including previously run experiments, results of those is then recommended for an unseen query dataset. They
experiments, parameters for those experiments, etc. Third, find some correlation between the query dataset and the
they do not organize the solution space in an intelligent, preselected set of databases with performances based on
search-supporting way. Fourth, they focus tuning solely on a few randomly selected classifier/parameter pairs on the
svm svm

c c
Kernel Kernel

poly
Φ
Figure 4: Hyperparameters and hyperpartitions within support vector machines. The leftmost diagram shows the full CPT,
while the subfigures in the center and on the right show hyperpartitions 1 and 2. In hyperpartion 1, degree (φ) is a tunable
parameter that is conditioned on the kernel being polynomial. In hyperpartion 2, γ and intercept are tunable parameters
that are conditioned on the kernel type being sigmoid. Thus, kernel and soft margin are method hyperparameters, while
degree, γ, and intercept are conditional hyperparameters.

query dataset. Abdulrahman, et al. evaluate this type of work method hyperparameters: For a given modeling method,
when there is missing meta information [7]. several method hyperparameters must be specified
There have also been many studies which target specific no matter what. Hyperparameters may be categorical
parts of the AutoML pipeline. Dewancker, et al. improve choices, such as which kernel or which transfer
the AutoML pipeline with more sophisticated acquisition function to use, discrete-ordinal choices like number
functions [8]. Kim, et al. propose candidate parameter sets of epochs and number of hidden layers, and
with a Random Space Partitioning Optimizer [9]. Kim, et continuous valued choices like that of learning rate
al. improve the scalability of a structure discovery algo- and soft-margin.
rithm [10]. Li, et al. discretize the entire hyperparameter conditional hyperparameters: Some hyperparameters
space and then apply a multi-armed bandit scheme to select must only be specified if certain choices for method-
the hyperparameters [11]. level hyperparameters are made. These are conditional
hyperparameters. For example, when a Gaussian
IV. D EFINING THE SEARCH SPACE
kernel is chosen for svm, a value for σ – the standard
In order to explore the space of possible model pipelines deviation of the kernel – needs to be specified.
for a given problem, we must first have a way to pro-
grammatically enumerate the sets of hyperparameters which We define two concepts to help explain the hyperparameter
describe the models. This is difficult, because hyperparam- search space.
eters are hierarchical, and values selected at higher levels Definition 1: A conditional parameter tree (CPT) ex-
affect which choices must be made at lower levels. At presses the combined hyperparameter space for a given
the highest level, the choice of which modeling method modeling method as a search tree. The root node of the CPT
to use affects which hyperparameters must be specified for identifies the model-building method, and each node below
that model. For example, a support vector machine (SVM) the root represents a hyperparameter. Certain hyperparameter
is described by a different set of hyperparameters than a nodes also act as branches, with child nodes beneath them.
decision tree. Furthermore, certain hyperparameters for a Branch nodes are always categorical variables; the value
given model affect which other hyperparameters must be chosen for a branch node determines which conditional
chosen. Consider a data scientist who is training an SVM. hyperparameters below it must be specified. Nodes which
First she must choose which kernel to use. If she chooses do not have children are leaves. Method hyperparameters
a polynomial kernel, she must then specify the value for always descend directly from the root, while conditional
a discrete-ordinal hyperparameter degree. If she chooses a hyperparameters always descend from branch nodes.
sigmoid kernel instead, she must specify values for two A CPT can be traversed to enumerate all possible models
additional continuous hyperparameters, γ and intercept. for a given method, essentially defining its search space.
Therefore, choosing a set of hyperparameters is not as simple Figure 4 shows a typical CPT for a support vector machine.
as sampling from a fixed vector space, even for a given type There are two method hyperparameters – a continuous
of model. We define three levels of hyperparameter: valued soft margin, and a categorical valued kernel. The
methods: At the highest level, a modeling method must kernel is represented as a branch node. The choice of
be selected first. ATM includes support for several kernel determines which conditional hyperparameters must
common modeling methods, including support vector be specified. If kernel is polynomial, degree must be
machines, deep belief networks, random forests, and specified; if sigmoid is chosen, both gamma and intercept
naive Bayes classifiers. Table I lists the supported must be specified instead.
methods. Definition 2: A hyperpartition is a subset of a CPT which
Table I: Table showing the list of methods, their hyperparameters (“m-hyp”), and conditional hyperparameters (“c-hyp”).
The table also shows the number of hyperpartitions per method (“# hyp”). Categorical hyperparameters have the set of
possible values, all other hyperparameters are continuous. Abbreviations used in the table are expanded as follows: SVM ←
support vector machine, RF← random forest, ET ← extreme trees, DT← decision tree, SGD ← stochastic gradient descent,
PA ← passive aggressive, KNN← k-nearest neighbors, LR ← logistic regression, GNB ← gaussian naive bayes, MNB←
multinomial naive bayes, BNB← bernoulli naive bayes, GP← gaussian process, MLP← multi-layer perceptron, DBN←
deep belief network.
Method m-hyp c-hyp # hyp
SVM kernel={linear, rbf, sigmoid, polynomial}, C gamma, coef0, degree 4
RF criterion={gini, entropy}, min-samples-leaf, max-features, min-samples-split, max-depth None 2
ET criterion={gini, entropy}, min-samples-leaf, max-features, min-samples-split, max-depth None 2
DT criterion={gini, entropy}, min-samples-leaf, max-features, min-samples-split, max-depth None 2
SGD loss={hinge, modified-huber, log, squared-hinge}, learning-rate={optimal, constant}, None 48
alpha, fit-intercept={False, True}, eta0, n-iter, penalty={l1, l2, elasticnet}, l1-ratio
PA loss={hinge, squared-hinge}, C, n-iters None 2
KNN weights={uniform, distance}, n-neighbors, algorithm={brute, kd-tree, ball-tree }, p, leaf-size 24
metric={euclidean, manhattan, minkowski, chebyshev}
LR penalty ={l1, l2}, fit-intercept= {True, False}, C, tol 4
GNB None None 1
MNB alpha, fit-prior None 1
BNB binarize, alpha, fit-prior None 1
GP kernel= {constant, rbf, matern, rational-quadratic, exp-sine-squared} nu, length-scale, alpha, periodicity 5
MLP num-hidden-layers= {1, 2, 3}, activation= {relu, logistic, identity, tanh}, layer-1-hidden-size, beta-1, 84
solver= {lbfgs, sgd, adam}, alpha layer-2-hidden-size, beta-2,
layer-3-hidden-size, learning-rate=
{constant, invscaling, adaptive},
learning-rate-init
DBN num-hidden-layers= {1, 2, 3}, epochs, learn-rates, output-activation-function= {softmax, layer-1-hidden-size, layer-2-hidden-size, 12
sigmoid, linear, tanh}, learn-rate-decays, learn-rate-pretrain layer-3-hidden-size,

consists of choices for all branch nodes on a path from the vector machine CPT.
root to the leaves. A hyperpartition fully defines a set of
tunable conditional hyperparameters. V. ATM: AUTO -T UNED M ODELS

These abstractions allow us to break the process for model We present ATM, a distributed, collaborative, scalable
selection into two steps. First is hyperpartition selection, system for hyperparameter tuning and model generation.
in which the CPT is traversed and a value is specified ATM consists of a cluster of workers which train and test
for every branch node encountered. Next is hyperparameter predictive models with different sets of hyperparameters and
tuning. Once a hyperpartition has been chosen, the remaining sync up with a master database, the ModelHub. ATM can
unspecified parameters can be selected from a fixed vector process multiple prediction problems and multiple datasets
space. The vector space may have continuous elements at once. ATM’s worker cluster is persistent: a data scientist
as well as discrete elements, and is likely to have some can decide to upload specifications for a variation of an
combination of both. The process of selecting hyperpa- unfinished problem or a different problem altogether while
rameters from the space defined by a hyperpartition can ATM is running, and workers will adopt the task on the fly.
be optimized with Gaussian Processes and other sampling
A. Components
methods, as described in section VI. An example of a full
set of hyperparameter choices is as follows: ATM comprises the following components:
{ Worker cluster: A worker is a persistent process which
" method " : " svm " , runs on a compute instance (in our tests, an EC2 instance).
" method - hyperparameters " : {
" c " : 0.87 ,
Workers execute model training and communicate with the
" kernel " : " polynomial " rest of the system by appending to a central, immutable
}, log in the ModelHub, as described below. When activated,
" conditional - hyperparameters " : {" degree ": 3}
}
a worker queries the ModelHub for a task to work on. It
chooses an incomplete datarun from the database based on
Figure 4 shows two distinct hyperpartitions for a support the dataruns’ priority scores, then queries for the results of
all classifiers that have already been trained on the datarun. Each hyperpartition is linked to a datarun.
It runs hyperpartition selection to choose a set of partition • classifiers: A classifier represents a model trained
hyperparameters, then runs hyperparameter tuning to select on a dataset with a single, fully-qualified set of hy-
values for the remaining unspecified hyperparameters. Once perparameters. It is also the smallest logical unit of
it has chosen a full set of hyperparameters, it trains and tests work that can be assigned to a worker. Once a classifier
a classifier to produce a model and various performance is finished, model location and metrics location
metrics. It saves the model and the metrics to storage (in point to URLs in the cloud where the model and metrics
our test, Amazon S3) and logs the results to the ModelHub. objects for the classifer are stored. The status column
We implemented all workers in Python 2.7, and we used can be set to ”started”, ”completed”, or ”errored”; if it
the scikit-learn library for specific classifier method is set to the last, an error message is also stored.
implementations. Algorithm 1 presents the algorithmic logic To summarize, when a data scientist wants to solve a
within a worker node. prediction problem with ATM, he or she will upload a set of
ModelHub: The ModelHub is a central database that stores data and create a corresponding entry in the datasets table.
information about data sets, task configurations, hyperparti- They will then create a new datarun which includes con-
tions, and previously-trained classification models, including figurations for budget, priority, and hyperparameter selection
links to the models themselves and information about their and links to the dataset. The script for registering a datarun
performance metrics. Our implementation of ModelHub uses automatically registers each possible hyperpartition based
a MySQL database in conjunction with Amazon S3 to store on the datarun’s configuration. Once the datarun has been
this information. The ModelHub does not perform any actual registered, workers will automatically begin training, testing,
computation on the classifiers; its role is to provide a single, and saving classifiers for the datarun, and will continue
consistent log which all of the workers can query and with until the datarun’s budget has been exhausted.
which they can sync their results. The data structures it holds
are described in more detail below. Algorithm 1 ATMWorker
The ModelHub database contains four tables, each repre-
1: procedure ATMW ORKER
senting a logical data structure. Together they describe all the 2: while True do
classifiers that have been tested for as long as the ModelHub 3: D = ModelHub.GetDataruns()
has been running. Figure 5 shows the database schema, and 4: for d ∈ D do
5: if IsBudgetExceeded(d) then
the tables are summarized below. 6: D =D\d
• datasets: A dataset represents a set of train/test data. 7: end if
8: end for
The table contains its name, paths to train and test 9: if D = {∅} then
data files (if applicable), and the name of the column 10: Sleep() . Wait for dataruns to be added
containing the label. All datasets must be in tabular 11: Continue
12: end if
format. Currently, our implementation only accepts data 13: d = SelectPriorityDatarun(D)
in csv format. 14: ᾱjp ←S EARCH(d.dataset id, ModelHub.classifiers)
• dataruns: A datarun is the logical abstraction for 15: Mp ← L EARN C LASSIFIER(ᾱjp )
a single run of a machine-learning problem. The ta- 16: ypj ← E VALUATE(Mp d.dataset id)
ble stores configuration and settings specified by the 17: ModelHub.StoreResult(Mp , ypj ,ᾱjp )
18: end while
user, including the methods for hyperpartition selec- 19: end procedure
tion (e.g. ”UCB1”) and hyperparameter tuning (e.g.
“Gaussian Process”), and their associated parameters.
Each datarun has an associated budget, which tells the B. Outputs
worker how much work to perform. If budget type
== ‘‘time’’, the column budget amount bounds ATM generates three different types of files during its
the amount of wall clock time workers may spend on operation: data, models, and metrics. We used Amazon’s
the datarun; if budget type == ‘‘classifiers’’, S3 service for storage. Workers and end users can access
it bounds the total number of classifiers which may the S3 bucket with credentials established during the initial
be computed. The priority column tells the workers setup. Labeled data files must be provided by the user, but
which runs to work on first. ATM performs one-time preprocessing and cleaning on the
• hyperpartitions: A hyperpartition is a set of hy- data to convert it into a consistent csv format. For each data
perparameters which define a path through a CPT, and model pair dataset id, classifier id the following
as described in section IV. The table contains the additional elements are stored:
fixed values for the hyperparameters corresponding – Metrics: To enable post-hoc analysis of the models,
to the branch nodes in the CPT, as well as data we store a set of several metrics generated by each
about the unspecified conditional hyperparameters. round of a k-fold cross validation on the available
datasets hyperpartitions
dataset id hyperparition id
name datarun id
train path method
test path partition hyperparameter values
class column conditional hyperparameters
dataruns classifiers
datarun id classifier id
dataset id datarun id
hyperpartition selection scheme hyperpartition id
ts model location
hyperparameters tuning scheme metrics location
r minimum cv judgment metric
priority test judgment metric
start time hyperparameters values
end time start time
budget type end time
budget amount status
error message

Figure 5: Diagram showing the relational ModelHub database schema.

Table II: Metrics stored on S3 for each of the classifier finally subtract the standard deviation from the mean.
trained. PR ← Precision-recall, ROC ← Receiver operating This judgement metric is calculated for every fold and
characteristic, AUC ← Area under the curve an average of this statistic across all folds in stored in
Across folds For each fold
the databse.
– Final trained model: After performing cross-validation
Problem Judgement metric Curves Metric
testing on the dataset, the worker trains a final model
Binary F1 PR Accuracy
ROC F1 -score using all the training data available. This model is
AUC-PR serialized as a Python pickle (.pkl) file. The file is
AUC-ROC saved to the cloud and is uniquely identified using a
Multi 3, 4, or 5
hash of the method name, the hyperparameter settings,
µ(F1c ) − σ(F1c ) PR/class F1 /class
ROC/pair AUC/class
and the dataset path.
AUC/pair
VI. AUTOMATIC SEARCH
Big-Multi > 5
µ(F1c ) − σ(F1c ) - Rank-N Combining the search spaces of multiple modeling
F1 /class methodologies and hyperpartitions creates a number of chal-
lenges with regard to finding the best model, either in the
isolation of one methodology or from an hyperpartition. In
training data. This allows data scientists to assess particular, one encounters:
models based on the consistency of their performance
Discontinuity and non-differentiability: Categorical
as well as on their overall performance, and by a
hyperparameters make the search space non-
variety of different metrics. Data sets vary in number of
differentiable, and do not yield to simple search
classes, class balance, and number of training examples,
techniques (e.g. hill climbing) or to methods that rely
which makes it impossible to decide on one universal
on learning about the search space (e.g. Bayesian
“best” metric. While designing ATM, we consulted
optimization approaches).
several data science experts about what metrics would
Varying dimensions of the search space: By definition,
be best. There was no agreement about an ideal subset
conditional hyperparameters imply that the hyperparti-
of metrics, so we decided to include a wide range.
tions within a methodology have different dimensions.
The metrics computed for each classifier are shown in
Because choosing one categorical variable over another
Table II. The judgement metric is a statistic computed
can imply a different set of hyperparameters, the dimen-
across all folds and the average of this number is
sionality of a hyperpartition also varies.
stored in the ModelHub.classifiers database. For
Non-transferability of methodology performance: Un-
Binary class problem we use the F1 score achieved
fortunately, when conducting searches among modeling
for each fold and calculate its average. For multi class
methodologies, robust heuristics are limited. Training
problems, for each fold, (i) we calculate the F1 scores
on the dataset with an svm model gives us no informa-
for every pairing of classes, (ii) calculate the average
tion about how well a dbn might perform.
and standard deviation of these F1 scores and (iii) and
Table III: Notation table. context of ATM, UCB1 treats each hyperpartition as an
Symbol Description
arm with its own distribution of rewards. As time goes on,
the bandit learns more about the distribution, and balances
t = 1...T index into classifiers trained so far, ordered by time
i index into dataset exploration and exploitation by choosing the most promising
j = 1...J index into hyperpartitions hyperpartitions to form classifiers.
j
Yj = y1j . . . yT judgement metric for classifiers tried so far for Concept of Memory: Memory (moving window) strategies
jth hyperpartition
ᾱj a hyperparameter set for a classifier for jth hyper-
exist in order to change the bandit formulation when re-
partition ward distributions are non-stationary. The UCB1 algorithm
ᾱj1 . . . ᾱjT hyperparameter sets for classifiers tried so far for assumes that the underlying distribution of rewards at each
jth hyperpartition arm choice is stationary. In the context of ATM, during
zj reward for hyperpartition j
fj a meta model learned so far for hyperpartition j model optimization, as the meta model (GP) continues
ᾱjp proposed hyperparameters for hyperpartition j to learn about the hyperpartitions and is able to discover
parameterizations that continue to score better and better, it
Algorithm 2 ATM - Search algorithm essentially shifts the bandit’s perceived reward distribution.
The distributions of classifier performance from the set of
1: procedure S EARCH(Dataset ← Di , ModelHub.classifiers)
2: j
G ET judgement metric y1j . . . yT ∀j|Di all parameterizations within that hyperpartition have not
3: for j ∈ J do changed, but the bias with which the GP samples has
4: E VALUATE -R EWARD: zj ← Q(y1j . . . yT
j
) changed. This could effectively lead a hyperpartition to be
5: end for
6: S ELECT HYPERPARTITION:
selected based on stale rewards. Memory strategies have a
s parameter ts that determines how many previously trained
2 ln n classifiers to use to calculate the rewards.
s j = zj + (1)
nj Given the Modelhub data base, a worker first queries and
where j is hyperpartition index, zj is the reward gained from pulling aggregates the cross-validation scores y1j . . . yTj for all the
the j-th arm nj times, and n = J
P
j=1 nj over all J hyperpartitions. classifiers trained for a hyperpartition j thus far, and the
Select j = argmax sj (2) corresponding hyperparameters set for each of the classifiers
j given by ᾱ1j . . . ᾱTj . Treating each possible hyperpartition as
7: For selected j an arm, we next calculate the reward “accumulated” for each
8: G ET hyperparameters ᾱj1...T corresponding to y1...T
j
hyperpartition.
9: if T ≥ rminimum then
10: fj ← FIT((ᾱj1 , y1j ) . . . (ᾱjT , yT
j
) Calculation of reward Q: First a subset of classifiers,
11: ᾱjp ← PROPOSE(fj ) corresponding to the number ts , is selected by either picking
12: else the ts most recent trained classifiers, or the best ts classifiers
13: ᾱjp ← R ANDOM(j) from the ones trained so far for this hyperpartition. These
14: end if
15: return ᾱjp selected ts scores are denoted as ysj .
16: end procedure
Reward based on average: This is the traditional way
to define rewards. The reward is calculated by simply
averaging these scores, given by:
With a conditional parameter space framework in place,
|ys|
we now design automatic meta-learning techniques that 1 X j
iteratively select among hyperpartitions and tune hyperpa- zjT = ys (3)
|ts | i=1 i
rameters. In our system, we use bandit based method to
select amongst hyperpartitions and meta modeling technique with | · | being the cardinal of a set.
to tune hyperparameters for a given hyperpartition. Algo- Reward based on velocity: Velocity strategies seek to rank
rithm 2 presents the combined bandit-based selection and hyperpartitions by their rate of improvement. This is
metamodel-based tuning algorithm. based on the idea that a hyperpartition whose last few
A. Hyperpartition selection using bandit learning evaluations have made large improvements should be
exploited while it continues to improve. To calculate
We employ bandit learning strategies which model each
velocity we first sort scores in ys in ascending order.
hyperpartition as an arm in a multi-armed bandit (MAB)
Using velocity, the reward is changed to:
framework.
Concept of reward: Given that each arm, when pulled, pro- |ys|
vides a randomized reward drawn from a hidden underlying 1 X
z̄jT = ∆ysji (4)
distribution, the goal of a MAB problem is to decide which |ts | i=2
arm to pull to maximize long-term reward, re-evaluating
the decision after each pull’s reward is observed. In the for ∆ysji = ysji − ysji−1 .
Velocity strategies are inherently very powerful be- • compute rewards: Accepts a list of scores pertaining
cause they introduce a natural feedback mechanism to a single choice and returns a list of rewards.
that controls exploration and exploitation. A velocity • bandit select: Accepts a mapping of choices to lists
optimization strategy will explore each hyperpartition of rewards, and returns the choice which it believes will
arm until that arm begins improving less quickly than maximize the expected reward and minimize expected
others, going back and forth between hyperpartitions regret.
and wasting no time on hyperpartitions which are not • select: Accepts a mapping of choices to histori-
promising. cal performance scores, calls compute rewards and
Selection of a hyperpartition: Given the rewards for each of bandit select in sequence, and returns a single rec-
the arm, zj ∀j each arm (hyperpartitions) score is calculated ommended choice.
using Hyperparameter tuning involves taking a set of hyperpa-
s
2 ln n rameter vectors, ᾱ1...T , and a set of corresponding scores,
sj = z̄j + (5) y1...T , and generating a new hyperparameter vector which
nj
is expected to improve on the previously-achieved y-values.
where n is the total number of classifiers trained so far across For this problem, we define a Tuner interface with the
all hyperpartitions for this dataset and nj is the total number following methods:
of classifiers trained for this hyperpartition. Hyperpartition • fit: Accepts historical hyperparameter performance
is selected based on argmaxj sj data (ᾱ1...T and y1...T ) and trains a model which can
predict the performance of arbitrary hyperparameter
B. Hyperparmeter tuning using meta modeling vectors, e.g. a Gaussian process.
Within a hyperpartition, we employ a Gaussian Process • predict: Accepts a set of proposed hyperparameter

(GP) -based meta-modeling technique to identify the best vectors and returns predicted performance for each one.
hyperparameters given the performance of classifiers already • acquire: Accepts the predicted performances of a set of

built for that hyperpartition [12]. To accomplish this, we proposed hyperparameter vectors and returns the index
propose a Fit, Propose abstraction. of the vector which it believes an ATM worker should
Fit: In Fit, a meta model fj is learned between the try next.
• create candidates: Generates a list of random hy-
hyperparameter sets tried so far, ᾱ1j . . . ᾱTj , and their cor-
responding scores, y1j . . . yTj . perparameter vectors, to be passed to predict and then
Propose: In Propose, a new hyperparameter set ᾱp is to acquire.
• propose: Calls create candidates, predict, and
proposed. This usually involves creating candidate hyper-
parameter sets, making predictions using the learnt model acquire in sequence and returns a single proposed set
fj and choosing among those candidates by applying an of hyperparameters.
acquisition function to their predictions. ATM supports Our system includes several implementations of both
two acquisition functions: Expected Improvement, and Ex- Selectors and Tuners by default. However, we also expect
pected Improvement per Time [12]. that AUTO ML experts will want to expand and improve
upon our methods. An expert who wishes to contribute
C. Abstractions in our software can do so by creating a subclass of Selector or Tuner and
modifying one or more of the methods above. He or she can
One of the contributions of our work is to break the
then use our testing functionality to automatically evaluate
automatic machine learning process into two distinct sub-
its efficacy.
problems: hyperpartition selection (via multi-armed bandit
methods) and hyperparameter tuning (via Gaussian process- VII. DATASETS
based methods). While we provide several versions of both
To demonstrate ATM, we use datasets from OpenML [13].
bandit-based selection and GP-based tuning mechanisms,
OpenML is a website4 that allows users to upload datasets
there are many ways these approaches could be improved
for public viewing and track the results of various machine
upon for different circumstances.
learning pipelines. OpenML has over 19,000 datasets avail-
The problem of hyperpartition selection is essentially
able for download. The uploader specifies details about the
one of picking between multiple discrete choices, each of
dataset such as which attribute must be predicted or which, if
which is associated with a set of scores representing its past
any, attributes should be ignored, the authors, citation infor-
performance. Multi-armed bandit algorithms, used for this,
mation, collection date, or license information. Currently, the
usually involve first computing a set of rewards for each
datasets must be uploaded in Attribute-Relation File Format5
arm/choice, then using the set of all rewards to determine
which arm to pull next. To tackle this problem, we define a 4 www.openml.org

Selector interface with the following methods: 5 http://weka.wikispaces.com/ARFF

Num. Instances
(ARFF), a common structure for specifying the values in the
10 10
datasets as well as a common way of specifying feature types
(e.g., numeric or categorical), feature names, missing data 10 5

values, instance weights, or sparse data. 10 0

50 100 150 200 250 300 350 400
Additional details about a data science exercise are spec- Dataset ID

Num. Features
ified in OpenML in what are called “tasks.” A task defines 10 5
whether the end result is classification or regression, what
metrics must be returned, and what protocol to use (e.g., a
10 0
classification task would typically include cross-validation). 50 100 150 200 250 300 350 400
Dataset ID
A potential solution for a task is called a “flow” which is

Num. Classes
a machine learning pipeline. An example could be: Data 4
→ PCA → SVM → Report Predictive Accuracy. A flow
2
specifies default values for the hyperparameters of a pipeline.
50 100 150 200 250 300 350 400
OpenML tracks “runs,” which are flows applied to a Dataset ID
task – that is, a run is a fully-specified (i.e., with specific
hyperparameter values) machine learning pipeline applied
Figure 6: Graph showing the number of instances (top),
to a particular task. The hyperparameters may take on the
number of features (middle), and number of classes (bottom)
default value from the flow, or they may be updated by the
in each dataset. The datasets were sorted by number of
user to values he or she thinks are appropriate. This pipeline
instances.
is executed on a user’s personal computing hardware to
generate results as specified in the task (e.g., predictive accu-
racy or F-measure). These results are uploaded to OpenML
classifiers. Since there are 420 datasets, we attempt to learn
through the API. The ability to see previous performance for
420,000 classifiers. Each classifier was trained using 10-
a dataset is especially helpful, so that other users do not have
fold cross-validation, thus our goal is to learn a total of
to rerun flows to determine what does or does not yield good
4.2 million models.
performance. A user simply looks at the performance of
previous flows to generate a potential new flow or different GP+Bandit Run: We run ATM with Best ts Velocity
hyperparmeter values that he or she thinks may improve for hyperpartition selection and Gaussian Process-Expected
performance. Improvement for hyperparameter tuning. For each dataset,
A user can traverse the various runs through the API. A we set a budget of 100 classifiers. Since there are 420
user can search by dataset, flow, or uploader. If we search by datasets, we attempt to learn 42,000 classifiers.
dataset, we can find the best performing run for a particular
IX. R ESULTS AND DISCUSSION
dataset. Since OpenML flows are designed by people, we can
find a human baseline performance to compare our AutoML Figure 7 shows the F1-score histograms of the best-
results against. performing classifiers for all datasets – both for the Grid run
Selecting datasets for our experiment: We only select and the GP+Bandit run. In the GP+Bandit run, we attempted
datasets which (1) are active, (2) have no missing values, to learn 41,647 classifiers across all datasets where 39,342
and (3) have 2 to 4 classes. We exclude datasets which were successfully learned while 2,305 logged errors. For the
have missing values as incomplete-data classification is not Grid run, we attempted to learn 286,577 classifiers across
the goal of this work. After applying our filtering criteria, all datasets where 270,331 were successfully learned while
we downloaded 420 datasets from the OpenML website. 16,246 logged errors. Overall, we are able to achieve good
The datasets fall within a wide range of domains, including classification performance for many datasets.
cancer risk detection, diabetes detection, credit risk detec- Comparison to human baselines: Regardless of whether
tion, car acceptability, chess, tic-tac-toe, and spam detection. a Grid or GP+Bandit strategy was used, the process is
Figure 6 shows additional information about the datasets. automatic, with no human involvement. We next compare
this with a “human-in-the-loop” baseline. These baseline
VIII. E XPERIMENTAL SETTINGS
metrics come from the OpenML site itself. Because datasets
We attempted to run ATM on the datasets using two hosted on this site are publicly available, data scientists can
selection schemes: download the data, try different methods, and upload the best
Grid Run: In this configuration, the hyperpartitions are classifier and the performance metrics they achieved. Using
selected randomly. The hyperparameter range is split into their API, we collected the best possible F1-score submitted
3 values (start of range, end of range, and middle of range). for a dataset by anyone. Figure 8 shows a histogram of the
One of the three values is then chosen randomly for each best ATM performance minus the best human-discovered
hyperparameter. For each dataset, we set a budget of 1,000 performance for 47 datasets (as reported on OpenML).
Best-So-Far F1-score
100 100 0.8
80 80

0.7
60 60
Count

Count
40 40 0.6
20 20 Grid
0.5 GP+Bandit
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
F1-score F1-score 100 101 102
Iteration number
Figure 7: Histogram of the best F1-scores for each dataset
using the Grid configuration (left) and GP+Bandit configu- Figure 9: Comparing Grid and GP+Bandit strategies. The y
ration (right). axis is the average of the best-so-far F1-score achieved after
x iterations for each problem.
20 20

Best-So-Far F1-score
1
15 15
Count

Count

10 10
0.8

5 5

0 0 0.6
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1
F1-score Difference F1-score Difference 25 50 75 100
Iteration number
Figure 8: Histogram of the difference between the best F1-
Grid Bandit+GP
score for each dataset and the best “human-in-the-loop” F1-
score from OpenML for 47 datasets. The results for the Figure 10: Comparing Grid and GP+Bandit at specific
Grid configuration are on the left while the results for the iterations. The y axis is the best-so-far F1-score after x
GP+Bandit configuration are on the right. iterations.

Positive values occur when ATM finds a better-performing datasets at a time. The system can assess performance on
classifier than the best “human-in-the-loop” classifier up- a variety of metrics, store metrics and models for future
loaded to OpenML. Negative values occur when ATM does reference or analysis, and present the user with a final,
not find a better performing classifier than the best “human- optimized model to use for prediction. Users can configure
in-the-loop” classifier uploaded to OpenML. ATM to search the vast hyperparameter space using several
Comparing Grid and GP+Bandit: Our next question different strategies, and even to try multiple strategies at
targets whether an intelligent selection strategy that uses once in parallel.
GP+Bandit achieves better result within a given budget of Our main contribution was the design, implementation,
classifiers. To compare both strategies across all data sets, we and testing of the end-to-end system. In addition, we pre-
first calculated a best-so-far F1-score. Figure 9 compares the sented a novel method for organizing the hierarchical search
two selection methods across all datasets using this metric space of machine learning methods. We defined the condi-
(the line represents the average across all datasets). Both tional parameter tree and hyperpartition and demonstrated
selection strategies have a budget of 100 classifiers. We how these abstractions can be used to better traverse complex
see that GP+Bandit has a slight advantage over the random hyperparameter spaces.
strategy, in that it takes slightly fewer classifiers for it to get We designed several different search strategies and gave
to the same result. Figure 10 shows the same information at access to our end user through a simple API. To demonstrate
specific iterations. our system, we ran the largest-ever training experiment,
where we trained an aggregate of several million models
X. C ONCLUSION
across 420 data sets.
In this paper, we presented ATM, a distributed AUTO ML
system that can support multiple users, and search through XI. F UTURE WORK : MODEL RECOMMENDER SYSTEM
multiple machine learning methods. The system is decentral-
ized in that its many worker nodes work independently and Our current work provides a recommender approach, in
asynchronously, only interacting through a shared database. which past ATM results are used to recommend models for a
ATM can learn hundreds of classifiers each for hundreds of new dataset. We briefly describe this below, where we will
attempt to generate models for a dataset using the results R EFERENCES
from the remaining 419 datasets.
[1] C. Thornton, F. Hutter, H. H. Hoos, and K. Leyton-Brown,
1) Organize results of the Grid Run into a matrix, M of “Auto-WEKA,” Proceedings of the 19th ACM SIGKDD inter-
size 420 × 159, 838 where 159, 838 is the number of national conference on Knowledge discovery and data mining
every possible classifier definition (method and specific - KDD ’13, p. 847, 2013.
hyperparameter values) and 420 represents the number
of datasets on which classifiers have been trained in [2] M. Feurer, A. Klein, K. Eggensperger, J. Springenberg,
M. Blum, and F. Hutter, “Efficient and Robust Automated Ma-
ATM. M will be a very sparse matrix. chine Learning,” Advances in Neural Information Processing
2) M will be split into a row vector p and matrix G. The Systems 28, pp. 2944–2952, 2015.
probe row (p) is of size 1 × 159, 838 while the gallery
matrix G is of size 419 × 159, 838. [3] R. S. Olson and J. H. Moore, “TPOT: A Tree-based Pipeline
3) Sample the probe row p down so it only has 5 values Optimization Tool for Automating Machine Learning,” Pro-
ceedings of the Workshop on Automatic Machine Learning,
and store the new row vector in ps .
vol. 64, pp. 66–74, 2016.
4) Record the best-so-far performance µ = max (ps )
where max (·) reports the maximum value among all [4] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann,
the defined values and ignores the undefined
values. and I. H. Witten, “The WEKA data mining software,” ACM
G SIGKDD Explorations, vol. 11, no. 1, pp. 10–18, 2009.
5) Create a recommender matrix R = .
ps
6) Use Soft Imputation to complete the matrix and [5] M. Martin Salvador, M. Budka, and B. Gabrys, “Adapting
estimate values for the undefined values (R0 = Multicomponent Predictive Systems using Hybrid Adaptation
Strategies with Auto-WEKA in Process Industry,” AutoML at
SoftImpute (R)). ICML 2016, no. 2011, pp. 1–8, 2016.
7) Find the top 5 values from the last row in R0 (probe
row). [6] P. Brazdil, C. Giraud-Carrier, C. Soares, and R. Vilalta,
8) Run the classifiers corresponding to the top 5 values in Metalearning: Applications to data mining. Springer Science
Step 7. & Business Media, 2008.
9) Record the classifier performances in their correspond-
[7] S. M. Abdulrahman and P. Brazdil, “Effect of Incomplete
ing column in ps . Meta-dataset on Average Ranking Method,” Proceedings of
10) Update µ = max (ps ). the Workshop on Automatic Machine Learning, vol. 64, pp.
11) Repeat steps 5-10 20 times. 1–10, 2016.

[8] I. Dewancker, M. McCourt, S. Clark, P. Hayes, A. Johnson,

ACKNOWLEDGMENTS
and G. Ke, “A Strategy for Ranking Optimization Methods
Prof. Alfredo Cuesta-Infante is supported by the Span- using Multiple Criteria,” Workshop on Automated Machine
ish Government research funding TIN-2015-69542-C2- Learning (AutoML), ICML, pp. 1–12, 2016.
1-R(MINECO/FEDER). Prof. Arun Ross and Thomas
Swearingen acknowledge the support from award no. 2015- [9] J. Kim, “AutoML Challenge : AutoML Framework Using
Random Space Partitioning Optimizer,” Workshop on Auto-
R2-CX-0005, from the National Institute of Justice, Office mated Machine Learning (AutoML), ICML, pp. 1–4, 2016.
of Justice Programs, U.S. Department of Justice. Bennett
Cyphers and Kalyan Veeramachaneni acknowledge the sup- [10] H. Kim and Y. W. Teh, “Scalable Structure Discovery in Re-
port received from DARPA’s Data driven discovery of mod- gression using Gaussian Processes,” Workshop on Automated
els (D3M) program. Authors acknowledge comments and Machine Learning (AutoML), ICML, pp. 1–11, 2016.
feedback received from Dr. Una-May O’Reilly during the
[11] L. Li, K. Jamieson, G. DeSalvo, A. Rostamizadeh, and
early prototype development phase of the project. Authors A. Talwalkar, “Efficient Hyperparameter Optimization and
acknowledge comments received from James Max Kanter. Infinitely Many Armed Bandits,” arXiv preprint, 2016.
The opinions, findings, and conclusions or recommendations
expressed in this publication are those of the authors and do [12] J. Snoek, H. Larochelle, and R. P. Adams, “Practical bayesian
not necessarily reflect those of the Department of Justice, optimization of machine learning algorithms,” in Advances in
neural information processing systems, 2012, pp. 2951–2959.
DARPA, MIT or any other organization authors are affli-
ated to. Additionally, this work was supported in part by [13] J. Vanschoren, J. N. van Rijn, B. Bischl, and L. Torgo,
Michigan State University through computational resources “Openml: Networked science in machine learning,” SIGKDD
provided by the Institute for Cyber-Enabled Research. Explorations, vol. 15, no. 2, pp. 49–60, 2013.

ResearchPaper2 1 David Laredo
No ratings yet
ResearchPaper2 1 David Laredo
31 pages
Auto ML
No ratings yet
Auto ML
15 pages
ATM Management System
100% (1)
ATM Management System
25 pages
ATM Documentation: Release 0.0.1
No ratings yet
ATM Documentation: Release 0.0.1
25 pages
Taking The Human Out of Learning Applications: A Survey On Automated Machine Learning
No ratings yet
Taking The Human Out of Learning Applications: A Survey On Automated Machine Learning
20 pages
REF-10-Automated Machine Learning The New Wave of Machine Learning
No ratings yet
REF-10-Automated Machine Learning The New Wave of Machine Learning
8 pages
A Review On Automated Machine Learning (AutoML) Systems
No ratings yet
A Review On Automated Machine Learning (AutoML) Systems
6 pages
Automatic Problem-Specific
No ratings yet
Automatic Problem-Specific
53 pages
ATM Documentation: Release 0.0.1
No ratings yet
ATM Documentation: Release 0.0.1
25 pages
2024 AutoML Past, Present and Future
No ratings yet
2024 AutoML Past, Present and Future
82 pages
Information 15 00039
No ratings yet
Information 15 00039
20 pages
Evaluation and Comparison of AutoML Approaches and Tools
No ratings yet
Evaluation and Comparison of AutoML Approaches and Tools
9 pages
AutoML A Survey of State-Of-The-Art
No ratings yet
AutoML A Survey of State-Of-The-Art
33 pages
Python/R Projects: Energy & Health AI
No ratings yet
Python/R Projects: Energy & Health AI
8 pages
15 NIPS Auto Sklearn Preprint
No ratings yet
15 NIPS Auto Sklearn Preprint
9 pages
AutoML: Simplifying ML for All
No ratings yet
AutoML: Simplifying ML for All
15 pages
Automated Machine Learning in Action 1st Edition Qingquan Song Full
100% (3)
Automated Machine Learning in Action 1st Edition Qingquan Song Full
143 pages
15 NIPS Auto Sklearn Poster
No ratings yet
15 NIPS Auto Sklearn Poster
1 page
Auto ML v21657563907199
No ratings yet
Auto ML v21657563907199
39 pages
AutoML - A Survey of The State-Of-The-Art
No ratings yet
AutoML - A Survey of The State-Of-The-Art
60 pages
AutoML Seminar Report for B-Tech
No ratings yet
AutoML Seminar Report for B-Tech
34 pages
Enabling Automated Machine Learning For Model-Driven AI Engineering
No ratings yet
Enabling Automated Machine Learning For Model-Driven AI Engineering
5 pages
1 s2.0 S2949715923000604 Main
No ratings yet
1 s2.0 S2949715923000604 Main
30 pages
Deep Learning Algorithm For Data Classification With Hyperparameter Optimization Method
No ratings yet
Deep Learning Algorithm For Data Classification With Hyperparameter Optimization Method
9 pages
ML-Plan: Automated Machine Learning Via Hierarchical Planning
No ratings yet
ML-Plan: Automated Machine Learning Via Hierarchical Planning
21 pages
A-Z Machine Learning Guide
No ratings yet
A-Z Machine Learning Guide
201 pages
Project Occupancy Alfonso Vicente Aragues
No ratings yet
Project Occupancy Alfonso Vicente Aragues
18 pages
Implementing Artificial Neural Network in Python From Scratch
No ratings yet
Implementing Artificial Neural Network in Python From Scratch
16 pages
Final Report
No ratings yet
Final Report
17 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
27 pages
Comparison of Neural Networks With Traditional Machine Learning Models
No ratings yet
Comparison of Neural Networks With Traditional Machine Learning Models
20 pages
Efficient Classification Algorithms Using Svms For Large Datasets
No ratings yet
Efficient Classification Algorithms Using Svms For Large Datasets
54 pages
60 ChatGPT Prompts For Data Science 2023
100% (3)
60 ChatGPT Prompts For Data Science 2023
67 pages
Deeplearning Ai
No ratings yet
Deeplearning Ai
64 pages
AKTUA399 Masteroppgave Fredrik Hjorth Bentsen
No ratings yet
AKTUA399 Masteroppgave Fredrik Hjorth Bentsen
84 pages
Osn Tensorflow2 210327175734
No ratings yet
Osn Tensorflow2 210327175734
23 pages
TPOT Automated Machine Learning in Python: 607K Followers Editors' Picks Features Deep Dives Grow Contribute About
No ratings yet
TPOT Automated Machine Learning in Python: 607K Followers Editors' Picks Features Deep Dives Grow Contribute About
43 pages
Auto ML Tool For Supervised Machine Learning Data
No ratings yet
Auto ML Tool For Supervised Machine Learning Data
11 pages
Automated Machine Learning A Survey of Tools and T
No ratings yet
Automated Machine Learning A Survey of Tools and T
6 pages
A Robust Deep Learning Model Fortomato Plant Leaf Disease Detection and Localization
No ratings yet
A Robust Deep Learning Model Fortomato Plant Leaf Disease Detection and Localization
17 pages
July4 SaketAnand FriendlyIntroToML
No ratings yet
July4 SaketAnand FriendlyIntroToML
84 pages
Image Classification Using SVM and CNN: March 2020
No ratings yet
Image Classification Using SVM and CNN: March 2020
6 pages
14 Oracle AutoML
No ratings yet
14 Oracle AutoML
15 pages
PDSeasonableSchool ML4PD
No ratings yet
PDSeasonableSchool ML4PD
135 pages
Autonomous Vehicle DDS via ML
No ratings yet
Autonomous Vehicle DDS via ML
39 pages
From Field Problems To Machine Learning
No ratings yet
From Field Problems To Machine Learning
51 pages
UCLA Electronic Theses and Dissertations: Title
No ratings yet
UCLA Electronic Theses and Dissertations: Title
43 pages
DR Antonio Gulli - A Collection of Advanced Data Science and Machine Learning Interview Questions Solved in Python and Spark (II) - Hands-On Big Data and Machine - Programming Interview Questions) (
No ratings yet
DR Antonio Gulli - A Collection of Advanced Data Science and Machine Learning Interview Questions Solved in Python and Spark (II) - Hands-On Big Data and Machine - Programming Interview Questions) (
112 pages
Automated Machine Learning
No ratings yet
Automated Machine Learning
10 pages
Deep Learning and P Deep Learning and Polar Transformation T Ansformation To Achie o Achieve A Novel
No ratings yet
Deep Learning and P Deep Learning and Polar Transformation T Ansformation To Achie o Achieve A Novel
284 pages
Top 14 Python AutoML Frameworks
No ratings yet
Top 14 Python AutoML Frameworks
3 pages
ML Models and Techniques
No ratings yet
ML Models and Techniques
12 pages
2012 Nikolaos Nikolaou MSC
No ratings yet
2012 Nikolaos Nikolaou MSC
102 pages
VI SEMESTER ML Syllubus Lab Theory r20
No ratings yet
VI SEMESTER ML Syllubus Lab Theory r20
4 pages
Unit 4 Iot Cloud
No ratings yet
Unit 4 Iot Cloud
46 pages
Machine Learning and Deep Learning Techniques Used in Cybersecurity and Digital Forensics: A Review
No ratings yet
Machine Learning and Deep Learning Techniques Used in Cybersecurity and Digital Forensics: A Review
91 pages
Carbon Accounting: Apa, Mengapa Dan Sudahkah Ber-Implikasi Pada Sustainability Reporting? (Based On 2012 TH 'Proper With Gold Rank)
No ratings yet
Carbon Accounting: Apa, Mengapa Dan Sudahkah Ber-Implikasi Pada Sustainability Reporting? (Based On 2012 TH 'Proper With Gold Rank)
9 pages
Business Strategy Execution Guide
No ratings yet
Business Strategy Execution Guide
9 pages
Buku Gelatin
No ratings yet
Buku Gelatin
129 pages
0000004421-Chap 9
No ratings yet
0000004421-Chap 9
42 pages
Silent Manor by Ryuunosuke Bearbrand
No ratings yet
Silent Manor by Ryuunosuke Bearbrand
8 pages
Proposal Thesis
No ratings yet
Proposal Thesis
15 pages
Power Electronics Course Guide
No ratings yet
Power Electronics Course Guide
1 page
Internship Report On Laborate Pharmaceuticals India Ltd.
71% (7)
Internship Report On Laborate Pharmaceuticals India Ltd.
21 pages
Mariam Chapter 2
No ratings yet
Mariam Chapter 2
13 pages
Packing List Izu 2
No ratings yet
Packing List Izu 2
4 pages
Word Transformation Exercise
No ratings yet
Word Transformation Exercise
3 pages
Muntha in Astrology Tips
0% (1)
Muntha in Astrology Tips
2 pages
Material Handling1
No ratings yet
Material Handling1
2 pages
Business Extract-Mokhatla Trading Pty Ltd-61704 (1) - 8
No ratings yet
Business Extract-Mokhatla Trading Pty Ltd-61704 (1) - 8
8 pages
Ways To Install Windows 11 - Microsoft Support
No ratings yet
Ways To Install Windows 11 - Microsoft Support
4 pages
Job Portal
No ratings yet
Job Portal
127 pages
GONAR-DTH Drilling Tools 2022-08
No ratings yet
GONAR-DTH Drilling Tools 2022-08
28 pages
35 KumaraSwamiyam 3bw PDF
100% (1)
35 KumaraSwamiyam 3bw PDF
7 pages
Plastic Hypodermic Needle Study
No ratings yet
Plastic Hypodermic Needle Study
163 pages
Muscular System Overview
No ratings yet
Muscular System Overview
19 pages
Amurru Akkadian - A Linguistic Study - Shlomo Izre'el - 1991
No ratings yet
Amurru Akkadian - A Linguistic Study - Shlomo Izre'el - 1991
257 pages
Complete Bundle Mecanique Pour Ingenieur Dynamique Volume 9th Edition HQ File
100% (1)
Complete Bundle Mecanique Pour Ingenieur Dynamique Volume 9th Edition HQ File
406 pages
British Society of Gastroenterology Guidelines On The Management of Irritable Bowel Syndrome
No ratings yet
British Society of Gastroenterology Guidelines On The Management of Irritable Bowel Syndrome
27 pages
APC 300 Service Manual
No ratings yet
APC 300 Service Manual
87 pages
The Top 50 Questions Kids Ask Susan Bartell PDF Download
100% (3)
The Top 50 Questions Kids Ask Susan Bartell PDF Download
57 pages
Raymond Caldwell: Theatre Educator & Director
No ratings yet
Raymond Caldwell: Theatre Educator & Director
9 pages
Co1 Science 1ST Quarter
No ratings yet
Co1 Science 1ST Quarter
4 pages
63b2 - Elizabeth Kusia Europass CV 19.01.17
No ratings yet
63b2 - Elizabeth Kusia Europass CV 19.01.17
4 pages
Hybrid Inverter Datasheet 600kW
No ratings yet
Hybrid Inverter Datasheet 600kW
2 pages
International Management Institute New Delhi
100% (1)
International Management Institute New Delhi
4 pages

Atm IEEE BIgData-9

Uploaded by

Atm IEEE BIgData-9

Uploaded by

ATM: A distributed, collaborative, scalable system for automated machine learning

∗ Departmentof Computer Science and Engineering

† Laboratory for Information and Decision Systems

‡ UniversidadRey Juan Carlos, Madrid, Spain

that has proven most irreducible: feature engineering. 20

While commercial solutions can attempt to solve some 10

Figure 5: Diagram showing the relational ModelHub database schema.

Selector interface with the following methods: 5 http://weka.wikispaces.com/ARFF

values, instance weights, or sparse data. 10 0

[8] I. Dewancker, M. McCourt, S. Clark, P. Hayes, A. Johnson,

You might also like