As shown above, Angel's core API classes, ordered by when (in general) they are called during model training, include:
- MLRunner
- MLRunner creates AngelClient with factory class based on conf, and calls AngelClient's interfaces in order according to the standard
trainprocess
- MLRunner creates AngelClient with factory class based on conf, and calls AngelClient's interfaces in order according to the standard
-
- Starts PSServer
- Initializes PSServer and loads empty model
- After training, saves the model to HDFS from multiple PSServers
-
- Starts
trainprocess when called by AngelClient
- Starts
-
- TrainTask calls
parseandpreProcessmethods to read data from HDFS, and assemble data into DataBlock that contains multiple LabeledData - TrainTask calls
trainmethod to create, and pass DataBlock to, the MLLearner object
- TrainTask calls
-
- MLLearner calls its own
learnmethod, reads DataBlock, computes the model delta, and pushes to / pull from PSServer through PSModel inside MLModel, eventually obtaining a complete MLModel
- MLLearner calls its own
-
- According to the algorithm's need, creates and holds multiple PSModels
-
- Encapsulates all the interfaces in AngelClient that communicate with PSServer, facilitating MLLearner calls
Understanding these core classes and processes will be quite helpful for implementing high performance machine-learning algorithms that can run on Angel.