Goals
- Create a library of modular recipes (parameterized devops pipeline templates) which could then be composed to create custom end to end CI/CD pipelines for Machine Learning
- To learn & teach fundamentals
Why care? To sustain business benefits of Machine learning across any organization, we need to bring in discipline, automation & best practices. Enter MLOps.
Approach
- Minimalistic: Focus is on clean, understandable pipeline & code
- Modular: Atomic recipes that could be referred and reused (e.g recipe: Deploy to production after approval)
Status: Project board
Technologies: Azure Machine Learning & Azure Devops
Technical Aspects
It is fine if you do not understand this yet - there will be discussions in the workshop (todo: add detailed notes)
- Fully CI/CD YAML based multistage pipeline (does not use classic release pipelines in Azure devops)
- Use YAML based variables template (no need to configure variable groups through UI)
- Gated releases (manual approvals)
- CLI based MLOps: use Azure ML CLI from Devops pipelines as a mechanism for interacting with the ML platform. Simple and clean.
Get Started
- Understand what we are trying to do (below section + workshop discussion)
- Setup the environment
- Run an end to end MLOps pipeline
Note: Automated builds based on code/asset changes have been disabled by setting triggers: none in the pipelines. The reason is to avoid triggering accidental builds during your learning phase.
The above diagram illustrates a possible end to end MLOps scenario. Our current Build-Release pipeline has a subset: Training ➡️ Approval ➡️ Model Registration ➡️ Package ➡️ Deploy in test ➡️ Approval ➡️ Deploy to Production
Notes on our Base scenario:
- Directory Structure
mlops_pipelinescontains the devops pipelines- The EnvCreatePipeline.yml is a devops pipeline that will provision all the components in the cloud
- The BasicBuildRelease.yml is a devops pipeline that would perform the subset of steps mentioned above (Training to Deployment in Test)
codedirectory has the source code for training and scoring. This will be used by Azure ML to create docker images to perform training & scoring.datasetdirectory contains the german credit card dataset
- Training: For training we use a simple LogisticRegression model on the German Credit card dataset. We build sklearn pipeline that does feature engineering. We export the whole pipeline as a the model binary (pkl file).
- We use Azure ML CLI as a mechanism for interacting with Azure ML due to simplicity reasons.
More documentation will follow.
Acknowledgements
- MLOpsPython python repo was one of the inspirations for this - thanks to the contributors
- German Creditcard Dataset
Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.