Mohammad Al-Fares, Virginia Beauregard, Kevin Grant, Angus Griffith, Jahangir Hasan, Chen Huang, Quan Leng, Jiayao Li, and Alexander Lin, Google; Zhuotao Liu, Tsinghua University; Ahmed Mansy, Google; Bill Martinusen, Formerly at Google; Nikil Mehta, Jeffrey C. Mogul, Andrew Narver, and Anshul Nigham, Google; Melanie Obenberger, Formerly at Google; Sean Smith, Databricks; Kurt Steinkraus, Sheng Sun, Edward Thiele, and Amin Vahdat, Google
Automated management of a physical network's lifecycle is critical for large networks. At Google, we manage network design, construction, evolution, and management via multiple automated systems. In our experience, one of the primary challenges is to reliably and efficiently manage change in this domain -- additions of new hardware and connectivity, planning and sequencing of topology mutations, introduction of new architectures, new software systems and fixes to old ones, etc.
We especially have learned the importance of supporting multiple kinds of change in parallel without conflicts or mistakes (which cause outages) while also maintaining parallelism between different teams and between different processes. We now know that this requires automated support.
This paper describes some of our network lifecycle goals, the automation we have developed to meet those goals, and the change-management challenges we encountered. We then discuss in detail our approaches to several specific kinds of change management: (1) managing conflicts between multiple operations on the same network; (2) managing conflicts between operations spanning the boundaries between networks; (3) managing representational changes in the models that drive our automated systems. These approaches combine both novel software systems and software-engineering practices.
USENIX ATC '23 Open Access Sponsored by
King Abdullah University of Science and Technology (KAUST)
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.
This content is available to:
author = {Mohammad Al-Fares and Virginia Beauregard and Kevin Grant and Angus Griffith and Jahangir Hasan and Chen Huang and Quan Leng and Jiayao Li and Alexander Lin and Zhuotao Liu and Ahmed Mansy and Bill Martinusen and Nikil Mehta and Jeffrey C. Mogul and Andrew Narver and Anshul Nigham and Melanie Obenberger and Sean Smith and Kurt Steinkraus and Sheng Sun and Edward Thiele and Amin Vahdat},
title = {Change Management in Physical Network Lifecycle Automation},
booktitle = {2023 USENIX Annual Technical Conference (USENIX ATC 23)},
year = {2023},
isbn = {978-1-939133-35-9},
address = {Boston, MA},
pages = {635--653},
url = {https://www.usenix.org/conference/atc23/presentation/al-fares},
publisher = {USENIX Association},
month = jul
}