The GRP-Automation program was created to modernize the core work that the GlobalNOC engages in and to then go beyond that, taking a leadership position in innovations related to operating networks. Our goal is to deliver benefits to our managed networks and partners in security, efficiency, MTTR, and other related areas.
The program initially has a two year charge. The Year One goal of the GRP Automation project is to automate 80% of the changes to the core L2/3 equipment for networks the GlobalNOC staff actively configure. Over the past several weeks we’ve talked to a large number of the internal staff and external customers to understand their concerns and challenges, as well as researching the changes being configured on the in-scope devices. This has helped get us to where we are today: a rough plan for the work we’re going to perform over the next twelve months to reach our goal. As with all visions of the future, it’s more clear in the next few months than it is six months out; we’re leaving ourselves open to the possibility that further talks with customers and users will change our direction.
The Y1 plan is divided into eight projects. Each project will consist of some underlying uplift in our base capabilities and then a use case deployment on each in-scope network, showcasing how it can be used. Throughout each project the GRP Automation team will be working closely with each team, assisting them in taking advantage of the uplift and helping shoulder the workload of the implementation. As the GRP Automation team moves on to the next project, it will remain available to the individual network teams as a resource to assist them in further taking advantage of the uplift.
Project 1 – “Set System”, beginning Monday, May 20
The router configs can be thought of as being divided in to two parts: the parts of the config that almost never change and those parts of the config that change frequently and/or are related to customer/peer turnups. The “system” stanza on Juniper routers is a good example of the static portions of the config; it seldom changes and it’s generally the same on every router on the network. There may be a few simple “variables”, such as the radius key or the syslog server, but it’s generally the same. Large portions of the config are essentially static and yet our research showed that it’s not uncommon for a part of that static to change. The loopback-in filter is updated, or a new feature like uRPF is enabled and a chunk of static config is inserted.
Our first project is related to these mostly static configuration sections … with some allowances for light variable replacement like “this is the syslog server to use for this set of devices” . The “set system” stanzas will be moved in to GIT with future changes to those sections made in GIT with AWX/Ansible pushing them out. This will allow for more consistency in these portions of the configuration, as well as assist in the deployment of these config changes. We’ll also be documenting and working with the networks standards related to change windows, role codes, release phasing, annotating changes, commit standards, and managing where the source of truth is for sections of the config: GIT, the database, or the router proper. Please know that training will be a part of every project. The GRP Automation team will be working with the networking staff to move the System stanzas, as well as other areas the staff would like assistance. It should be noted that the GRP Automation team is leveraging previous work in this area by the GlobalNOC staff Molly Balas and Jason Iannone.
Project 2 – Global Prefix lists, beginning Monday, June 17
Prefix lists are a big part of our lives when we work on backbone routers. The “global” lists manage access to the router, such as an ssh-in list, allowed SNMP hosts, or BOGON prefixes to reject when received by BGP. This GRP Automation project will enable these prefixes to be managed via the GlobalNOC database as the source of truth. Prefixes will be entered and tagged, in much the same way that they are entered to manage IP address space in the database. From there the global prefix lists can be pushed out to the routers automatically, ensuring that our prefix lists are exactly what we intend on every node and freeing us from the uncertainty of dropped lines, insertions, and other trouble that naturally derives from manually editing important ACL’s. This work leverages previous work in this area by GlobalNOC engineer Tom Johnson.
Project 3 – Per-Peer Prefix lists, beginning Monday, July 29
A large part of the normal work of a router is modifying the prefix lists for individual peers and customers. Be it for an interface ACL or an allowed prefix list, there is, in general, a significant number of routine changes in this area. This project will allow those prefix list changes to be managed from the GlobalNOC database, just as the previous “global” project. There is non-trivial work in data modeling associated with this project, to ensure that the database understand which list is associated with which BGP session and which peer. Integration with other tools is also possible, such as automated notifications of changes to a business office or integration to ticketing systems, etc. This is the first of the projects that drive “per peer” or “per customer” changes from the database; the second half of our “mostly the same on every router” vs “different on every router” config work. This work leverages previous work in this area by GlobalNOC engineer Tom Johnson.
Our Y1 work plan was formed quickly. The first three projects, representing about five months of work, are fairly well formed. As we move further into the future our crystal ball is hazier. We need further discussions, which will take place during these first five months, to solidify our projects further out. But this is, in general, where we think we want to go:
Project 4 – Campus, beginning Monday, September 9
It’s easy to get in to the habit of thinking that the GlobalNOC manages only backbones and related infrastructure. In fact, there are a significant number of GlobalNOC users that manage hundreds, if not thousands of devices in an enterprise or campus environment, including the Indiana University campus proper. This project will focus more on deliverables for those campus/enterprise environments, with integrations to other systems and tools to smooth their operational changes.
Project 5 – Interface Backbone, beginning Monday, October 21
Our research indicates that a substantial amount of the work we do is related to interfaces, turnups, decommissions, address changes, and so on. A substantial portion are related to backbone circuits, and this project will be focused on that area. There are database backend changes required, which can be worked on in parallel during our first four projects. When we reach project 5 those changes should be in place and we will be in a position to begin our work.
Project 6 – Interface Customer/Peer, beginning Monday, December 9
Likewise, customer/peer interface and peering work. We should have our backend changes in place by this point and can begin implementing these sorts of changes from our database source of truth. This is the last major change needed to have all of the components in place for …
Project 7 – Service Provisioning, beginning Monday, February 3, 2020
When most folks think of automation they think of service turnups/provisioning. While this is a component, our research in to config changes shows this to be bursty activity on most networks … with the exception of some outlier networks. We should be able, at this point, to have a system in place that allows for customer/peer turnsups based on a form, or ticket.
Project 8 – Campus, beginning Friday, March 6, 2020
We’re trying to ensure that all of our in-scope networks derive value from each project, but it’s certainly the case that some projects benefit some networks more than others. This project will once again target a project that is of more value to campus/enterprise configuration environments than our previous projects.
That’s our Year One plan. Year Two is under discussion, but at this point it’s likely that our service desk, break/fix, auto-remediation, and predictive failure analysis will be focus areas. Everyone, internal or external, should expect frequent two-way communication throughout the next two years. We need to ensure that the GRP-Automation program is addressing the real needs and concerns of the staff and customers, and adjust our plans accordingly to align to actual needs.
Check out our video for more details on our year one plan: https://www.youtube.com/watch?v=Ad12ApepGHk