Bringing a Network Out of the Stone Age and Laying the Groundwork for Software-Defined Access
Cisco
With an organisation as old as the University of Cambridge, there are many misconceptions. One of the most common is that it is a unified organisation with centralised staff and devices that all operate on a centralised network. Unlike most other UK universities, the truth is that the University of Cambridge is extremely fragmented, with different Colleges, Departments, Schools, Faculties and other affiliated Institutions operating autonomously. Each institution meets the needs of their academics, students and staff in their own way, and although they connect to the backbone of the University network, many operate their own internal networks.
I've worked for the University since 1995 and have been with the University Information Services (UIS) department from 2015. UIS is responsible for maintaining many network systems including the main internet connection for the entire University, functioning as an underlying current of connectivity across the University. Even though one department is responsible for University-wide IT services, the fact that many of the departments manage their own networks creates a set of expected challenges, as well as some that are far more difficult to anticipate. Furthermore, Cambridge does not have a campus; the whole of the city is our campus. Fortunately, we have an amazing resource in that we have our own fibre optic network that spans much of the city, allowing us to interconnect most of our buildings and customers.
A Technologically Ancient Setup in a Historic Space
Given the size of the University, our primary network is enormous. The smaller, splinter networks are connected to the backbone of the network through a point of presence (PoP) switch. That’s where some of the institutions take over network management—whatever happens on their network is their business, although we frequently get pulled into troubleshooting and resolving issues for our customers.
The lack of visibility and lack of automation makes everything difficult to manage, and on top of that, our team of eight people had traditionally taken a largely manual approach to management, although in recent years we've automated some aspects of the network. Nevertheless, we frequently have to log into many individual devices to manage and configure everything, a tedious and time-consuming effort that prevents the team from completing projects that would be far more impactful. It was like reinventing the wheel every time we started the update process, and there were no shortcuts.
In our eyes, the solution was to leverage a technology with intelligence baked into the platform. We needed something that could detect and identify every device on the network and provide the right security posture. This would allow the UIS Network team to scale up and take on more of the network.
But it wasn’t just a matter of selecting the right technology. We had to jump over hurdles within our existing environment to move ahead.
The Hurdles to Upgrading
The largest hurdle was a physical one. Although our team's remit wasn't with the passive network infrastructure, it obviously goes hand in hand with the active network. The University of Cambridge is more than 800 years old, and one of the oldest parts of campus is the Old Schools, which houses more than 200 of the university’s administrative employees. The network at the Old Schools was large and entirely outdated, composed of a hodgepodge of physical cabling and old switches. Some components were easy to identify, others were questionable, and several simply had no relevance.
The building itself was another challenge. It contains materials like asbestos, which makes it a difficult environment to work in because it’s hard to alter the interior structure. Critical administrative tasks were also completed in the building during typical office hours. We couldn’t risk, nor were we allowed, to have downtime, so we would have to make any changes outside of regular business hours.
Ever since I arrived at UIS, there had been discussions about updating the network at the Old Schools. It was long overdue for an overhaul, but for one reason or another, the project never got off the ground.
An Unlikely Catalyst for Change
With so much standing in the way, it took a once-in-a-century event to give us the push we needed to finally update the network. When the pandemic hit in 2020, we got an unexpected opportunity to make enormous strides with the project. Suddenly, we had an empty building to work with and we didn’t have to worry about downtime. The University recognised a significant hurdle was no longer a barrier and gave the green light for the project to start. There was one snag: the project started in September 2020 but we had a deadline of January 2021. We knew this would be exceptionally tight if we were to implement an untested platform.
Having already done lots of research, we knew that we needed to purchase Catalyst 9000 series switches, so we initiated a conversation with Cisco and our technology partner, CAE Technology Services. Our team was set on kicking the tyres on Cisco Software-Defined Access (SDA). We knew we could leverage other technologies within that solution, like Cisco Identity Services Engine (ISE) and Cisco DNA Center (DNA-C), both of which would improve and enhance our automation and management capabilities.
Working closely with my colleagues, Mantas Kandzezauskas and Bob Franklin, our initial plan was to buy switches that could eventually be run in SDA fabric mode. We were fully aware of how complex a fabric network deployment could be, especially with a team that was accustomed to much more traditional methods of network implementation and management. By getting SDA-ready now, it would make the task easier whenever we were ready to automate the network and take the technology leap to fabric networks.
There’s No Such Thing as Too Much Planning
Given our timeline, we weren’t sure that we could do everything at once, so we were content to move in phases to make sure we had enough time for the initial upgrade. We planned to implement a traditional network to meet the deadline and then upgrade to a fabric at a later time.
We held a series of technical discussions with the Cisco CX team, led by the ever helpful Roger Milnes, and Cisco’s tech genius Steve Kirk, who knows the Cisco product range and technology in great detail. Over the course of many weeks and many Webex conference calls, the experts changed our minds about our plans and I'm extremely grateful they did.
Roger and Steve gave us a thorough explanation of how the technology fits together to create a functional platform, and how it was most effective when everything fits together from the start. During the hours-long sessions, they provided a level of support that gave us the confidence we needed to move forward with such an aggressive schedule.
We engaged the professional services of CAE for the initial design work. Even though we had a basic idea for the network design, they helped with some of the logistics involved in the immediate precursor to implementation. After they shared their design expertise, we got to work.
Planning was paramount to our success, and through the series of discussions about the scope of the project and the design, we carefully laid the groundwork. Just incorporating the ISE and DNA-C controllers onto our network to support the underlying SDA platform was no mean achievement. One of our biggest challenges was how bring these services into our IP address space and plan for future SDA fabric sites and their logical addressing—after all, we plan to implement many more SDA sites, so getting the IP addressing scheme correct now was important.
The hardware was delivered around November 2020, which enabled us to get the controller platform installed and built, but due to the aggressive schedule onsite the switches couldn't be installed until after the New Year. I remember worrying about the project over Christmas, thinking that if something didn’t work as expected, we wouldn’t have an awful lot of time to troubleshoot and get it sorted by our deadline. Had we gotten the switches installed before Christmas, we would’ve had a lot more confidence. That said, I needn’t have worried at all.
When we returned to work in January all the planning and design phase of the SDA site had already been done (which DNA-C uses to automate the network configuration) but there was still a lot of work left to do. Once everything was physically connected, we were ready to press a few buttons and let SDA do its magic. It was amazing to see everything get configured, and I was impressed with just how smoothly everything went in such a phenomenally short period of time.
Going from the technological Stone Age at the beginning of September to laying the foundation to implement a much more robust, automated network ended up being a lot easier than any of us had expected. Now that we have the SDA platform installed in our environment we can roll out new SDA fabric sites much easier. We expect this will allow us to scale out our services while retaining a small head count.
Just the Beginning
Managing our network is considerably easier now that we have SDA. Thanks to the new user interface, we can identify potential network issues in a lot less time. A good example is shown in the Assurance dashboard example below.
This pinpoints the root cause issue for problematic hosts much quicker than traditional methods. Troubleshooting on a traditional network would likely involve finding out the MAC address of the host, finding it on the network, inspecting the switch logs and dhcp binding table, and probably the dhcp server logs, too. There is an element of comfort that comes with having such an easy-to-read dashboard, and the Assurance element of the platform tracks many components through multiple telemetry points throughout the network from the client right the way through the fabric switches.
We still have a long way to go to get to where we want to be. As of spring 2021, our campus hasn’t fully reopened, and because many of our staff, faculty, and students are still remote so our network hasn’t been put to the test. We have some small tweaks to make, but our main objective is to eventually start using Cisco ISE to leverage some of those dynamic qualities that this platform offers, such as dynamic host onboarding and automated security posturing.
Eventually, I hope UIS will come to manage many more of the networks across the University, and we will realise even more benefits of SDA. As we push to manage an increasingly larger percentage of campus networks, we won’t significantly increase our workload. In fact, we know we will decrease our workload in the long run because we will be working smarter instead of harder.