Creating Resilient IT Infrastructure with Workload Optimization
For any company, the IT team is like the guards walking the rampart, keeping a watch over the castle. When IT does their job well, all is quiet—customers say nothing. When something goes wrong, that’s when customers complain and IT needs to jump into action.
As the person responsible for the cloud in our data centers at , that moment when IT has to spring into action is one I knew well. Maybe too well. IP Telecom is a service provider that falls under the umbrella of the public company IP—Infraestruturas de Portugal—that manages roads and railways infrastructure in Portugal. IP Telecom’s main business is to manage the fiber optics along railway lines. In my role with the data centers business, we work in the cloud environment, selling infrastructure as a service.
Just a few years ago, if you asked me how these systems were operating, I would have said they were fine, but obviously there was always room for improvement. What I see—only in hindsight—is how much room for improvement there was with issues we treated as normal.
The big challenge in our data centers was managing the machines’ workload, their memory, and processing. We were seeing what I now realize were massive peaks in workload, and this ultimately meant lower performance for our clients.
Another challenge was licensing costs. Because we sell infrastructure as a service, our clients can install whatever machines or operating system they want. We want to keep it simple for the client. But to be compliant with, say, Microsoft systems, my Windows VMs can work only on that specific hosts. On the virtual environment side, increased memory needs translate into higher licensing costs. The costs of an inefficient data center ultimately gets passed to the client.
In other words, we were offering our clients less for more—the exact opposite of a good business proposition. And, like guards keeping watch over castle, you can bet our IT team heard from customers about issues.
A 24/7 IT team is good to have, in theory, because you want your business to offer around-the-clock support. At the same time, you know in practice it should not be necessary to have your IT team constantly running all day, every day solving issues. We were doing the latter, and it cost us as a company, not just in unhappy customers, but in limiting our future growth.
When your IT team is putting out fires 24/7, you don’t have the crucial human resources to innovate and push forward as a company. We had to address the deeper issue of the inefficiencies in our systems.
Overcoming Skepticism in the Search for a Solution
When we began looking for a solution, we had to consider our needs. Our optimal solution had to reduce the requirements of the workloads and memory needs. So what were our objectives? It had to be a solution with greater intelligence than the VMware itself. It had to be intelligent enough to get information from the VMware and figure out which are the best posts for the VMs on each of the data centers.
We already used a significant amount of Cisco equipment in the data center network. So much so, that it made sense to look also for a Cisco solution. Through the helpful collaboration and advice of the local Cisco Gold Partner, Cilnet, we came upon a solution: (CWOM). We began with a PoC in 2018 to prove that it would help us reduce costs and management time.
I should pause here to note that not everyone at IP Telecom was convinced that we should look for a solution to these issues. The operations team at the time, in particular, had what I will call a healthy skepticism about the value of adding additional tools to solve the problems. This is why we went with a PoC: to prove it could achieve what we needed.
Two days into the PoC phase, CWOM identified problems with our systems and proposed a set of solutions. In a PoC, the solutions don’t apply themselves automatically, but we could see from the sequence of tasks CWOM suggested that this was the solution we were looking for. It was very good. That sequence of tasks erased any skepticism the operations team had about CWOM’s value.
Among Immediate Benefits, a Breather for IT
With the success of the PoC, there was no reason to wait on implementation. We immediately saw results. While I can’t get into the specifics of our individual clients, overall we saw a 30% reduction in resources required for Microsoft systems, and a 15% reduction for VMware. Right there was one of our objectives achieved, because you’ll remember that one of our challenges was the mounting cost of licenses.
We also saw improved performance. With the machine workloads, we no longer see those mountain peaks—they’re more like molehills. We’ve seen CWOM identify overworked machines and redistribute, which improves performance and overall application speed.
Because of how intelligent the system is, CWOM also saves our teams a lot of time, especially our IT people. Automation means our “24/7” team doesn’t have to work 24/7. Now that they aren’t constantly putting out fires, we can put that talent to work thinking about the future.
Thinking of the Future
When you work automation into your systems, it frees your people from their focus on operations. Now, they can think about innovation. Coming from the perspective of business development, it’s those tasks like upgrading our infrastructure where I want to see our IT team apply themselves.
The changes we’ve seen so far equate to lower costs for our clients. We’re also expanding CWOM into our internal systems and will soon use it to generate client-facing automatic reports. I predict those reports will provide a big benefit because clients value transparency. It inspires confidence in us as a company. And in this business, the client’s confidence is everything. Along with a better price for our services, this is what will keep us competitive.
When I look back on this process, the only thing I wish we did differently is we should have been more ambitious in our goals. That’s the advice I would give to IT managers considering a similar solution: Cast a wide net. I wish we had looked at more metrics when we did the PoC, because it didn’t take long for us to discover positive results in the PoC and we’re now finding we could do so much more with CWOM than we originally thought.
In IT, silence is golden. Since implementing CWOM, we’ve seen a 10–15% reduction in tickets opened for customer questions and complaints. That tells me we’re on the right track to providing better service for our customers.