Staying Cool Under Pressure: How CoolSys Migrated Its Data Center to Nutanix in Four Months
Nutanix
Experience is the best teacher. Working through challenges forces you to be resourceful and think critically. Finding the right solutions by going through the trial and error process can cause a lot of headaches, so the lessons you learn are hard won. You’ll take those lessons with you, and learn to not make the same mistakes again.
When you’re under pressure with a lot at stake, experience helps you make great decisions quickly because you’re more certain of the outcome. Staying cool under pressure isn’t just a behavior, it’s a mindset. And it’s what we do at CoolSys—both literally and figuratively.
CoolSys is the parent company of a portfolio of 16 brands that handle commercial refrigeration and HVAC solutions. The core of CoolSys consists of three divisions: the commercial and industrial division serving large retailers; the engineering division, where we design the cooling and heating systems for commercial and industrial settings; and the light commercial division, which handles kiosks and retail outlets for brands like Starbucks.
Our clients include some of the world’s largest retailers. We have about 3,200 employees with about 2,000 field service personnel keeping our clients’ systems running smoothly.
Navigating Data Center Difficulties
As successful as our company has been, a few years ago CoolSys noticed some issues with our data center. Our peak business season is from May–October but our data center wasn’t able to meet the increased demands during that time. Our data center had outdated hardware, and we were experiencing difficulties with capacity and performance, as well as heating and cooling issues. This was a big challenge for us because of our business model. Working with big box stores like Target, Walmart, Costco, and others, we are contractually obligated to guarantee our refrigeration solutions.
A large part of our business is handled by our field service personnel. The only way to contact our refrigeration engineers when they’re in the field is via their mobile devices. All work orders are processed through those devices, so it’s our only way to let the engineers know when and where they go next. If our systems cannot sync to their phones and vice versa, all that information is lost and the jobs don’t get assigned. This leads to revenue loss and property damage—both unhappy outcomes.
Not only that, but when engineers complete work and are unable to upload that information, they have to capture it manually. It takes a lot of time and results in a lot of lost productivity. That’s why any data center downtime is simply not sustainable, affordable, or acceptable for our business.
Sourcing a New Hyperconverged Data Center
I’m an IT veteran with about 25 years in the industry. I started with CoolSys in the summer of 2018 and prior to that, spent nearly 15 years as a director of IT infrastructure and operations in a similar industry with a similar set of challenges.
I joined CoolSys because I was excited by the company’s growth story and I wanted to be a part of it. We needed a high-quality, cost-effective solution in place in order to deliver on our brand promise. I knew I could help the company scale because I had experience building the type of IT infrastructure required to meet the growing demands of the company and its clients. My mandate was to establish a stable environment. I had to establish connectivity and migrate our data to a more stable setting—all in just four months. There was no question: the data center had to move to a new colocation facility, and there was no time to waste.
How I Knew Nutanix Was the Right Decision
Over the years, I’ve seen compute infrastructure evolve from a rack full of blade servers and subsequent consolidation to a storage area network (SAN), followed by the next generation of unified computing servers (UCS) and on to hyperconverged platforms. Around that time, I was introduced to Nutanix.
I’ve heard good things about Nutanix through networking with my peers and other IT directors. I was very impressed with the platform and I knew exactly where Nutanix offered advantages compared to past data center infrastructure I’ve used.
For example, if you have UCS and you use a bare-metal restore (BMR) hypervisor in tandem with storage resource management (SRM) for your site recovery manager, it’s cumbersome to deal with so many different technologies at multiple sites. In order to orchestrate disaster recovery (DR), you needed to have a lot of expertise in UCS, networking, virtual machine replication (VMR), backup and recovery, and so on. That’s a lot of specific skill sets and knowledge to apply in an overly complicated data center environment. With the Nutanix platform, all of that skill is simplified and contained in one box, so you can essentially handle all those complexities with a simple integration.
Nutanix is also a leader in the hyperconvergence space, and we needed to choose a proven solution. Our timeline was compressed, and the appetite for risk was very minimal. We had to be certain that the platform we chose would be solid and could support our growth.
When you’re migrating from one data center to another, you have to be able to do it with zero downtime so the business can continue to run smoothly. I understood all of the Nutanix capabilities and orchestrated the failover process. Nutanix gave us a loaner platform where we were able to duplicate the data as we were setting up the new data center.
Once the data from the current systems were on the Nutanix platform, we then replicated that data from the loner platform into the colocation facility. By Thanksgiving weekend, everything was in place. All it took was one reboot during that whole process to complete the migration, and there was essentially zero downtime for the business.
Lessons from a Four-Month Data Migration
Our story at CoolSys is ultimately a success story, but the deployment wasn’t perfect. Despite my years of experience, there’s always more to learn.
If you are anticipating growth, you have to invest in that upfront and factor it in as part of your planning and your budget. Otherwise, you will get some surprises. Everyone was excited about the new colo, but we didn’t plan appropriately for our current level of growth and very quickly realized that we needed to add capacity.
Performance is everything. With our 2,000 field technicians uploading high-resolution photos to the server and expecting a very high response time, we needed faster compute. So we worked with Nutanix to address that, and are in the process of creating a high-performance cluster dedicated for mission-critical database applications. That way, we can grow both horizontally and vertically on the database tier without sacrificing performance expectations.
When we moved the workload from the physical servers to the virtualized platform on Nutanix, we initially had to do some fine-tuning to get performance back to what the teams were used to. It didn't take long. We were able to work with the Nutanix team to get the right size specifications. Once they collected and analyzed the data, they provided the parameters and configurations for fine-tuning and then performance was set.
Through this experience, I’ve learned that it’s important to always do your homework. Make sure you understand your workload, performance expectations, and data growth pattern. Be sure to talk to experts about the specific workloads that currently exist for your company, and how those workloads are expected to change. Never assume you know the answers.
If you have years of experience in your field, you can take for granted that what you’ve experienced in your previous environment will apply to your current environment. This is where you can really go wrong. General principles should certainly be applied, but you still have to do your due diligence. Validate your assumptions with people familiar with the new platform so you have a better chance of success.
Addressing Today’s Challenges with Solutions That Scale for Tomorrow’s Growth
In the time since we implemented Nutanix, we’ve experienced solid platform growth. We’ve been able to run our compute infrastructure with very few resources to manage it. We don’t have a dedicated server specialist, networking specialist, or hypervisor specialist. Instead, we have a common pool of engineers who manage the full stack. That makes everything easier.
As we continue to grow our portfolio of acquired companies, we plan to seamlessly integrate all their IT infrastructure needs, especially for the technicians working in the field. After we complete all of the Nutanix improvements, we can start bringing in all these entities into this platform without worrying about the ability to meet the SLA on both the response time as well as the scaling capacity limitations.
With high-availability configuration, our platform is more stable than ever before. We have invested in a remote site where we are duplicate the data from the primary site, which protects us against risk. If our primary site is compromised, we can access the data residing in the remote site, and those kinds of protection capabilities are much easier to manage now.
For remote offices with local compute needs, we’ve been able to create a remote branch office setup with specific configurations for those needs. They can run their local operations through that hardware while replicating that data to the central site for protection. So, they also get the added benefit of data protection through Nutanix.
It isn’t always easy to find a solution that’s agile enough to meet today’s challenges and can also scale for tomorrow’s growth. We found that in Nutanix. Should I encounter a similar opportunity in the future, this experience taught me that it would be a good call to choose them again.