Improving the utilization of zombie servers in the data center may prove more difficult than first meets the eye. But breathing fresh life into otherwise comatose IT equipment isn’t impossible.
——————————————————————————————————————————————————————————
Buried deep in the recesses of data centers worldwide lurks a silent threat. Off the beaten track of well-managed hardware lifecycles, they quietly drain energy, compute, and money. We’re talking, of course, about zombie servers.
What is a zombie server? A zombie—also known as a comatose server—is commonly understood to be an unintentionally idle device with no external communications and no visibility, all while guzzling electricity.
Sounds straightforward enough, except zombiehood can be a matter of degree—and thus hard to detect. Sometimes a server contributes some compute or storage, but isn’t utilized fully enough to justify the power expenditure.
The good news is that by identifying zombie servers, you can improve your company in several ways at once. Eliminating zombies can reduce costs, green your center, and improve security.
Zombie Servers: The Extent of the Threat
So, zombies exist, and that can mean double trouble for a data center. But how many zombies could there possibly be?
A lot, apparently. In fact, according to a 2015 joint study by the Anthesis Group and then Stanford research fellow Jonathan Koomey, there were approximately 3.6 million zombie servers in operation worldwide at that time. There’s no evidence that zombies have become less common in the years since. What’s more, 3.6 million was just an estimate一the real number may have been as high as 10 million.
The study, which relied on anonymized data collected by analytics company TSO Logic (purchased by AWS in 2019), showed that a stunning 30% of servers were comatose. A server was “comatose”, for the purposes of the Anthesis study, if it delivered no information or compute for six months or more.
VMWare has also found ample evidence of the prevalence of zombies. During a 2017 host inventory, the firm discovered that 43% of its host machines were zombies, and that removal would save $640k per year.
Zombies aren’t a one-off problem. They’re everywhere!
How Much Energy Do Zombie Servers Use?
When it comes to energy, zombies are doubly greedy. Not only do they consume power to keep running, but they lead companies to spend more on cooling than they otherwise would have.
The energy used is substantial, as are the potential savings if zombies are driven out. For its part, Anthesis assumed a cost of $3,000 per server, abstracting away infrastructure capital and operating costs. Based on the value, at least $10 billion-$30 billion worth of servers were sitting idle globally at the time of its study. And that’s long before the current tech crunch drove up server prices.
In a separate study, the Uptime Institute identified approximately 20,000 comatose servers. Shutting them off resulted in a 5-megawatt reduction in IT load and a 4-megawatt drop in cooling and infrastructure load.
In another stunning reduction, when AOL purged its zombie servers, it saved the firm $4.3 million, while also reducing its carbon footprint by 35%.
“Removing idle servers would result in gigawatt-scale reductions in global IT load, the displaced power use from which could then support new IT loads that actually deliver business value,” remarked Dr. Jonathan Koomey. “That’s a result that everyone should cheer.”
Why Hunting Zombie Servers is a Challenge
The data is clear: zombie servers are as ravenous as they are numerous. As if that weren’t enough, they’re also sneaky一zombies are surprisingly hard to detect.
In the late 90s and early 00s, when each server was dedicated to a specific function, idle devices were easier to single out: just find the servers whose power consumption was constant.
Data centers today are more complex. Compute is usually organized by an “orchestration layer”, applications which assign and track workloads across multiple servers. These programs require a pool of idle servers standing by which can be rapidly assigned to new workloads as the need arises.
The upshot is that identifying zombies is no longer as simple as finding servers which consume power at a constant rate. Nowadays, the orchestration layer needs servers in a no-load or low-load state, which can be activated on the fly when compute is required.
Just because it looks like a zombie and behaves like a zombie doesn’t mean that it actually is one.
The difficulty in detecting zombie servers is just one obstacle to utilizing servers which are necessary and reselling those which aren’t.
Beware of Virtual Zombies
Of course, compute and storage tasks are far from strictly delineated by the bounds of physical servers. A lot of the allocation of compute and storage happens virtually一a single virtual server, used for a specific task, might have its operations spread across multiple devices.
This abstract, virtualized conception of a server brings with it yet another complication一virtual servers can be zombies too! In a 2017 update to the original Koomey / Anthesis study, the researchers reported a reduction in the incidence of comatose servers一down to 25%. However, three in every ten of the virtual servers it studied remained zombified.
During a 2019 migration, VMWare found that 47% of their VMs were comatose, and that eliminating them saved $3 million annually in data center costs.
Here’s the problem. With an underutilized physical server, one can simply unplug it or reassign it to some useful task. Virtual zombies require more care, though, because they spread the energy waste across multiple devices.
(Another reason virtual zombies are so vexing is that they waste licensing fees, too. In a 2016 analysis of the original Koomey / Anthesis data, The Uptime Institute identified significant potential savings in licensing costs associated with zombies.)
How to Deal with Comatose Servers
No one said zombie hunting would be easy. But, as everyone knows, zombies don’t do well in sunlight. By increasing operational visibility and making use of illuminating analytics, you can expose and purge comatose servers.
Step 1: Gather data
Since zombie servers are hard to detect, the first thing to do is confidently locate them. By using the appropriate tools, you can gather analytics which will help identify, and eventually eliminate, the zombie menace.
Thankfully, some of the most helpful zombie hunting tools are some of the most common. Data Center Infrastructure Management (DCIM) software is key, as it lets you visualize trends, forecast power consumption, and test for “failover capacity”, which is how well your system can handle drive failure without shutting down.
Power Distribution Units (PDUs) are another key source of data. While spotting zombies is no longer as simple as looking for devices that consume power at a constant rate, one can still learn a lot by examining patterns of consumption. Monitor power by server and by month in order to detect spikes and dips in drive activity.
Finally, you can extract server utilization data from the orchestration layer, which is the software responsible for allocating servers to specific tasks.
Step 2: Analyze
After you’ve gathered the relevant data, it’s time to analyze it. Thankfully, much of this step can be automated if one has the appropriate software on hand.
Send the data gathered in step 1 into a database or DCIM tool, and run some analytics to determine which servers are running at low power. A data center operator can compare this to what’s happening in the orchestration layer. This will determine whether a given server is comatose or just underutilized, and whether it’s safe to shut it down.
If the DCIM tools don’t take you far enough, next generation software offers more advanced zombie hunting capabilities.
Zombies vs. machines? At IBM, researchers are exploring how to use A.I. to spot zombie servers. Underutilized servers are hard to detect, and can easily slip by many of the most obvious tests, such as a CPU usage threshold. Here’s where machine learning shines: it can detect complex patterns which point to a comatose server.
Step 3: Diagnose
What makes a good server turn into a zombie? Data centers are complex places, so it’s sometimes impossible to pinpoint just where in the server shuffle a particular asset became stranded. But if your data center staff can crack the mystery, it can prevent future zombies. That saves a lot of time and money.
What year was the zombified asset purchased? What was it’s original intended use? Was the asset renamed at some point? Was it used by someone who then left the company?
The difficulty of this sort of investigation depends on the size and complexity of the data center in question. But if you can spot zombie triggers and adjust your hardware lifecycle accordingly, it’s possible to minimize the accidental creation of zombies.
Step 4: Adjust Your Hardware Lifecycle
Once the comatose servers have been identified, it’s time to decide what to do with them.
One way forward is to give the formerly idle servers more tasks, so you’re making good use of the extra compute. Alternatively, you can repurpose the capacity for use elsewhere in the company. Remember to have server drives securely wiped and resold to a third party, or recycled.
(Given how common zombie servers are, you might find yourself repurposing drives at scale. In order to reuse them in a secure, green, and efficient manner, be rigorous when selecting an ITAD firm to support you.)
To prevent future zombies, it’s important to track physical and virtual assets over their whole lifetime. Specifically, you should track the purpose, status, and activity of the asset, as well as any relevant information about who deployed and who uses the asset. This visibility encourages accountability.
If you don’t have one already, a good ITAD firm will help you draw up a lifecycle management plan. Such plans help ensure that active servers reach the end of their lifecycle in an orderly manner. Avoid joining the costly purgatory of comatose machines.
Attack Those Zombie Servers
Zombie servers are everywhere, costing companies money, contributing to their carbon footprint, and posing a security risk.
It’s not all negative, however. Every idle server is an opportunity for your company to save money and increase its bottom line. Banishing zombie servers is also an excellent way to green your data center.
While zombie hunting can be tricky, it’s well worth the cost and effort. With the right diagnostic tools and a bit of planning, your whole data center can run more efficiently.
Find out how Horizon can support your data center asset recovery programs and strengthen your server lifecycle management.