We just found 30 servers that can’t be accounted for. Thirty Internet-facing servers with no malware protection and patchy histories. I need to take a deep breath and figure out just how bad this is and what we can do to stop this sort of thing from happening again.
This came to light because I collect metrics that I can present to our CIO during his quarterly business review. Among them is our number of unmanaged resources. That’s a number that we always want to see decreasing. Total elimination of unmanaged resources is probably beyond our reach, but I at least want to contain them to our development environment and keep them out of the DMZ, that portion of a network that exposes applications and infrastructure to the world. Our production environment is behind a firewall that protects it from the R&D network, which I call the “Wild, Wild West.”
We’ve tried to get the folks in the R&D organization to manage their resources better, but they have so many isolated requests that they can’t keep on top of things. Rather than fight a battle we can never win, we just put those R&D resources behind their own firewall and impose rules that restrict what those resources can do and where they can go. To compensate for that, and since I believe that you’re only as strong as your weakest link, I strongly emphasize configuration management of our production network, with a 100% compliance goal for our Internet-facing resources.
The metric on unmanaged resources is created by conducting Nessus scans and matching those numbers up against what our operations folks tell us they are managing. The difference is the number of unmanaged resources. Naturally, I was stunned when a Nessus scan turned up 30 Internet-facing servers that didn’t appear on our corporate systems management console. Once I picked my jaw up off the floor, we reviewed the servers manually. Besides the malware and patching lapses (no updates in more than six months), we found that some of these unmanaged resources were Linux servers with source-code compilers on them. Some of them had default services running that are risky at best, such as Telnet and FTP.
So who is running these servers? An email to everyone in IT asking that question got no response. OK, then, let’s deactivate the servers’ switch ports and see who comes running. It took more than three days, but finally someone from one of the business units called IT operations. It turns out that the business unit had provisioned the servers to run a proof of concept for a customer. The unit was able to do this because one of its admins used to be a member of the IT department, and he still had access to Lab Manager, the centralized administration server used to spin up virtual machines. The admin said he thought Lab Manager only positioned servers on the R&D network and not the DMZ.
So there’s no bad guy in this story, but we clearly have some process shortcomings. The password for Lab Manager should have been changed when the admin left the IT department, according to our policy. We had undocumented servers with customer data on them, which is against our policy. Why was there no email alert or other notification from Lab Manager that servers had been provisioned? I also want to find out why the provisioned servers weren’t installed with our predefined baseline image, which would have installed our systems management software, patches and antivirus software, and hardened the operating system.
One other question comes to mind: Why didn’t our security information and event management system alert us that there were new IP addresses in our DMZ? I’ll definitely look into that one.