CNME Editor Mark Forker spoke to Sascha Giese, Head Geek at SolarWinds to find out how its Hybrid Cloud Observability platform is equipping IT leaders with the tools they need to solve and mitigate security issues in their IT environment.
Sascha Giese, is one of the most respected IT thought leaders and in a candid interview with CNME, he talks about their Secure by Design concept, the key findings from their IT Trends Report, the unique capabilities of its Hybrid Cloud Observability platform – and what’s next on the strategic roadmap for SolarWinds.
Can you tell us about Secure by Design and how SolarWinds hopes to achieve it?
Secure by Design is a response to the SUNBURST incident in December 2020. We used this incident to turn each stone in our environment and spent much thought on the software design process. Until then, we followed what was considered industry best practice, like every other software company. But the incident showed such best practices could no longer deal with the changing threat landscape and state-sponsored attacks. We significantly improved the security of our software design process in many details.
To give you an idea, we no longer use static resources. For instance, in IT, we are quick in spinning up a VM for a test, but once the trial finishes, we tend to keep the VM running. We forget about it and move on. There is an attack point here. Attackers who gained access to such an unmonitored VM have all the time in the world to discover the environment. To counter this, we moved away from static solutions and switched to a highly dynamic Kubernetes-based system. Each time a developer finishes a task or stops working at the end of the day, the resources will automatically destroy themselves.
Also, previously we had one build pipeline, which means developers ship code to a compiler that converts the hand-written code into machine readable executables. The next step is forwarding the executables to download platforms for end users. Now we have multiple independent development pipelines. The first team creates code and documents every single step, the second team rebuilds the code from the documentation, and the third team checks the integrity and authenticity of both code versions. That way, we can bypass many security problems. These changes improved our security, and we also shared it with other software vendors because what happened at SolarWinds happened elsewhere a few months later. We want to help the software community prevent such situations from happening again.
SolarWinds has said that it has been on a journey towards becoming Secure by Design, but how much has this goal been accelerated post the pandemic?
Everything happened during the pandemic or before. I don’t think there were any significant changes post the pandemic. For us internally, for the engineers, it wasn’t easy initially because so many things changed. But eventually, everyone got used to it. So, it’s working exactly the way we expected; we’re very proud of the system.
It’s an industry first to come up with measures to improve the security of software design. It’s something exceptional. And it is not something that’s finished and done. We can’t say everything planned is sorted and set in stone because it is a dynamic process.
Whenever we discover something designed to be improved, we improve it. And we use minor incidents to test our strategies. For example, imagine a user doesn’t log out of a workstation. It’s usually not a big deal, but we use instances like these to test our security policies. Even if it’s a tiny incident, we go the whole way and evaluate our processes with notifications, alerts, etc. It’s like a fire drill that helps us prepare and keeps us on our toes.
Let’s come to hybrid cloud. The SolarWinds IT Trends report 2022 indicates that the shift to hybrid cloud has only resulted in increasing IT management complexity which has created doubt and a lack of confidence among tech professionals as to how best to manage their IT environments. What is your take on this?
Unfortunately, there’s no easy solution that works for all situations. Increased complexity is a huge problem, and if you look for the root cause of the complexity, you have to look back – it’s a chain that starts with the human attention span.
We, as humans, no longer want to wait. So, businesses must adapt and change their requirements, leading to significant IT changes. Today, IT is no longer just supporting the company; it is running it, and correctly implementing new technology is a great way to gain a competitive advantage. This is how increased complexity starts. You could probably say it’s a homemade problem, but businesses must evolve to stay alive and competitive.
How to fight complexity? It would help if you had the expertise and various ways to gain it. It requires giving your IT teams enough time to learn and develop new skills, or you increase headcount and hire additional IT professionals with the necessary expertise. But that’s not always possible, given the global shortage of IT professionals.
Another way would be to get third parties into the business for a while, maybe a contractor who sets up a multi-cloud environment, for instance. There are solutions to every complexity, but there is no one-size-fits-all solution.
In this context, can you tell us more about the SolarWinds Hybrid Cloud Observability platform?
The Hybrid Cloud Observability platform is an evolution of what was previously known as the Orion Platform. The Orion Platform has been in the market for 15 years. It’s a modular system that grew with customers’ demands. However, in the last couple of years, we noticed that the market is changing, so we came up with the new platform, Hybrid Cloud Observability, which is easier to understand for the user. It is easier to understand the licensing, and the deployment is more straightforward. Customers get more features for the same price. The software can be deployed in any scenario, whatever the customer needs, on-prem, in a private cloud, public cloud, or hybrid. It can manage/observe any IT environment. So, the platform allows users to get different layers of information into one system, enabling them to perform a faster root cause analysis. Keep in mind, when something breaks in IT, the first question is not how to fix it but who has to fix it.
With Hybrid Cloud Observability, we allow different groups – the network team, server admins, and cloud architects- to use the same tool and access the same data. The solution has some features that help users identify the root cause of problems. That, in a nutshell, is the new platform.
Today, the focus is on business resilience, continuity, and growth. How does multi-cloud observability help organisations with all of these?
Multi-cloud isn’t new, but it’s still a complex construct. There are many variables and moving parts, and it’s crucial to understand workloads, application delivery, and connectivity. However, the connectivity between different clouds isn’t that basic and can be pretty complicated. What is not understood can’t be observed, and what is not observed can’t be managed. Multi clouds are complex, and a platform like Hybrid Cloud Observability is more than beneficial to know how things work. Most customers mix AWS and Azure globally. And those are supported out of the box by our product. It’s a question of attaching a security token, and then we retrieve all the information straight from the cloud provider. It’s pretty easy to use.
With security being a huge concern and priority today, where does the security aspect come into play in this observability scenario?
Our annual IT Trends Report discovered complexity as the biggest problem, and it’s the same for security. I’d probably say that security and IT operations have one least common denominator: lack of visibility.
If you don’t see or understand a performance problem, you can’t fix it. And if you don’t see or understand a risk, you can’t mitigate it.
Security teams and security professionals use different tools than operational teams. Still, it would be beneficial for them to understand data flow and how applications talk to each other, and observability allows them to gain complete insight into the environment. So yes, it is helpful for security teams, too.
Do you believe that the Hybrid Cloud Observability platform could remedy many of the issues facing IT teams today as the shift to hybrid IT continues to accelerate?
The short answer is yes. I briefly touched on this example: in IT, multiple teams, like networks, and applications, work in silos and use their toolset. If you have a unified platform that brings the groups together, brings humans together, that’s a huge advantage. It’s probably something that an individual in IT doesn’t see as a crucial topic. But the IT Director or the CIO, someone with the big picture in mind, will instantly understand the benefits for the whole IT department and what it means to the business. But sometimes, it’s also about the simple fact that such a tool gives people more time during their workday.
An IT professional spends more than half of the day fixing broken things. We call this firefighting, and it’s usually a waste of time because that is time that can’t be spent on improving IT, can’t be spent on gaining a competitive advantage, and can’t be spent on learning how to deal with new technology. So, firefighting is a waste of time, but unfortunately, as we know, things break, and things misbehave in IT. The need for firefighting remains, but if a tool could automate steps and even work autonomously in the background, it would be highly beneficial for each organisation. It doesn’t have to be a global player. Even smaller businesses see how easy Hybrid Cloud Observability is and understand that it can help with consolidating tools and lowering stress on individuals, reducing costs significantly.
What’s next for SolarWinds?
There are a few things in the background that we are working on. For quite some time, we’ve been focusing on Artificial Intelligence. And we’ve built our own; we didn’t go the way of purchasing an already existing framework. Our data has been training the system for the last eight months. We also reached out to a few customers who were okay with providing us with insights. Our AI will make it into the product and lower IT professionals’ workload.
There’s a multi-layered approach. The first thing we want to do is reduce unnecessary alerts; that’s important because if we receive text messages or emails all day about stuff that’s not relevant, we tend to ignore it. And when something serious happens, we don’t respond because we missed it. We use AI to look into a situation, discover whatever caused an alert, and relate it to all previously collected information.
Let’s say, for instance, we manage a hypervisor like VMware, and the hypervisor runs with 90% CPU or 90% memory. Traditional systems will probably see 90% as a lot and instantly send an alert. However, if this condition is active for a longer timeframe and everything else is working fine, there’s no real reason to alert. A notification will do.
Now, let’s say CPU utilisation increases from 90 to 95%. That’s an anomaly so that the AI will look into the reasons. And if the system sees that, for example, 20 virtual machines were running previously and now it’s 22, that’s a valid reason the CPU would go up. We wouldn’t send an alert but instead deliver a notification to an ITSM solution for change management.
If, however, we see that the increase in the CPU is coming from a single machine, maybe a database, we’ll look deep into the database, collect all information and send it to the resolver group only, which would be the DBAs in this example.
This is the first step. There are a couple of other things in the pipeline, but they’re further ahead. We have many plans here at SolarWinds; one could say we’re on a mission!