The rapid Digital Transformation over the past few years, arguably only accelerated by the COVID-19 pandemic, has led to the increased adoption of cloud-native technologies such as microservices and Kubernetes in enterprises across the region. We speak to Gregg Ostrowski, Executive CTO at Cisco AppDynamics, about the importance of using a modernised cloud native observability platform for driving exceptional digital experiences.
Modern application architectures offer huge benefits for organisations in terms of improved speed to innovation, greater flexibility and improved reliability.
But IT teams in organisations across the UAE and the world for that matter, are now finding themselves under immense pressure as they attempt to monitor and manage availability and performance across hugely complex cloud-native application architectures. In particular, they’re struggling to get visibility into applications and underlying infrastructure for large, managed Kubernetes environments running on public clouds.
There is no doubt that staying on top of availability and performance is a far greater challenge in a software defined, cloud environment, where everything is constantly changing in real-time. But with Digital Transformation projects and innovation initiatives continuing to run at breakneck speed, the heat is on for technologists to adapt and get the visibility and insight they need across these modern environments.
An issue of scalability
Traditional approaches to application availability and performance were often based on physical infrastructure. For example, 10 years ago, IT departments operated a fixed number of servers and network wires, they were dealing with constants and fixed dashboards for every layer of the IT stack. The advent of cloud computing added a new level of complexity and organisations found themselves continually scaling their use of IT, up and down, based on real-time business needs.
While monitoring solutions have adapted to accommodate rising deployments of cloud alongside traditional on-premise environments, the reality is that most were not designed to efficiently handle the dynamic and highly volatile cloud-native environments that we increasingly see today.
Therefore, the fundamental question is one of scale, these highly distributed systems rely on thousands of containers and spawn a massive volume of metrics, events, logs and traces (MELT) telemetry every second. And currently, most technologists simply don’t have a way to cut through this crippling data volume and noise when troubleshooting application availability and performance problems caused by infrastructure related issues that span across hybrid environments.
The case for cloud-native observability
As such, it is essential for technologists to implement a cloud-native observability solution, to provide observability into highly dynamic and complex cloud-native applications and the entire technology stack.
In order for technologists to be able to thoroughly understand how their applications are behaving and where issues might lie, they need visibility across the application level, into the supporting digital services (such as Kubernetes) and into the underlying infrastructure as code (IaC) services (such as compute, server, database and network) they leverage from their cloud providers.
But before technologists’ rush to implement a solution to this growing challenge, there are some important factors that must be considered when thinking about observability into cloud environments.
For one, technologists should be looking to implement a purpose-built solution; one that can observe distributed and dynamic cloud-native applications. Traditional monitoring solutions continue to play a vital role, and will do so for years to come, but it becomes problematic when cloud functionality is bolted on to existing monitoring and APM solutions. Too often, when new use cases are added to existing solutions, data remains disconnected and siloed, forcing users to jump from tab to tab, to try to identify the root causes of performance issues. Very few of these solutions provide complete visibility, for example insight into business metrics or security performance, and many are naturally biased towards a particular layer of the IT stack depending on their legacy, whether that is the application or core infrastructure.
A new approach for new teams
Cloud-native applications are built in completely different ways, and they’re managed by new teams such as Site Reliability Engineers (SRE), DevOps and CloudOps, that have new and different skill sets, mindsets and ways of working compared to other functions within IT. As such, they require a completely different kind of technology to track and analyse availability and performance data. They need a solution that is truly customised to the needs of cloud-native technology stack to decipher short-lived microservices interactions with one another and which can be long gone once troubleshooting is done.
SRE and DevOps teams need a solution that embraces open standards, giving a full-stack, correlated view of all telemetry data across the technology stack, most notably, Open Telemetry. Technologists need to be able to collect all telemetry across the stack and domains and then analyse all that telemetry data, since it is interconnected and interdependent at once. A standards driven solution is essential to future-proof organisations for the next decade and beyond.
Technologists also need a solution that allows them to monitor the health of key business transactions that are distributed across their technology landscape. If an issue is detected, they need to follow the thread of the business transaction’s telemetry data, so they can quickly determine the root cause of issues, with fault domain isolation and triage the issue to the correct teams for expedited resolution.
Finally, technologists should be looking for a solution that combines observability with advanced AIOps functionality. They need to leverage the power of AIOps and business intelligence to prioritise actions for their cloud environments. In the future, organisations will utilise AI assisted issue detection and diagnosis with insights for faster troubleshooting. Ultimately, it allows technologists to focus more quickly on what really matters, where and why it happened.
Over the last two years we have seen a seismic evolution in applications and technologists need to ensure that their monitoring capabilities keep pace. From understanding how highly distributed cloud-native applications work and predicting incidents, to adopting new ways to gather vast amounts of MELT telemetry data, teams across ITOps, DevOps, CloudOps and SREs need contextual insights that provide business context deep within the tech stack.
Only with the right cloud-native observability solution in place, will IT teams and their organisations be able to optimise the benefits of modern applications, driving enhanced digital experiences for customers and improved business outcomes.