How can data centre leaders combat the skills shortage as further outages disrupt the industry?

How can data centre leaders combat the skills shortage as further outages disrupt the industry?

Uptime Institute has announced the key findings of its 10th annual Global Data Center Survey, the largest and most comprehensive in the data centre industry. The results show a growing sector adapting to rapid change on multiple levels. In almost every area under discussion — outages, resiliency, staffing, workload placement or innovation — there is considerable variety in the strategies being employed.

Uptime Institute has reported on data centre outages for several years, surveying operators on their experiences with outages and closely tracking publicly recorded incidents. Even with the inherent difficulty collecting and assessing this information, clear trends from our research emerge: in surveys from 2018 and 2019, and now supported by its 2020 survey, outages occur with disturbing frequency, bigger outages are becoming more damaging and expensive and what has been gained in improved processes and engineering has been partially offset by the challenges of maintaining ever more complex systems. Avoiding unplanned downtime remains a top technical and business challenge for all owners and operators.

Findings from the report also showed that the data centre staffing crisis is getting worse. The number of managers stating they are having difficulty finding qualified candidates for open infrastructure positions is rising steadily. Women continue to be under-represented and more effort is needed to address the workforce gender imbalance and take advantage of the larger and more diverse skilled talent pool.

“Our 2020 survey results reflect a strong, growing sector facing increased change and complexity,” said Andy Lawrence, Executive Director of Research, Uptime Institute. “The growing complexity, along with the greater consequences of failure, creates the need for more vigilance and more sophisticated approaches to resiliency, performance and operations.”

Uptime Institute annually conducts its comprehensive global survey across the data centre industry. This year’s survey was conducted March-April 2020 and includes responses from nearly 850 managers at organisations that own and operate data centres in more than 50 countries. This group is the focus of Uptime Institute’s new report.

We hear from a number of experts who offer their thoughts and advice on combatting the skills shortage as further outages disrupt the industry.

Florian Malecki, International Product Marketing Senior Director at StorageCraft: “Historically, it’s safe to say there hasn’t been much glamour or glory in a career in storage. In many organisations, it’s a role that went largely unnoticed, at least until the data backup system failed. Recently though, this has changed.

“Data storage used to be a question of simply managing structured data in known, secure environments. This has changed, and with it, the skill set required from those managing it. Driven by explosive data growth, especially the growth of unstructured data, businesses need storage infrastructure that is highly scalable and flexible. In some ways, these are also the skills that are needed from storage professionals; a person that is able to quickly adapt to frequently changing demands from the business, understand what is needed, where it is needed, how the data can be accessed and protected, and propose a solution that meets all these requirements.

“Additionally, it’s essential that today’s storage professionals are security conscious. The rise of ransomware combined with an increasing quantity of data being generated and stored, present challenges when it comes to security. It is no longer only about capacity, but about deploying storage infrastructure that safeguards data and ensures its constant availability and recoverability. As well as this, it’s important to also consider threats from other factors, such as app failure, hardware failure or human error.

“Today, as a role in storage is far more complex, strategic and critical to wider business success, it can be challenging for data centre leaders to find the right profiles.

“To overcome this situation and as one option to solve the challenge, professionals have to choose the right storage vendor for their data centre’s requirements.

“Here’s what they should look for in a vendor:

Ease of use is crucial. With the right storage solution, such as scale-out object-based storage appliances, they can simply plug and play with the ability to ‘bring your own disks’, mix-and match drive types (SAS, SATA) and capacity within the same bay with zero configuration (no RAID, volume or LUNs to configure). This means they spend less time on managing storage and more time on driving strategic initiatives.

High scalability. Choose a highly scalable data storage solution that can keep pace with customers’ data growth, which is often more than 100% per year. The right storage solution will allow them to cost-effectively add any number of drives, anytime and in any granularity to meet the storage requirements of the customers. And they can grow their global storage pool with zero configuration and no application downtime.

Intelligent analytics. Look for a self-organising storage solution that applies analytics and Machine Learning to the management of information. Ideally, such a storage system will use analytics to identify the data that should be backed up and the data that should not, giving you in an intelligent, tiered data architecture that provides rapid access to mission-critical information.

Strong data protection. It is very important to ensure that the data storage solutions include the ability to protect business data wherever it lives and guarantee that it’s always available no matter what happens, like a successful ransomware attack. The best solutions use snapshot, replication and encryption technologies and can also recover individual files, entire systems, or fail over a whole data centre very quickly.”

Charbel Khneisser, Regional Presales Director, MENA at Riverbed: “Today, most organisations have invested sufficiently in availability and can therefore ensure that their applications and data are almost constantly available. So, when we talk about outages, we’re actually talking about performance degradation and issues that ultimately impact user experience and productivity.

“Unfortunately, as an industry, we have become accustomed to blaming any performance issues on the network. End-users typically don’t understand the complexity of application delivery so when service performance doesn’t match up to expectations, their first instinct is to assume it is a bandwidth issue. However, today, there are far more factors at play. Consider applications, for example, which are far more complex now than they were even just a decade ago. Historically, apps followed a three-tier model, but today if you look at mission-critical applications such as eCommerce platforms or core banking applications, it’s common for these to have up to 20 tiers, along with scores of dependencies and interconnections. All this introduces an unprecedented level of complexity which in turn makes troubleshooting significantly more challenging when issues pertain at the application layer.

“To combat the skills shortage, organisations need to ensure that their IT resources are able to optimise their efforts. This means urgently addressing the time that is traditionally spent in identifying and resolving performance issues. Studies show that the mean time to resolve performance degradation is typically between four hours to two days. When we analyse this resolution process, we see that is it subdivided into four key areas: time to know; time to identify; time to rectify; and time to verify. Due to the wealth of information that is today conveniently available no more than a few clicks away, time to rectify is actually the least demanding of these steps. However, because of the lack of visibility that IT teams have into their networks and applications, the majority of their effort is spent in the three other areas of the resolution process.

“If organisations are able to address this, they can free up precious time for their IT resources who can then focus on high-value tasks that drive business outcomes. Actionable insight comes from visibility across all domains – the network, applications and data. In addition to powerful Network Performance Management (NPM) solutions, organisations must also invest in Machine Learning in order to gain the ability to proactively resolve issues before end-users are impacted. Using tools in this manner also enables them to correctly baseline performance which allows any deviations to be more rapidly detected.

“Ultimately, we must recognise that data centre admins have certain core skill sets and have been hired for a specific purpose. By equipping them with the right tools, organisations can empower them to play to their strengths, rather than expending hours of effort on troubleshooting and other activities that are outside of their core competencies. This will help not only address performance issues, but will enable the industry to optimise abilities and effectively overcome the skills shortage.”

Taj El-Khayat, Regional Director MENA, Citrix: “The lack of available skilled talents in the IT sector has always been a recurrent issue for the industry, but the Digital Transformation’s acceleration across organisations due to the pandemic has exposed this skill shortage even more clearly. This has especially been the case for data centre operators, requested to provide the best and most stable service while facing a drastic and sudden increase in load.

“Based on what we’ve seen at Citrix and in our customers’ and partners’ businesses, these are the tactics that I believe can help data centre players to limit the risk of skills shortage:

Use a skills matrix: Data Centre managers should implement a skills matrix – a tool that gives a detailed overview of resources and expertise available within an organisation or a team. This not only helps to identify needed competencies, but also to make sure there is no skills gap and to be able to react on time if a new talent needs to be recruited. Such a matrix is also an excellent way to assess current skills – and for team members to improve and understand the key role they play in their department.

Be a proactive part of the recruitment process: As with any industry, recruiting skilled talent is a tough task, the requirement of very specific technical skills and a high-demanding environment are making this task even more challenging. Working proactively with internal and external recruitment teams from the beginning of the search process gives team leaders the possibility to build a long-term talent pipeline that fits with the real needs, identified by the assessment.

Build partnerships: Partner and engage with universities, colleges and private training organisations to become first stop and point of contact for data centre topics. University programmes are essential for building future capacity and especially to create incentives for a more diverse workforce – for example encouraging more female students to join the IT sector.

Bet on the people: Despite automation, Artificial Intelligence and other augmented technologies – designed to make us more efficient and productive – humans will never be replaced. However, more advanced skills will be required, so data centre operators need to invest in their talent development towards these new advanced skills, such as cloud (focused on 5G and colocation expertise), data science and data centre security, using network intelligence to enable undisturbed performance, analyse data and automate functions without any security risks.

“Facing skills shortage and finding a way to improve the situation is a marathon, not a sprint. But the combination of best practices and close, proactive collaboration with internal talent development teams, as well as external academic partnerships will without a doubt help to face the future with more serenity.”

Simon Bennett, EMEA CTO, Rackspace Technology: “With more organisations shifting workloads to the cloud, several are finding that the mix of skills they need are rapidly evolving. This is primarily due to the evolving service requirements that are being deployed, used and enhanced. As such, instead of a skills shortage in the traditional sense, we’re seeing a lack of availability of the right skills at the right time.

“In order to meet these changes, a data centre leader really needs to think strategically about the skills that need to be retained to add business value to customers, as well as those which can be considered as a commodity or are only needed periodically. With this in mind, a data centre leader can design an organisation that is able to ebb and flow to meet business demand with a core fixed supply and a flexible edge.

“For example, a role that patches operating systems or checks batch operations may not require specialist skills, so can be brought in on-demand or simply automated to release people to perform more high-value tasks. Whereas functional knowledge for a customised ERP system, which may be tailored to the business, is a key skill to be retained through permanent staff.

“We’re in an era of flexibility where customers demand the ability to have options to meet their requirements at any given moment. Take Netflix for example, it wouldn’t be the market-leading streaming entertainment service if it only offered horror or comedy. Similarly, where flexibility is desired by a customer, the relationship needs to reflect the ability to provide suitable skills on demand. The key is to be able to flex the organisation.

“At the outset, the difference between customer challenges is an important launch point. A customer looking to rebalance a steady state as opposed to one where a transformation is being undertaken will have quite different needs. In the latter case, it may well be a project as opposed to operational automation where flexible resources are to be utilised.

“For Rackspace Technology, our approach to providing skills to a customer is based around the concept of service blocks – flexible solutions that you can scale up or down depending on the needs and consumption.  

“These ‘blocks’ could include automation, be it at the IaaS level or further up the stack into PaaS for elements such as database performance management or even into cloud-native development. Where an organisation requires help to set up a DevOps practice or develop its first containerised application, it’s a people and tools solution. Each customer and organisation has different needs and there is no ‘one size fits all’ approach. The key is flexibility in provision through a blend of people and Robotic Process Automation.”

Justin Augat, Vice President of Marketing for iland: “As we have seen in 2020, it’s often the unplanned events that most challenge our preparation for change. This year, technology has played a role in helping us adapt to remote work and the ability to collaborate virtually. But just because technology has helped businesses adapt, it doesn’t mean that IT professionals haven’t had to overcome their own challenges. 

“The reality is that 2020 may also go down as the year in which the good and bad of IT have both been amplified. For example, IT organisations using cloud services before the pandemic were able to lean on their provider to support their changing business environment. However, organisations that managed their own infrastructure, and were burdened with a talent shortage prior to the lockdowns, likely saw that risk become more pronounced.

“Turning the corner into 2021, we may see pandemic-related issues subside, but we are not likely to see the IT talent shortage reverse. As such, there are a few IT strategies that companies can employ today to reduce the impact of a skilled resource leaving the organisation:

  • Map your IT needs and priorities to associated resources. The fact is, few companies have mapped out their IT needs, prioritised them and identified the resource(s) capable of managing them. It is often left to a small team to manage or team lead (who, if left would be the point of failure). Mapping the skills needed to manage IT will help reduce the time to respond should any critical IT personnel leave the organisation.
  • Automate where possible. By automating more (backup, disaster recovery, etc.) and using common management tools, you can spread the ability to manage any single application across a larger population, thereby reducing the risk of a skills shortage.
  • Move to the cloud. The cloud immediately reduces single points of failure, including skill gaps, within your organisation. Whether moving applications to the cloud for production, or for backup and recovery, cloud services will manage the underlying infrastructure to keep your business moving.
  • Managed service. An obvious conclusion for organisations that are concerned with skill shortages. Managed service offers are comprehensive management – end-to-end – eliminating the need for a physical resource to manage IT. More importantly, managed service providers (and cloud providers) constantly work economies of scale of talent in your favour. So, if a critical employee leaves a managed service provider, the customer isn’t even aware of it. 

“As organisations look to the future, technology will continue to improve how we work. Working now to reduce the risk and dependency on the resources capable of managing that technology, will best position organisations for the opportunity ahead!”


Browse our latest issue

Intelligent CIO North America

View Magazine Archive