Data is dominating and determining scientific developments which are taking place today. Here, James McCafferty, CIO; and Pete Clapham, ISG Team Leader at the Sanger Institute, tell us how the organisation’s partnership with Dell Technologies has enabled it to address and overcome some of the most difficult challenges in genomic research, whilst Richard Rawcliffe, VP & GM, UK Public Sector at Dell, discusses how it develops solutions that are tailored to the customer’s needs.
The Wellcome Sanger Institute (‘Sanger Institute’) is a world-leading genomics research centre whose ambition demands science at scale and a visionary and creative approach to research. With sequencing hundreds of thousands of genomes, the Sanger Institute requires mass data storage and a partner that has the technology capabilities to help its scientists understand that data. Without data, none of the Institute’s ground-breaking research can happen. Without data storage and analytics, the data becomes useless.
With scale-out storage, the organisation was able to scale and upgrade its storage without sacrificing simplicity. This enabled the Institute to increase the scale of its sequencing without compromising any of its scientists’ time to manage its increased data store.
By sequencing tens of thousands of COVID-19 genomes each week, the organisation has been at the heart of biomedical research throughout the pandemic. The Sanger Institute addresses some of the most difficult challenges in genomic research. Its findings, generated through its own research programmes and a leading role in international consortia, are being used to develop new diagnostics and treatments for human disease.
James McCafferty, CIO at the Sanger Institute; Pete Clapham, ISG Team Leader, Sanger Institute; and Richard Rawcliffe, VP & GM, UK Public Sector at Dell, offer their input and discuss how Dell has been instrumental in helping the Sanger Institute to overcome its business challenges.
What were you aiming to achieve ahead of your work with Dell Technologies?
Clapham: From a compute world perspective, our main objective working with Dell Technologies was to form a partnership. We need to work in partnership with a vendor so they can deliver not just the hardware that we need to perform today’s compute tasks, but to also help us stay abreast of the latest product developments and adaptations for the future. We need to ensure we are performing world-class at-scale scientific research and doing it better than anyone else in the world. We want to become a world leader in this field. As such, we need fellow world leaders like Dell Technologies to help us deliver in this rapidly changing data world in areas such as spatial informatics and other developing areas that are emerging from the scene.
How does Dell Technologies enable you to continually address some of the most difficult challenges in genomic research?
McCafferty: One of the most difficult challenges within genomic research currently is data explosion. There are many new ways of extracting data about the world of genes and we’re also now doing much more in the way of imaging within cells, so that we can extract information such as the type of cell, what the cell is doing within the environment in which it lives, whether it’s growing, shrinking, or dying etc. Therefore, huge amounts of data are coming through in lots of different modalities. We must work out how to get value from all that data through analytics which is an even bigger challenge. The Sanger Institute is geared up for doing science at scale, so there are huge amounts of data being generated. How we bring all this together and make sense of it for scientists is a challenge and this is where Dell is helping us.
Similarly, we need to establish end-to-end processes to support our science: from biological samples to knowledge and insights. The process usually starts with samples coming to the institute, where we carry out much of the biochemical work and the sequencing to generate data, which we then try to make sense of. Being able to manage those pipelines at scale is a challenge.
Clapham: The data our sequencing teams provide is often the tip of the iceberg. With the availability of ever greater datasets, there are new opportunities for downstream research. This data explosion and Sanger’s onsite and worldwide collaborations, with projects such as the Tree of Life, means we need to engage closely with our scientific partners to ensure that we continue to deliver high-performance compute (HPC) solutions that our research teams need both now and in the future.
Rawcliffe: The partnership with the Sanger Institute is more of an ecosystem. While we design specific platforms and infrastructures with genomics in mind, ultimately we’re not the experts in the field of genomics so we need the input from partners like the Sanger Institute to help inform our product development and ensure it keeps pace with market requirements.
Clapham: We have seen real value from working with Dell’s development teams behind the scenes and swapping knowledge on how to appropriately determine an effective way forward for the technology and for Sanger’s informatics developing requirements. It’s essential that we can work with our partners, engaging with them early to overcome barriers and assess the opportunities that we as a team see coming. Science is moving forward and changing at a rapid pace and we are generating huge amounts of data. We need to keep up with delivering value from it, including new discoveries that we can then translate into human health benefits.
Why did you decide to work with Dell on this occasion?
Clapham: As part of our continued drive to get best value for the Institute, we perform regular requests for proposals from vendors. We look for the best fit technically for our needs at the time, how our benchmarks perform, what fits our infrastructure and power requirements, what the CO2 footprint is going to be and so on. We also look at the value add in terms of management, maintenance, licensing and contracts, as well as adapting to and coping with unforeseen changes. As a research institute, we need to be able to change direction quickly and we have developed an infrastructure that can manage this. We also like to know that we can speak early and often to our partners and be engaged with them continuously so that we know what’s coming down the line. Dell Technologies can offer us this value add and we’ve found the partnership extremely beneficial.
How does data and storage play a part in enabling the Institute to carry out ground-breaking research?
McCafferty: New and different datasets, algorithms and techniques are powering the science that we do today.
Clapham: Making sure that the data is quickly accessible is essential. Due to our scale, some areas of our storage portfolio will need to be very cost-effective while other areas need to be more performance-led so that large-scale analysis can be performed effectively, often against tight deadlines. As technologies and research areas develop, we need to ensure that we can continue to deliver effective storage solutions that scale up and out and remain manageable as well as the right fit for our environment.
How does Dell Technologies accelerate discovery and innovation with its HPC solutions?
McCafferty: A key objective is reducing the time to science. HPC solutions can hugely accelerate much of the computation we need to do. For example, we carry out work to identify mutations that cause disease and part of that is to isolate genes to check their roles in genetic disorders. This informatics would have previously taken many days to calculate, but by adopting HPC solutions we are now able to calculate the same information within a matter of hours. If we can reduce the time on these activities, we are able to move the science on faster and our scientists can achieve so much more in the same timeframe. Adopting HPC also puts the power into scientist’s hands – there are various user-friendly tools which enable our scientists to perform advanced interactive analytics, which is a huge step forward for innovation and discovery. This is all powered by high-performance computing.
How has the Dell PowerScale solution meant the Institute has been able to scale and upgrade its storage without sacrificing simplicity and what are the advantages of this?
Clapham: We’ve been using PowerScale heavily here at Sanger. It provides a resilient platform that we depend upon for our core cluster components, including the core installation and logging areas. If these stop, then our clusters stop. We are dependent on this enterprise storage platform.
Similarly, we use this storage platform to support our Institute’s new home directories, our software storage service areas – which hold core applications for our teams – as well as our longer-term unstructured datasets. PowerScale is being used both as a high-performance frontend and for those core interactive service areas, with the HSM integration of a cost-effective Dell-ECS based S3 backend for less frequently accessed datasets. With this, we can further manage our costs into the future.
Can you outline your plans for the future in regard to your partnership with Dell?
Clapham: We’ve written and shared with Dell a set of benchmarks that are based on core informatics workloads that run on our HPC systems. This ensures that we can both stay abreast of what is the best fit for our use now, and by sharing the benchmark suites we are able to engage with Dell more closely. Partnerships of this type are key to our ability to adapt and shape our HPC platforms to enable research processes at scale.
We’re looking forward to continued engagement with Dell and are already seeing huge benefits from our current engagements.
Rawcliffe: We’re learning so much from our customers. We’re taking the feedback and developing solutions in line with that feedback and in line with those outcomes that customer hopes to achieve. We take the experience and the expertise of the Sanger Institute and work with it to develop solutions that fits these needs.