Article by: Dr. Jabe Wilson, Consulting Director of Text and Data Analytics at Elsevier
The idea of programming and data science as an art is not new. The ideology can be traced back to the founding of computer science as a discipline. It is mentioned in the 1959 statement of purposes at the founding of the Communication of the ACM and was put forward as a positive characteristic in a famous article from 1974. The article supported the idea, ‘because it applies accumulated knowledge to the world, because it requires skill and ingenuity, and especially because it produces objects of beauty’. This way of thinking has become increasingly important in recent years as AI has grown in influence.
In a talk I recently gave at Bio-IT World, I shared some lessons from the experience of data scientists at Elsevier – specifically, why it is an art and how to support the data scientists, which then enables us to better support customers of Elsevier. The underlying message is that we must support programming and data science as it is experienced rather than with an idealised view of the practice.
Growing from experience
The thinking behind data science being seen as an art form echoes my own experiences growing up in a household of artists and programming from an early age. The art-based approach is where you understand a problem more fully over time by progressively working on alternative solutions which lead to a deeper understanding of the problem itself. Watching an artist develop a painting over several weeks and seeing the canvas change as they work towards their vision has many parallels to trying out lines of code and testing different algorithms. I studied artificial intelligence (AI) at the multidisciplinary Centre for Cognitive Sciences at the University of Sussex and taught interaction design at the Royal College of Art, so I can see the strong association between the creative arts, programming and data science.
During the presentation, I discussed the reasoning behind data science being described as an art, because of the need to adopt an exploratory workflow and what the significant challenges data scientists face as they work. These include:
- Firstly, clearly defining a problem that may initially be ill-defined
- Identifying and working on preparing the relevant data (described as data curation and feature engineering)
- Choosing the algorithmic approach to take
- Adjusting these elements based on your experience of running your system
Supporting data science using three key principles
While these challenges may seem overwhelming, if data scientists can follow these next three key principles, they will be able to overcome obstacles they encounter:
Good data: Cleaned and curated to remove noise, including curation and feature engineering such as scaling or reducing dimensionality. Companies should spend a great deal of effort on data curation to make the lives of data scientists easier.
Right data: To have enough of the relevant data for a hypothesis to be able to build a predictive model. Problem description is important (defining the hypothesis or model you want to explore) as is the choice of algorithmic approach (are you choosing Naïve Bayes, support vector machines, or logistic regression?).
In-time data: Avoiding waiting hours between process steps. Create a platform that brings this together, so you have the information at your fingertips when you need it.
It was recently said in Harvard Business Review that, “if your data is bad, your machine learning tools are useless.” When you see examples of data science failing, such as specialist cancer centre MD Anderson’s use of IBM Watson, the reasoning often comes down to failings across these three principles. In MD Anderson’s case, the cancer centre placed the project on hold after issues with data fed into Watson meant that it failed to expedite clinical decision-making and match patients to clinical trials.
How does data science work in practice?
For many industries, data science is used extensively in operations. Whether in the retail sector, to personalise the customer’s experience, or in the life sciences to aid research processes when looking for a cure to the likes of Alzheimer’s disease. Whatever its use, creative data science is delivered through a multidisciplinary work place – which has domain experts, specialists in creating taxonomies and ontologies, and the experts that focus on how best to apply the right algorithms that make up AI approaches. Some good examples of how this is done can be seen in the following examples:
Rare disease treatment: Taking highly curated data enabling predictions about which drugs can be repurposed to treat rare diseases.
Translational safety: Determine where animal testing can be avoided in research and development when testing drug toxicity.
Evidence selection: Applying neural networks to identify complex evidence-based statements to choose the right data for building data science models.
Real-world data interpretation: Bringing together machine learning classification of text with taxonomies and alongside images, to deliver learning across multimodal data sets. These approaches create opportunities to develop classification of data sources with unstructured text and unlabelled images from real-world data, such as patient records.
Having the right skills in place
These examples show the great promise data science holds if data is used in the right way. Data science is also fundamental to successful AI projects – which can help overcome many of the challenges we face as a society, from finding new drugs, to finding alternatives to plastic, to modelling new methods of carbon capture. However, while AI has great potential, it’s not simply a case of ‘plug and play.’ The use of AI in the likes of healthcare will necessitate purpose-built platforms that are not only technologically advanced but scientifically nuanced. For the technology to reach its potential, it will require huge volumes of accurate, varied, multidisciplinary data, along with many years of training and algorithm-building by human ‘masters.’ As industries move past the trial and error stage and work out how AI will benefit them, they must look to tools which support the data and hire professionals who can understand how data should be handled.
Machines will need the input of human creativity
Finally, it is important to reiterate that the human element to data science and AI is essential to positive outcomes. Human researchers are more efficient when they are augmented by AI, not replaced by them – together, they are more effective than advanced AI systems working without human guidance. By combing the power of human creativity with AI, we will be able to achieve more than ever before. But it needs to be remembered that the very act of creating AI systems through data science is creative in itself. Therefore, businesses must support data scientists in the development of AI by fostering creativity in data science and enabling humans to work with these systems to augment their human intelligence to solve problems.