All Data is not equal

Sue Daley
Associate Director, Technology & Innovation, techUK

Different people need and work with different data. Having an understanding of the different datasets that exist across an organisation and the characteristics and quality of the data that could be used to train and run AI systems is key, not just to unlocking the insights and power from data that exists, but also to realising the full potential of the latest data-driven technologies.


A machine learning algorithm is in essence a piece of a software code and set of instructions for a computer or machine to follow to achieve a particular result.  It is only when datasets are applied to that algorithm that AI systems can be trained to find patterns, learn from experience and unlock key insights from vast amounts of data. The more data that can be applied, the more it can learn and provide answers. Sounds easy, right?

Yet, for those looking to adopt and deploy an AI-driven solution, there are three fundamental questions that must first be answered; what is the business objective you want to solve, is your data in a digital format and ready for AI, and do you trust the quality of the data?

Just as with any adoption of digital technologies, there must be a clear business reason for using any technology tool or solution. For example, using complex machine learning to find insights that could have been achieved using tools found in an excel spreadsheet may not be a good use of resources, and could, therefore, put the business’ willingness to invest in AI innovations further down the road.

If a clear business objective is identified, the next step is understanding whether the necessary data exists in a digital form and where the datasets needed to train AI systems reside within the organisation. Knowing where the data has come from and its integrity and security during its lifecycle will also be key.

Finally, if an organisation is to trust the results generated from an AI system, the quality of the data used to train the system must also be trusted. For example, ensuring biases that may exist in historical or legacy datasets are identified and removed before the data is used to train AI systems is vital. An AI-driven HR system based on historical data could result in hiring recommendations that may lead to a ‘group think’ mentality which could have a long-term effect on an organisation’s market success.

Taking the time now to consider the type and format of datasets that may be used to train AI systems, and having the right approach to identify, assess and address bias, so that it does become a case of ‘good data in, good data out’ is the only way any organisation will become  truly AI ready.


Read more about our report on Data security here.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.