An Oracle blog about Education and Research

Big Data - How can we succeed?

David Ebert
Director – Public Sector, Education, Healthcare - Industry Solutions (EMEA)

Data. It’s big. It’s getting bigger. In fact, it’s growing exponentially. It’s produced by more and more people. It’s created by an increasing number of things – commonly called devices. It is becoming more varied and more unstructured. About 5 years ago someone said 90% of the world’s data had been generated in the previous 2 years. That’s astonishing.

So, how do we keep-up? Can we indeed keep-up? Can we make use of this data? Are we deriving insights from the data? I have lots of questions and I cannot profess to have all of the answers. Nevertheless, recent advances in technology appear to be paving the way.

Big Data in Research

Data is critical to all organisations, but to varying degrees. Think about the research industry though. The ability to efficiently collect, manage and analyse data is critical. Accelerating scientific discovery is not only dependant on researchers with brains the size-of-planets. There is also a clear dependency on enabling researchers to easily and efficiently access all required technology tools and infrastructure.

In my previous role, I worked at one of the world’s leading research intensive universities - Imperial College London. There I had the privilege of interacting with many researchers. Before this I had some semblance of the technology researchers need, but the reality was far greater. Every academic department and research group had significantly more data and more technology needs than I envisaged. I knew that would be the case in Medicine, with techniques such as genome sequencing producing an eye-watering amount of data. However, I also saw this across all of the engineering disciplines from aeronautics to mechanical, the Natural Sciences disciplines and even the Business School.

Big Data at CERN

In the research world, big data really is big; none more so than at CERN. The volume of data produced and the scale of analysis at CERN’s Large Hadron Collider (LHC) are astounding. 40 petabytes was produced by the LHC experiments in 2017. Within the LHC, there are up to 1 billion particle collisions per second (yes, per second!) that are filtered down to approximately 100,000 and sent for digital reconstruction. More detailed algorithms whittle this down to around 100 ‘events of interest’ per second. Now that’s big data discovery and analysis in action. When people ask which customers use Oracle Big Data technology, I always cite CERN and not surprisingly, no further examples are requested. 


Storing Data

Storing research data is a major challenge, particularly as the capacity needs fluctuate considerably over the lifetime of the project, and beyond. Yes, storage costs are declining but the volume of data is anything-but. Managing a datacentre must be a headache for organisations now, but imagine the migraine in 5-10 years’ time. Cloud storage is the only way to go, surely. Yes, there will be some data that organisations will not, under any circumstances, allow to be stored off-site. Yes, there are data-residency regulations requiring that all data be stored within the country, or region. However, cloud companies like Oracle are providing the flexibility to meet these requirements. For example:

Deriving Insights

Storing the data is one thing, but data is only helpful if you can effectively derive insights from it. With all this data, how do we go about making sense of all the data and finding out what’s useful. After-all, big decisions and big discoveries are derived from data.

Humans can’t keep-up, so various technologies play a key role; including machine learning (ML) and artificial intelligence (AI). These tools enable insights to be derived automatically. Data scientists fine-tune algorithms to optimise the analysis and insights derived from data. Nowadays though, ML and AI help them focus their time and expertise on specific data and insights that really accelerate discoveries.

Advances in autonomous cloud services are a giant leap-forward. Oracle is applying AI and ML to its entire next-generation Cloud Platform services. Oracle’s Autonomous Database was introduced at OpenWorld. However Oracle recently announced autonomous capabilities for many other Cloud Platform services – including analytics - are scheduled to be available in the next 12 months. Self-driving, self-securing and self-repairing.

Surviving and Thriving

So, the 5 V’s of data - volume, variety, velocity, veracity and value – are on the increase, big time.

  • More and more devices are doing more and more things.

  • It’s becoming easier to create and publish content.

  • Discovery and innovation are critical in a world with so many threats and opportunities.

I could continue this list, but I’m sure you get the picture, so let me stop there. After all, I don’t want to be guilty of creating more data than is necessary!

So, data is here to stay and set to increase; what we do with it, is critical. Cloud technologies for storage, compute and analysis are paramount. Furthermore, if these technologies are embedded with machine learning and artificial intelligence; the greater our chances of surviving and indeed thriving, in the deluge.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.