Learn how businesses like yours can begin to optimize for today and plan for tomorrow with Cloud-Ready IT Infrastructure

CERN Big Data Exploration for FCC

Guest Author

Today's guest blog is from Johan Louwers, Oracle ACE Director in the field of Oracle Linux at Capgemini.

When we talk about big data, enterprises are commonly the first thing that comes to mind. But what about the scientific research field? It seems to come in a distant second when thinking about big data. An improbable thought when we consider that one of the biggest data creators in the world is CERN and, more specifically, the Large Hadron Collider project. The Large Hadron Collider provides a staggering amount of data (close to 100 terabytes of data a day) which needs to be analyzed by a multitude of different researchers (watch this video to learn about the success CERN has already achieved with Oracle Big Data Discovery).

As a next step, CERN is working on another project named Future Circular Collider. The Future Circular Collider (FCC) study aims to develop a conceptual design for a particle accelerator infrastructure in a global context, with an energy significantly above that of previous circular colliders (SPS, Tevatron, LHC). It will explore the feasibility of different particle collider scenarios with the aim of significantly expanding the current energy and luminosity frontiers. It also aims to complement existing technical designs for linear electron/positron colliders (ILC and CLIC). In short, this is a big project that carries with it big implications and even bigger sets of data.

For the FCC study, CERN scientists and researchers are looking at current implementation methods used in other collider projects as a basis. Part of the study is to analyze LHC data to improve the current LHC project and to use it in this FCC project. They aim to study:

  • Reliability, Availability, Maintainability and Safety (RAMS) studies for the Future Circular Collider (FCC)
  • (And increase) the reliability and availability of the LHC
  • RAMS findings and use it to assess the feasibility of the needs of FCC
  • Data distributed across multiple sources
  • Operations e-logbook
  • Accelerator Fault Tracking project
  • Accelerator logging service
  • Accelerator schedules
  • Cryogenics
  • And more…Vacuum, Power Converters, etc.

In addition, part of the FCC study focuses on:

  • Predictive maintenance and system optimization
  • Data extraction, transformation and loading (ETL)
  • Data Visualization and Discovery

cern oracle exalytics big data

To ensure this study is done correctly, CERN has implemented Oracle Exalytics and Oracle Big Data Discovery in combination with a Cloudera CDH 5.5.1 cluster (Cloudera has been a close partner with Oracle for several years and is the main provider of software for the Oracle Big Data Appliance). It's important for CERN to be able analyze all data that is collected so their researchers have an accurate and holistic view of their systems. The deployment is shown below (in a high level diagram) as it is been implemented at CERN. This configuration allows CERN's researchers to work with an interactive catalog of all data, assessing attribute statistics, data quality, and outliers, all while simultaneously conducting quick data exploration or creating dashboards and applications on the fly. By analyzing the data collected from the control and monitoring systems used within the LHC project, CERN hopes to achieve a situation in which these systems will become intelligent, predictive, and proactive based upon massive big-data analysis. 

oracle big data cern hadoop cloudera setup

This setup shows how CERN is using the combination of a Cloudera solution and Oracle Exalytics for big data discovery. The same setup can apply for other solutions and industries that deal with large amounts of data. In a situation where you need to develop a solution that can provide the ability to discover information in a large set of data, the above setup would be a great starting point for evaluation.

We're excited to welcome Oracle experts to share their views and perspectives on today's trends on this blog. Johan is an Oracle ACE Director in the field of Oracle Linux and systems. Specialized in Oracle technology, infrastructure solutions and cloud computing, Johan is providing active advice and support to enterprises around the globe on both an architectural as hands-on technical level. He is currently leading the Global Oracle Architect office as the Capgemini Chief Architect. Next to this he is a strong promoter of Open Source technology for implementing disruptive and cutting edge technology solutions within enterprises. You can follow Johan on Twitter @johanlouwersLinkedIn, and his blog at johanlouwers.blogspot.com.

Join the discussion

Comments ( 1 )
  • srinivasa reddy Wednesday, July 13, 2016
    I want to know more about. witch reporting tool is faster handle the big data data or terabytes of data for analysis. what is the primary skill required for that analytics tool.
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.