Tuesday Feb 26, 2013

Big Data - The future is sooner and stranger than you think

The future is sooner and stranger than you think” – Reid Hoffman

We are now at the cusp of the real Digital Age, an age where the network – the Internet today – becomes an organic entity. Various forces are in play that are accelerating the growth of the network – Metcalfe’s Law (economic value), Gilder’s Law (growth in connectivity) and Moore’s Law (growth in processing power) are intersecting to drive rapid connectivity to the network. A common language (TCP/IP), convergence of communications on the Internet and a continuous drive to decomposition and distribution of state and function on the network are making this growth seamless.

Objects are joining the network every second, and each of these objects is generating a digital footprint. These objects are continuously connecting to each other, creating immense mesh graphs of relationships. These relationships are enhancing the digital footprints of the connected objects, growing it at an accelerated pace.

If we look at this network from a consumer-supplier perspective, our current thinking puts the individual consumer at the center of these relationships, the one with the “power” to create (or sever) these relationships. Thus one might view this immense graph as a mesh of star topologies, where each star is centered around an individual. A very different world from even 10 years ago, where the individual was likely a node in a lightly connected mesh.

Lets illustrate this with a retail banking example – 10 years ago a person would likely have had a long term relationship with a bank, where the bank would have dictated the terms for their current account, savings account and mortgage. This person might have had another mature relationship with a discount brokerage, where the brokerage would have controlled the fee structure, the margin requirements and CD rates. The suppliers were at the center of these relationships. Fast forward to today – this individual now likely has multiple transient relationships with a number of banks, a current account at one that charges no fees, savings accounts with banks that offer the highest interest, and a mortgage with a bank offering the lowest interest. The customer is now at the center of the star, with the suppliers being transient nodes.

This change in the power structure is exacerbated by new entrants into the market – institutions that are built around a complete digital footprint. These institutions have raised the expectations of the individual consumer. The consumer now expects to have real-time responses to their queries, and expects to have this discussion in public. The consumer now expects to have full transparency about the product and/or service that the suppliers are offering.

While established suppliers might view this new digital age with uncertainty, this age brings a brand new way of communicating with the consumer, finding out their preferences, offering goods and services supremely optimized to their needs. It brings about new ways of creating efficiencies in existing processes, but more importantly it brings about the means for us to do things that seemed improbable 10 years ago.

The DNA of the digital age is the Data that is created by the digital footprint of the participants. As more objects get digitized and join the network, new data is generated. Think about the requirements being mandated by the SEC and the CFTC to execute and clear swaps electronically. These are new sources of electronic data. The current nodes in the network are generating more new data continuously. Think about market data being generated at the various execution venues, it has been growing at an exponential rate since the cash equities market became electronic. Think about Twitter, Facebook, GPS and RFID enabled devices, SmugMug, Yahoo!, LinkedIn and the blogosphere, data from these sources is being added every second. Current estimates put data generation at 5 trillion bits per second.

This data is the new frontier; it has been called the new Oil. And just like oil, this data needs to be ingested, manipulated, aggregated, cleansed, analyzed and likely persisted. And just like oil, various products can be derived from this data as it goes through the various phases of its refinement. Mid and back-office data management functions are now rapidly evolving to profit centers from traditional cost centers.

Clichés abound about the characteristics and lifecycle of this data – the 4 Vs (Volume, Velocity, Variety, Value), the 5 Ms (Monitoring, Mapping, Management, Middleware, Measurement). Everyone however does agree that we are moving from a world where the business would typically dictate their requirements to IT who would then figure out which data to persist and in which format, to a world where IT now has the ability to store all the data and allow the business to ask random questions later. Advances in storage technologies have made storage nearly free. Advances in compute technologies have enabled cheap powerful processing power to be made available for anyone with a credit card.

So what can we do with this data? We can address today’s problems better – we can report to the regulator faster, with more data supporting each report. We can mine this data to find current trends in the market. But more importantly, if we can successfully merge the various streams of data from different sources, we can attempt to predict the future! Facebook has more knowledge about a consumer than any existing supplier today. Consumers, by choice, share their feelings, changes in their life status, their likes and dislikes, their travel plans, their pictures with tags, their current location – essentially an unlimited amount of information. This when combined with already existing data about that consumer in traditional stores, is a literal gold mine. This is a lot more than targeted advertisements, this enables a business to ask arbitrary questions about a consumer and expect a reliable answer in near realtime. Massive personalization is just the tip of this spear.

We in Oracle’s Financial Services Analytical Applications organization are looking to merge the traditional data with the new data to be able to solve problems like Fraud, Customer Analytics, Relationship Pricing, Realtime offers and Enhanced Risk Management. We have a significantly large portfolio of existing applications that address the analytics needs of banks and insurance firms which we are enhancing to take advantage of the new data and improvements in compute, storage and networking technologies.

Without changing our pattern of thought, we will not be able to solve the problems we created with our current patterns of thought” – Albert Einstein

Wednesday Jul 13, 2011

"Large scale data management" for Financial Services

The problem of managing large amounts of data - structured data for this writeup - is pervasive in the financial industry. Compliance, Risk, Analytics, Pricing etc. all require ingesting, cleansing, transforming, standardizing, aggregating, persisting, analyzing and reporting on very large quantities of data. Given Oracle's pedigree in data management, I don't think it would be a surprise to you that we have a large set of technologies that help our FSI customers with their data management issues. We are also taking these technologies to our partners and helping them achieve enterprise class scale and reliability for their applications using this "large scale data management" platform.

The following diagram shows the relevant technologies that collectively we call the LSDM platform.

The baseline is Exadata or large SPARC systems. Data movement technologies, GoldenGate and ODI are the layer above - the combination allowing users to move data from a database instance to another instance in near realtime (change data capture only) and manipulating it at the destination. The destination instance is 11g, and the Spatial option brings in Semantics into the equation. Semantic data stores are starting to become popular in the FSI, since modeling of complex and continuously morphing relationships is easier using semantics than relational databases. For an industry that builds products that are so intricate that most personnel - and machines - have no clue about the component parts of these products, semantics will likely be mandated by the regulators.

Above the database are the in-memory data technologies that can be used in a variety of ways – Coherence as the in-memory data grid, and TimesTen as the in-memory database. These technologies are essentially performance related technologies. Think of Coherence as a large data cache where the data is dynamically provisioned across memory resources of various servers, and compute can be shipped to these servers – moving the compute to the data which is typically faster than the other way around. TimesTen is a in-memory SQL database, which can be linked with the compute making the two be a part of the same address space, again accelerating performance.

And the BI layer sits above all these technologies, helping with analytics and reporting.

We are positioning this stack with our customers and our ISV partners, in the reference data and risk space. Both these areas need large volumes of data to be processed, and the ISVs that we have spoken with are excited about working with us.



Ambreesh Khanna's Weblog


« July 2016