By Mark Hornick on Jan 19, 2015
So far in this series on Addressing Analytic Pain Points, I’ve focused on the issues of data access, performance, scalability, application complexity, and production deployment. However, there are also fundamental needs for enterprise advanced analytics solutions that revolve around data security, backup, and recovery.
Traditional non-database analytics tools typically rely on flat files. If data originated in an RDBMS, that data must first be extracted. Once extracted, who has access to these flat files? Who is using this data and when? What operations are being performed? Security needs for data may be somewhat obvious, but what about the predictive models themselves? In some sense, these may be more valuable than the raw data since these models contain patterns and insights that help make the enterprise competitive, if not the dominant player. Are these models secure? Do we know who is using them, when, and with what operations? In short, what audit capabilities are available?
While security is a hot topic for most enterprises, it is essential to have a well-defined backup process in place. Enterprises normally have well-established database backup procedures that database administrators (DBAs) rigorously follow. If data and models are stored in flat files, perhaps in a distributed environment, one must ask what procedures exist and with what guarantees. Are the data files taxing file system backup mechanisms already in place – or not being backed up at all?
On the other hand, recovery involves using those backups to restore the database to a consistent state, reapplying any changes since the last backup. Again, enterprises normally have well-established database recovery procedures that are used by DBAs. If separate backup and recovery mechanisms are used for data, models, and scores, it may be difficult, if not impossible, to reconstruct a consistent view of an application or system that uses advanced analytics. If separate mechanisms are in place, they are likely more complex than necessary.
For Oracle Advanced Analytics (OAA), data is secured via Oracle Database, which wins security awards and is highly regarded for its ability to provide secure data for confidentiality, integrity, availability, authentication, authorization, and non-repudiation. Oracle Database logs and monitors user activity. Users can work independently or jointly in a shared environment with data access controlled by standard database privileges. The data itself can be encrypted and data redaction is supported.
OAA models are secured in one of two ways: (i) models produced in the kernel of the database are treated as first-class database objects with corresponding access privileges (create, update, delete, execute), and (ii) models produced through the R interface can be stored in the R datastore, which exists as a database table in the user's schema with its own access privileges. In either case, users must log into their Oracle Database schema/account, which provides the needed degree of confidentiality, integrity, availability, authentication, authorization, and non-repudiation.
Enterprise Oracle DBAs already follow rigorous backup and recovery procedures. The ability to reuse these procedures in conjunction with advanced analytics solutions is a major simplification and helps to ensure the integrity of data, models, and results.