Known for big surf and occasional big earthquakes, Santa Cruz, California has also been in the news regarding big data. In fact, the police force has used predictive analytics to capture would-be thieves
. Two women were taken into custody after they were discovered peering into cars in a downtown parking garage. After further questioning, one was found to have outstanding warrants while the other was carrying illegal drugs.
The unique thing here is that the police officers were directed to the parking structure by a computer program that had predicted that car burglaries were especially likely there that day. This computer program, developed by PredPol, is based on models used for predicting aftershocks from earthquakes, a common occurrence here in California. The algorithms used generated projections about which areas and windows of time are at highest risk for future crimes.
The Innovative Hacker
Organizations struggle to mitigate threats due to the continuing evolution of hackers and their methods of attack. Since William T. Morris Jr. first introduced the infant internet to his Morris worm virus in 1988, organizations have been fighting tweakers, script kiddies, espionage, and organized crime. The problem is that every time a solution is advised, a new hack is created. It’s a never ending cycle, and unfortunately, the turnaround time for hackers is getting shorter and shorter. They are innovating and sharing their innovations with others, who in turn take advantage and increase the number of effective attacks.
According the 2015 Verizon Data Breach Investigations Report, with over 80,000 incidents examined, hackers have become more inventive, thinking up new tactics to evade defenses. “I hate to admit defeat, says Jay Jacobs, co-author of the report, but there does seem to be an advantage to the attackers right now.” (Source: Financial Times
access for a fee).
Learning from the Past
By analyzing and detecting patterns in years of past crime data, the Santa Cruz police department, were able to determine hot spots of potential crime. In fact, on the day the two women were arrested, the program had identified the approximately one-square-block area where the parking garage is situated as one of the highest-risk locations for car burglaries.
According to the RAND Corporation's “Predictive Policing" study, there is strong evidence to support the theory that crime is statistically predictable. That’s because criminals tend to operate in their comfort zone. They commit the type of crimes that they’ve committed successfully in the past, generally close to the same time, location and methods.
There is a connection between physical crime and the cybercrime organizations face today. To explain this connection further, the RAND Corporation found that prediction-led policing is not just about making predictions; "but it is a comprehensive business process, of which predictive policing is a part.” That process is summarized here in order to explain the steps taken to analyze past information in order to prevent further criminal activity.
First, the police force collected and analyzed previous crime, incident, and offender data in order to produce predictions. These predictions uncovered hotspots. Next, data from multiple and disparate sources in the community gets combined together, often using Big Data environments to quickly process terabytes of data. This data helps inform police where hotpots of potential crime will break out based on time of day, weather, recent criminal activity and more. Using the predictions helps to inform how they will respond to a potential incident. Criminals will then react to the changed environment: either they will be removed, or those still operating in the area may change their practices or move to a different area. Regardless of the response, the environment has been altered, the initial data will be out of date, and new data will need to be collected for analysis.
The Importance of Acquiring Good, Clean Data
This entire process hinges on the collection of data and the importance of that data to make predictions.
Organizations today have the data necessary to make these types of predictions. In fact, our systems are churning out this data all the time through system server logs, database audits, event logs and more. If crime is statistically predictable, and we have all evidence right there in front of us, then we need to collect and analyze it.
Of course, the future of predictive analytics and machine learning is much more than analyzing audit and log data and monitoring our databases, however, these two critical practices are important first steps to a comprehensive cybersecurity program.
The recent 2015 Verizon Data Breach Investigations Report highlights that once you have the data you need, analysis is performed using inferred or computed elements of the data. In order to mitigate data breaches, they suggest looking for anomalies within the following:
- Volume or amount of content transfer, such as e-mail attachments or uploads
- Resource access patterns, such as logins or data repository touches
- Time-based activity patterns, such as daily and weekly habits
- Indications of job contribution, such as the amount of source code checked in by developers
- Time spent in activities indicative of job satisfaction or discontent
Despite that this data is all around us, the tough part is how to effectively and efficiently collect all of this data--securely--and make sense of it to predict and prescribe future actions and prevent the next data breach.