According to Ernst and Young, $8.2 billion a year is lost to the marketing, advertising, and media industries through fraudulent impressions, infringed content, and malvertising.
The combination of fake news, trolls, bots and money laundering is skewing the value of information and could be hurting your business.
By using graph technology and the data you already have on hand, you can discover fraud through detectable patterns and stop their actions.
We collaborated with Sungpack Hong, Director of Research and Advanced Development at Oracle Labs to demonstrate five examples of real problems and how graph technology and data are being used to combat them.
But first, a refresher on graph technology.
With a graph technology, the basic premise is that you store, manage and query data in the form of a graph. Your entities become vertices (as illustrated by the red dots). Your relationships become edges (as represented by the red lines).
By analyzing these fine-grained relationships, you can use graph analysis to detect anomalies with queries and algorithms. We’ll talk about these anomalies later in the article.
The major benefit of graph databases is that they’re naturally indexed by relationships, which provides faster access to data (as compared with a relational database). You can also add data without doing a lot of modeling in advance. These features make graph technology particularly useful for anomaly detection—which is mainly what we’ll be covering in this article for our fraud detection use cases.
If you take a look at Gartner’s 5 Layers of Fraud Protection, you can see that they break the analysis to discover fraud into two categories:
It’s this second category based on connections, patterns, and behaviors that can really benefit from graph modeling and analysis.
Through connected analysis and graph technology, you would:
Now we’ll discuss examples of ways companies can apply this to solve real business problems.
In the world of social media, marketers want to see what they can discover from trends. For example:
Of course, all of this information can be incredibly valuable. At the same time, it can mean nothing if it’s all inaccurate and skewed by how much other companies are willing to pay for bots.
In this case, we worked with Oracle Marketing Cloud to ensure the information they’re delivering to advertisers is as accurate as possible. We sought to find the fake bot accounts that are distorting popularity.
As an example, there are bots that retweet certain target accounts to make them look more popular.
To determine which accounts are “real,” we created a graph between accounts with retweet counts as the edge weights to see how many times these accounts are retweeting their neighboring accounts. We found that the unnaturally popularized accounts exhibit different characteristics from naturally popular accounts.
Here is the pattern for a naturally popular account:
And here is the pattern for an unnaturally popular account:
When these accounts are all analyzed, there are certain accounts that have obviously unnatural deviation. And by using graphs and relationships, we can find even more bots by:
In this case, we used graph technology to identify sockpuppet accounts (online identity used for purposes of deception or in this case, different accounts posting the same set of messages) that were working to make certain topics or keywords look more important by making it seem as though they’re trending.
To discover the bots, we had to augment the graph from Use Case #1. Here we:
Because we found that the messages were always the same, we were able to take that and create subgraphs using those edges and apply a connected components algorithm.
As a result of all of the analysis that we ran on a small sampling, we discovered that what we thought were the most popular brands actually weren’t—our original list had been distorted by bots.
See the image below – the “new” most popular brands barely even appear on the “old” most popular brands list. But they are a much truer reflection of what’s actually popular. This is the information you need.
After one month, we revisited the identified bot accounts just to see what had happened to them. We discovered:
A common pattern in financial crimes, a circular money transfer essentially involves a criminal sending money to himself or herself—but hides it as a valid transfer between “normal” accounts. These “normal” accounts are actually fake accounts. They typically share certain information because they are generated from stolen identities (email addresses, addresses, etc.), and it’s this related information that makes graph analysis such a good fit to discover them.
For this use case, you can use graph representation by creating a graph from transitions between entities as well as entities that share some information, including the email addresses, passwords, addresses, and more. Once we create a graph out of it, all we have to do is write a simple query and run it to find all customers with accounts that have similar information, and of course who is sending money to each other.
Because Europe has so many borders with different rules about who pays tax to which country when products are crossing borders, VAT (Value Added Tax) fraud detection can get very complicated.
In most cases, the importer should pay the VAT and if the products are exported to other countries, the exporter should receive a refund. But when there are other companies in between, deliberately obfuscating the process, it can get very complicated. The importing company delays paying the tax for weeks and months. The companies in the middle are paper companies. Eventually, the importing company vanishes and that company doesn’t pay VAT but is still able to get payment from the exporting company.
This can be very difficult to decipher—but not with graph analysis. You can easily create a graph by transactions; who are the resellers and who is creating the companies?
In this real-life analysis, Oracle Practice Manager Wojciech Wcislo looked at the flow and how the flow works to identify suspicious companies. He then used an algorithm in Oracle Spatial and Graph to identify the middle man.
The graph view of VAT fraud detection:
A more complex view:
In that case, you would:
And you will discover common “Middle Man” nodes where the flows are aggregated
Conceptually, money laundering is pretty simple. Dirty money is passed around to blend it with legitimate funds and then turned into hard assets. This was the kind of process discovered in the Panama Papers analysis.
These tax evasion schemes often rely on false resellers and brokers who are able to apply for tax refunds to avoid payment.
But graphs and graph databases provide relationship models. They let you apply pattern recognition, classification, statistical analysis, and machine learning to these models, which enables more efficient analysis at scale against massive amounts of data.
In this use case, we’ll look more specifically at Case Correlation. In this case, whenever there are transactions that regulations dictate are suspicious, those transactions get a closer look from human investigators. The goal here is to avoid inspecting each individual activity separately but rather, group these suspicious activities together through pre-known connections.
To find these correlations through a graph-based approach, we implemented this flow through general graph machines, using pattern matching query (path finding) and connected component graph algorithm (with filters).
Through this method, this company didn’t have to create their own custom case correlation engine because they could use graph technology, which has improved flexibility. This flexibility is important because different countries have different rules.
In today’s world, the scammers are getting ever more inventive. But the technology is too. Graph technology is an excellent way to discover the truth in data, and it is a tool that’s rapidly becoming more popular. If you’d like to learn more, you can find white papers, software downloads, documentation and more on Oracle’s Big Data Spatial and Graph pages.
And if you're ready to get started with exploring your data now, we offer a free guided trial that enables you to build and experiment with your own data lake.