Network Traffic Monitoring using Oracle VCN Flow Logs

Oracle Cloud Infrastructure (OCI) allows you to create virtual networks, called Virtual Cloud Networks (VCN). VCNs are useful to isolate your services and to have fine-grained control over the communication between services.

It is useful to monitor what goes on in your VCN: A private server in the network may be accessible from the internet due to a misconfiguration. You may have few clients that are downloading excess amount of data and causing network congestion. You may want to know how your firewall rules are affecting the traffic

OCI Log Analytics (LA) allows you to easily ingest and analyze the network logs, to identify interesting patterns and gain insights. We will look at what Flow Logs are, what information they contain, and how to enable analysis of these logs.

What are VCN Flow Logs?

Each instance in an OCI Virtual Cloud Network (VCN) has one or more Virtual Network Interface Cards (VNICs) for communication within and outside of the VCN. OCI Networking uses security lists to determine what traffic is allowed in and out of a given VNIC. A VNIC is subject to all the rules in all the security lists and network security groups associated with the VNIC’s subnet.

You can enable logging to capture this information. The VCN Flow logs record details about the traffic that has been accepted or rejected based on the security list or network security group rules

A flow log record is a space-separated string that has the following format:

<version><srcaddr> <dstaddr> <srcport> <dstport> <protocol> <packets> <bytes> <start_time> <end_time> <action> <status>

For example:

2 172.16.2.139 172.16.1.107 73 89 11 102 349 1557424462 1557424510 ALLOW OK
2 172.16.2.145 172.16.2.179 82 64 13 112 441 1557424462 1557424486 REJECT OK

Ingest and View Flow Logs

LA ships with an out-of-the-box Log Source for the VCN Flow Logs. Once you have enabled the Flow Logs, you can ingest them into LA. Follow the steps in this article on how to ingest OCI VCN Flow Logs into Log Analytics.

Once you have ingested the VCN Flow Logs, you can use the Log Analytics Explorer UI to view the log records. You can do this by switching to the Records with Histogram visualization to view the raw logs. Each record represents a transfer in the network, and contains details such as the Source and Destination IP Addresses, and the bytes transferred between the end points.

Flow Logs - Records View — Figure 1: Flow Logs – Records View

Summarizing the Network Traffic

The Records Visualization shows you the raw logs and the parsed fields. Our example data set has over 700k records. It is easier to analyze these logs if we can first aggregate the records, and reduce them to fewer aggregate values. The Link feature comes in handy to perform this aggregation and then to analyze these aggregated records using out-of-the-box applied machine learning.

Here are some of the questions we will try to answer by analyzing these 700k log records:

What are the top applications or end points in our network?
What is the trend of data transfer between these end points?
Is there any unusual data transfers between any of these end points?

Mapping Ports to Names

The Flow logs contain the Source and Port Numbers. We can map the ports to service names to improve the readability of the data. As a first step, we will list down the known ports and their corresponding names.

The Internet Assigned Numbers Authority (IANA) maintains a registry of registered ports and services. We will pick up few of the standard ports and names from this registry.

Port	Name
21	FTP
22	SSH
80	HTTP
137	NetBIOS
443	HTTPS
648	RRP

You can then list down your custom service ports and assign the names accordingly. For this article, we will use the below ports.

Port	Name
1521	Oracle Database
9006	Tomcat
9042	MySQL
9060	WebLogic Server Administration Console
9100	Network Printer
9200	Elastic Search

We will use this list later in the query,

Mapping IP Addresses to Server Names

If you use static IP Addresses, then you can follow the same process to list down the IPs to server name. Let us use the below values for our example:

IP Address	Server
147.154.102.1	Application Server 1
147.154.102.2	Application Server 2
134.70.8.3	Storage Server

Structured View of Flow Logs

We can use the Link Visualization to get a tabular view of the flow logs. Since we want to see each distinct transfer, we will link using multiple fields. If you are new to Link, see this article for an overview of this feature.

Switch to the Link Visualization. By default, the system links using the Log Source field. Remove the Log Source field and add the Source IP, Source Port, Destination IP and Destination Port fields to Link By.

Select Fields to Link By — Figure 2. Adding Fields to Link By

Computing Traffic between End Points

The Link table now shows one row per unique combination of Source and Destination. Let us add the field ‘Content Size Out’ field to Display Fields. Change the function from Average to Sum, since we want to sum up all the bytes transferred between two end points.

Choose the sum Function — Figure 3: Compute Bytes Transferred between end points

Each group (i.e. row) in the table now combines all transfers between two end points. Let us break this up into transfers per day. This can be done by adding span=1day Time after the link command.

Edit the query from:

'Log Source' = 'OCI VCN Flow Logs'
| link 'Source IP', 'Source Port', 
       'Destination IP', 'Destination Port'
| stats sum('Content Size Out') as 'Bytes Transferred'

To:

'Log Source' = 'OCI VCN Flow Logs'
| link span=1day Time, 'Source IP', 'Source Port',
'Destination IP', 'Destination Port'
| stats sum('Content Size Out') as 'Bytes Transferred'

We can now see how much data is transferred between two end points for every single day.

Link with Source, Destination and Bytes Transferred

Figure 4. Link Table with Source, Destination and Bytes Transferred per Day

Mapping Ports using the Query Language

Let us use the port mapping information we have collected earlier to enrich the table. If the number of mappings are small, then we can use the eval command in query language to map the ports.

Append the following to the query:

| eval 'Traffic From' = if('Source Port' = 21, FTP, 'Source Port' = 22, SSH, 'Source Port' = 80, HTTP, 'Source Port' = 137, NetBIOS, 'Source Port' = 443, HTTPS, 'Source Port' = 648, RRP, 'Source Port' = 1521, 'Oracle Database', 'Source Port' = 9006, Tomcat, 'Source Port' = 9042, MySQL, 'Source Port' = 9060, 'WLS Admin. Console', 'Source Port' = 9100, 'Network Printer', 'Source Port' = 9200, 'Elastic Search', Unknown)

| eval 'Traffic To' = if('Destination Port' = 21, FTP, 'Destination Port' = 22, SSH, 'Destination Port' = 80, HTTP, 'Destination Port' = 137, NetBIOS, 'Destination Port' = 443, HTTPS, 'Destination Port' = 648, RRP, 'Destination Port' = 1521, 'Oracle Database', 'Destination Port' = 9006, Tomcat, 'Destination Port' = 9042, MySQL, 'Destination Port' = 9060, 'WLS Admin. Console', 'Destination Port' = 9100, 'Network Printer', 'Destination Port' = 9200, 'Elastic Search', Unknown)

The first eval statement examines the value of the Source Port field for each row to create a new field called Traffic From. If none of the conditions match, then the value ‘Unknown’ is assigned to Traffic From.

The second eval is similar, except the conditions are applied to the Destination Port to create the new field, Traffic To.

Mapping IP Addresses using the Query Language

We can also map static IP Addresses to their server names, similar to how we mapped the ports to names. Append the following eval statement to the query:

| eval Server = if('Source IP' = '147.154.102.1' or 'Destination IP' = '147.154.102.1', 'Application Server 1', 'Source IP' = '147.154.102.2' or 'Destination IP' = '147.154.102.2', 'Application Server 2', 'Source IP' = '134.70.8.3' or 'Destination IP' = '134.70.8.3', 'Storage Server', Unknown)

The link table should now show the three new columns you have generated:

Link Table with Port and Server Mappings

Figure 5. Link Table with Mappings and Bytes Transferred per Day

Since each row depicts a transfer, you can rename the title of the tabs. Click Options->Display Options and enter the aliases as ‘Transfer’, ‘Transfers’ and ‘VCN Flow Logs’.

You now have all the required data in one place.

Link Visualization with Custom Tab Alias — Figure 6: Link Visualization with Custom Aliases for the Tabs

Using a Lookup File to Map Ports and IPs

If you have large number of ports and IPs you want to map, then typing in a lengthy eval statement is not practical. You can instead define your port and name of the service in a csv file, and upload this file as a Lookup. You can create another one for the IP to Server mapping. See the lookup command for more details.

Analyzing the Traffic Patterns

Link table provides a structured view of the data, with each row summarizing the transfer between two end points for a day. But there are more than half a million rows in our table, No amount of manual sorting and filtering would give us a high level picture of the data.

This is where the Analyze feature comes in handy. Analyze can take a set of fields, reduce them to smaller number of data points using machine learning, and then plot them in a chart for interactive analysis.

Click Analyze -> Create Chart and provide the following input:

Chart Title: Network Traffic Analysis
X-Axis: Start Time
Y-Axis: Traffic From
Size/Color: Bytes Transferred, Traffic To

Analyze now goes through the 600k+ transfers in the table and analyzes each specified field. Numeral values are clustered together to form ranges. String values and Time provides a context for the clustering. Analyze then plots the results in a bubble chart:

Analyze Chart for Traffic Flow — Figure 8: Analyze Chart showing Network Traffic Flow

You can hover the mouse on any bubble to get more details. As an example, you can see several details about the highlighted bubble above:

The highlighted bubble represents five transfers (i.e. 5 rows in the table).
Data is flowing from WebLogic Server Administration Console to one or more IPs on port 443 (HTTPS).
Based on the data, Analyze decided to create a single cluster for all transfers between July 14 and July 20. The total data transferred between these days is anywhere from 420 bytes to 10,229 bytes.
The bubble is relatively small, indicating this is not a large data transfer in this context.
Color of the bubble is green. Looking at the legends panel, we can see all HTTPS traffic get the green color

You can use the drop down to choose other metrics to control the bubble size. Similarly, you can use another metric to control the color legend. You can click on a bubble to drill down and see the specific transfers, and even drill down to those log records, or cluster those logs.

Further Analysis – Identifying the Largest Transfer and End Points

We were able to reduce the large amount of data to few bubbles in the UI. This allowed us to visualize the data flow over time between different end points. Let us now try to identify any abnormal data transfers.

We have set Bytes Transferred as the metric to control the bubble size. The largest bubble shows the highest transfer. In case there are several such bubbles, we can also switch the legend to Bytes Transferred, and use this to identify the high data transfers.

Network Bytes Transferred View — Figure 9: Analyze Chart showing Bytes Transferred Legend

The legends now show that Bytes Transferred can range anywhere from 180 bytes to 77.6MB. The darkest bubble is at 77.6MB. We can turn off all the other legends to show just this bubble.

Analyze Chart with Largest Transfer — Figure 10: Analyze Chart showing the Largest Data Transfer

We can see that the largest data transfers happened on July 26, 2020, at 10:55pm. We can see there were just two transfers, and it was over HTTP. The actual data transferred was between 73MB and 81MB.

What if we also want to know if a known server caused this high transfer? What about the Source and Destination IPs?

The Correlate feature in Analyze can be used to identify interesting data for each bubble. This uses machine learning to identify what may be unique about that bubble data, and would fetch the most relevant of the specified fields.

Add Fields to be Correlated — Figure 11: Add fields for Auto-correlation

Analyze would re-analyze the data. Select the Bytes Transferred legend and de-select all except the high transfer bubble. Then hover the mouse on that bubble.

Analyze Chart with Correlated Fields — Figure 12: Highest Data Transfer with Auto-correlated Fields

We can now see that all the transfer was between two IPs. The server was also a known one from our list – Application Server 1. Now that we have the specific time range where the unusual data transfer happened, and the name of the server, we can look at the Access Logs or other logs of the server for further analysis.

Summary

We were able to take close to a million VCN Flow Log records and reduce them to few data points in an interactive chart using our machine learning. We could then easily identify what is the normal data transfer between various end points, and a specific transfer that stuck out as abnormal. We could then narrow down to the specific server and client in the network that caused this.

Log Analytics allows us to go from low level network logs to higher level server logs by making it easy to analyze large volume of data. It is also possible to blend in external metadata for analysis, such as application port to name mapping and IP Address to Server name mapping.

Network Traffic Monitoring using Oracle VCN Flow Logs

What are VCN Flow Logs?

Ingest and View Flow Logs

Mapping Ports to Names

Mapping IP Addresses to Server Names

Structured View of Flow Logs

Computing Traffic between End Points

Mapping Ports using the Query Language

Mapping IP Addresses using the Query Language

Using a Lookup File to Map Ports and IPs

Analyzing the Traffic Patterns

Further Analysis – Identifying the Largest Transfer and End Points

Summary

Sreeji Das

Senior Director of Development

New Storage Management Enhancements in the Oracle Database Management Service

Oracle Enterprise Manager Is Now Available on Oracle Cloud Marketplace

Network Traffic Monitoring using Oracle VCN Flow Logs

What are VCN Flow Logs?

Ingest and View Flow Logs

Mapping Ports to Names

Mapping IP Addresses to Server Names

Structured View of Flow Logs

Computing Traffic between End Points

Mapping Ports using the Query Language

Mapping IP Addresses using the Query Language

Using a Lookup File to Map Ports and IPs

Analyzing the Traffic Patterns

Further Analysis – Identifying the Largest Transfer and End Points

Summary

Authors

Sreeji Das

Senior Director of Development

New Storage Management Enhancements in the Oracle Database Management Service

Oracle Enterprise Manager Is Now Available on Oracle Cloud Marketplace