X

News and Views: Drive Smart Decisions with Cloud Analytics, Machine Learning and More

How Oracle Analytics Maps Cities with the Same Name

Philippe Lions
Senior Director

What should an Oracle Analytics map data visualization show when some of the city names plotted on a map exist in more than one country? 

For example, let's take a dataset that includes entries like Barcelona and Liverpool for cities. We would expect Barcelona to be plotted in Spain, and Liverpool to be plotted in England. But it may not always be the case since there is a city named Barcelona in Venezuela and a city named Liverpool in Canada. If you are not careful, you might end up seeing these points plotted over Venezuela and Canada.

Subscribe to the Oracle Analytics Advantage blog and get the latest posts sent to your inbox

A short, simple fix for correcting this is to add the country name as part of the Map Category grammar. In the example pictured below, the left visualization shows a Map grammar with only City in the Category; the right visualization shows it with both Country and City. 

Maps - Using the Location Match Dialog

This easy addition removes the ambiguity and corrects the problem. The question is how can a user detect if there are any ambiguities or even mismatches in the data, compared to what the map layer expects? Oracle Analytics Cloud includes a Location Match Dialog feature that directly answers the question for users of map visualizations. 

On any map visualization, you can do a right-click and choose the option "Location Matches" to pop up the Location Match Dialog. This shows how well the data from the dataset has been matched to the map layer's data, as seen in the screenshot below.

 

Maps - Using the Location Match Dialog

Location Match Dialog for Cities Column

In a map visualization scenario where there are multiple map layers, there is a tab for each of the map layers, with the open tab corresponding to the map layer you are currently on. There are also various columns that will help you understand and improve the match quality of the data.

 

Map layer drop-down:

Maps - Using the Location Match Dialog

When you open the Location Match Dialog for the first time, the data from the column is matched against the map layer used in your map. This is the layer you selected in the drop-down of the dialog box, and it is a visual indication of the map layer being matched to. You can also select a different layer from the list of existing map layers, and the dialog will show how well the data from the dataset can match to the newly selected map layer.


Summary section:

Maps - Using the Location Match Dialog


The summary section shows the total number of rows in your dataset and gives you the precise number of rows that were difficult to match with the map layer's data. In the above diagram, there were a total of 144 rows of city data matched with the world cities map layer, and there were issues with 22 of them.

Your Data column:
The first "Your Data" column shows the original data directly coming from your dataset. It only represents the rows for which an ambiguity or issue was detected.

Match column:
Match shows the data matched in the map layer (the data that resides in a GeoJSON (Geographical JavaScript Object Notation) file for your map layer, whether it's a standard Oracle Analytics Cloud data layer or a custom data layer that you have uploaded).

Maps - Using the Location Match Dialog


If the data from the dataset didn't match any data in the map layer, then a red warning indicator is displayed for that row. If there was a match, but not exactly a perfect match, then a yellow warning indicator is displayed. This does not necessarily mean that it was a wrong match, just that there were other potential matches and the system is not 100 percent sure which one to pick. In these cases, you would want to review these use cases to see if you can make a better match. For perfect matches, there are no indicators.

Match Quality column:
The Match Quality column quantifies how good the match was. When you open the Location Match Dialog, the rows are sorted from worst to best matches.

· The rows with no data matching map layer data have the value "No Match" for the Match Quality column.

· Multiple matches are ambiguous cases. 

 

In case the data from the dataset matches with multiple distinct entries of map layer data, the Match Quality column indicates how many such matches were found. A common way to resolve these types of issues is to add more information (columns) in the Location edge so that the ambiguities can be resolved. For example, let’s revisit our original use case of plotting cities Barcelona and Liverpool on the map visualization. By default, when the user brings these cities into the map visualization, they get plotted in Venezuela and Canada respectively, instead of Spain and England. An example of the Location Match Dialog for this case can be found below.

 

Both entries have ambiguous matches. To give a better picture of the mapping, consider the following tables:

Your Data

Map Layer Data

Barcelona

Barcelona, Venezuela

 

Barcelona, Spain


It found two matches for the Barcelona entry in the map layer data and gives the Match Quality value as two matches. We get similar results for Liverpool:

Your Data

Map Layer Data

Liverpool

Liverpool, Canada

 

Liverpool, England


With this kind of result, we might add more data to the Location Edge to try to resolve this ambiguity and plot the data point exactly where it is intended to be. In this case, if you add the Country column to the Location edge, it can resolve the ambiguity.

After adding the Country column to the Location edge, the Location Match Dialog looks like this:

 

Now you can see that the “Your Data” column has the country value appended to the city name, and with this added help, it is able to resolve the ambiguities for both Liverpool and Barcelona and plot it exactly on the map.

Maps - Using the Location Match Dialog

Barcelona and Liverpool plotted in Europe after adding Country column to the Location Edge

 

· The next category of matches are partial matches. The partial match could be that a part of the word in your data has been matched with map layer data. In this case, the match quality value is the percentage of how close the two strings are.

 

Maps - Using the Location Match Dialog
 

· Good matches fall into the last category of matches, and these are cases where the data matches exactly with the map layer's data. In this case, the match quality value is 100% Confidence.


Finally, we have the Remove column, which lets the user exclude rows of data from the visualization and set the scope for which rows of data need to be excluded. This includes the Project scope, the Canvas scope, and the Visual scope. Once the user selects the rows and sets the scope, appropriate filters are created to exclude those rows.

Another interesting use case to ponder is what will happen to the Location Match Dialog if a user decides to bring Latitude/Longitude columns instead of semantic column types like the city, state, country, etc.

 

Maps - Using the Location Match Dialog
In this case, the map layer being matched is the Latitude/Longitude layer. There is one less column in the Location Match Dialog because we know what coordinates we are trying to match. So, the Match column just indicates whether the row entry is valid or not.

The Location Match Dialog primarily helps users understand the extent of ambiguous matches or mismatches in their data when they are plotting it against a map layer. It won't automatically fix the data, so you must either bring more data into the visualization to remove the ambiguity or apply a fix to the map layer. But the Location Match Dialog does give a detailed list of what needs to be fixed to fully match map layers. And this is a big step in overcoming the barrier of the geospatial data mismatch.

To learn how you can benefit from the Oracle Analytics Cloud, visit Oracle.com/analytics.

Join the discussion

Comments ( 2 )
  • João Afonso Thursday, March 26, 2020
    Very useful. Had run into this before and did know how to solve. Thanks a lot.
  • Dan Vlamis Thursday, March 26, 2020
    Thanks, Philippe, for the detailed blog entry about how matching process works. Three questions:
    1) In this process, does the name of the column (e.g. "Country" or "DW_COUNTRY") matter?
    2) Does the order of fields matter? I would tend to put them in top-down (e.g. Country, State, City), but not sure where to put ZipCode.
    3) Does Zip code help disambiguate the location?
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.