article by Frank Nimphius, November 2020

Chatbots should be confident when it comes to resolving intent and extracting entities. However, high confidence also means that the bot may sometimes not resolve information if the user makes mistakes while entering it. This article shows how you can add unsharpness to entity extraction using the Fuzzy Match option on an entity to allow users spelling mistakes or providing values that don't perfectly match a known entity value or synonym.

Example

The sample below show the intent tester with a query for car manufacturers based in Germany. The value list entity has the manufacturer's home country set as a synonym.

The next image shows the same query with German instead of Germany. Note that German is not among the defined synonyms.

As you can see in the image above, German did not resolve to German manufacturers to be extracted from the list of values. So here, my option 1 would be needed to handle the missing information.

The image below shows how you can enable fuzzy matching for an entity. All you need to do is to toggle the Fuzzy Match switch and then retrain the model (For the sample I am using trainer HT. This is good for development. Make sure you use Trainer TM for production)

For the sake of completeness, below is the entity value definition for BMW, one of the German manufacturers in the list

So, lets re-run the sample for German car manufacturers

This time, Volkswagen, MINI, Audi and Porsche (and Mercedes, which is not shown in the image are extracted. So what fuzziness did is that it found Germany cars based on a partial match, which is German.

All great so far. So why not using Fuzzy match all the time. Well, truth to be told, the Fuzzy match could produce more results than you asked for.

If you look at the image below, then in a similar search, I am searching for car manufacturers in the Bavarian area in Germany (though the entities contain Bavaria as the state in Germany). So before samples used a fraction of a word (German instead of Germany), where now I am exceeding the known word, which is Bavaria, using an additional letter.

Still the returned results are correct though they contain duplicates.

Next , lets look at a sample in which Fuzzy Match fails. Actually Japan is a synonym I set for cars manufactured in Japan. Japanese does not match the synonym despite that Japan is contained in it. Compared to the previous German vs. Germany sample where it worked, you see a difference in 3 letters. So fuzzy matching has its boundaries, which means that it is not providing endless wiggle room for users who don't get an input right. So probably in my example i should have additional synonyms like German, British, American, Japanese etc, added to increase the general understanding.

Just to proof what I said about 2 vs 3 letters offset, below is the query with Japanes as the misspelled entity synonym value. Here Fuzzy Match does work.

Below is another example for car manufacturers based in the US. And before you think that Germans apparently don't know how to spell Michigan, let me tell you that the misspelling in the image below is on purpose to demonstrate the resilience Fuzzy Match brings against typos. Though Michigen is not Michigan, the information could be extracted.

Next, lets have a look at a runtime sample executed in the embedded conversation tester in Oracle Digital Assistant. The sample uses the System.CommonResponse component to render a list of value if the initial user message does not contain a specific information. In the use case below, US is a synonym added to all American car manufacturers. Because the entity is defined to extract a single value, the US synonym triggers a disambiguation dialog to be shown.

The image below shows a typo in the word Detroit that still resolves correctly to the synonym associated with some companies.

Summary

If you enable Fuzzy Match on an entity then you intentionally add some unsharpness to entity extraction. This can prove useful for cases in which you provided minimum guidance to users as of what values you expect them to add into a message or where users tend to do typos. The unsharpness however is within the range of 2 characters you add or remove from a known entity synonym or value. As a downside, you risk that duplicate values are resolved or values get added that are not a 100% match.

My recommendation to you thus is to try and get entity recognition right without using Fuzzy Match. If then Fuzzy Match gives you better results, then use it. Certainly Fuzzy Match in combination with clear user guidance can add a benefit. Use case matters and testing is required. For example, testing showed that the duplicated mentioning of Audi and MINI when searching for Bavarian companies in the intent tester does not show at runtime (see image below)

Ps.: The prompt displayed in the image above is defined as Disambiguation Prompt on the entity. You can have a look at the sample source code after downloading the skill from the link below

Download the sample

You can download the sample skill used in this article to play with the feature and to explore the code being used. For sure you find some interesting aspects beyond the use of Fuzzy Match on the entity. Download the sample, import it to Oracle Digital Assistant 20.08 or later and train it before testing.

Sample Skill (ZIP)

TechExchange: Take Advantage of Fuzziness When Extracting Entities in Oracle Digital Assistant

Example

Summary

Download the sample

Related Content

Author

TechExchange: Establish a Conversational Parent-Child Relationship Using Value List Entities in Oracle Digital Assistant

On-demand customer webinars: How digital assistants can help power the customer experience

TechExchange: Take Advantage of Fuzziness When Extracting Entities in Oracle Digital Assistant

Example

Summary

Download the sample

Related Content

Author

Authors

TechExchange: Establish a Conversational Parent-Child Relationship Using Value List Entities in Oracle Digital Assistant

On-demand customer webinars: How digital assistants can help power the customer experience