Neural networks are algorithms that are loosely modeled on the way brains work. These are of great interest right now because they can learn how to recognize patterns. A famous example involves a neural network algorithm that learns to recognize whether an image has a cat, or doesn't have a cat. In this article, I'm providing an introduction to neural networks. We'll explore what neural networks are, how they work, and how they're used today in today's rapidly developing machine-learning world.
Before we look at different types of neural networks, we need to start with the basic building blocks. And these aren't hard. There are just five things you need to figure out:
I'll summarize these terms below, or you can take a look at this Oracle blog post on machine learning for a more detailed explanation.
Neurons are the decision makers. Each neuron has one or more inputs and a single output called an activation function. This output can be used as an input to one or more neurons or as an output for the network as a whole. Some inputs are more important than others and so are weighted accordingly. Neurons themselves will "fire" or change their outputs based on these weighted inputs. How quickly they fire depends on their bias. Here's a simple diagram covering these five elements.
I haven't represented the weight and bias in this diagram, but you can think of them as floating point numbers, typically in the range of 0-1. The output or activation function of a neuron doesn't have to be a simple on/off (though that is the first option) but can take different shapes. In some cases, such as the third and fifth examples above, the output value can go lower than zero. And that's it!
So now you have the building blocks, let's put them together to form a simple neural network. Here's a network that is used to recognize handwritten digits. I took it from this neural network site, which I'd recommend as a great resource if you want to read further about this topic.
Here you can see a simple diagram with inputs on the left. Only eight are shown but there would need to be 784 in total, one neuron mapping to each of the 784 pixels in the 28x28 pixel scanned images of handwritten digits that the network processes. On the right-hand side, you see the outputs. We would want one and only one of those neurons to fire each time a new image is processed. And in the middle, we have a hidden layer, so-called because you don't see it directly. A network like this can be trained to deliver very high accuracy recognizing scanned images of handwritten digits (like the example below, adjusted to cover 28x28 pixels).
But a network like the one shown above would not be considered by most to be deep learning. It's too simple, with only one hidden layer. The cutoff point is considered to be at least two hidden layers, like the one shown below:
I glossed over what the hidden layer is actually doing, so let's look at it here. The input layer has neurons that map to an individual pixel, while the output neurons effectively map to the whole image. Those hidden layers map to components of the image. Perhaps they recognize a curve or a diagonal line or a closed loop.
But importantly, those components in the hidden layers map to specific locations in the original image. They have to. There are hard links from the individual pixels on the left. So a network like the one above would not be able to answer a simple question on the image like the one below: how many horses do you see?
I could show images that had horses anywhere on the picture and you would have no problem determining how many there were. You'd do so by recognizing the elements that make up a horse, no matter where in the picture they occurred. And that's a very good thing, because the world we live in requires us to recognize objects that are in front of us, or off to the side, fully visible, or partially obscured. To solve problems like this, we need a different kind of network like the one you see below: a convolutional neural network.
Let's imagine we're working with images that are 28x28 pixels again, but this time we can't rely on having one image fixed in the center. Look at the logic of that first hidden layer. All of those neurons are now linked to specific, overlapping areas of the input image (in this case a 5x5 pixel area). Starting with this basic structure, and adding some additional processing, it's possible to build a neural network that can identify items in a position-independent way. Incidentally, neurons in the visual cortex of animals, including humans, work in a similar way. There are neurons that only trigger on certain parts of the field of view.
Convolutional networks are the workhorses of image recognition. But when it comes to natural language processing, they are not so good. Understanding the written or spoken word is quite different from processing independent images. Language is highly contextual, by which I mean that individual words have to be processed in the context of the words around them. (Note that I am not a linguist and apologize for any imprecise usage of terms).
When it comes to processing a sentence, there are at least three different things you have to understand: the first two are the meaning of the individual words, and the syntax or grammar of the sentence (the rules about word order, structure, and so on). If you've gotten this far, then you have those things nailed. But consider the sentence below.
I'm baking a rainbow cake and want to add different _________ to the batter.
What's the missing term? You can only figure something like that out by looking at the earlier part of the sentence. A rainbow has many different colors, so you would need to add different food dyes to the batter (which would also have to be portioned out in some way).
Working out that answer required taking earlier words in the sentence as input to the next word. I'm describing a feedback loop, which is not something you saw earlier. Networks with feedback loops are called recurrent neural networks and in its simplest form, a feedback loop looks like this.
Note how the output feeds back to the inputs. If you "unroll" this diagram, you get the structure below.
You can see how this kind of structure would enable you to process a sequence of elements (like the words in a sentence) with each one providing input (context if you like) for subsequent elements.
Of course, this simple structure is not powerful enough to process language, but more complex networks with feedback loops can. And a common kind of recurrent neural network contains elements called LSTM units, which are really good at remembering things (like a key word earlier in a sentence), as well as forgetting them when needed. Below is one example of an LSTM unit.
You can see the similarity with the simple diagram above, but there's much more going on here. I'm not going to explain it all (there's a great explanation on this Github page) but I'll point out a couple of things.
Look inside the main rectangular box. The shaded rectangular boxes are entire layers, the symbol inside representing the shape of the activation function (output). The shaded circles with X and + represent multiplication and addition operations respectively. Look at the first combination (a layer with a sigmoid output leading to a multiplication with the output of the previous term). If that layer outputs a value of zero, then the multiplication will effectively zero out that previous term. Put another way, this first combination is the "forgetting" circuitry. For the rest, check out this blog.
There's more to processing language than syntax and understanding individual words. In that earlier example, how did you know a rainbow has many different colors? You've seen rainbows before and know what they look like. And that general knowledge of the implications of those words is the third element of processing natural language (and the hardest for a computer). To illustrate this, see the example below from the game of cricket. Those of you who don't know the game will still be able to process the syntax of the sentences. You will still know what the individual words mean. But lacking the "common sense" of the context for those words, you will have no clue what is going on.
Coming around the wicket, the leg spinner bowled a "wrong-un." The batsman swept it firmly against the spin to deep backward point. Is the batsman left-handed, right-handed, or can't tell from this information?
This is just an overview. There are many other approaches to neural networks that have different strengths and weaknesses or are used to solve different types of problems. But concepts here still apply. Neural networks are all built on the same basic elements: neurons (with bias), inputs (with weights), and outputs (activation functions with specific profiles). These elements are used to construct different or specialized layers and elements (like the LSTM unit above). All of these things are combined with feedback loops and other connections to form a network.
In an upcoming blog post, we will look at how neural networks learn and are trained. Because until that happens, they are not ready for work. In the meantime, discover more about Oracle's data management platforms and how Oracle supports neural networks in Oracle Database.