The phrase “deep learning” probably conjures up images of sentient robots staging a hostile takeover. But in reality, deep learning is just another way to describe large neural networks, a technology you encounter every day when you browse the Internet or use your mobile phone.
Among countless other applications, deep learning generates captions for YouTube videos, serves up appealing food photos on Yelp, and answers iPhone users’ questions via Siri. And as data scientists and researchers tackle increasingly complex deep learning projects, this type of artificial intelligence will only become more entwined in our daily lives.
Neural Networks vs. Deep Learning
What’s the difference between deep learning and a regular neural network? The simple answer is that deep learning is larger in scale. Before we get into what that means, let’s talk about how a neural network functions.
To make sense of observational data (like photos or audio), neural networks pass data through interconnected layers of nodes. When information passes through a layer, each node in that layer performs simple operations on the data and selectively passes the results to other nodes. Each subsequent layer focuses on a higher-level feature than the last, until the network creates an output.
In between the input layer and the output layer are hidden layers. And here’s where users typically differentiate between neural nets and deep learning: A basic neural network might have one or two hidden layers, while a deep learning network might have dozens or even hundreds. For example, a simple neural network with a few hidden layers can solve a common classification problem. But to identify the names of objects in a photograph, Google’s image recognition model, GoogLeNet, uses a total of 22 layers.
Why so many layers? Increasing the number of layers and nodes can potentially increase the accuracy of your network. However, more layers means your model will require more parameters and computational resources and is more likely to become overfit.
Training a Deep Learning Model
Training a deep learning model requires a lot of data. The more data you train it on, the more accurate your deep learning model will be. (In 2012, Google used 10 million digital images taken from YouTube videos to train a deep learning model to identify cats. Yes, you read that right.)
Simply put, training a deep learning model means that you’re feeding data to the model, getting an output, and then using that output to make adjustments. For example, if you train your model on a bunch of pictures of cats and then feed it new cat photos it’s never seen before, it should be able to pick out the cats in the new photos. If it doesn’t, you can change the way the network’s nodes are weighing certain characteristics of the images (the presence of whiskers and a tail, for instance). Weight, in this case, is a number that represents the importance of a characteristic. The higher the weight, the higher the influence that characteristic has on the nodes.
But how did the model even know to look for whiskers? Typically, a data scientist will engineer features for the model to consider and feed it labeled data during the training process (e.g., a series of labeled photos of cats). But one of the amazing things about deep learning is that you don’t necessarily have to complete this step. To continue with our Google example, the company’s cat-identifying model learned to pick out 20,000 distinct object categories, unsupervised. It “learned” what a cat looked like without explicitly being told. (The downside to this method is that the resulting models aren’t as accurate as models that receive supervised training — yet.)
As data scientists get closer to building highly accurate deep learning models that can learn without supervision, deep learning will become faster and less labor intensive. That can only mean bigger and better things are yet to come.