Constructing neural network with Keras
Keras is one of the most popular depth learning libraries, making great contributions to the commercialization of artificial intelligence. It's very simple to use, allowing you to build a powerful neural network with a few lines of code. In this article, you will learn how to build a neural network through Keras, by dividing user comments into two categories: positive or negative, to predict the emotion of the user's comments. This is the so-called emotional analysis, we will use the famous IMDB review data sets to do experiments. The model we will build can also be applied to other machine learning problems with only a few changes.
Please note that this article will not delve into the details of keras or deep learning. This article is designed to provide you with a blueprint for keras neural networks and to familiarize yourself with its implementation.
What Keras is.
Keras is an open source Python library that allows you to easily build a neural network. The library can be run on Tensorflow,microsoft cognitive Toolkit,theano and mxnet. TensorFlow and Theano are the most common digital platforms used in Python to build depth learning algorithms, but they can be quite complex and difficult to use. By contrast, Keras provides a simple and convenient way to build a deep learning model. Its creator is françoischollet, enabling people to build neural networks as quickly and simply as possible. He focuses on scalability, modularity, minimalism, and Python support. Keras can use the GPU and CPU, which supports both Python 2 and 3. Google Keras has made a huge contribution to the commercialization of deep learning and artificial intelligence, because it has commercialized powerful modern depth learning algorithms that were not previously accessible and not available.
What is emotional analysis.
With emotional analysis, we want to identify the speaker or writer's attitude towards documents, interactions, or events (e.g. emotions). Therefore, it is a natural language processing problem that needs to understand the text as well as the potential intentions. Emotions are mainly divided into positive, negative and neutral categories. Therefore, affective analysis is widely used in such areas as comments, surveys, documents, and so on.
IMDB Data Set
The IMDB Mood classification dataset consists of 50,000 movie reviews from IMDB users labeled positive (1) or negative (0). Comments are preprocessed, each of which is encoded as an integer form of the word index sequence. The words in the comments are indexed according to their overall frequency in the dataset. For example, the integer "2" encodes the second most frequent word in the data. Of the 50,000 comments, 25,000 were the training set and the other 25,000 as a test set. The dataset, created by researchers at Stanford University and released in 2011, has an accuracy rate of 88.89% per cent.
Import libraries and get data
We first import the required libraries to preprocess the data.
%matplotlib Inline
Import Matplotlib
Import Matplotlib.pyplot as Plt
Import NumPy as NP
From keras.utils import to_categorical
From Keras Import Models
From Keras import layers
We continue to download IMDB datasets that have been built into Keras. Since we do not want to train the dataset 50/50, test the split, we will merge the data into the data and target immediately after the download, so we can do a 80/20 split later.
From Keras.datasets import IMDB
Imdb.load_data (num_words=10000)
data = Np.concatenate ((training_data, Testing_data), axis=0)
Targets = Np.concatenate ((training_targets, testing_targets), axis=0)
Explore Data
Now we can start exploring the dataset:
Print ("Categories:", Np.unique (targets))
Print ("Number of unique words:", Len (Np.unique (Np.hstack (data)))
Categories: [0 1]
Number of unique words:9998
length = [Len (i) for I in data]
Print ("Average Review Length:", Np.mean (length))
Print ("Standard deviation:", round (NP.STD length))
Average Review length:234.75892
Standard deviation:173.0
You can see in the output above that the dataset is labeled as two categories, representing 0 or 1, respectively, to indicate the emotion of the comment. The entire dataset contains 9,998 words, the average length of the comment is 234 words, and the standard deviation is 173 words.
Now let's look at a training sample:
Print ("Label:", Targets[0])
Label:1
Print (Data[0])