It is important to understand how the chat robot (chatbots) works. A basic mechanism of chat bots is to use text classifiers for intent recognition. Let's look at how the Artificial neural network (ANN) works internally.
In this tutorial, we will use the 2-layer neuron (a hidden layer) and the word bag (bag of words) method to organize our training data. There are three ways to classify text: pattern matching, traditional algorithms and neural networks. Although the algorithm using a number of naive Bayes (multinomial Naive Bayes) is surprisingly effective, it has three fundamental flaws: the output of the MNB algorithm is a score (score) rather than a probability (probability). We would prefer to get a chance to ignore predictions below a certain threshold. This is similar to the "noise suppression" mechanism in VHF radios. The MNB algorithm can only learn patterns from the regular samples of the classification, but it is also very important to learn from the negative sample samples of the classification. Unbalanced training data causes the NMB classifier to distort the score, forcing the algorithm to adjust the score according to the size of the data set of the different classifications. This is not the ideal solution.
In correspondence with simplicity (naive), a text classifier does not attempt to understand the meaning of a sentence, but simply classifies it. It is important to understand that the so-called intelligent chat robot does not really understand the human language, but that is another matter.
If you're new to artificial neural networks, then click here to see how they work.
To understand the traditional algorithms for classification, see here.
Now, let's implement a text classification neural network for intent recognition by following these steps: Selecting technology stack Preparation training data preprocessing data iteration: Code implementation + Test + model adjustment abstract thinking
Code here, we use Ipython Notebook, which is an ultra-efficient way of working with data science projects. The Code development language is python.
We use NTLK for natural language processing. First, you need a way to reliably cut sentences into words (tokenize) and stem extraction (STEM):
# Use natural Language Toolkit
import NLTK from
nltk.stem.lancaster import lancasterstemmer
import OS
Import JSON
import datetime
stemmer = Lancasterstemmer ()
of our training data, 12 sentences fall into 3 categories of intent (intent): Greeting, Goodbye, and Sandwich:
# 3 classes of training data Training_data = [] Training_data.append ({"Class": "Greeting", "sentence": "How is You?"}) Trai Ning_data.append ({"Class": "Greeting", "sentence": "How are Your Day"}) Training_data.append ({"Class": "Greeting", "
Sentence ":" Good Day "}) Training_data.append ({" Class ":" Greeting "," sentence ":" What is it going today? "}) Training_data.append ({"Class": "Goodbye", "sentence": "Has a nice Day"}) Training_data.append ({"Class": "Goodbye", " Sentence ":" See You Later "}) Training_data.append ({" Class ":" Goodbye "," sentence ":" Has a nice Day "}) training_ Data.append ({"Class": "Goodbye", "sentence": "Talk to You Soon"}) Training_data.append ({"Class": "Sandwich", "sentence" : "Make Me a Sandwich"}) Training_data.append ({"Class": "Sandwich", "sentence": "Can do a sandwich?"}) Training_ Data.append ({"Class": "Sandwich", "sentence": "Having a sandwich Today"}) Training_data.append ({"Class": "Sandwich", "
Sentence ":" What's for Lunch? "}) Print ("%s sentences in training data"% Len (training_data))
Now we are preprocessing the data:
Words = []
classes = []
documents = []
ignore_words = ['? ']
# Loop through each sentence with our training data for
pattern in Training_data:
# Tokenize Each word in the Senten Ce
w = nltk.word_tokenize (pattern[' sentence ')
# Add to my words list
words.extend (w)
# Add to Documents in our Corpus
documents.append ((W, pattern[' class "))
# Add to our classes list
if pattern[' Class '] not in classes:
classes.append (pattern[' class ")
# stem and lower each word and remove duplicates
words = [Stemmer.stem (W.lower ()) for W "Words if w not in ignore_words]
words = List (set (words))
# Remove DUPL Icates
classes = List (set (classes))
print (len (documents), "Documents")
print (Len (classes), "classes ", classes)
print (len (words)," unique stemmed words ", words)
Run the above code and the output is as follows:
Documents
3 classes [' Greeting ', ' goodbye ', ' sandwich ']-
unique stemmed words [' sandwich ', ' hav ', ' a ', ' how ', ' for ', ' ar ', ' good ', ' mak ', ' I ', ' it ', ' Day ', ' soon ', ' Nic ', ' lat ', ' going ', ' I ', ' Today ', ' can ', ' lunch ', ' is ', ' ' s ', ' See ', ' to ', ' talk ', ' Yo ', ' what '
Note that each word is converted to lowercase and is stemmed. Stemming can help machines understand that both have and have the same. In addition, we do not care about the case of words.
We translate each sentence in the training data into a word bag (bag of words) that says:
Here is the conversion code:
# Create our training data
training = []
output = []
# Create a empty array for our output
Output_empty = [0] * len (classes)
# Training set, bag of words for each sentence for
doc in documents:
# Initialize our bag of words
bag = []
# List of tokenized words for the pattern
pattern_words = doc[0]
# stem each word
Pattern_words = [Stemmer.stem (Word.lower ()) for word in pattern_words]
# Create my bag of words array for
W in Words:
bag.append (1) If w in pattern_words else bag.append (0)
training.append (bag)
# output was a ' 0 ' for EA Ch tag and ' 1 ' for current tag
output_row = List (output_empty)
Output_row[classes.index (doc[1])] = 1
Output.append (Output_row)
# sample Training/output
i = 0
w = documents[i][0]
print ([Stemmer.stem (Word.lower ()) for Word in W])
print (training[i])
print (Output[i])
The output of the code runs as follows:
[' How ', ' ar ', ' You ', '? ']
[0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[1, 0, 0]
The above steps are a classic part of text processing: Each training statement is transformed into an array of only 0 and 1, with the number of members corresponding to the words in the corpus.
For example, for sentences:
[' How ', ' is ', ' You ', '? ']
After stemming, the transformation is:
[' How ', ' ar ', ' You ', '? ']
Then convert to the input of the model: because how is ranked 4th in our dictionary, so in model input, we will enter the 4th member to 1, in addition, we decided to discard?:
[0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
This input corresponds to the 1th class in three intents, so its output is expressed as:
Note that a sentence can belong to more than one class (intent), or not to any class.
You can try to practice and run the code above until you find something to feel.
The first step in machine learning is to have clean data.
Next, we implement the core functions of this 2-layer neural network:
If you are a novice in artificial neural networks, you can click here to see how it works.
We use NumPy because it allows for fast matrix multiplication calculations.
We use a sigmoid function as the activation function of the neuron. Then, iteratively iterate and adjust the parameters until the error rate is low to acceptable.
The following code implements the word bag processing and converts the input sentences to 0, 1 arrays. This exactly matches our transformation of the training data, which is critical to getting the right results.
Import NumPy as NP import time # COMPUTE sigmoid nonlinearity def sigmoid (x): output = 1/(1+np.exp (x)) return OU Tput # Convert output of sigmoid function to its derivative def sigmoid_output_to_derivative (output): Return output* (
1-output) def clean_up_sentence (sentence): # tokenize the pattern sentence_words = nltk.word_tokenize (sentence)
# stem each word sentence_words = [Stemmer.stem (Word.lower ()) to Word in sentence_words] return sentence_words # return bag of words array:0 or 1 for each word in the bag this exists in the sentence def bow (sentence, words, show_d Etails=false): # tokenize the pattern sentence_words = clean_up_sentence (sentence) # bag of words bag = [0 ]*len (words) for s in Sentence_words:for i,w in Enumerate (words): if w = = S:b
Ag[i] = 1 if Show_details:print ("found in Bag:%s"% W) return (Np.array (bag)) def think (sentence, show_details=false): x = Bow (Sentence.lower (), words, show_details) if Show_details:print ("sentence:", sente
NCE, "\ n bow:", x) # input layer is our bag of words l0 = x # matrix multiplication of input and hidden layer L1 = sigmoid (Np.dot (L0, synapse_0)) # output Layer L2 = sigmoid (Np.dot (L1, Synapse_1)) return L2
Now let's implement neural network training functions to adjust the weight of the synapses. Don't be too nervous, the main use of knowledge is the matrix multiplication in middle school mathematics:
Def train (X, Y, hidden_neurons=10, alpha=1, epochs=50000, Dropout=false, dropout_percent=0.5): Print ("Training with %s neurons, alpha:%s, dropout:%s%s "% (hidden_neurons, str (alpha), dropout, dropout_percent if Dropout Else")) PRI NT ("Input Matrix:%sx%s Output Matrix:%sx%s"% (Len (X), Len (X[0]), 1, Len (classes))) Np.random.seed (1) last_m Ean_error = 1 # Randomly initialize our weights with mean 0 synapse_0 = 2*np.random.random (len (x[0]), hidden_neur ONS)-1 synapse_1 = 2*np.random.random ((hidden_neurons, Len (classes)))-1 prev_synapse_0_weight_update = Np.ze Ros_like (synapse_0) prev_synapse_1_weight_update = Np.zeros_like (synapse_1) Synapse_0_direction_count = Np.zeros_ Like (synapse_0) Synapse_1_direction_count = Np.zeros_like (synapse_1) for J in Iter (range (epochs+1)): # F
Eed forward through layers 0, 1, and 2 layer_0 = X layer_1 = sigmoid (Np.dot (Layer_0, Synapse_0))
if (dropout): layer_1 *= np.random.binomial ([Np.ones (Len (X), hidden_neurons)],1-dropout_percent) [0] * (1.0/(1-dropout_percent))
layer_2 = sigmoid (Np.dot (layer_1, Synapse_1)) # How does much did we miss the target value? Layer_2_error = y-layer_2 if (j% 10000) = = 0 and J >: # If this 10k iteration ' s error is GR
Eater than the last iteration, break out if Np.mean (Np.abs (layer_2_error)) < last_mean_error: Print ("Delta after" +str (j) + "iterations:" + str (Np.mean (Np.abs (layer_2_error))) Last_mean_error = n P.mean (Np.abs (Layer_2_error)) Else:print ("Break:", Np.mean (Np.abs (Layer_2_error)), ">", L
Ast_mean_error) Break # of what direction is the target value? # were we really sure?
If so, don ' t change too much. Layer_2_delta = Layer_2_error * Sigmoid_output_to_derivative (layer_2) # How much do each L1 value contribute to The L2 error (according to the weights)?
Layer_1_error = Layer_2_delta.dot (synapse_1.t) # in what direction is the target L1? # were we really sure?
If so, don ' t change too much. Layer_1_delta = Layer_1_error * sigmoid_output_to_derivative (layer_1) synapse_1_weight_update = (Layer_1.T.dot (LA Yer_2_delta)) Synapse_0_weight_update = (Layer_0.t.dot (Layer_1_delta)) if (J > 0): Synapse_
0_direction_count + = Np.abs (((synapse_0_weight_update > 0) +0)-((Prev_synapse_0_weight_update > 0) + 0))
Synapse_1_direction_count + = Np.abs (((synapse_1_weight_update > 0) +0)-((Prev_synapse_1_weight_update > 0) + 0))
Synapse_1 + = Alpha * Synapse_1_weight_update Synapse_0 + = Alpha * synapse_0_weight_update
Prev_synapse_0_weight_update = synapse_0_weight_update Prev_synapse_1_weight_update = synapse_1_weight_update
now = Datetime.datetime.now () # Persist synapses Synapse = {' Synapse0 ': synapse_0.tolist (), ' Synapse1 ': synapse_1.tolist (), ' DateTime ': Now.strftime ("%y-% m-%d%h:%m "), ' words ': words, ' classes ': classes} synapse_file =" synapse S.json "with open (Synapse_file, ' W ') as Outfile:json.dump (Synapse, outfile, indent=4, sort_keys=true) print ("Saved synapses to:", Synapse_file)
Now that we are ready to build a neural network model, we will save the synaptic weights in the network to a JSON file, which is our model file.
You can try different gradient descent parameters (Alpha) to see how it affects the change in error rate. This parameter helps our model to achieve the lowest error rate:
Synapse_0 + = Alpha * synapse_0_weight_update
We used only 20 neurons in the hidden layer, so it was easier to adjust. The connection synaptic weights of these neurons will vary according to the size and value of the training data, and a reasonable error rate target is less than 10 ^-3.
x = Np.array (training)
y = np.array (output)
start_time = Time.time ()
train (x, Y, hidden_neurons=20, alpha= 0.1, epochs=100000, Dropout=false, dropout_percent=0.2)
elapsed_time = Time.time ()-Start_time
print (" Processing time: ", Elapsed_time," seconds ")
The result of the above code is:
Training with neurons, alpha:0.1, Dropout:false
Input matrix:12x26 Output matrix:1x3
Delta after 10000 it erations:0.0062613597435
Delta after 20000 iterations:0.00428296074919
Delta after 30000 iterations:0. 00343930779307
Delta after 40000 iterations:0.00294648034566
Delta after 50000 iterations:0.00261467859609
Delta after 60000 iterations:0.00237219554105
Delta after 70000 iterations:0.00218521899378
Delta after 80000 iterations:0.00203547284581
Delta after 90000 iterations:0.00191211022401
Delta after 100000 iterations:0.00180823798397
saved synapses To:synapses.json
processing time:6.501226902008057 seconds
Now, the Synapse.json file contains all the synaptic weights in the network, and that's our model.
Once the synaptic weights are calculated, the following classify () function is the core of the classification: 〜15 line code.
Note: If the training data has changed, we need to recalculate the entire model. For a very large data set, this can take a lot of time.
Now we can predict the probability that a sentence belongs to a category. The prediction is fast because it is the dot product calculation in the Think () function:
# probability threshold Error_threshold = 0.2 # Load our calculated synapse values Synapse_file = ' Synapses.json ' with open (synapse_file) as Data_file:synapse = Json.load (data_file) synapse_0 = Np.asarray (sy napse[' synapse0 ') synapse_1 = Np.asarray (synapse[' synapse1 ') def classify (sentence, show_details=false): Resul ts = Think (sentence, show_details) results = [[I,r] for i,r in enumerate (results) if R>error_threshold] ResU Lts.sort (Key=lambda x:x[1], reverse=true) Return_results =[[classes[r[0]],r[1]] for R in results] print ("%s \ n Classification:%s "% (sentence, return_results)) return return_results classify (" sudo make Me a sandwich ") classify ( "How is" to