Learning deep structured Semantic Models for Web Search using clickthrough data and its subsequent articles
A multi-view Deep Learning approach to cross Domain User Modeling in recommendation Systems implementation Demo.
1. Data
DSSM, for input data is the query pair, that is, the query and corresponding display, the display of the click and not click, respectively, for the positive and negative samples, but also for the order of clicks, there are different assignments, specific reference to the paper.
I do not have permission to open my query data, please find the data yourself.
2. Word hashing
The original use of 3-grams, for Chinese, I used the uni-gram, because the Chinese language itself has a certain meaning (there are also paper strokes), for each gram use one-hot coding instead, the final can greatly reduce the dimension. 3. Structure
Structure diagram:
Map entries to low-dimensional vectors. Calculates the cosine similarity of queries and documents.
3.1 Input
The Tensorboard visualization is used here, so the name_scope is defined:
With Tf.name_scope (' input '):
Query_batch = Tf.sparse_placeholder (Tf.float32, Shape=[none, Trigram_d], name= ' Querybatch ')
Doc_positive_batch = Tf.sparse_placeholder (Tf.float32, Shape=[none, Trigram_d], name= ' DocBatch ')
Doc_negative_batch = Tf.sparse_placeholder (Tf.float32, Shape=[none, Trigram_d], name= ' Docbatch ')
On_train = Tf.placeholder (Tf.bool)
3.2 Full Connection layer
I use the three layer of the fully connected layer, for each layer of the full connection layer, in addition to the neuron is not the same, the others are the same, so you can write a function reuse.
LN=WNX+B1 L_n = w_n x + b_1
def add_layer (inputs, in_size, Out_size, Activation_function=none):
wlimit = np.sqrt (6.0/(In_size + out_size))
Weights = tf. Variable (Tf.random_uniform ([In_size, Out_size],-wlimit, wlimit))
biases = tf. Variable (Tf.random_uniform ([Out_size],-wlimit, wlimit))
wx_plus_b = Tf.matmul (inputs, Weights) + biases
if Activation_function is None:
outputs = wx_plus_b
else:
outputs = activation_function (wx_plus_b)
return outputs
Among them, for weights and bias, use the specific initialization method according to the paper:
Wlimit = np.sqrt (6.0/(In_size + out_size))
Weights = tf. Variable (Tf.random_uniform ([In_size, Out_size],-wlimit, wlimit))
biases = tf. Variable (Tf.random_uniform ([Out_size],-wlimit, Wlimit))
Batch Normalization
def batch_normalization(x, phase_train, out_size):
"""
Batch normalization on convolutional maps.
Ref.: http://stackoverflow.com/questions/33949786/how-could-i-use-batch-normalization-in-tensorflow
Args:
x: Tensor, 4D BHWD input maps
out_size: integer, depth of input maps
phase_train: boolean tf.Varialbe, true indicates training phase
scope: string, variable scope
Return:
normed: batch-normalized maps
"""
with tf.variable_scope('bn'):
beta = tf.Variable(tf.constant(0.0, shape=[out_size]),
name='beta', trainable=True)
gamma = tf.Variable(tf.constant(1.0, shape=[out_size]),
name='gamma', trainable=True)
batch_mean, batch_var = tf.nn.moments(x, [0], name='moments')
ema = tf.train.ExponentialMovingAverage(decay=0.5)
def mean_var_with_update():
ema_apply_op = ema.apply([batch_mean, batch_var])
with tf.control_dependencies([ema_apply_op]):
return tf.identity(batch_mean), tf.identity(batch_var)
mean, var = tf.cond(phase_train,
mean_var_with_update,
lambda: (ema.average(batch_mean), ema.average(batch_var)))
normed = tf.nn.batch_normalization(x, mean, var, beta, gamma, 1e-3)
return normed
Single Layer
With Tf.name_scope (' FC1 '):
# Activates the function after bn, so here is none
query_l1 = Add_layer (Query_batch, Trigram_d, L1_n, Activation_function=none)
doc_positive_l1 = Add_layer (Doc_positive_batch, Trigram_d, L1_N, activation_function= None)
doc_negative_l1 = Add_layer (Doc_negative_batch, Trigram_d, L1_n, Activation_function=none) with
Tf.name_scope (' BN1 '):
query_l1 = batch_normalization (Query_l1, On_train, l1_n)
doc_l1 = Batch_normalization ( Tf.concat ([Doc_positive_l1, Doc_negative_l1], axis=0), On_train, l1_n)
doc_positive_l1 = Tf.slice (DOC_L1, [0, 0], [ Query_bs,-1])
DOC_NEGATIVE_L1 = Tf.slice (DOC_L1, [Query_bs, 0], [-1,-1])
query_l1_out = Tf.nn.relu (QUERY_L1)
doc_positive_l1_out = Tf.nn.relu (doc_positive_l1)
doc_negative_l1_out = Tf.nn.relu (DOC_NEGATIVE_L1)
······
Merge Negative samples
With Tf.name_scope (' Merge_negative_doc '):
# Merge negative samples, tile can choose whether to expand negative samples.
doc_y = Tf.tile (doc_positive_y, [1, 1]) for
I in range (NEG):
for J in Range (Query_bs):
# Slice (input_, be gin, size) Slicing API
doc_y = Tf.concat ([Doc_y, Tf.slice (doc_negative_y, [j * NEG + I, 0], [1,-1])], 0)
3.3 Calculation of the Cos similarity
With Tf.name_scope (' cosine_similarity '):
# cosine Similarity
# query_norm = sqrt (sum (each x^2))
Query_ Norm = Tf.tile (Tf.sqrt (Tf.reduce_sum (Tf.square (query_y), 1, True)), [NEG + 1, 1])
# doc_norm = sqrt (sum (each x^2))doc_norm = Tf.sqrt (Tf.reduce_sum (Tf.square (doc_y), 1, True)) prod = tf.reduce_sum (tf.multiply (Tf.tile (query_y),
[ NEG + 1, 1]), doc_y), 1, True)
Norm_prod = tf.multiply (Query_norm, Doc_norm)
# cos_sim_raw = query * Doc/(| | query| | * || doc| |)
Cos_sim_raw = Tf.truediv (prod, Norm_prod)
# gamma = Cos_sim
= Tf.transpose (Tf.reshape (Tf.transpose (Cos_sim) _raw), [NEG + 1, Query_bs])) * 20
3.4 Defining the loss function
With Tf.name_scope (' Loss '):
# Train Loss
# converted to Softmax probability matrix.
prob = Tf.nn.softmax (Cos_sim)
# takes only the first column, which is the probability of the positive sample column.
Hit_prob = Tf.slice (prob, [0, 0], [-1, 1])
loss =-tf.reduce_sum (Tf.log (hit_prob))
tf.summary.scalar (' Loss ', loss)
3.5 Selecting an optimization method
With Tf.name_scope (' Training '):
# Optimizer
train_step = Tf.train.AdamOptimizer (flags.learning_rate). Minimize (loss)
# # 3.6 Start Training
# Create a Saver object, optionally saving variables or models.
saver = Tf.train.Saver ()
# with TF. Session (Config=config) as Sess: with
TF. Session () as Sess:
Sess.run (Tf.global_variables_initializer ())
Train_writer = Tf.summary.FileWriter ( Flags.summaries_dir + '/train ', sess.graph)
start = Time.time () for
step in range (flags.max_steps):
Batch_ id = step% flags.epoch_steps
sess.run (Train_step, Feed_dict=feed_dict (True, true, batch_id% flags.pack_size, 0.5))
GitHub full Code HTTPS://GITHUB.COM/INSANELIFE/DSSM
Multi-View DSSM implementation of the same, you can refer to Github:multi_view_dssm_v3
CSDN Original: http://blog.csdn.net/shine19930820/article/details/79042567