Christmas play a bit hi, almost forgot to update. I wish you all a merry Christmas and a happy New Year's day.
To continue to study, under the/home/your_name/tensorflow/cifar10/new folder Cifar10_train, used to save the training log logs, continue in/home/your_name/tensorflow/ Enter the following code in the cifar10/cifar10.py:
deftrain ():#Global_stepGlobal_step = tf. Variable (0, name ='Global_step', trainable=False)#Cifar10 Data FolderData_dir ='/home/your_name/tensorflow/cifar10/data/cifar-10-batches-bin/' #training when the log logs file, no this directory to first build aTrain_dir ='/home/your_name/tensorflow/cifar10/cifar10_train/' #Load Images,labelsImages, labels =my_cifar10_input.inputs (Data_dir, batch_size)#Beg LossLoss =losses (inference (images), labels)#set the optimization algorithm, here with SGD random gradient descent method, constant learning rateOptimizer =Tf.train.GradientDescentOptimizer (learning_rate)#The Global_step is used to set the initializationTrain_op = optimizer.minimize (loss, Global_step =global_step)#Save OperationSaver =Tf.train.Saver (Tf.all_variables ())#Summary ActionSummary_op =tf.merge_all_summaries ()#Initialize all variables in the initialization modeinit =tf.initialize_all_variables () os.environ['cuda_visible_devices'] =Str (0) config=TF. Configproto ()#20% of resources that occupy the GPUConfig.gpu_options.per_process_gpu_memory_fraction = 0.2#set session mode, with InteractiveSession interactive sessions, forcing the grid to highSess = tf. InteractiveSession (config=config)#Run InitializationSess.run (init)#set up a multithreaded coordinatorCoord =Tf.train.Coordinator ()#Start queue runners (queuing runner)Threads = tf.train.start_queue_runners (Sess = sess, coord =coord)#write the summary into the Train_dir, and note that it's not running yet.Summary_writer =Tf.train.SummaryWriter (Train_dir, sess.graph)#start the training process Try: forStepinchxrange (max_step):ifcoord.should_stop (): Breakstart_time=time.time ()#running loss in a session_, Loss_value =Sess.run ([Train_op, loss]) duration= Time.time ()-start_time#Confirm Convergence assert notNp.isnan (Loss_value),'Model diverged with loss = NaN' ifStep% 30 = =0:#This section of the code to set some fancy print format, you can do without the tubeNum_examples_per_step =batch_size examples_per_sec= Num_examples_per_step/Duration Sec_per_batch=Float (duration) format_str= ('%s:step%d, loss =%.2f (%.1f examples/sec;%.3f' 'Sec/batch)') Print(Format_str%(DateTime.Now (), step, Loss_value, Examples_per_sec, Sec_per_batch)) ifStep% 100 = =0:#Run summary operation, write RollupSummary_str =Sess.run (summary_op) summary_writer.add_summary (summary_str, Step)ifStep% 1000 = = 0or(step + 1) = =Max_step:#saves the current model and weights to train_dir,global_step as the current number of iterationsCheckpoint_path = Os.path.join (Train_dir,'model.ckpt') Saver.save (Sess, Checkpoint_path, Global_step=Step)exceptException, E:coord.request_stop (e)finally: Coord.request_stop () coord.join (threads) sess.close ()defevaluate (): Data_dir='/home/your_name/tensorflow/cifar10/data/cifar-10-batches-bin/'Train_dir='/home/your_name/tensorflow/cifar10/cifar10_train/'images, Labels= My_cifar10_input.inputs (Data_dir, batch_size, train =False) Logits=inference (Images) Saver=Tf.train.Saver (Tf.all_variables ()) os.environ['cuda_visible_devices'] =Str (0) config=TF. Configproto () config.gpu_options.per_process_gpu_memory_fraction= 0.2Sess= TF. InteractiveSession (config=config) coord=tf.train.Coordinator () Threads= Tf.train.start_queue_runners (Sess = sess, coord =coord)#Load Model Parameters Print("Reading checkpoints ...") Ckpt=tf.train.get_checkpoint_state (Train_dir)ifCkpt andCkpt.model_checkpoint_path:ckpt_name=os.path.basename (ckpt.model_checkpoint_path) Global_step= Ckpt.model_checkpoint_path.split ('/') [ -1].split ('-') [-1] Saver.restore (Sess, Os.path.join (Train_dir, ckpt_name))Print('Loading Success, Global_step is%s'%global_step)Try: #Compare the classification results, as for why use this function, the following detailsTop_k_op = Tf.nn.in_top_k (logits, labels, 1) True_count=0 Step=0 whileStep < 157: ifcoord.should_stop (): Breakpredictions=Sess.run (top_k_op) True_count+=np.sum (predictions) Step+ = 1Precision= true_count/10000Print('%s:precision @ 1 =%.3f'%(DateTime.Now (), precision))excepttf.errors.OutOfRangeError:coord.request_stop ()finally: Coord.request_stop () coord.join (threads) sess.close ()if __name__=='__main__': ifTrain:train ()Else: Evaluate ()
Now explain the function of In_top_k, the official document: tf.nn.in_top_k(predictions, targets, k, name=None)
This function returns a batch_size size Boolean matrix array,predictions is a batch_size*classes size matrix, Targets is a batch_size-sized category index matrix, the function of which is, if Targets[i] is predictions[i][:] The first k maximum value, then the return of array[i] = True, otherwise, the return of the AR Ray[i] = False. As you can see, in the above evaluation procedure evaluate, this function is not calculated with the results of Softmax, but is calculated using the inference final output (a fully connected layer).
After writing, click Run, you can see that the training of the loss value, from the beginning of about 2.31, down to the final 0.00 or so, in the course of training,/home/your_name/tensorflow/cifar10/cifar10_train/ Folder will appear 12 files, including 5 model.ckpt-0000 files, this is the training process to save the model, the following number represents the number of iterations, 5 Model.ckpt-0000.meta files, this is the training process saved metadata (temporarily unclear function) , TensorFlow saves only a few recent models and several meta-data by default, removing the previously useless model and metadata. There is also a checkpoint text document, and a out.tfevents form of the file, is the summary log file. If you do not want to use Tensorboard to see the network structure and training process of weight distribution, loss, etc., in the program can not write summary statements.
After the training is done, we visualize it with tensorboard (in fact, it can be visualized at any time during the training process). Open the command line terminal at any location and enter:
Tensorboard--logdir=/home/your_name/tensorflow/cifar10/cifar10_train/
The following instructions will appear:
According to the instructions, open the browser, enter the http://127.0.1.1:6006 (some browsers may not support, suggest more than a few browsers to try) will see the visual interface, there are six tabs:
There are two graphs in the EVENTS dialog, one is the loss diagram in the training process, one is the graph of queue queues, and because there are no image_summary () and audio_summary () statements, the IMAGES and Audio tabs have no content; grap The HS tab contains a flowchart for the entire model, such as the ability to expand and move the selected Namespace;distrbutions and histograms that contain the various aggregated distributions and histograms at the time of training.
After training, set TRAIN = False to test and get the following results:
It can be seen that the accuracy of the test is only 76%, the test results are not high enough for the reason may be, the test did not go through the Softmax layer, directly with the weight of the full connection layer (doubt?). ), the official code also gives the official operating results, as follows:
Can see, after 100,000 iterations, the official given the correct rate of 83%, we only carried out 50,000 times, to 76% of the correct rate, relatively speaking, still can, the effect of the official good reasons may be:
1. The official use of non-fixed learning rates;
2. The official iteration is one times more than the number of iterations of this code;
Reference documents:
1. Https://github.com/tensorflow/models/tree/master/tutorials/image/cifar10
TF Boys (tensorflow Boys) Form a diary (vi)