About TensorFlow a very good article, reprinted from the "TensorFlow deep learning, an article is enough" click to open the link
Google is not only the leader in big data and cloud computing, but also has a good practice and accumulation in machine learning and deep learning, and at the end of 2015, open Source was used internally by the deep learning framework TensorFlow. Compared with Caffe, Theano, Torch, mxnet and other frameworks, TensorFlow has the largest number of fork and star numbers on GitHub, and is rich in applications such as graphics classification, audio processing, referral systems, and natural language processing. The recently popular Keras framework uses TensorFlow, the famous Stanford cs231n course uses TensorFlow as a programming language for teaching and homework, and many tensorflow books are already in preparation or on sale at home and abroad. Alphago Development Team DeepMind also plans to migrate neural network applications to TensorFlow, which confirms the popularity of TensorFlow in the industry. TensorFlow not only opened the source code on GitHub, but also in "Tensorflow:large-scale machine learning on heterogeneous distributed Systems" The paper also introduces the design and implementation of the system framework, in which the training cluster which has tested 200-node scale is not comparable to other distributed deep learning frameworks. Google also introduced the Google Play App Store in Wide & deep learning for Recommender Systems and the YouTube Video recommendation System paper And YouTube video of the proposed algorithmic model, also provides a tensorflow-based code example, using TensorFlow anyone can get close to state of the art in the Imagenet or Kaggle contest.
TensorFlow from getting started to apps
It is no exaggeration to say that the popularity of TensorFlow makes the threshold for deep learning more and more low, as long as you have the basics of Python and machine learning, getting started and using neural network models becomes very simple. TensorFlow supports Python and C + + two programming languages, and a complex multilayer neural network model can be implemented in Python, without worrying if the business uses other programming Using a cross-language GRPC or HTTP service can also access intelligent models that are trained using TensorFlow.
So how do you write TensorFlow applications using Python? How hard is it to get from getting started to applying.
Here we write a Hello World application, output a string and perform a simple operation.
From this simple code you can see that the use of TensorFlow is very convenient and is imported in the form of a Python standard library without the need to start additional services. The first contact with TensorFlow may be confusing, this logic python can also be implemented, why use Tf.constant () and TF. Session (). In fact, TensorFlow defines the running model and training through graph and session, which has great benefits in complex models and distributed training, which will be introduced in the following sections of the article.
The previous Hello World application did not train the model, and then a logistic regression problem and model were introduced. We use NumPy to construct a set of linear relational data, and with the random gradient algorithm implemented by TensorFlow, the slope and intercept in the function can be automatically solved after training long enough.
The above code can be found in the Tensorflow_examples project, trained to see the output slope W is about 2, intercept B is about 10, and the correlation between the data we built is very consistent. Note that the TensorFlow code does not implement the least squares algorithm, and there is no if-else to control the code logic, is completely data-driven and based on the gradient descent algorithm dynamically adjust loss values learned. This allows us to switch to other data sets, even to other areas such as image classification, without the need to modify the code can be automatically learned by the machine, which is the neural network and TensorFlow powerful place.
The previous model has only W and b two variables, and if the data is in a nonlinear relationship it is difficult to get good results, so we recommend using deep neural networks, which is also a deep learning model that TensorFlow design focuses on. We know that Google won the Imagenet global competition with the inception model in 2014, and the code is based on TensorFlow, the more complex model definition code.
With TensorFlow already packaged fully connected networks, convolutional neural Networks, RNN and lstm, we have been able to assemble a variety of network models that enable multilayer neural networks such as inception to be as simple as piecing together Lego. But there are more details on choosing an optimization algorithm, generating tfrecords, exporting a model file, and supporting distributed training, and then we'll cover all tensorflow-related core usage techniques in a single article. tensorflow Core Usage Tips
To introduce the various uses of TensorFlow, we will use Deep_recommend_system, an open source project that implements Tfrecords, Queuerunner, Checkpoint, Tensorboard, Inference, GPU support, distributed training and multi-layer neural network model, and can easily expand the implementation of wide and deep models, in the actual project development can be directly downloaded to use.
1. Prepare training Data
General TensorFlow application code contains the definition of graph and the operation of the session, the amount of code can not be encapsulated into a file, such as cancer_classifier.py file. Before training, you need to prepare the sample data and test data, the general data file is a space or comma-delimited csv file, but TensorFlow recommended the use of binary tfrecords format, which can support Queurunner and coordinator multi-threaded data read, The batch size and epoch parameters can be used to control the size of a single batch at training time and how many rounds are trained on the sample file iterations. If you read the CSV file directly, you need to record a pointer to the next read data in your code, and it is inconvenient to use when the sample is not fully loaded into memory.
In the data directory, the project has provided a CSV and tfrecords format conversion tool convert_cancer_to_ Tfrecords.py, refer to this script and you can parse the CSV file in any format and turn it into TensorFlow supported tfrecords format. Whether it is big data or small data, through simple script tools can directly docking TensorFlow, the project also provides print_cancer_tfrecords.py script call API directly read the contents of tfrecords file.
2. Accept command-line arguments
With Tfrecords, we can write code to train the neural network model, but as we all know, deep learning has too much hyperparameter to tune, we need to adjust the optimization algorithm, the model layer number and the different models, it is very convenient to use command line parameters at this time.
The TensorFlow bottom uses the Python-gflags project, which is then encapsulated into the Tf.app.flags interface, which is very simple and intuitive to use, and typically defines the command line parameters in the actual project, especially the cloud machine that will be mentioned later Learning Service, the parameters are adopted to simplify the tuning of the hyperparameter.
3. Define a neural network model
To prepare the data and parameters, the most important thing is to define the network model, the definition of the model parameters can be very simple, create multiple variable, can also be more complex, such as using the Tf.variable_scope () and Tf.get_variables () interface. To ensure that each variable has a unique name and can easily modify the number of hidden nodes and the number of network layers, we recommend referencing the code in the project, especially when defining variables to bind Cpu,tensorflow using the GPU by default may cause parameter updates to be too slow.
The code above is also common in production environments, whether it's training, implementing inference, or validating model correctness and AUC. The project also implements the wide and deep model based on this code, which is widely used in the Google Play App Store, which is also suitable for popular recommender systems, combining the traditional logistic regression model with the neural network model of depth learning. 4. Use a different optimization algorithm
To define the network model, we need to decide which optimizer to use to optimize the model parameters, whether to choose SGD, Rmsprop or choose Adagrad, Ftrl. There is no fixed answer to different scenarios and datasets, and the best way to do this is to practice, with the command-line parameters defined earlier that we can easily use different optimization algorithms to train the model.
In the production practice, different optimization algorithms have great differences in training results and training speed, and the optimization of network parameters may not be effective using other optimization algorithms, so choosing the correct optimal algorithm is an important step in hyperparameter tuning. By adding this logic to the TensorFlow code, it is also possible to implement the corresponding function well. 5. Online Learning and continuous learning
Many machine learning vendors will declare their products to support online learning, in fact, this is only a basic function of TensorFlow, is to support the continuous optimization of data models. TensorFlow can save model and restore model parameters through Tf.train.Saver (), and after using Python to load model files, it can continuously accept data from online requests, update model parameters and save to checkpoint by Saver. For next optimization or online service.
The continuous training means that even if the training is interrupted, the last training result can continue to optimize the model, which is also achieved through saver and checkpoint files in TensorFlow. The Deep_recommend_system project will continue to optimize the model from the last training, or you can specify Train_from_scratch on the command line, not only to not worry about the training process being interrupted, but also to provide online services while training inference. 6. Using Tensorboard to optimize parameters
TensorFlow also integrates a powerful graphical tool, that is, tensorboard, generally only need to add our interest in the code of training indicators, Tensorboard will automatically draw on these parameters, through the visual way to understand the model training situation.
7. Distributed tensorflow Applications
Finally, we have to introduce tensorflow powerful distributed computing function, traditional computing framework such as Caffe, native does not support distributed training, in the case of large data volume often cannot increase the machine scale out. TensorFlow hosted Google's various business petabytes of data, at the beginning of the design to consider the needs of distributed computing, through the GRPC, Protobuf and other high-performance libraries to achieve the neural network model of distributed computing.
The implementation of distributed TensorFlow application is not difficult, the construction of graph code and stand-alone version of the same, we implemented a distributed cancer_classifier.py example, the following command to start the multi-PS multi-worker training cluster.
Before we dive into the code, we need to understand the concepts of PS, worker, In-graph, between-graph, synchronous training, and asynchronous training in distributed TensorFlow. First of all, PS is the parameter server of the whole training cluster, the variable,worker of the model is the node that calculates the gradient of the model, and the resulting gradient vectors are delivered to the PS update model. In-graph and Between-graph correspond, but both can achieve synchronous training and asynchronous training, in-graph refers to the entire cluster by a client to build graph, and by this client to submit graph to the cluster, Other workers are only responsible for the task of gradient calculations, whereas between-graph refers to multiple workers in a cluster that can create multiple graph, but because the worker runs the same code, the graph is the same, And the parameters are saved to the same PS to ensure that the same model is trained so that multiple workers can build graph and read training data, suitable for big data scenarios. The difference between synchronous and asynchronous training is that each update gradient in sync training needs to block the results of waiting for all workers, while asynchronous training is not blocked, training is more efficient, and asynchronous training is generally used in big data and distributed scenarios. 8. Cloud Machine Learning
Before already introduced the TensorFlow related all content, the careful netizen may have discovered, the TensorFlow function is powerful, but the essence still is a library, the user besides writes TensorFlow application code also needs to serve on the physical machine, and manually specify the directory of training data and model files, maintenance costs are large, and the machines are not shared.
Throughout the big data processing and resource scheduling industry, Hadoop ecosystem has become the industry standard, through the MapReduce or spark interface to process data, the user through the API submission task by yarn for Uniform resource allocation and scheduling, not only to make distributed computing possible, The utilization of the server is greatly improved by the resource sharing and the unified dispatching flat platform. It's a pity that the TensorFlow definition is a deep learning framework and does not include features such as cluster resource management, but after the open source TensorFlow, Google Cloud ml service was released soon, and we started out as an early user of cloud ml from the alpha version. Deep understanding of the convenience of deep learning in the cloud training. With the Google Cloud ml service, we can submit TensorFlow application code directly to the cloud and even deploy a trained model directly to the cloud, directly through the API, and thanks to TensorFlow's good design, We implemented the Cloud machine learning service based on Kubernetes and TensorFlow serving, and the architecture design and usage interfaces are similar to Google Cloud ml.
TensorFlow is a very good framework for deep learning, for individual developers, researchers have enterprises are worth investing in the technical direction, and cloud machine learning can solve the user in the environment initialization, training task management and neural network model of the online services management and scheduling problems. Now that Google Cloud ML has supported automatically hyperparameter tunning, parameter tuning will become a computing problem rather than a technical one, even if some developers use mxnet or something other than TensorFlow, We are also willing to communicate with more deep learning users and platform developers to promote community development.