TensorFlow Source code Analysis of the Common_runtime-direct

TensorFlow Source code Analysis of the Common_runtime-direct_session

Last Update:2018-09-18 Source: Internet

Author: User

Tags mutex

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Directory

Core Concepts
Direct_session
1. Direct_session.h
2. direct_session.cc

1. Core Concepts

Readers who have read the previous article should remember that the session is an execution agent. We give the calculation diagram and input to the session, which dispatches the actuator and performs the calculation to produce the result. TF provides us with one of the simplest actuator direction_session. According to the current understanding, we feel that the implementation of direction_session should be very simple and straightforward, after all, the complex structure of the actuator we have seen in executor that article. But in fact, the problem is that sometimes we just want to calculate some nodes in the graph as input, some nodes as output, to perform a small number of calculations in the graph, without the need to execute the whole map, and another aspect, this part of the graph to perform the task, on the same graph may exist at the same time multiple. To cope with this situation, direct_session a lot of ancillary data.

2. direct_session2.1 Direct_session.h

The Directsession class provides a rich range of data and interfaces, and we omit the formal parameters of some functions for brevity:

Class Directsession:public Session {public:directionsession (const sessionoptions& options, const device* Devic        E_mgr, directsessionfactory* Factory);    Status Create (const graphdef& graph) override;    Status Extend (const graphdef& graph) override; Status Run (...) override;//run diagram status Prunsetup (...); /Part run diagram ready Status prun (...); /part of the Run Diagram Status Reset (const std::vector<string>& containers);//empty device_mgr in containers, if containers itself is empty    , then empty the default container Status Listdevice (...) override;    Status Close () overrides;        Status Localdevicemanager (const devicemgr** output) overrides;  void Exportcostmodels (...); Private:status maybeinitializeexecutionstate (...); /After the given graph, if the actuator state is not initialized, initialize the underlying actuator state status getorcreateexecutors (...); /For a given set of inputs and outputs, retrieved in a given set of actuators, if there is an appropriate actuator, if not, create a Status creategraphs (...); /given graph_def_ and devices, as well as inputs and outputs, create multiple graphs, these newly created diagrams share a common library of functions flib_def Status extendlocked (const graphdef& graph);//eXtend's internal execution class Status resourcehandletoinputtensor (...); Status sendpruninputs (...); /provide more input to the actuator, start the subsequent execution Status recvprunoutputs (...); /Get more output from the actuator, it waits until the output tensor is calculated to complete Status checkfetch (...);    /check whether the output of the requirement can be calculated according to the given input Status waitfornotification (...);        Status checknotclosed ();        Const Sessionoptions Options_;    Device-related Structures const std::unique_ptr<const devicemgr> device_mgr_;    Std::vector<device*> Devices_;        Deviceset device_set_;    String Session_handle_;    BOOL Graph_created_ guarded_by (Graph_def_lock_) = false;    Mutex Graph_def_lock_;        Graphdef graph_def_ guarded_by (graph_def_lock_);        STD::VECTOR&LT;STD::p air<thread::threadpool*, bool>> thread_pools_;//is used to execute the thread pool of the OP, using a Boolean value to flag whether or not to own the thread pool        Status Init_error_; BOOL Sync_on_finish_ = true;//If true, blocks the thread until the device has completed operations in all queues within a step void schedclosure (thread::threadpool* pool, std::        Function<void () > C);//scheduling C Mutex executor_lock_;//protection actuator in thread poolStd::unordered_map<string, std::shared_ptr<executorsandkeys>> executor_ GUARDED_BY (executor_lock_);// The signature is mapped to its actuator, and the signatures include the input and output of the partial execution graph, which can uniquely determine a partial execution diagram std::unordered_map<string, std::shared_ptr<runstate>> Partial_runs_ guarded_by (Executor_lock_);//from the signature to the partial execution state, each partial execution has a structure that specifically preserves its state sessionstate session_state_;//    Saves all current tensor directsessionfactory* const FACTORY_ that are currently alive in the session;        cancellationmanager* Cancellation_manager_; Std::unordered_map<string, string> stateful_placements_ guarded_by (graph_def_lock_);// For stateful nodes (such as params and queue), save the node name to the device where the node is located, and once the nodes are placed on a device, they are not allowed to move again std::unique_ptr< Simplegraphexecutionstate> execution_state_ guarded_by (graph_def_lock_);//Use std::unique_ptr< when placing the entire picture    Functionlibrarydefinition> flib_def_;//A library of functions prior to any rewrite or optimization, in particular, the Creategraphs function modifies the function library mutex closed_lock_; BOOL Closed_ guarded_by (closed_lock_) = false;//If the session has been closed, true//generates a unique name for this session std::atomic<int64> Edge_na Me_counter_ = {0};        Std::atomic<int64> Handle_name_counter_ = {0}; Static std::atomic_int_fast64_t step_id_counter_;//generates a unique step ID for all sessions const INT64 OPERATION_TIMEOUT_IN_MS_ = 0;//Global Timeout threshold for blocking operations Costmodelmanager cost_model_manager_;//all loss models for graphs executed in the current session}

The

is visible, and much of the content inside the directsession is prepared for partial execution. Since the calculation diagram is only a calculation plan, we can perform different calculations by selecting different inputs and outputs for the same graph. Different computations require different actuators, and different storage structures are required to hold the current state of each calculation. To do this, TF specifically gives a few structures, first we look at the different calculation of the actuator packaging:

 //partition for each of the actuators and functions run-time library struct Perpartionexecutorandlib {graph* Graph = nullptr;    Std::unique_ptr<functionlibraryruntime> Flib; Std::unique_ptr<executor> executor;};/    /data structure provided for each calculation struct Executorsandkeys {std::atomic_int_fast64_t step_count;    Std::unique_ptr<graph> Graph;    Namenodemap Name_to_node;    Std::unique_ptr<functionlibrarydefinition> Flib_def;    std::vector<perpartitionexecutorsandlib> items;    Std::unordered_map<string, size_t> Input_name_to_index;    Std::unordered_map<string, string> Input_name_to_rendezvous_key;    Std::unordered_map<string, size_t> Output_name_to_index;        Std::unordered_map<string, string> Output_name_to_rendezvous_key;    Datatypevector input_types; Datatypevector output_types;};

For a calculated figure, our execution of each calculation, whether the calculation of the full graph or part of the calculation, are likely to be cross-device, it is necessary to do node placement, the node of the graph is divided into different devices, each device placed a diagram of the partition, Each partition has a corresponding runtime function library and executor. For each calculation, we need a vector to store the information of different partition.
In addition, just mentioned that we also need to provide a structure to save the current state for each calculation, let's look at:

//对于每一个partition内的执行，会话保存了一个RunStatestruct RunState {    mutex mu_;    Status status GUARDED_BY(mu_);    IntraProcessRendezvous* rendez = nullptr;    std::unique_ptr<StepStatsCollector> collector;    Notification executors_done;    std::unordered_map<string, bool> pending_inputs;//如果已经提供了输入，则为true    std::unordered_map<string, bool> pending_outputs;//如果已经获得了输出，则为true    TensorStore tensor_store;    ScopedStepContainer step-container;    //...};struct RunStateArgs {    RunStateArgs(const DebugOption& options) : debug_options(options) {}    bool is_partial_run = false;    string handle;    std::unique_ptr<Graph> graph;    const DebugOptions& debug_options;};

Runstate provides state-saving functionality for each partition execution, while Runstateargs provides parameters and configurations for debugging.

2.2 direct_session.cc

In the source file, the definition of directsessionfactory, which provides the ability to generate and manage Directsession, is briefly excerpted as follows:

class DirectSessionFactory : public SessionFactory {  public:    Session* NewSession(const SessionOptions& options) override;    Status Reset(...) override;    void Deregister(const DirectSession* session);  private:    mutex session_lock_;    std::vector<DirectSession*> session_ GUARDED_BY(sessions_lock_);//用于存储生成的DirectSession};

In addition, a class for direct factory registration is provided:

class DirectSessionRegistrar {  public:    DirectSessionRegistrar() {        SessionFactory::Register("DIRECT_SESSION", new DirectSessionFactory());    }};static DirectSessionRegistrar registrar;

Below, we will disassemble the important functions in the directsession in order, because some function details are more, except the core code, we only give the function explanation:

Directsession::D irectsession (const sessionoptions& options, const devicemgr* device_mgr, directsessionfactory* Const Factory) {//Prepare thread pool according to options//Device_mgr prepare Device_ and device_set_ and Op_segment ()}status Directsession::run () for each device (.. .) {//Extract the name of the input for this run of the current session//Check if there is already a ready-made actuator for the required input and output//construct a call frame, which facilitates the transfer of input and output between the session and the executor//create a structure of the runtime State (RunS Tate)//Start parallel execution, the core code is as follows for (const auto& item:executors_and_keys->items) {item.executor->runasync (AR    GS, Barrier->get ()); }//Get output//Save the output tensor we want to save in this run//create and return the loss model (cost model)//If there is a correlation configuration in runoptions, the output of the segmented figure}status Directsession::getor Createexecutors (...) {//Fast find path//Slow find path, sort inputs and outputs so that the same input and output sets will get the same signature//if not found, create this actuator and cache//build the execution diagram, the core code is as follows Creategraphs (options, &    Amp;graphs, &ek->flib_def, Run_state_args, &ek->input_types, &ek->output_types)); Prepare run-time environment}status directsession::creategraphs (...) for each sub-diagram {//Pre-preprocessing//Graph segmentation algorithm, the core code is as follows Partition (popts, &client_graph->graph, &partitions); Check the effectiveness of the segmentation results//Graph optimization traversal, the core code is as follows Optimizationpassregistry::global ()->rungrouping (optimizationpassregistry::P ost_    partitioning, optimization_options); Allow the device to override its own sub-graph}

Visible, the specific execution process is within the run function, call the Executor->runasync function to implement, before the concrete execution, we also need to get the executor through the Getorcreateexecutors function, within this function, We use the Creategraphs function to segment the original image and optimize the graph by using the graph optimization traversal algorithm.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More