C + + Serialization method

Source: Internet
Author: User
Tags normalizer python script

  1. Using boost serialization for the time being, my test is basically OK just for c++11 shared ptr not tested successfully, can only manually write down the shared PTR part of the serialization, that is, I do not directly serialize the pointer, self-management, such as the following look

    Load_ (Modelfile); Model Direct serialization

    String normalizername = Read_file (Obj_name_path (_normalizer));

    if (!normalizername.empty ())

    {//Because there is no direct serialization using shared PTR, I do not know the specific information, so I wrote the Normalzier type name to the text when I save, and the load is determined by this type

    _normalizer = Normalizerfactory::createnormalizer (Normalizername, Obj_path (_normalizer));

    }

    String calibratorname = Read_file (Obj_name_path (_calibrator));

    if (!calibratorname.empty ())

    {

    _calibrator = Calibratorfactory::createcalibrator (Calibratorname, Obj_path (_calibrator));

    }

    Static Normalizerptr Createnormalizer (string name)

    {

    Boost::to_lower (name);

    if (name = = "Minmax" | | name = = "Minmaxnormalizer")

    {

    return make_shared<minmaxnormalizer> ();

    }

    if (name = = "Gaussian" | | name = = "Gaussiannormalizer")

    {

    return make_shared<gaussiannormalizer> ();

    }

    if (name = = "Bin" | | name = = "Binnormalizer")

    {

    return make_shared<binnormalizer> ();

    }

    LOG (WARNING) << name << "Isn't supported now, does not use Normalzier, return nullptr";

    return nullptr;

    }

    ??

    Static Normalizerptr Createnormalizer (string name, string path)

    {

    Normalizerptr normalizer = Createnormalizer (name);

    if (normalizer! = nullptr)

    {

    Normalizer->load (path); Direct serialization of Normalzier

    }

    return normalizer;

    }

    ??

    @TODO confirm if there is no way to serialize the shared PTR directly,

    In addition, you can try the open source dedicated serialization library Creal, Creal emulate boost serialization while boost serialization only supports binary, text, XML three serialization, text serialization readability is not strong, binary speed is the fastest, The XML is more readable at a slower rate. I usually only use binary and XML format. The Creal supports the JSON-formatted output , which is known to support shared PTR

    ??

    Speed of the same model boost serialization

?

binary

text

save

1.8

2.29

Load

1.9

2.67

??

??

  1. if XML output is required, the serialization of boost is not the same as the binary output, and it is recommended that the XML output-supported notation be compatible with each other.

    ??

    Friend class Boost::serialization::access;

    Template<class archive>

    void Serialize (Archive &ar, const unsigned int version)

    {

    /*???????? AR & Boost::serialization::base_object<predictor> (*this);

    AR & _weights;

    AR & _bias;*///This notation only supports binary

    AR & BOOST_SERIALIZATION_BASE_OBJECT_NVP (Predictor);

    AR & BOOST_SERIALIZATION_NVP (_weights); Such a macro is convenient if you need to change the name, such as _weights->weights can use the original function

    AR & BOOST_SERIALIZATION_NVP (_bias);

    }

    ??

    ??

  2. The code for the serialized part is automatically generated using a Python script. Because it is not the same as C #, C # is serializable by default, if you do not need serialization, you can like a # define specified, and the boost default is not serialized, need to serialize the place need to display is written on the

    predictors]$ get-lines.py LINEARPREDICTOR.H 98 99 | gen-boost-seralize-xml.py

    ??

    friend class boost::serialization::access;

    template<class archive>

    void Serialize (Archive &ar, const unsigned int version)

    {

    ar & BOOST_SERIALIZATION_NVP (_weights);

    ar & BOOST_SERIALIZATION_NVP (_bias);

    ??

4. For predictor default is the save binary, the optional SaveXML mode this automatic support, optional savetext This is a specific precitor subtype if there is a need to manually write the text output format.

The XML output is similar to this

??

Convert to JSON

xml2json.py model.xml > Model.json

More Model.json

Use JSON pretty print to view JSON files

jpp.py Model.json | More

??

xml2tojson.py using Xmltodict to convert to JSON

??

Import Sys,os

Import Xmltodict, JSON

Doc = xmltodict.parse (open (sys.argv[1)), process_namespaces=true)

Print Json.dumps (DOC)

??

jpp.py

Import Sys,os

Import JSON

s = open (Sys.argv[1]). ReadLine (). Decode (' GBK ')

Print Json.dumps (Json.loads (s), Sort_keys=true, indent=4, Ensure_ascii=false). Encode (' GBK ')

??

  1. How can I view the model of the output more easily?

    Small model output directly look at the XML text is good, if the data compare more processing XML is not very convenient, json better with Python,

    But it's not very convenient to convert to a JSON map because you have to follow the key to access the string type is not automatically prompted

    ??

    In [6]: Import JSON

    ??

    In [7]: M = json.loads (Open ('./model.json '). ReadLine ())

    ??

    In [8]: M.keys ()

    OUT[8]: [u ' boost_serialization ']

    ??

    In [9]: m[' boost_serialization '].keys ()

    OUT[9]: [u ' @version ', U ' @signature ', U ' data ']

    ??

    In []: m[' boost_serialization ' [' Data '] [' _trees '] [' Item '][0].keys ()

    OUT[18]:

    [u ' _gainpvalue ',

    U ' @tracking_level ',

    U ' @class_id ',

    U ' _ltechild ',

    U ' _gtchild ',

    U ' _maxoutput ',

    U ' _leafvalue ',

    U ' numleaves ',

    U ' _splitgain ',

    U ' _splitfeature ',

    U ' _previousleafvalue ',

    U ' _threshold ',

    U ' @version ',

    U ' _weight ')

    ??

    in [+]: m[' boost_serialization ' [' Data '] [' _trees '] [' Item '][0][' _splitgain '] [' Item '][10]

    OUT[19]: U ' 3.89894126598927926e+00 '

    ??

    Since the python prompt is not prompted as private by default, it has been modified

    #include "conf_util.h"

    #include <boost/serialization/nvp.hpp>

    #define GEZI_SERIALIZATION_NVP (name)

    BOOST::SERIALIZATION::MAKE_NVP (Gezi::conf_trim (#name). C_STR (), name)

    ??

    This shows that Gainpvalue is not the beginning of _

    ??

    Using Python's introspection function, dict data can be parsed by JSON, and string as key is converted into a Python object for easy access to the following

    def H2O (x):

    If Isinstance (x, Dict):

    return type (' Jo ', (), {K:H2O (v) for K, V in X.iteritems ()})

    Elif isinstance (x, List):

    L = [H2O (item) for item in X]

    return L

    Else

    return x

    ??

    def H2O2 (x):

    If Isinstance (x, Dict):

    return type (' Jo ', (), {K:h2o2 (v) for K, V in X.iteritems ()})

    Elif isinstance (x, List):

    return type (' Jo ', (), {"I" + str (IDX): H2O2 (val) for Idx, Val in enumerate (x)})

    return L

    Else

    return x

    ??

    def xmlfile2obj (path):

    Import Xmltodict

    Doc = xmltodict.parse (open (path), process_namespaces=true)

    Return H2O (DOC)

    ??

    def xmlfile2obj2 (path):

    Import Xmltodict

    Doc = xmltodict.parse (open (path), process_namespaces=true)

    Return H2O2 (DOC)

    ??

    This can be done directly with the serialized XML file using M = xmlfile2obj (' *.xml ') or M = Xml2obj2 (' *.xml ')

    The recommendation is to use the first, which is the standard conversion, to provide the second interface is primarily Python's automatic hint for list item no, only Dir () to view:

    The second one turns [3] this way. I3 that is, all lists are removed with Dict.

    ??

    m = Xmlfile2obj ('./model.xml ')

    In []: m.boost_serialization.data.trees.item[0].splitgain.item[13]

    OUT[14]: U ' 3.26213753939964946e+00 '

    ??

    m = Xmlfile2obj2 ('./model.xml ')

    in [+]: m.boost_serialization.data.trees.item.i0.splitgain.item.i13

    OUT[16]: U ' 3.26213753939964946e+00 '

    ??

    ??

    ??

    ??

    ??

??

C + + Serialization method

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.