Deeplearning.ai Summary-C + + to achieve ADMA optimization __c++

Source: Internet
Author: User

Deeplearning.ai Summary-C + + implementation ADMA optimization

Flyfish

Compiling environment
vc++2017

The theory is excerpted from "deep learning"

Adam a learning Rate adaptive optimization algorithm the name "Adam" derives from the phrase "adaptive moments".
In the context of early algorithms, it may be best to be seen as a variant that combines rmsprop with some important differences in momentum.
First, in Adam, momentum is directly incorporated into the estimation of the gradient first-order moment (exponential weighting). The most intuitive way to add momentum to the rmsprop is to apply momentum to the scaled gradient.
There is no clear theoretical motivation for using the momentum of scaling. Second, Adam includes a bias correction, a correction of the first-order moment (momentum term) initialized from the origin, and an estimate of the second-order moment (non-center).
The Rmsprop also uses the (non centered) second-order moment estimation, but the correction factor is missing. Therefore, unlike Adam,rmsprop, second-order moment estimation may have a high bias at the beginning of training.
Adam is often considered a fairly robust choice of parameters, although learning rates sometimes need to be modified from suggested defaults.

Paper address//http://arxiv.org/abs/1412.6980 #include <vector> #include <unordered_map> template <typename T
    , TypeName func> inline void For_i (T size, Func f) {for (size_t i = 0; i < size; ++i) {f (i);
the typedef std::vector<double> TENSOR2D;
        Class Adam {Public:adam (): Alpha (double (0.001)), B1 (double (0.9)), B2 (double (0.999)), b1_t double (0.9), b2_t double (0.999), EPS (double (1e-8)) {} void Update (const tensor2d &AMP;DW, Te
        Nsor2d &w) {tensor2d &mt = get<0> (W);

        tensor2d &AMP;VT = get<1> (W);
        for (Auto it = Dw.begin (); it!= dw.end (); it++) std::cout << *it << "T";

        Std::cout << "DW \ n";
        for (Auto it = Mt.begin (); it!= mt.end (); it++) std::cout << *it << "T";

        Std::cout << "MT \ \"; for (Auto it = Vt.begin (); it!= vt.end (); it++) std:: cout << *it << "T";




        Std::cout << "VT \";
            for (const auto& N:E_)//address hash as key {for (Auto it = N.begin (); it!= n.end (); it++)


                {std::cout << (*it). << ":"/<< (*it) second << Std::endl; for (auto s = (*it). Second.begin (); s!= (*it). Second.end (); s++) {Std::cout
                << (*s);
            } std::cout << "complete \"; } For_i (W.size (), [ampersand] (size_t i) {Mt[i] = B1 * Mt[i] + (double (1.0)-B1) * dw[i];//m  Omentum Vt[i] = B2 * Vt[i] + (double (1.0)-B2) * dw[i] * Dw[i];//rmsprop double mt_hat = mt[i]/
            (double (1)-b1_t);
            Double vt_hat = vt[i]/(double (1.0)-b2_t);
        L2 norm W[i]-= Alpha * (mt_hat)/(Std::sqrt (vt_hat) + EPS);

        });
        b1_t *= B1; B2_T *= B2;
    }//Learning rate or step factor, which controls the update ratio of weights (such as 0.001).
    Larger values (such as 0.3) will have faster initial learning before the learning rate is updated, and//the smaller values (such as 1.0E-5) will converge to better performance.  Double Alpha;
    Learning Rate//first-order moment estimation of exponential decay rate (e.g. 0.9).     Double B1;
    The exponential decay rate of the second-order moment estimation (e.g. 0.999).     The super parameter should be set to close to 1 double B2 in a sparse gradient, such as in NLP or computer vision tasks;   Double b1_t;   Square double b2_t of B1;
    B2 Square Private://This parameter is very small number, in order to prevent in the implementation divided by 0 (such as 10E-8).  
Double EPS; Private:template <int index> tensor2d &get (const tensor2d &key) {if (e_[index][&k
        Ey].empty ()) E_[index][&key].resize (Key.size ()), double ());
    Return e_[index][&key];
} std::unordered_map<const tensor2d *, tensor2d> e_[2]; };

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.