Deeplearning.ai Summary-C + + to achieve ADMA optimization _

Deeplearning.ai Summary-C + + to achieve ADMA optimization __c++

Last Update:2018-07-26 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Deeplearning.ai Summary-C + + implementation ADMA optimization

Flyfish

Compiling environment
vc++2017

The theory is excerpted from "deep learning"

Adam a learning Rate adaptive optimization algorithm the name "Adam" derives from the phrase "adaptive moments".
In the context of early algorithms, it may be best to be seen as a variant that combines rmsprop with some important differences in momentum.
First, in Adam, momentum is directly incorporated into the estimation of the gradient first-order moment (exponential weighting). The most intuitive way to add momentum to the rmsprop is to apply momentum to the scaled gradient.
There is no clear theoretical motivation for using the momentum of scaling. Second, Adam includes a bias correction, a correction of the first-order moment (momentum term) initialized from the origin, and an estimate of the second-order moment (non-center).
The Rmsprop also uses the (non centered) second-order moment estimation, but the correction factor is missing. Therefore, unlike Adam,rmsprop, second-order moment estimation may have a high bias at the beginning of training.
Adam is often considered a fairly robust choice of parameters, although learning rates sometimes need to be modified from suggested defaults.

Paper address//http://arxiv.org/abs/1412.6980 #include <vector> #include <unordered_map> template <typename T
    , TypeName func> inline void For_i (T size, Func f) {for (size_t i = 0; i < size; ++i) {f (i);
the typedef std::vector<double> TENSOR2D;
        Class Adam {Public:adam (): Alpha (double (0.001)), B1 (double (0.9)), B2 (double (0.999)), b1_t double (0.9), b2_t double (0.999), EPS (double (1e-8)) {} void Update (const tensor2d &AMP;DW, Te
        Nsor2d &w) {tensor2d &mt = get<0> (W);

        tensor2d &AMP;VT = get<1> (W);
        for (Auto it = Dw.begin (); it!= dw.end (); it++) std::cout << *it << "T";

        Std::cout << "DW \ n";
        for (Auto it = Mt.begin (); it!= mt.end (); it++) std::cout << *it << "T";

        Std::cout << "MT \ \"; for (Auto it = Vt.begin (); it!= vt.end (); it++) std:: cout << *it << "T";




        Std::cout << "VT \";
            for (const auto& N:E_)//address hash as key {for (Auto it = N.begin (); it!= n.end (); it++)


                {std::cout << (*it). << ":"/<< (*it) second << Std::endl; for (auto s = (*it). Second.begin (); s!= (*it). Second.end (); s++) {Std::cout
                << (*s);
            } std::cout << "complete \"; } For_i (W.size (), [ampersand] (size_t i) {Mt[i] = B1 * Mt[i] + (double (1.0)-B1) * dw[i];//m  Omentum Vt[i] = B2 * Vt[i] + (double (1.0)-B2) * dw[i] * Dw[i];//rmsprop double mt_hat = mt[i]/
            (double (1)-b1_t);
            Double vt_hat = vt[i]/(double (1.0)-b2_t);
        L2 norm W[i]-= Alpha * (mt_hat)/(Std::sqrt (vt_hat) + EPS);

        });
        b1_t *= B1; B2_T *= B2;
    }//Learning rate or step factor, which controls the update ratio of weights (such as 0.001).
    Larger values (such as 0.3) will have faster initial learning before the learning rate is updated, and//the smaller values (such as 1.0E-5) will converge to better performance.  Double Alpha;
    Learning Rate//first-order moment estimation of exponential decay rate (e.g. 0.9).     Double B1;
    The exponential decay rate of the second-order moment estimation (e.g. 0.999).     The super parameter should be set to close to 1 double B2 in a sparse gradient, such as in NLP or computer vision tasks;   Double b1_t;   Square double b2_t of B1;
    B2 Square Private://This parameter is very small number, in order to prevent in the implementation divided by 0 (such as 10E-8).  
Double EPS; Private:template <int index> tensor2d &get (const tensor2d &key) {if (e_[index][&k
        Ey].empty ()) E_[index][&key].resize (Key.size ()), double ());
    Return e_[index][&key];
} std::unordered_map<const tensor2d *, tensor2d> e_[2]; };

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Deeplearning.ai Summary-C + + to achieve ADMA optimization __c++

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support