Topic Center

Contact Sales

Home > Others

Paper notes show and Tell:lesson learned from the 2015 Mscoco Image captioning challenge__ Google

Last Update:2018-08-20 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

AK compares its similarities to this model in its Open-source neuraltalk and NEURALTALK2 projects and admits that "but the Google release should work significantly as a" Better CNN, some tricks, and more careful engineering. " So let's start today and compare the NIC (Neural Image Caption) model with what's good.
Project code: Im2txt.

Overall, the two are not very different, are end to end,cnn extraction special town, RNN generated statements. And the difference is reflected in the nuances:

1, Nic from sequence to sequence machine translation Method Middle School came up with a brim name called Encoder-decoder model, encoder is Cnn,decoder is RNN. The point is that CNN and RNN here are very different from neuraltalk. The NIC model uses a better feature extractor–googlenet (2015), Batchnorm (2016), which enriches the image information obtained, uses more complex lstm, and increases the number of layers and cell numbers in 2016, Make the decoder more complex, also achieved better results.

2, feature input mode. The extraction of features in neuraltalk as bias combined with other inputs, entered directly into the first cell of RNN, feeling a bit hasty, while the NIC left the first moment entirely to the feature input, does not make predictions, there is a warm meaning. It is also mentioned that the author's experience confirms that it is not good to enter a picture each time, so it is only entered once.

3. In other details:
Nic uses CNN's pretrained model and fix it, does not train it, only trains the lstm part. Until the model is stabilized, the overall model is trained on the Coco DataSet. The advantage of this is that because Coco's training focuses on a lot of words about color description, the new model has the predictive stability of the original model and makes the annotation statement more specific and accurate.

LSTM random initialization, model using ensemble comprehensive prediction, using dropout further prevent fitting to increase generalization ability, beam search to try more, use schedule sampling and so on, more details please refer to the original.

Overall, the whole article is not in the spit because of the small training set caused by the problem of fitting. Also true, the image caption problem is faced with a great challenge is the emergence of caption is always training focused, and the details are not in place. This article says that if the training set is larger, the overfit problem will be alleviated and the effect will be better. But I personally think that simply relying on a larger training set is not a long-term solution, from smaller data sets to find more general rules to bring more long-term development space, we may try to use other means to solve the overfit problem (BN over lstm), Or the robustness of the model can be improved by structural adjustment.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Paper notes show and Tell:lesson learned from the 2015 Mscoco Image captioning challenge__ Google

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support