My favourite papers from day one of ICML 201507 July 2015
aargh! How can I possibly keep all the amazing things I learnt @ ICML today in my head?! Clearly I can ' t. This was a list of pointers to my favourite papers from today, and what I think they is cool. This was mainly for my benefit and you might like them too!
Neural Nets/deep learningbilbowa:fast Bilingual distributed representations without Word Alignments
Stephan Gouws, Yoshua Bengio, Greg Corrado
Why this paper is cool: It simultaneously learns word vectors for words in the languages without have to learn a mapping between them.
Compressing neural Networks with the Hashing Trick
Wenlin Chen, James Wilson, Stephen Tyree, Kilian Weinberger, Yixin Chen
Why this paper is cool: Gives a huge reduction (32x) in the amount of memory needed to store a neural network. This means your can potentially use it in low memory devices like mobile phones!
Batch normalization:accelerating Deep Network Training by reducing Internal covariate Shift
Sergey Ioffe, Christian szegedy
Why this paper is cool: Makes deep neural network training super fast, giving a new state of the art for some datasets.
Deep learning with Limited numerical Precision
Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, Pritish Narayanan
Why this paper is cool: Train neural networks with very limited fixed precision arithmetic instead of floating points. The key insight is for use randomness to do the rounding. The goal is to eventually build custom hardware to make learning much faster.
Recommendations etc. Fixed-point algorithms for learning determinantal point processes
Zelda Mariet, Suvrit Sra
Why this paper is cool If you want to recommend a set of things, rather than just an individual thing, how does you choose the best set? This would tell you.
Surrogate Functions for maximizing Precision at the Top
Why this paper is cool: If you are about the top n things recommend, this technique works faster and better than other approach Es.
Purushottam Kar, Harikrishna Narasimhan, Prateek Jain
and Finally ... Learning to Search Better than Your Teacher
Kai-wei Chang, Akshay Krishnamurthy, Alekh Agarwal, Hal daume, John Langford
Why this paper is cool: A new, general-to-do structured prediction (tasks like dependency parsing or semantic parsing) which works well even W Hen there is errors in the training set. Thanks to the authors for talking me through this one!
Want more? Sign up below to get a free ebook machine Learning in practice, and updates on new posts:
July 2015
Yesterday I posted on my favourite papers from the beginning of ICML (some of those papers were actually presented today, Although the posters were displayed yesterday). Here's today's update, which includes some papers to being presented tomorrow, because the posters were on display today ...
Neural netsunsupervised Domain adaptation by backpropagation
Yaroslav Ganin, Victor Lempitsky
Imagine You has a small amount of labelled training data and a lot of unlabelled data from a different domain. This technique would allow you to build a neural network model that fits the unlabelled domain. The key idea was super cool and really simple to implement. You build a network this optimises features such that it's difficult to distinguish which domain the data came from.
Weight uncertainty in neural Networks
Charles Blundell, Julien cornebise, Koray Kavukcuoglu, Daan Wierstra
Probabilistic backpropagation for scalable learning of Bayesian neural Networks
Jose Miguel Hernandez-lobato, Ryan Adams
These papers has a very similar goal, namely making neural networks probabilistic. This is cool because it allows do a decision, but know how do sure you were about the decision. There is a bunch of the other benefits:you don ' t need to worry about regularisation, hyperparameter tuning is easier etc.
Anyway, the papers achieve this and the different ways. The first uses Gaussian scale mixtures together and a clever trick to backpropagate expectations. The second one computes the distribution after rectifying and then approximates this with a Gaussian distribution. Either-A-exciting development for neural networks.
Training Deep convolutional neural Networks to Play Go
Christopher Clark, Amos Storkey
Although I ' ve never actually played the game, I had an interest in AI Go players, because it's such a hard game for Compu Ters, which still can ' t reach the level of human players. The current state of the art uses Monte Carlo tree Search which is a really cool technique. The authors of this paper use neural networks to play the game but don ' t quite achieve the same level of performance. I asked the author whether the approaches could be combined, and they think they can! Watch this space for a new state of the art Go player.
Natural Language processingphrase-based Image captioning
Remi Lebret, Pedro Pinheiro, Ronan Collobert
This was a new state of the art in this very interesting task of labelling images with phrases. The clever bit is in the syntactic analysis of the phrases in the training set, which often follow a similar pattern. The authors use this to their advantage:the model was trained on the individual sub-phrases so is extracted, which allo WS it to behave compositionally. This means. It can describe, for example, both the fact, a plate are on a table, and that there was pizza on the PLA Te. Unlike previous approaches, the sentences that is generated is not often found in the training set, which shows the IT is doing real generation and not retrieval. Exciting stuff!
Bimodal modelling of Source Code and Natural Language
Miltos Allamanis, Daniel tarlow, Andrew Gordon, Yi Wei
Another fun paper; This one tries to generate source code given a natural language query, quite an ambitious task! It is trained on snippets of code extracted from StackOverflow.
optimisationgradient-based Hyperparameter optimization through reversible learning
Dougal Maclaurin, David Duvenaud, Ryan Adams
Hyperparameter Optimisation is important when training neural networks because there was so many of the things floating AR Ound. How does you know the to set them to? Normally you has to perform some kind of search on the space of possible parameters, and Bayesian techniques has been ve Ry helpful at doing. This paper suggests something entirely different and completely audacious. The authors is able to compute gradients for hyperparameters using automatic differentiation aftergoing through a WHO Le round of stochastic gradient descent learning. That ' s quite a feat. What's this means is so we can answer questions about what the optimal hyperparameter settings look like in different sett Ings-and makes a whole set of things that is previously a "black art" a lot more scientific and understandable.
And more ...
There were many more interesting papers-too many to write up here. Take a look at the schedule and find your favourite! Let me know on Twitter.
My favourite papers from day one of ICML 2015