How does Does #DeepDream work?

Last Update:2015-07-18 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

How does Does #DeepDream work? Do neural networks hallucinate of electronic dogs?

If you've been browsing the net recently, you might has stumbled on some strange-looking images, with pieces of dog heads , eyes, legs and what looks like buildings, sometimes superimposed in a normal picture, sometimes not. Although they can is nightmare-inducing (or because of that), they has gained a lot of popularity on the Internet. Often tagged #deepdream, they is made by a neural network trained on a huge set of categorized images and set free to Gen Erate new ones. The network comes from Google Analytics and its code are currently available on GitHub, spawning more home-made neural image Generators.

It turns out, like many things on the Internet, it had something to does with cats.

In 1971, a British scientist named Sir Colin Blakemore raised a kitten in complete darkness, except for several hours a DA Y in a small cage, where the kitten could is only see black and white horizontal stripes. A month or so later, the kitten is introduced to the normal world. It reacted to light, but otherwise seemed to be blind. It didn ' t follow moving objects, unless they made a sound. When a recording is taken from the kitten's visual cortex, it turned out of it neurons reacted to horizontal lines, BU T vertical, so the brain is unable to comprehend the complexity of the real world. Another kitten raised in a vertical stripe environment had a similar disability.

The controversial and Sir Colin had he share of threats from angry animal Defenders, but the result is Interesting. The experiment tells us, the vision system, at least in cats, was something that develops after birth. The visual cortex of a kitten adapts to what the newborn eyes is exposed to and forms neurons, react to dominant Basi c patterns, like vertical or horizontal lines. From those basic patterns it can and do sense of a more complex image, infer depth and motion. This gives us hope and suggests a method for creating an artificial vision system.

convolutional Neural Networks

A Feed-forward artificial neural network, when used as a classifier, takes an array of values as input and tries to assign It to a category in response. A relatively small network can be trained to guess a person ' s gender, given numeric height and hair length. We collect some sample data, present the samples to the network and gradually modify the network's weights to minimize the Error. The network should then find the general rule and give correct answers to samples it has never seen before. If well trained, it'll be correct most of the time and wrong in cases where people would probably is wrong too, given on Ly the same information:height and hair length, and would be much faster too. People is good with numbers. In the turn, estimating gender from a photo was a much easier task for humans, but the orders of magnitude harder for computers. An image, represented as a set of numerical values of pixels, was a huge input space, impossible to process in reasonable t Ime. And what if we don 't want to detect genders, but dog breeds, or recognize plants, or find cancer in X-rays? This is where the convolutional neural networks come in.

Convolutional Neural Networks is a breed of neural networks introduced by Kunihiko Fukushima back in 1980, under the name "Neocognitron". Stumble across the name "LeNet", named after Yann LeCun, a researcher working with Facebook.

A kernel Convolution is An image operation this, for each pixel, takes the pixels on its square neighborhood, Calculates a weighted average of their values, and puts the result as a new value of the pixel. An equally-weighted average would produce a blurry image. Negative weights for nearest neighbors would make the image look sharper. With properly selected weights, we can enhance vertical or horizontal lines, or lines of any angle. With a bigger convolution kernel (neighborhood), we can find curved lines, color gradients or simple patterns. This technique have been known and used in image analysis and manipulation for years. In convolutional neural networks though, the weights in the convolution matrices can be trained using error Back-propagati On. This by, instead of each pixel being a input, we can have one neuron reacting-to-vertical lines, another to horizon Tal lines, or angles, just like in a cat ' s visual cortex.

A layer of a convolutional neural network, consists of a number of such image-transforming neurons, each emphasizing a dif Ferent aspect of the image. An image then becomes split to a number of feature channels:instead of the initial RGB, we get one channel per kernel. Hardly a solution to the input size problem. We Use the pooling layers for that. A pooling layer takes non-overlapping square neighborhoods of pixels, finds the highest value and returns that as the Valu E of the neighborhood. Note that, if we do so on the original image, we ' d just get a badly pixelated, somewhat brighter miniature. When a pooling layer's input comes from a convolutional layer, its response means "There are this feature in this area", WH Ich is actually useful information. Pooling layers also make the network less sensitive to where the features is in the image.

"Park or Bird" problem by XKCD

Still, knowing that there is vertical or horizontal lines, gradients or edges, doesn ' t help us detect if the photo con Tains a bird or not (see Famous park or Bird problemfinally solved by Flickr). Well, we made a step in the right direction, so why not make the next step and stack another set of convolutional neurons, and a pooling layer on top of this, creating a deep learning neural network. Turns out, with enough layers, a network "gets" quite complex features. A face was, after all, a combination of eyes, nose and mouth, with a chance of ears and hair. We can then use these complex features as input for a regular Feed-forward neural network and train it to return a Categor Y:bird, dog, building, electric guitar, school bus or pagoda.

In, a convolutional neural network is trained on Youtube videos and allowed to freely self-organize categories for W Hat it could see. It Self organized a category of cats (see, all goes back to cats again). In, convolutional networks, working together with recurrent neural networks trained on full sentences, learned to Des Cribe images in full sentences. Recurrent neural networks is another topic.

Source:
Deep Fragment embeddings for bidirectional image-sentence Mapping by Andrey Karpathy

Dreaming deep

Since, learning neural networks started winning image analysis competitions, reaching near-human accuracy in Labeling images. Initially, there was some resistance from the computer vision community. One complaint is and the networks is winning, but, well, not showing their work. The leading methods at the time if detecting a face, for example, would give exact positions of eyes and mouth and Retu RN various proportions of the purported face. A Deep Neural network would just detect faces with uncanny accuracy and don't tell us how. While the inner workings of the first convolutional layer were well understood (lines, edges, gradients), it is Impossibl The second layer used the information given by the first. The layers above that were a mystery.

Researchers has been successful in fooling these deep neural networks, by generating images with an evolutionary algorith M. Make a random noise image, check the network ' s response. If the network thinks there ' s a chance of a bus in the image, generate more images like that (offspring) feed those to the Network again. The winner is the one that increases the network's certainty, and gets to contribute to the next generation. What looks like random noise to us, 'll be interpreted as a bus by the network. If We also optimize the generated images to has statistical properties of real samples, the noise would turn into shapes a ND textures that we can identify. Such generated images tell us so, for example, stripes is a key feature of bees, that bananas is usually yellow and an Emone fish has orange-white-black stripes.

Another approach is feeds a real image to the network and then pick a layer, apply the winning transformation of the image, and feed the result back to the network. For the first layer, expectedly, the network would enhance leading line directions in the image, giving it an impressionist IC style of wavy brush strokes, dots or swirls. If we do and a higher layer–reversing the process through lower layers, we get an image painted with textures, lik e fur, wood, feathers, scales, bricks, grass, waves, spaghetti or meatballs. A yet higher layers would turn any vertical lines to legs and draw eyes and noses on shapes the vaguely look like faces (By the way, people does that too, it's Called pareidolia). Since There is a lot of animal photos in the training image set, the generated images often contain parts of images, some Times gruesomely Distorted. Animal faces is superimposed on human faces, spiky leaves become bird beaks and everything seems a bit hairy. Our Brains struggle to make sense of the network ' s dreams. The images is disturbing because they remind us of something, but at the same time–not exactly. This makes them interesting, maybe disturbing, and and viral on the Internet.

What does this say about the history of art, if a lower layer paints a Van Gogh or Seurat picture, while a higher level LA Yer reminds us of Picasso or Dali?

Left:original photo by Zachi Evenor. Right:processed by Günther Noack, software Engineer

Going deeper

Things get really interesting if we directly suggest to the network what it should see, by triggering the last, Classifica tion layer. Now, I'll leave the explanation of how that trigger was passed back through the network to those who has done it. Instead of starting with a random image, it helps to take a real image, blur it a bit, zoom in, and let the network "Enhan Ce "it by drawing what it sees. When we repeat the process, we get a potentially infinite zooming sequence of bits and pieces of the network ' s sample data .

The Large scale deep neural Network (LSD-NN) was able to hallucinate like the. It is made by Jonas Degrave and team, known as 317070 on GitHub. Interestingly, this one is built before Google published the Deep Dream Code. 317070 ' s network hallucinates in a Twitch stream, where users can shout categories from Image Net databaseand the network Produce images that remind it for the what the user suggested. The network doesn ' t exactly draw the requested objects, but gets the essence. When the users ask for volcanoes, there ' s smoke and lava. When the users shout for pizza, there ' s melting cheese. You can see sausages in a butcher shop and spiders on spiderwebs, but most of the time it ' s a mesmerizing, colorful soup o f Textures and shapes. Really. Try it.

It is called fun or nightmarish and we learn a lot from making networks dream. We have found the equivalent of the first convolutional layer in cat brains. We have a model which confirms the theory that dreaming helps us remember (networks can is trained on their own generated Sample data). We basically simulated pareidolia, perhaps we can infer some information about the mechanism behind schizophrenia.

Like chess and, recently, Jeopardy, machines has crossed yet another threshold and took over something we used to be Bett ER at. Remember when CAPTCHAs were a good a-to-stop crawlers and bots from stealing your online data? Not anymore. Now it's just a challenge of the bot ' s computational ability. The viral images under the #deepdream hashtag be just a sign that machine vision is becoming mainstream and it's time to Accept that dreaming of electronic sheep (or dogs) are just something androids do.

How does Does #DeepDream work?

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

How does Does #DeepDream work?

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support