Solving bongard problems with deep learning

Source: Internet
Author: User
Tags benchmark

Yun-June Guide : This article introduces deep learning and bongard problems, and how to use deep learning to better solve bongard problems.


The Bongard problem was proposed by Soviet computer scientist Mikhail Bongard. Since the 1960s, he has been working on pattern recognition, and has designed 100 such puzzles to make it a good benchmark for judging pattern recognition capabilities. And these puzzles are challenging for people and algorithms. To give a simple example:


Figure 1


As shown in the figure, the 6 pictures on the left conform to a certain rule, while the picture on the right conforms to a different rule. To solve this problem, you need to understand the patterns and find out their respective rules (i.e. solutions). Rule: "Left: Triangle, right: quadrilateral".

This example is simple and can be done in a matter of seconds, but there are more difficult problems, such as:


Figure 2


You can try to find the rules, test your own pattern recognition ability, the answer please click: 22,29,37,54.


These issues were more widely known in the Douglas Hofstadter, published in 1979, in the book "Del, Escher, Bach-set the Big one". Hofstadter's PhD, Harry Foundalis, established an automated system to solve his doctoral research project, a system called "Phaeco". This program can not only solve bongard problem, but also is a kind of architecture of cognitive visual pattern recognition.


deep Learning and Bongard issues


The Phaeco created in 2006 is very influential because it not only demonstrates solutions for 15 problems, but in many cases is more efficient than humans. In fact, it can solve more problems, but it needs extra work to enhance the feature extractor or detector.


Recent advances in artificial intelligence and ML have been significant. Convolutional Neural Network (CNN) has already won a lot of imagenet competition on the GPU, and in recent years, the CNN algorithm and architecture have been continuously improved.


So I think the use of deep learning methods can help solve bongard problems. The Bongard problem has stimulated my study of deep learning, but since it is based on simplified images, and deep learning has only a few classes of production images, the answer to the original Bongard question is uncertain.


Now I decided to try to solve the problem.


the formation and method of the problem


There are at least two problems with the application of deep learning methods to bongard problems.


1, this is a one-time learning problem. the most effective application of deep learning is based on supervised learning. For example, millions of images are trained and categorized. In this case, neural networks show similar or even better performance than humans.


But if you only learn from a few examples, in "one-off learning", machine learning approaches are far less flexible and performance than humans. Each of the BPS classes has only 6 examples, which makes it difficult to solve with algorithms.


2, this is actually a multi-modal learning problem. This means that the input is the form of the image, and the output is the form of classification rules described in natural language. Although there are already some ways to solve these problems, I have not found a definitive solution. So I decided to start with something simpler, to eventually expand to the complete problem formula.


Instead of a verbal description of its rules, you can use the Bongard problem solution as a classification issue. In this case, 12 images can be divided into two groups: 10 for the "training" picture and 2 for the "test" picture. For a training picture, it knows exactly which class to belong to, left or right. The test images are randomly swapped, and the classes they belong to are unknown. Solving this problem means looking at the "training" image before deciding which class the "test" image belongs to.


Figure III shows the classification pattern for this formula.


Figure 3


Now, by simplifying the description of the problem, I decided to use transfer learning to actually solve the problem. It is one of the methods of single point learning, which shows good effect in visual analogy. First, you train the model multiple times in an example of a similar target problem, and then reuse the relevant parameters of the model.


Hierarchical feature representation of deep neural network learning training data. If you want to train a convolutional neural network on a bongard image, it will first learn the corresponding characteristics of different geometries, each of which can be considered a filter. If there is a corresponding feature, the corresponding classifier is activated.


In order to train a feature extraction neural network (NN), I had to create a new dataset and not use the BP image because they were too few and too similar.


Synthesizing Datasets


To train the feature extraction network, I generated a set of random images that resemble the images in the Bongard problem. Each image consists of a geometric shape, randomly rotated at random positions, at random proportions. If the shape is closed, it can also be randomly filled into black. The group has a total of 24 classes, examples are as follows:


Figure 4


I generated a training set of 1M images and a test set of 10K images.


Neural Network


In order to train the classifier of the generated composite image, I used a relatively small neural network. It is based on the "darknet Reference" model and removes some maxpool layers because the input images are relatively small. It has 9 convolution layers, and the architecture description is as follows;


Figure 5


After 8 epochs of data training, it converges to an acceptable precision: top 1:0.848000,top 2:0.968000.


Neural network output processing


The first step in making a classifier for the Bongard problem is to pass all 12 images forward through the neural network. In convolutional neural networks, each layer has a set of filters with shared weights, and the response of each filter forms a feature map. Figure 6 shows a feature map of all layers. The input image is on the left and is processed from left to right by layer.


Figure 6


Each value in the activation map (each "pixel") has the potential to be an attribute. However, the values of these characteristics do not imply that the position, orientation, size, and other parameters of the input image are unchanged. A classifier trained on only 10 images based on this feature may not find an abstract classification rule, but will soon be able to adapt to this classification.


For the feature conversion to be invariant, each feature map is transformed into a single binary feature as follows: 1) standardize the cross-layer feature map, 2) Set the threshold to 0.3 (Figure 7), 3) If the value of the feature map is higher than the threshold value, set the resulting attribute value to 1, otherwise 0 (Figure 8).


Figure 7 Normalized and thresholding feature mappings (the original page can be scrolled horizontally)

Figure 8 The two-value system feature based on feature mapping (the original web page can be scrolled horizontally)


In this way each image can be described by CNN. I just used the 6-10-layer feature, where there are 1050 feature mappings on these layers, which means that each image is described with a binary vector of length 1050.


Find a problem-solving classifier


Once you have extracted the features, you can use them for the actual classification problem. Now I decided to use the simplest classifier-a decision tree. It is usually part of a complex classifier, but in this case the simplest classifier is sufficient. With this classifier, only a characteristic value can be used to determine whether the image belongs to the left or the right.


The algorithm to learn this rule is very simple, it is a simple direct search. It can be demonstrated by example:

1) Feature extraction of 10 training images. As mentioned above, each image is represented by a two-element eigenvector, which describes the image from an nn angle.


Figure 9 A binary descriptor based on the CNN feature


2) For each feature, check the values of all 10 training images. If the eigenvalues of the 5 left images are different from the eigenvalues of the 5 right graphs, then it is possible to become a classifier.


3) If there are several classifiers, then a validation criterion is required to select one of them. We can compare the eigenvalues of two test images: because the images belong to different classes, the eigenvalues should be different. Ignoring the accuracy of the test classes, only their eigenvalues are used as validation criteria.


4) Apply the validated classifier to the test image to check that it is properly categorized. If it is correct, the problem is considered to have been solved correctly.


5) If no rules are found or no rules are validated, then the problem is considered unresolved.


Table 1 shows an example of the search for the classification characteristics of question number 6th, as shown in Figure 1. All features are displayed only when the left and right images have different eigenvalues. Only feature 731 has been validated and has been tested and found to be correct.


Table 1, problem 6 characteristics analysis


Table 2 shows an example of the error classification (issue 62). Although these two features were selected as classifiers and passed validation, the test image still shows the wrong result.


Table 2: Error analysis for issue 62


Visualize the data as shown above and categorize the code on GitHub.


Results Analysis


The above algorithm is applied to 232 problems on the operation, the results are as follows: Resolved 47, the correct 41, so the resolution rate is 20%, the correct rate is 87%.


To better display the results, the resolved issue is shown in table 3 in different colors, green-correct, red-incorrect.


Table 3: Issues resolved


The different parts of the problem set have different complexities, and according to the results, the first dozens of problems are relatively simple. As shown in Figure 10, accuracy depends on the number of problems.


Figure 10 Correct rate of problem solving, sorted by issue ordinal


Table 4 shows the accuracy of the issues designed by different authors. The first 100 questions designed by M.bongard are easiest to solve with the algorithms described in this article, and the rest of the questions are more challenging.


Table 4: Average accuracy of problems designed by different authors


In his paper, H.foundalis collected data on the results of people solving problem 1-100. Figure 11 shows the correct rate for the top 20 of the score. All the problems are unique and the results vary, suggesting that even for humans, some problems are quite challenging.


Fig. 11 Accuracy of solving bongard problem by human


Conclusion


Some simple deep learning methods are now useful for solving bongard problems, at least in the form of a simplified classification of the problem. Interestingly, M.bongard has predicted similar approaches in almost "pattern recognition". In this case, the feature space is to classify the image by pre-training a neural network so that it is similar to the image in the problem domain. The presentation of primitive questions, including the interpretation of classification rules in natural languages, is fairly easy for people, which seems to be possible for "classic" algorithms (such as "Phaeco") based on hand-structured features and pattern detectors. But for neural networks, a transparent and interpretive solution is a known weakness, so the original problem can be more challenging for neural networks. There are several ways to solve this problem:


Create a multi-modal synthetic dataset that includes the interpretation of rules in the image and Bongard questions, and use it for supervised learning. However, it is difficult to produce a meaningful puzzle, and even if it is possible, I am not sure if the migration learning is still valid in that case.


The CNN visualization method can be used to explain the solution. That is, by highlighting the pixels used for classification and showing which patterns are represented by the CNN filter used for classification. It is worth exploring whether visual interpretation and verbal interpretation are considered as appropriate.


A built-in neural network architecture can also be used for this problem, such as a variational self-encoder (VAE) or a generated adversarial network (GAN). In this case, it is "to explain with an example".


"I can't understand what I can't create. "--Richard Feynman


To apply this famous saying: "I can create, I can understand." NN generates a new image example in the Bongard problem, and if the resulting image captures the concepts expressed by the classification rules, it is sufficient to show the neural network's understanding of the problem.


Overall, the Bongard problem will remain a challenging benchmark for machine learning.


This article is translated by Alibaba Cloud community organization.

The original title of the article "Solving-bongard-problems-with-deep-learning"

Author: sergii Kharagorgiev

Translator: Ultraman, Reviewer: roman.


End


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.