CD cover page Retrieval sample
Imagine a scenario where you meet a dog one day and you like it but can't figure out what it is and how to get the answer quickly. Often such requirements can be aided by search engines, but in this scenario, finding the core keywords that correctly describe the dog seems to be a more challenging challenge.
However, the search engine has not failed, just need to change the input mode-variable text for the image. To launch a search in a multi-media way, which is exactly what the search engine has always evolved.
A few days ago, Baidu engineer Li showed me a new product-Baidu mobile visual search-that has not yet been released on iOS and Android platforms. The technology will be implemented on the Baidu mobile platform client: Images captured via a mobile phone camera help users find the book they want, the CD, or even the dog.
My first reaction was to move the visual search and the previous PC end to the image search, is there a difference?
The answer I get is: there is no fundamental difference between the two products in selecting feature expressions, and the difference is in databases and algorithms. A few years ago, the introduction of Baidu Map (shitu.baidu.com), simply looking for the same picture, practical value is not very high (recently strengthened face recognition function).
and will be applied to the mobile phone end of the visual search, obviously do not want to become a kind of chicken.
Book cover page Retrieval sample
The difference is, first of all, narrowing the scope of application. The product will be pushed through a small-class penetration, first by supporting book Search, then by supporting music CDs ... This is the database change mentioned above. The new way to deal with the data is much smaller, which also gives the algorithm more flexibility.
The larger the database, the higher the demand for the algorithm. The search algorithm is mainly measured in two dimensions: first, the effect, the second is efficiency. The so-called effect is the accuracy and recall rate, these two related to the quality of search, and efficiency can be understood as the unit time processing speed, which is directly related to the time consumed by a query.
In fact, a smaller database makes it possible to sacrifice the efficiency of the algorithm in exchange for improved results.
How does the effect improve? Li explained that the first is to extract more feature expression, the benefits of more features are obvious; the second is to improve the index structure, variable level index for multilevel index, which can be the image of the first search, and then the results of the search again ... Finally, the results are returned to the user.
The difference is probably the case.
Movie poster Retrieval Sample
In the internal beta I see, Baidu Mobile visual search has covered several categories: books, music CDs, entertainment posters, text recognition, translation, and the first mentioned: dogs.
For example, the iOS platform, the official release after the user opened the "Baidu Search" App, click on the right side of the search box camera icon, you can pop-up corresponding visual search function menu. For example, if you want to find a book, just follow the preceding procedure, and take a picture of a book to get information about the book: including the price of multiple online stores.
The same goes for CDs and posters. You don't have to shoot the whole, only part of it can also find the desired results, or even oblique, upside down will not affect the search results. This is related to the extraction of feature recognition, the pattern of machine and human identification is obviously different. The Product manager decides which new class to expand to next.
The easiest way to expand is to "Rigidbody". Also said, the search object's shape and volume will not be easily changed. In this category, we can start the "unsupervised" mode of machine learning, that is, no manual intervention, Guidance machine learning process, this model appears to be quick and easy.
Clothes are a rigid body counter example--it's easy to deform. Through the mode of "supervised", the machine can learn to judge the type of clothes such as long sleeve short sleeve, but the deformation is too large to affect the actual effect.
In fact, the first mention of the dog is also a counter example.
I am curious about how the machine learns to recognize dogs in "supervised" mode. And the process seems to be simpler than it might seem: instead of having to give a dog a feature definition, just provide enough relevant pictures to draw a rough identification boundary, and the machine will find a way to judge a dog.
(Responsible editor: The good of the Legacy)