Image information retrieval on internet

Source: Internet
Author: User
Tags object contains include

Two stages of image retrieval technology

With the rapid development of the Internet, the multimedia information on the Internet has also increased dramatically, so people's demand for the retrieval of multimedia information will follow. The traditional information retrieval mainly concentrates on the retrieval of words, and there are not many researches on multimedia. The multimedia on the internet takes the image as the main, therefore the retrieval of the image becomes the hot spot of the research at present.

Image retrieval on the internet has undergone two stages: the first stage is based on keyword search. The second stage is the retrieval based on the content of the image itself.

In a keyword based image retrieval system, all images need to be labeled first, then the image can be searched using Full-text search technology. There are two problems with this approach: one is that this method requires more human participation, and as the number of images increases, this method is difficult to achieve; the second problem is that the image contains a large amount of information, different people for the same image of the understanding is not the same, which leads to the annotation of the image does not have a unified standard, Therefore, the results of the search can not meet the needs of users well.

Content-based retrieval is different from keyword based retrieval, and it does not require too much human participation, and it has a strong objectivity to use the characteristics of the image (such as color, texture, shape, etc.) to retrieve. However, because these characteristics do not represent the real semantic information of the image, content-based retrieval results are often unsatisfactory. Therefore, most of the current systems are based on keyword search, such as AltaVista, Yahoo!, ditto, etc.

In this paper, we propose a new method of image retrieval on Internet, which combines keyword based retrieval with content-based retrieval, and introduces user feedback to optimize retrieval results. In the following section we will describe how to collect images on the internet, build indexes, and retrieve them. And it will introduce how to combine the user's interaction and use related feedback to improve the retrieval results. Finally, we will give a summary of the image retrieval.

Image retrieval on the internet

To build an image retrieval system on the Internet, three problems need to be solved. The first is how to get the image from the Internet, then how to establish the corresponding index of the acquired image, and finally how to search the image database according to the user's demand.

1. Acquisition of images

There are a variety of images on the Internet, and we need to collect some representative images that are most interesting to users. Firstly, according to the classification of some popular search engines, the hierarchical structure of image classification is established; then select some popular and representative sites for each category as a candidate. For example, in sports, political, entertainment, news http:// and other sites have been selected to be downloaded images of the site.

Then, an efficient software tool (Crawler) is designed to automatically collect images for selected representative sites. All the pages in the site will be sent to the page parser for analysis, all the images on the page will be linked to the way the corresponding database. At the same time, some heuristic information, such as image size, file type, file name, Image Color histogram, will be used for simple classification of images, advertising bar, background, icons, buttons and other images without semantic information and users really need to distinguish between the image, for users to query.

2. Extraction and indexing of image features

In order to improve the efficiency of retrieval, it is necessary to extract the features of the collected images and establish corresponding indexes. The characteristics of the image are divided into two kinds, one is the low-level features of the image, such as the color of the image, texture and shape. Another kind of feature is the semantic feature of the image.

For the low-level features of the image, mainly uses the image color, the texture and the shape and so on characteristic. The color feature is widely used in image retrieval because it has nothing to do with the size and direction of the image, and is insensitive to the background color of the image. Color features include color histogram, color correlation diagram, color moment, and so on. The texture feature represents the visual mode of the object, which contains the structure of the object's surface and its relationship with the surrounding environment. The common methods include the correlation matrix method, roughness, contrast and other texture representations, and wavelet transform. The shape features include two kinds, one is based on the boundary shape characteristics, the other is based on the region shape characteristics. The most successful representation methods are Fourier and invariant moments. These low-level features will be extracted from various methods and form a set of eigenvectors that establish the relevant indexes and store them in the database.

Because the low-level features do not directly represent the semantic information of the image, we will also extract the semantic features of the image. We use the image-related text information in the Web page to characterize the semantic features of the image, and the relevant information used is:

The file name of the image and its URL most authors directly represent the content of the image by file name, such as Redflower.jpg, Cat.jpg, clinton.jpg, and so on, the image content is directly reflected in the filename. At the same time, the image URL information also provides some relevant semantic information, such as Anim_birds.jpg provides the category information of the image and its semantic information.

Alternative text for an image (alternate text) replaces text in a Web page and is often used to represent the semantic information of an image, and is also one of the most accurate features. However, not all authors are willing to provide this information.

Text around the image (surrounding text) the text around the image is most likely to express all of the contents of the image, although some of the text may not be related to the image, but these words to some extent still express the image of the semantic information, it was chosen to become one of the semantic features.

The title of the page where the image is located some images are used to enhance the author's intent, so some images are directly related to the content of the page. The title of the page becomes one of the semantic features.

The hyperlink information of the image's Hyperlink (Hyperlink) image is related to the content of the image to some extent. Therefore, some semantic features can be computed by the analysis of hyperlinks.

The link between the pages of the image (Link-structure) by analyzing the link between the page and the page (link analysis), the semantic similarity of the images contained within the Web page can be calculated to some extent. This information can be used to enhance the effect of image retrieval.

All these features will be automatically extracted from the Web page through the page analyzer, and given different importance, and according to the traditional text information retrieval technology, the image of the semantic feature vector. Each component of a vector corresponds to a keyword, and its value depends on the distribution of that keyword in the image-related Web page. If a keyword appears in a Web page many times, the corresponding component will be larger: on the other hand, if the keyword appears in many pages, the corresponding component will be smaller. This method is widely used in text retrieval and also suitable for image retrieval.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.