Image Retrieval seems to be a profound learning. Have we ever thought about building an image retrieval system on our own while enjoying Google, Baidu, tineye, and other search services? Opendir is a simple open-source document image retrieval application on Google Code. Common image retrieval is based on the precision or similarity of natural images, while opendir is an image similarity retrieval algorithm based on text, such as electronic scanning of various documents in daily life. The current version of opendir uses two features: Projection
Histogram feature and density distribution feature calculate the cosine similarity between vectors for similarity matching.
Compile:
The home page of this project is at http://code.google.com/p/opendir/. after the source code of the new project is decompressed in download, it is a project of VC 2008. The project also provides executable compressed packages.
Before compiling the project, ensure that opencv has been configured in VC. Opencv, as an open-source computer vision Library, has been widely used. It can even be found in the "Green Dad" of the famous crab software. For specific installation and configuration, visit http://www.opencv.org.cn/and its official website http://opencv.willowgarage.com/wiki/
Here, I will not go into details. For vs2010, you can directly download opencv for vs2010, saving the Build Steps.
For opendir and vc2008 and later versions, open the project or convert it directly, and build can be used. For versions such as vc2005, you can directly create an empty project, put an old and small opendir project into the build, or directly change the version parameter in the vcproject file.
Usage:
During opendir execution, you can enter two Commands:-W and-R,-W followed by the output feature data file name, and-r followed by the input feature data file name.
You must specify the image library in the inputimage.txt file. After using the-W command, the features of all background images are calculated and then saved to the feature data file in incremental mode.
With the feature data file, you only need to call the-R command to load the feature data file, instead of re-computing the features of the background image, and start searching directly.
The watermark image is used for testing.
In the attribute-debugging-command arguments of the VC project, we enter "-W feature.txt-r feature.txt" (Content in quotation marks) to let opendir calculate the feature data file and load it directly,.
During the execution process, we can see that feature.txt is generated:
imagepath=testimg\1.jpgindex=0DDFLength=120PHFLength=50FusionFeature=3, 12, 20, 20, 17, 12, 8, 8, 8, 8, 7, 3, 0, 0, 1, 41, 83, 86, 85, 80, 62, 0, 0, 0, 0, 0, 0, 48, 98, 100, 82, 85, 75, 0, 0, 0, 0, 0, 0, 48, 99, 99, 93, 100, 58, 0, 0, 0, 0, 0, 0, 48, 100, 100, 95, 84, 59, 0, 0, 0, 0, 5, 15, 16, 18, 19, 18, 14, 8, 4, 4, 0, 0, 11, 22, 13, 13, 13, 14, 13, 7, 7, 7, 0, 0, 1, 24, 21, 11, 10, 7, 8, 7, 8, 2, 0, 0, 8, 20, 13, 11, 5, 5, 0, 0, 0, 0, 0, 0, 3, 4, 4, 4, 4, 4, 9, 11, 13, 11, 0, 70, 13, 88, 7, 38, 56, 53, 57, 53, 53, 57, 53, 53, 57, 53, 57, 53, 53, 57, 51, 53, 57, 53, 57, 53, 26, 34, 61, 43, 80, 85, 24, 16, 27, 100, 18, 24, 77, 52, 18, 15, 15, 53, 38, 17, 0, 0, 0, 79, 37imagepath=testimg\2.jpgindex=1DDFLength=120PHFLength=50FusionFeature=7, 16, 15, 17, 16, 16, 10, 14, 15, 12, 13, 10, 8, 1, 29, 31, 32, 32, 32, 32, 32, 31, 0, 0, 0, 3, 100, 99, 100, 100, 100, 100, 100, 100, 3, 0, 0, 3, 100, 98, 96, 98, 90, 100, 100, 100, 3, 0, 0, 3, 100, 100, 99, 99, 97, 100, 100, 100, 3, 0, 4, 9, 51, 51, 51, 51, 52, 50, 51, 48, 1, 0, 14, 10, 10, 6, 2, 2, 3, 2, 2, 2, 0, 0, 7, 23, 20, 15, 12, 8, 0, 0, 0, 0, 0, 0, 19, 25, 16, 16, 16, 15, 13, 16, 15, 15, 17, 13, 9, 14, 13, 13, 14, 12, 12, 5, 5, 5, 5, 6, 56, 58, 15, 45, 66, 12, 8, 15, 69, 69, 69, 69, 69, 69, 69, 69, 69, 69, 69, 69, 69, 69, 69, 69, 69, 69, 69, 12, 33, 80, 12, 7, 13, 30, 20, 26, 39, 42, 36, 4, 16, 53, 99, 100, 16, 0, 0, 0, 91, 48imagepath=testimg\3.jpgindex=2DDFLength=120PHFLength=50FusionFeature=73, 85, 63, 53, 28, 21, 21, 20, 20, 20, 20, 17, 18, 99, 98, 100, 64, 24, 21, 21, 22, 24, 23, 21, 40, 78, 50, 53, 53, 46, 51, 45, 44, 23, 27, 22, 43, 71, 52, 54, 27, 23, 24, 31, 20, 22, 24, 19, 17, 73, 61, 75, 69, 72, 57, 56, 45, 28, 26, 17, 18, 65, 64, 66, 45, 61, 46, 51, 41, 19, 24, 22, 43, 36, 24, 25, 24, 12, 9, 6, 0, 0, 0, 0, 63, 59, 13, 0, 0, 0, 0, 0, 0, 0, 0, 0, 59, 46, 48, 35, 13, 2, 0, 0, 0, 0, 0, 0, 57, 59, 60, 46, 33, 37, 63, 44, 53, 48, 52, 47, 59, 60, 7, 18, 57, 72, 21, 31, 32, 36, 8, 17, 14, 100, 79, 96, 34, 1, 34, 70, 40, 17, 41, 91, 61, 96, 26, 21, 49, 47, 3, 8, 37, 48, 12, 18, 21, 14, 20, 29, 46, 30, 32, 25, 39, 55, 18, 9, 95, 43imagepath=testimg\4.jpgindex=3DDFLength=120PHFLength=50FusionFeature=35, 44, 44, 44, 41, 41, 23, 25, 32, 31, 32, 28, 81, 99, 99, 82, 86, 91, 9, 0, 1, 3, 26, 31, 81, 99, 100, 94, 100, 80, 22, 31, 47, 16, 39, 0, 81, 100, 100, 97, 85, 72, 36, 59, 51, 12, 12, 14, 12, 11, 10, 4, 0, 0, 0, 0, 0, 0, 0, 0, 12, 16, 18, 15, 5, 0, 0, 0, 0, 0, 0, 0, 8, 23, 24, 26, 25, 18, 4, 0, 0, 0, 0, 0, 15, 20, 17, 11, 12, 10, 10, 0, 0, 0, 0, 0, 14, 5, 6, 6, 5, 6, 4, 5, 4, 0, 0, 0, 15, 15, 9, 5, 5, 5, 5, 5, 5, 5, 5, 6, 57, 59, 41, 100, 73, 64, 77, 75, 67, 80, 81, 86, 85, 82, 83, 88, 89, 85, 89, 93, 23, 15, 28, 2, 9, 14, 32, 39, 32, 39, 45, 49, 53, 48, 53, 30, 12, 18, 46, 30, 4, 4, 71, 34, 11, 10, 1, 0, 93, 20imagepath=testimg\5.jpgindex=4DDFLength=120PHFLength=50FusionFeature=67, 66, 67, 55, 29, 20, 20, 20, 20, 20, 20, 17, 36, 25, 25, 21, 25, 22, 29, 26, 25, 26, 14, 0, 29, 90, 84, 71, 84, 78, 55, 53, 22, 19, 22, 19, 40, 40, 41, 41, 45, 37, 35, 41, 26, 0, 0, 0, 52, 86, 26, 20, 22, 23, 20, 20, 23, 24, 22, 19, 23, 86, 93, 57, 40, 30, 0, 0, 0, 0, 0, 0, 18, 67, 49, 38, 32, 30, 21, 23, 23, 23, 29, 22, 26, 85, 100, 56, 0, 0, 0, 0, 0, 0, 0, 0, 15, 55, 54, 52, 47, 33, 31, 28, 0, 0, 0, 0, 35, 46, 52, 42, 22, 20, 54, 46, 56, 50, 54, 49, 62, 63, 1, 5, 28, 5, 4, 6, 80, 97, 87, 58, 53, 18, 63, 30, 6, 6, 66, 34, 11, 13, 87, 53, 17, 41, 42, 23, 48, 49, 74, 70, 22, 21, 29, 31, 31, 32, 22, 50, 45, 70, 23, 21, 34, 21, 31, 1, 100, 45imagepath=testimg\6.jpgindex=5DDFLength=120PHFLength=50FusionFeature=28, 39, 40, 38, 40, 39, 32, 17, 34, 32, 32, 26, 37, 30, 30, 25, 15, 13, 14, 11, 18, 16, 16, 12, 0, 50, 25, 13, 0, 0, 0, 0, 0, 0, 0, 0, 27, 45, 18, 12, 12, 12, 12, 12, 12, 12, 12, 10, 42, 57, 42, 43, 41, 46, 48, 28, 29, 23, 12, 10, 47, 39, 27, 19, 15, 16, 17, 13, 15, 13, 14, 13, 8, 31, 30, 41, 39, 49, 43, 30, 27, 23, 7, 0, 12, 44, 43, 28, 31, 47, 27, 29, 52, 27, 28, 22, 0, 0, 0, 3, 73, 57, 5, 100, 46, 0, 0, 0, 24, 33, 35, 55, 81, 14, 41, 71, 13, 12, 12, 17, 59, 60, 20, 59, 12, 7, 11, 37, 97, 33, 11, 10, 18, 21, 11, 13, 1, 15, 26, 99, 92, 80, 50, 9, 11, 88, 61, 8, 10, 16, 59, 57, 45, 89, 100, 97, 20, 6, 11, 11, 14, 25, 27, 28, 29, 20, 14, 0, 95, 37
The fusionfeature is the feature vector. Each image consists of a 120-dimensional density distribution feature and a 50-Dimensional Projection histogram feature.
Then, input text.jpg to try:
This text.jpg is introduced on the main page of the project. It is the result of binarization after a document is taken on the mobile phone. There is a big noise, and the middle looks like an illustration.
Input test.jpg to view the similarity result:
Testimg's 1.jpg has the highest degree of attention. This is the source image of 1.jpg. From the position distribution at the beginning of the paragraph, it should be:
Simple performance testing:
Xiao Jin is going to use 1000 background images to test the search efficiency.
It is quite tiring to collect 1000 documents and images on the Internet. Here we use a lazy method to find a PDF of around 1000 pages and generate a JPG of 1000 documents and images using tools such as ipv2image,
At last, each piece is around 850*1100, and the resolution is okay.
After entering these images into the inputimage.txt configuration file, there is also a simple way to run cmd, enter the directory where the images are stored, and use DIR/images. Of course, this also has a drawback, that is, these images and opendir executable files must be in the same directory.
The background image library is easy to construct. Like before, we first use the-W command to generate the feature data file. The features of the 1000 images are calculated for more than one minute, but it is enough to work hard once.
After that, use the-R command to start retrieval:
The same image was found in 1000 images, with a similarity of 100% and a cost of 4 ms.
(Test environment turion X2 RM-74 2.2 GB, 2 GB memory)
After multiple tests, for exact search, such as retrieving the same image in the background image library, it can be found at 100% each time.
Although the search speed is satisfactory, the search results are quite satisfactory, and sometimes the results are not comparable. Maybe better features can improve accuracy in the future.
----------------------------------
Author: Chen Jin)
This article is an original article. If you need to repost and quote it, please specify the original author and link. Thank you.