Some understanding of the significance test:
It is generally believed that a good significance testing model should meet at least the following three criteria:
1) Good detection: the possibility of losing the actual significant area and the erroneous marking of the background as a significant area should be low;
2) High resolution: the significant figure should have high resolution or full resolution to accurately locate the protruding object and retain the original image information;
3) Computational efficiency: As the front end of other complex processes, these models should quickly detect significant areas.
In the first of many disciplines, such as psychology and neuroscience, the detection of significant objects began. In the field of computer vision, we have made efforts in the modeling of human attention mechanism, especially the bottom-up attention mechanism. This process is also known as visual significance detection. In the detection of significant areas is usually divided into top-down and bottom-up two methods, the main use of the bottom-up approach in this paper.
1) top-down (Top-down): Starting from a more general rule, gradually adding new text to narrow the rule coverage until a predetermined condition is met, also known as the Generate-test (generate-then-test) method, Is the process of gradual specialization of rules, from general to special process.
2) Bottom-up (bottom-up): That is, starting from the more special rules, gradually delete the text to expand the coverage until the conditions are met, also known as data-driven (Data-driven) method, is the rule of gradual generalization (generalization) process, is from the special to the general process.
Top-down is the coverage from the large to the small search rules, the bottom-up is the opposite, the former is more likely to produce better generalization performance rules, and the latter is more suitable for training samples less, and the former is more robust to noise than the latter, therefore, in the study of propositional rule, the former is usually used, The latter, in the first-order rule learning, has more complex tasks such as the hypothetical space.
Traditional methods of significance detection:
1. Selective attention algorithm for simulating visual attention mechanism of organism.
Methods: Feature Extraction------feature synthesis/significance calculation--significant regional division/point of Interest calibration.
The significant value is the contrast value of the pixel in terms of color, brightness, orientation, and the surrounding background.
L. Itti, C. Koch, & E. Niebur. A model of Saliency based visual attention for rapid scene analysis. IEEE Transactions on Pattern analysis and Machine Intelligence, 20 (11): 1254-1259, 1998.
2. The model is based on the Itti model of Markov random field to build a two-dimensional image of the Markov chain, by finding its equilibrium distribution and get a significant figure.
Algorithm steps:
Feature extraction: Similar to the Itti algorithm
Significant graph generation: The Markov chain method combines low-level visual mechanism with mathematical computation
J. Harel, C. Koch, &p. Perona. graph-based Visual saliency. Advances in neural information processing Systems, 19:545-552, 2006.
3. based on the spatial frequency domain analysis algorithm, the significant graph is obtained by Fourier inverse transformation of the residual spectrum R (f).
Xiaodi Hou, Jonathan Harel and Christof koch:image signature:highlighting Sparse Salient Regions (Pami 2012)
4. based on the spectral residual, this method obtains the spatial-temporal significance mapping by calculating the phase spectra of the four-element Fourier transform of the image.
In fact, the phase spectrum of an image is a significant target in the image. Each pixel in the image is represented by a four-tuple: color, brightness, and motion vectors.
The PQFT model is independent of prior information, does not require parameters, is computationally efficient, and is suitable for real-time significance detection.
Chenlei Guo, Qi Ma, liming zhang:spatio-temporal saliency detection using phase spectrum of quaternion Fourier transform. CVPR 2008
5. in this article, they suggest that both local and global information should be taken into account, such as attention should be focused on specific areas rather than scattered, the points closer to the visual focus are easily observed, etc., the effect tension pile.
S. Goferman, L. Zelnik-manor, and A. Tal. Context-aware saliency Detection. CVPR 2010.
Saliency Detection via graph-based manifold ranking for significant detection through graph-based streaming sequencing
The purpose of this paper is to obtain a better result of significant object segmentation by utilizing the prior position distribution and connectivity of the background and foreground in the image.
The method used is to sort the image elements (pixels or regions) with foreground clues or background apriori information by the graph-based manifold ordering. The significance of an image element is defined based on the correlation of the given seed (queries). This priori is spread to and increased by the popular sequencing method to obtain a more reliable estimate of the foreground. These nodes are sorted based on the similarity of background and foreground seed (via the correlation matrix (affinity matrices)). The information of the foreground is then enhanced with similar popular sequencing methods, and the significance tests are conducted in a two-stage scheme to effectively extract the background area and foreground significant objects.
Algorithm flow: Slic image over-segmentation, construction of the corresponding diagram, background prior to the popular sorting algorithm, adaptive segmentation, foreground of the manifold sequencing algorithm
We observed that the background usually renders local or global appearance connectivity with each of the four image boundaries, and the foreground renders a consistent appearance. In this work, we use these clues to calculate the pixel significance based on the super-pixel ranking. For each image, we construct a closed-loop graph in which each node is a super-pixel. We modeled the significance detection as a manifold scheduling problem, and proposed a two-stage scheme for graphical tagging. In the first stage, we exploit the boundary priori by using the node of each side of the image as the background seed point of the marker. From each marker result, we calculate the significance of the node as a background label based on the node's relevance to these seed points (that is, ranking). Then integrate the four tagged graphs to generate a significant figure. In the second phase, we will significantly map the results of the first phase to binary segmentation and mark the foreground node as a significant seed point. The significance of each node is calculated based on the correlation of each node with the foreground seed point of the final mapping.
Detailed process:
1. Graph Construction Graph Construction
We construct a single-layer graph G = (v,e), shown in 2, where V is a set of nodes and E is a set of non-facing edges. In this work, each node is a hyper-pixel generated by the slic algorithm .
Use the k-regular diagram to take advantage of spatial relationships. First, each node is not only connected to a node adjacent to it, but also to a node that shares a common boundary with its neighboring nodes (see Figure 2). By extending the range of nodes with the same K-degree, we effectively utilize local smoothing cues. Second, we force the nodes that connect the four edges of the image, that is, any pair of boundary nodes are considered adjacent. Therefore, we represent the graph as a closed-loop diagram. these closed-loop constraints are effective when significant objects appear near the image boundary or when some background areas are different.
Due to the constraints of the edge, it is obvious that the constructed graph is sparse connected. That is, most elements of the associative matrix w are zeros. In this work, the weights between the two nodes are:
Where CI and CJ represent the average of the hyper-pixels corresponding to the two nodes in the Cielab color space , and σ is a constant that controls the strength of the weights. Weights are calculated based on the distance in the color space, as it has been proven to be effective in the detection of significance. when the space distance decreases, the correlation between nodes increases, which is the important information of significance detection. by weighting we will be able to find the adjacency matrix W.
2. Manifold Ranking Popular Sort
Given a dataset x={x1,..., xl,xl+1,..., xn}∈r (m*n), some data points are labeled as seed points, and the rest of the nodes need to be sorted according to their relevance to the seed point. Let F:x→rn be a sort function, he assigns a sort value to each data point XI fi,f can be regarded as a vector f=[f1,f2,..., fn]t. Let Y=[y1,y2,..., yn]t as an indicator vector, if XI is a seed point, then yi= 1, otherwise equals 0. Then we define a figure g= (V,e) on the dataset, where V represents the DataSet X, and the Edge E is weighted by the adjacency matrix W=[wij]n*n. Calculate the optimal ranking of seed points by solving the following optimization problems:
(1)
where the parameter μ controls the balance of the smoothness constraint (the first item) and the fitting constraint (the second item). In other words, a good ranking function should not vary too much between nearby points (smoothing constraints) and should not be too different from the initial seed point assignment (Fit constraint). The minimum solution is calculated by setting the derivative of the above function to zero. Through the transformation, the final ranking function can be written as:
Core Formula
3. Saliency Measure Significance measurement
Given the input image represented as a graph and some significant seed nodes, the significance of each node is defined as its rank score computed by Equation 3, and Equation 3 is rewritten as f * = ay for easy analysis. Matrix A can be considered as the best association matrix for Learning, which equals (d-αw)-1. The first node's rank score F * (i) is the inner product of a and y of line I. Because y is a binary indicator vector, f * (i) can also be considered as the sum of the correlations between the node I nodes and all seed points.
In the traditional sorting problem, the seed point is manually tagged with the reference image. However, some of these may be incorrect, as the proposed algorithm chooses the seed points for significant detection. Therefore, we need to calculate the confidence level (that is, the significance value) for each seed point, which is defined as the rank score of the other seed points (except for themselves). To do this, we set the diagonal element of a to 0 when we calculate the rank score by Equation 3. Finally, we use the normalized rank score f * to measure the significance of the node. When a significant seed point is given, 1-f * is used to give the background seed point.
4. Two-stage Saliency Detection Two stages of the significance of detection
This section details the two-phase scenario for bottom-up significance detection using background and foreground seed point rankings.
4.1 Ranking with Background Queries background seed point sorting
Use the nodes on the image boundary as the background seed, which is the marker data (seed point Swatch) to sort the correlations of all other regions. Specifically, the use of boundary priori constructs four significant graphs, and then integrates them into the final mapping, which is called the Separation/composition (SC) method .
Taking the top image boundary as an example, we use nodes of this side as seed points, and other nodes as unlabeled data. Therefore, the indicator vector y is given, and all nodes are sorted based on the f* in Equation 3. It is an n-dimensional vector (n is the total number of nodes in the graph) each element in this vector represents the correlation between the node and the seed point of the background, and its complement is a significant metric . We normalize this vector to a range between 0 and 1, using the significant mapping of the top boundary priori, St can be written as:
Where I represents a hyper-pixel node in the index graph, f* (i) represents a normalized vector .
Similarly, we use the bottom, left, and right image boundaries as seed points to calculate a significant mapping of the other three boundary Sb,s1 and Sr. We note that the significant figure is calculated using a different indicator vector y, while the weight matrix and the degree matrix D are fixed. In other words, we need to calculate the inverse of the matrix for each image (D-ΑW). Because the number of super-pixels is very small, the inverse matrix of the matrix in equation 3 can be effectively calculated. Therefore, the total calculation load for the four mappings is very low. Consolidate four significant graphs using the following process:
There are two reasons for generating a significant figure using the SC method. First, the different sides of the hyper-pixels are usually not similar, should have a larger distance. If we use all the boundary pixels as seed points at the same time (that is, indicating that these super pixels are similar), the marker results are usually less desirable because they are not compressible (see Figure 4). Second, it reduces the effect of inaccurate seed points, which refer to real salient nodes that are inadvertently selected as background seed points. As shown in the second column of 5, the significance graph generated using all boundary nodes is poor. Because the mark results are inaccurate, the pixels with significant objects have a low significant value. By consolidating the four salient graphs, you can identify some significant parts of the object (although the entire object is not uniformly highlighted), which provides sufficient hints for the second phase of the detection process.
While most areas of significant objects are highlighted in the first phase, some background nodes may not be sufficiently suppressed (see Figures 4 and 5). To alleviate this problem and improve the results, especially when the object appears near the edge of the image, by using the foreground seed point to rank to further improve the significance of the graph.
4.2 Ranking with Foreground Queries background seed point sorting
The first stage of the significance mapping is the use of adaptive thresholds of the binary segmentation (that is, significant foreground and background), which facilitates the selection of the foreground of the salient object node as a seed point. We expect the selected seed points to cover as much of the significant object area as possible (i.e. with a high recall rate). Therefore, the threshold value is set to the average significance on the entire significant graph . Once the significant seed point is given, the indicator vector y is formed using equation 3 to calculate the sort vector f*. As implemented in the first phase, the rank vector f* is normalized between the ranges 0 and 1 to form the final significant figure .
Where I is the index of the hyper-pixel node on the graph, f* represents the normalized vector.
We note that at this stage it is possible to mistakenly select nodes as the foreground seed point. Although there are some imprecise markers, 6 shows that the proposed algorithm can detect significant objects well. This can be explained as follows. The area of the protruding object is usually relatively compact (in terms of spatial distribution) and has a uniform appearance (in terms of feature distribution), whereas the background area is the opposite. In other words, an in-object dependency (that is, two nodes of a significant object) is statistically much larger than the object background and background dependencies, which can be inferred from the association matrix A. To show this behavior, we calculated the average correlation value within each of the 300 images sampled from a dataset with a real tag [2] in the background and the object and background, as shown in 7. therefore, the sum of the correlation values of the object nodes and the reference worthy seed points is much larger than the sum of the correlation values of the background nodes of all seed points. In other words, background significance can be suppressed effectively (column Fourth of Figure 6). Similarly, although the significance graph after the first phase of Figure 5 is imprecise, significant objects are well detected after the significant mapping of the foreground seed points in the second phase. The algorithm summarizes the main steps of the proposed algorithm for the detection of significant objects.
algorithm : Bottom-up significance based on manifold sequencing
Input: An image and the required parameters
- The input image is segmented into super-pixel, and the graph G with super-pixel as node is constructed, and the degree matrix D and the weight matrix w are computed by Equation 4.
- Calculates (d-αw)-1 and sets its diagonal element to 0.
- The indicator vector y is formed, where the nodes of each side of the image are the seed points, and the mappings of their corresponding boundaries are computed by equations 3 and 5.
- The two-score sbq form a prominent foreground seed point and indicator vector y. The significance mapping SFQ is computed by Equation 3 and 7.
Output: A significant mapping sfq that represents the significant value of each of the pixels.
Matlab code:
Clear All;close All;clc;addpath ('./function/');------------------------setting parameters---------------------%%theta = 0.1; % Control edge weight alpha = 0.99; % control popular Sort cost function two items of balance spnumber = 200; % of pixels imgroot = './test/'; The path of the% test image saldir = './saliencymap/'; % significant image output path Supdir = './superpixels/'; % Hyper-pixel label file path mkdir (supdir); mkdir (saldir); imnames = Dir ([imgroot ' * ' jpg ']);d ISP (imnames); imname = [Imgroot Imnames.name]; [Input_im,w] = Removeframe (imname); The% preprocessing removes the border [m,n,k] = size (Input_im), and the percent----------------------generates the hyper-pixel--------------------percent imname = [Imname (1:end-4). BMP ']; %slic software only supports images in BMP format comm = [' slicsuperpixelsegmentation ' Imname ' int2str (int2str) ' Spnumber ' supdir]; %<filename> <spatial_proximity_weight> <number_of_superpixels> <path_to_save_results> System (comm); Spname = [Supdir imnames.name (1:end-4) '. dat ']; % Hyper-pixel label Matrix FID = fopen (SpnaMe, ' R '); A = Fread (FID, M * N, ' uint32 '); %fread (FID, N, ' str ') N represents the number of read-in elements, ' str ' is the format type A = a+1; % turns A positive integer or logical value B = Reshape (A,[n, M]); Superpixels = B '; Fclose (FID); Spnum = Max (Superpixels (:)); % actual number of mega-pixels----------------------design graphic model--------------------------percent% compute eigenvalues (mean color in lab color space)% for each hyper-pixel Input_vals = Reshape (Input_im, m*n, K); Rgb_vals = zeros (spnum,1,3); Inds = cell (spnum,1); For i = 1:spnum Inds{i} = Find (Superpixels==i); Rgb_vals (i,1,:) = Mean (Input_vals (Inds{i},:), 1); End lab_vals = ColorSpace (' lab<-', rgb_vals); %rgbzhuan converted into lab space seg_vals = reshape (lab_vals,spnum,3); % of the characteristics of each of the mega-pixel points to obtain the boundary% to seek the adjacency matrix Adjloop = zeros (spnum,spnum); [M1 N1] = size (superpixels); For i = 1:m1-1 for j = 1:n1-1 if (superpixels (i,j) ~=superpixels (i,j+1)) Adjloop (Superpixels (i,j), super Pixels (i,j+1)) = 1; Adjloop (Superpixels (i,j+1), Superpixels (i,j)) = 1; End if (superPixels (i,j) ~=superpixels (i+1,j)) Adjloop (Superpixels (i,j), Superpixels (i+1,j)) = 1; Adjloop (Superpixels (i+1,j), Superpixels (i,j)) = 1; End if (Superpixels (i,j) ~=superpixels (i+1,j+1)) Adjloop (Superpixels (i,j), Superpixels (i+1,j+1)) = 1; Adjloop (Superpixels (i+1,j+1), Superpixels (i,j)) = 1; End if (Superpixels (i+1,j) ~=superpixels (i,j+1)) Adjloop (Superpixels (i+1,j), Superpixels (i,j+1)) = 1; Adjloop (Superpixels (i,j+1), Superpixels (i+1,j)) = 1; End End;end; BD = Unique ([Superpixels (1,:), Superpixels (M,:), Superpixels (:, 1) ', Superpixels (:, N) ']); for i = 1:length (BD) for j = I+1:l Ength (BD) Adjloop (BD (i), BD (j)) = 1; Adjloop (BD (j), BD (i)) = 1; EndEnd edges = []; For i = 1:spnum indext = []; IND = Find (Adjloop (i,:) ==1); for j = 1:length (Ind) INDJ = Find (Adjloop (Ind (j),:) ==1); Indext = [INDEXT,INDJ]; End indext = [IndeXt,ind]; Indext = Indext ((indext>i)); Indext = unique (indext); if (~isempty (indext)) ed = ones (Length (Indext), 2); Ed (:, 2) = I*ed (:, 2); Ed (:, 1) = Indext; edges = [Edges;ed]; End end% computes the correlation matrix valdistances = sqrt (sum (seg_vals (Edges (:, 1),:)-seg_vals (Edges (:, 2),:)). ^2,2)); Valdistances = normalize (valdistances); %normalize to [0,1] weights = exp (-valdistances/theta); W=sparse ([Edges (:, 1); Edges (:, 2)],[edges (:, 2); Edges (:, 1)], ... [Weights;weights],spnum,spnum]; % optimal correlation matrix (equation 3) dd = SUM (W); D = sparse (1:SPNUM,1:SPNUM,DD); Clear DD; %s = sparse (I,j,s,m,n,nzmax) is generated by the vector i,j,s a m*n containing Nzmax non-0 elements of the sparse matrix S; that is, any 0 elements in matrix A are removed, non-0 elements and their subscripts constitute the matrix s Optaff = Eye (spnum)/( D-ALPHA*W); MZ = Diag (ones (spnum,1)); MZ = ~MZ; % set the diagonal element of a to 0 optaff = OPTAFF.*MZ; Percent-----------------------------significance of the first stage of detection--------------------------percent% for each hyper-pixel calculated significant value% as the seed point of the upper boundary of Yt = zeros (spnum,1); BST = Unique (superpixels (1,1:N)); Yt (BST) = 1; Bsalt = Optaff*yt; Bsalt = (bsalt-min (Bsalt (:)))/(Max (Bsalt (:))-min (Bsalt (:))); % normalized data Bsalt = 1-bsalt; % complement is a significant measure of% down Yd = zeros (spnum,1); BSD = unique (Superpixels (m,1:n)); Yd (BSD) = 1; Bsald = Optaff*yd; %f* (i) each element in this vector represents the correlation between the node and the seed point of the background Bsald = (bsald-min (Bsald (:)))/(Max (Bsald (:))-min (Bsald (:))); Bsald = 1-bsald; % Right Yr = zeros (spnum,1); BSR = Unique (Superpixels (1:m,1)); Yr (BSR) = 1; BSALR = Optaff*yr; BSALR = (bsalr-min (bsalr (:)))/(Max (BSALR (:))-min (bsalr (:))); BSALR = 1-BSALR; % Left Yl = zeros (spnum,1); BSL = Unique (Superpixels (1:m,n)); Yl (BSL) = 1; Bsall = Optaff*yl; Bsall = (bsall-min (Bsall (:)))/(Max (Bsall (:))-min (Bsall (:))); Bsall = 1-bsall; % Combine Bsalc = (BSALT.*BSALD.*BSALL.*BSALR); Bsalc = (bsalc-min (Bsalc (:)))/(Max (Bsalc (:))-min (Bsalc (:))); % assigns a significant value to each pixel Tmapstage1 = zeros (m,n); For i = 1:spnum tmapstage1 (inds{i}) = Bsalc (i); End tmapstage1 = (Tmapstage1-min (tmapstage1 (:)))/(Max (Tmapstage1 (:))-min (Tmapstage1 (:))); Mapstage1 = Zeros (w (1), W (2)); Mapstage1 (W (3): W (4), W (5): W (6)) = Tmapstage1; Mapstage1 = Uint8 (mapstage1*255); Outname = [Saldir imnames.name (1:end-4) ' _stage1 '. png ']; Imwrite (Mapstage1,outname); Percent----------------------significance detection second stage-------------------------% adaptive threshold value of two value th = mean (Bsalc); The% threshold is set to the average significance of bsalc (bsalc<th) = 0 on the entire significant graph; Bsalc (bsalc>=th) = 1; % for each of the mega-pixels calculated significance value fsal = Optaff*bsalc; % assigns a significant value to each pixel Tmapstage2 = zeros (m,n); For i = 1:spnum tmapstage2 (inds{i}) = Fsal (i); End tmapstage2 = (Tmapstage2-min (tmapstage2 (:)))/(Max (Tmapstage2 (:))-min (Tmapstage2 (:))); Mapstage2 = Zeros (w (1), W (2)); Mapstage2 (W (3): W (4), W (5): W (6)) = Tmapstage2; Mapstage2 = Uint8 (mapstage2*255); Outname = [Saldir imnames.name (1:end-4) ' _stage2 '. png ']; Imwrite (Mapstage2,outname);
Execution results (own randomly selected photos): 1. Original picture 2. Image of Super Pixel 3. Significant picture first stage 4. Significant picture the second stage of
5. Experimental results
Evaluate the proposed method on three datasets. The first is the MSRA dataset [23], which contains 5,000 images, of which the basic facts of the salient area are marked by the bounding box. The second is the MSRA-1000 dataset, which is a subset of the MSRA dataset that contains the 1,000 images provided by [2], which contain an accurate manual tag reference graph for significant objects. The last one is the proposed Dut-omron dataset, which contains 5,172 carefully tagged images of 5 users. For a source image of this dataset, Ground-truth tags and detailed instructions, visit http://ice.dlut.edu.cn/lu/DUT-OMRON/Homepage.htm.
Experiment Setup : We set the number of hyper-pixel nodes in all experiments n=200. The algorithm has two parameters: the boundary weight σ in Equation 4, and the balance weight α in equation 3. The parameter σ controls the weighted strength between a pair of nodes, and the parameter α balances the smoothing constraints and fitting constraints in the regularization function of the flow-order algorithm. For all experiments, these two parameters are based on experience choosing σ2= 0.1 and α= 0.99.
Evaluation Indicators : We evaluate all methods with precision, recall and F-measures. The precision value corresponds to the ratio of the significant pixels that are correctly assigned to all the pixels in the extraction area, and the recall value is defined as the percentage of the detected significant pixels relative to the ground live number. Similar to previous work, the significance graph is obtained by using a threshold value of 0 to 255 to obtain a precision curve. The F-metric is a measurement of the overall performance of the weighted harmonics computed by precision and recall:
Set β2= 0.3来 accent precision.
5.1 MSRA-1000
5.2 MSRA
5.3. Dut-omron
We test the proposed model on the Dut-omron dataset, where the image is annotated with a bounding box by five users. Similar to the experiments on the MSRA database, we calculated the rectangle of the two-dollar significant graph, and then evaluated our model by the fixed threshold and adaptive threshold methods. Figure 12 shows that the recommended datasets are more challenging (all models perform worse), providing more space for future work improvements.
5.4 Run time
6. Summary
In this paper, a bottom-up method is proposed, which is used to detect the salient regions in the image by the manifold sequencing on the graph, which combines the local packet clues and the boundary priori. We use a two-phase approach to rank with background and foreground seed points to generate a significant plot. We evaluated the proposed algorithm on a large data set and showed promising results by comparing it with 14 of the most advanced methods. In addition, the proposed algorithm is computationally efficient.
The two references in this article are also good basic learning materials:
1.D. Zhou, O. Bousquet, T. Lal, J.weston, and B. Scholkopf. Learning with local and global consistency. In NIPS, 2003. 3
2. D. Zhou, J. Weston, A. Gretton, O. Bousquet, and B. Scholkopf. Ranking on data manifolds. In NIPS, 2004. 2, 3
2018-08-31 00:46:05 Pei Jialeng