Fishe Vector Fisher VECOTR (ii) _fishe vector

Source: Internet
Author: User

In Fisher Vector (1), the linear core is introduced, in order to meet the different requirements, the practical application will use a variety of kernel functions, fisher kernel is one of them. Fisher Kernel

At this point, there is still a two classification problem for one (1,−1) (1,−1), we want to learn P (x,y) =p (x|y) p (Y) p (x,y) =p (x|y) p (y)

Because

So P (x,y) =p (x|y) p (y) =p (x|θy) p (Y) p (x,y) =p (x|y) p (y) =p (x|θy) p (y).

The

If P (Y) =p (Y¯) p (y) =p (Y¯), the upper type can be written as P (y|x) =σ (LNP (X|θy¯) −LNP (x|θy)) P (y|x) =σ (LNP (X|θy¯) −LNP (x|θy))

A first-order Taylor expansion of LNP (X|θy) LNP (X|θy) has LNP (X|θy) ≈LNP (x|θ) + (θy−θ) ln′p (x|θ) LNP (X|θy) ≈LNP (x|θ) + (θy−θ) ln′p (x|θ)

The corresponding LNP (X|θy¯) ≈LNP (x|θ) + (θy¯−θ) ln′p (x|θ) LNP (X|θy¯) ≈LNP (x|θ) + (θy¯−θ) ln′p (x|θ)

Bring the two expansion to the top: P (y|x) =σ ((Θy−θy¯) ln′p (x|θ)) p (y|x) =σ ((Θy−θy¯) ln′p (x|θ))

Make Ux=ln′p (x|θ) =∂∂ΘLNP (x|θ) ux=ln′p (x|θ) =∂∂ΘLNP (x|θ), which is Fisher Score. So P (y|x) =σ (Θy−θy¯) UX) =σ ((θ1−θ−1) UX) p (y|x) =σ ((Θy−θy¯) UX) =σ ((θ1−θ−1) UX)

This introduces the concept of a kullback–leibler divergence, also known as relative entropy, which is usually used to calculate the distance between two distributions, such as the distance between θ1 and Θ−1θ1 and θ−1: DKL (θ1| | θ−1) =∫∞−∞p (x|θ1) LNP (x|θ1) P (x|θ−1) dx DKL (θ1| | θ−1) =∫−∞∞p (x|θ1) LNP (x|θ1) P (x|θ−1) dx

∫∞−∞p (x|θ) ∂∂ΘLNP (x|θ) dx=∫∞−∞p (x|θ) ∂∂θp (x|θ) P (x|θ) dx=∂∂θ∫∞−∞p (x|θ) dx=0∫−∞∞p (x|θ) ∂∂ΘLNP (x|θ) dx=∫−∞∞p (x|θ) ∂∂θP (x |θ) P (x|θ) dx=∂∂θ∫−∞∞p (x|θ) dx=0

According to the first-order expansion of the above, if we do the first-order expansion relative entropy is 0 0, so LNP (x|θ1) in the θ−1 LNP (x|θ1) in the θ−1 of the second-order expansion LNP (X|Θ1) ≈LNP (x|θ−1) + (θ1−θ−1) ln′p (x|θ−1) + (θ1−θ−1) 2∂2LNP (x|θ−1) 2∂θ−1 LNP (x|θ1) ≈LNP (x|θ−1) + (θ1−θ−1) ln′p (x|θ−1) + (θ1−θ−1) 2∂2LNP (x|θ−1) 2∂θ−1

So DKL (θ1| | θ−1) =∫∞−∞p (x|θ1) LNP (x|θ1) P (x|θ−1) dx=∫∞−∞p (x|θ1) LNP (x|θ1) −p (x|θ1) LNP (x|θ−1) dx=∫∞−∞p (x|θ1) (θ1−θ−1) Ln′P (x|θ−1) + ∂2LNP (x|θ−1) 2∂θ−1 (θ1−θ−1) 2dx DKL (θ1| | θ−1) =∫−∞∞p (x|θ1) LNP (x|θ1) P (x|θ−1) dx=∫−∞∞p (x|θ1) LNP (x|θ1) −p (x|θ1) LNP (x|θ−1) dx=∫−∞∞p (x|θ1) (θ1−θ−1) Ln′P (x|θ−1) + ∂2LNP (x|θ−1) 2∂θ−1 (θ1−θ−1) 2dx

hypothesis, so DKL (θ1| | θ−1) = (θ1−θ−1) 2∫∞−∞p (x|θ1) ∂2LNP (x|θ−1) 2∂θ−1dx DKL (θ1| | θ−1) = (θ1−θ−1) 2∫−∞∞p (x|θ1) ∂2LNP (x|θ−1) 2∂Θ−1DX

The Fisher Information Matrix I (θ) =∫p (x|θ) ∂2∂ΘLNP (x|θ) 2dx I (θ) =∫p (x|θ) ∂2∂ΘLNP (x|θ) 2DX is introduced here, so DKL (θ1| | θ−1) = (θ1−θ−1) 2I (θ1) DKL (θ1| | θ−1) = (θ1−θ−1) 2I (θ1)

I'm not quite sure about this one, π__π. According to the relative entropy, we can specify a prior distribution of θθ, and we can see that when θ1 and θ1θ1 and θ1 are quite close, the relative entropy is smaller and the probability is greater (this is Yumbo said, did not understand). You can define P (θ) =e (θ1−θ−1) 2I (θ) P (θ) =e (θ1−θ−1) 2I (theta)

At this point, we can calculate the θθ according to the maximum posterior probability, and similar, we can finally get P (y∣x) =σ (∑i=1nλi (Utxii−1ux)) p (y∣x) =σ (∑i=1nλi)).

Similar to linear kernel functions, this makes K (xi,x) =utxii−1ux K (xi,x) =uxiti−1ux, called Fisher Kernel.

Having said so much, and finally to the point, it can be seen that Fisher kernel maps the original feature X-UX=∂∂ΘLNP (x|θ) UX=∂∂ΘLNP (x|θ) to another space, and the Fisher score is the Fisher Vector. Many literatures introduce Fisher vectors directly to the formula, and then say the parameters of the likelihood function are very discriminating and therefore used as a feature. I personally think that rather than understanding the Fisher vector as a mapping or transformation is simpler.

Don't forget, the previous mention of Fisher vector is commonly used in combination with the Gaussian mixture model, the general process is:

Therefore, when the given training set feature x1,x2,.., xm∈rd x1,x2,.., xm∈rd, the Gaussian mixture model is trained first, and the Parameters θ= (μk,σk,πk) k=1,..., kθ= (μk,σk,πk) k=1,..., K P (x|θ) =∑K=1KΠKP (x|μk,σk) p (x|θ) =∑K=1KΠKP (x|μk,σk)

The next step is to use UX=∂∂ΘLNP (x|θ) UX=∂∂ΘLNP (x|θ) to compute the derivation of the three parameters and concatenate the results (∂∂ΜKLNP (x|θ), ∂∂ΣKLNP (x|θ), ∂∂ΠKLNP (x|θ)) K=1,... K (∂∂ΜKLNP (x|θ), ∂∂ΣKLNP (x|θ), ∂∂ΠKLNP (x|θ)) K=1,... K

The vector of this 2K (d+1) 2K (d+1) dimension is the ultimate feature, sometimes ∂∂ΠKLNP (x|θ) ∂∂ΠKLNP (x|θ) is not computed, and the ultimate feature is the 2Kd 2Kd dimension.

from:http://bucktoothsir.github.io/blog/2014/11/27/10-theblog/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.