Comparison between statistical models, hmm, maximum entropy model, CRF Conditional Random Field

Source: Internet
Author: User

HMM ModelThe annotation is treated as a Markov chain. The first-order Markov chain is used to model the relationship between adjacent annotations. Each tag corresponds to a probability function. Hmm is a generation model that defines the joint probability distribution. x and y represent the observed sequence and the corresponding random variable of the labeled sequence respectively. In order to be able to define this joint probability distribution, the generation model must enumerate all possible observed sequences, which is difficult in the actual calculation process, because we need to think of the elements of the observation sequence as isolated individuals, that is, if each element is independent of each other, the observation results at any time only depend on the State at that time.

The premise of HMM model assumption is suitable for small datasets, but in fact, in a large number of real corpus observation sequences are more expressed in a form of multiple interactive features, there is a wide range of long-range correlation between observed elements. In a named object recognition task, due to the complexity of the object structure, simple feature functions cannot cover all features, at this time, the premise of HMM makes it unable to use complex features (it cannot use more than one labeled feature.

Maximum Entropy ModelAny complex related features can be used. In terms of performance, the maximum entropy classifier exceeds the byaes classifier. However, as a classifier model, the two methods share a common drawback: each word is classified separately, and the relationship between tags cannot be fully utilized, hmm models with Markov chains can establish Markov associations between tags, which are not used by the maximum entropy model.

Advantages of the Maximum Entropy Model: first, the maximum entropy statistical model obtains the model with extremely large information entropy in all models that meet the constraints. Secondly, the maximum entropy statistical model can flexibly set constraints, through the number of constraints, the adaptability of the model to unknown data and the degree of fitting to known data can be adjusted. Thirdly, it can naturally solve the problem of parameter smoothing in the statistical model.

Deficiency of the Maximum Entropy Model: first, the binarization feature in the Maximum Entropy statistical model only records the appearance of features, and the text classification needs to know the strength of features. Therefore, it is not optimal in the classification method. Secondly, due to slow algorithm convergence, the calculation cost of the Maximum Entropy statistical model is high and the time-space overhead is high, the data sparse problem is serious.

Maximum Entropy Markov ModelThe advantages of the HMM Model and the maximum-entropy model are combined into a generative model. This model allows the state transition probability to depend on the non-independent features of each other in the sequence, in this way, context information is introduced into the learning and recognition process of the model, which improves the recognition accuracy and recall rate. Experiments have proved that, this new model performs much better in sequence tagging tasks than HMM and stateless Maximum Entropy Models.

CRF modelFeatures: firstly, CRF has a unified index model for the joint probability of the entire sequence given the observation sequence. One attractive feature is the convex aspect of the loss function. Secondly, compared with the improved hidden Markov model, the Conditional Random domain model can make better use of the context information provided in the text to be recognized for better experimental results. The Conditional Random Field is effective in Chinese block recognition and avoids strict independence assumptions and data inductive bias. The Conditional Random domain (CRF) model is applied to Chinese name and Entity recognition. Based on the characteristics of Chinese characters, multiple feature templates are defined. Test results show that the Conditional Random domain model has better performance than other probability models when the same feature set is used. Thirdly, part-of-speech tagging is mainly faced with the problem of simultaneous word discrimination and unknown word labeling. The traditional hidden Markov method is not easy to integrate new features, but the maximum entropy Markov model has labeling bias and other problems. This paper introduces Conditional Random domains to establish a part-of-speech tagging model, which is easy to integrate new features and can solve the labeling bias problem.

CRFs has strong reasoning capabilities, and can use complex, overlapping, and non-independent features for training and reasoning. It can make full use of context information as features, you can also add other external features to make the model

A wealth of information is available. Meanwhile, CRFs solves the "label bias" Problem in the maximum entropy model. The essential difference between the CRFs and the maximum entropy model is that the maximum entropy model has a probability model in each State and must be normalized during each state transition. If a status has only one subsequent status, the probability of a jump from this status to the subsequent status is 1. In this way, no matter what the input is, It redirects to the subsequent status. CRFs creates a unified probability model for all States. In this way, when normalization is performed, even if a State has only one subsequent state, the probability of jump to the subsequent status is not 1, thus solving the "labelbias" problem. Therefore, theoretically, CRFs is very suitable for Chinese part-of-speech tagging.

Advantages of the CRF model: firstly, the CRF model has advantages in combining multiple features and avoids the labeling bias problem. Secondly, CRF has better performance, and CRF has better feature fusion capability. For me, the recognition effect of CRF is significantly higher than that of me in case of small time.

Deficiency of CRF model: firstly, through the analysis of CRF-based methods combined with multiple features to identify English naming entities, it is found that in the process of using CRF, feature Selection and optimization are the key factors affecting the results. The good and bad feature selection problems directly determine the system performance. Second, the training model takes longer time than me, and the model obtained is very large, which cannot be run on a general PC.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.