Natural language 19_lemmatisation

Source: Internet
Author: User

Https://en.wikipedia.org/wiki/Lemmatisation

lemmatisation (or lemmatization) in linguistics are the process of grouping together the inflected forms of a word so they can B E analysed as a single item, identified by the word ' s lemma, or dictionary form. [1]

In computational linguistics, lemmatisation are the algorithmic process of determining the lemma of a word based on its int Ended meaning. Unlike stemming, lemmatisation depends on correctly identifying the intended part of speech and meaning of a word in a sen Tence, as well as within the larger context surrounding that sentence, such as neighboring sentences or even an entire doc Ument. As a result, developing efficient lemmatisation algorithms is an open area of the. [2][3]

Contents
    • 1 Description
    • 2 use in biomedicine
    • 3 See also
    • 4 References
    • 5 External Links
Description

In many languages, words appear in several inflected forms. For example, in 中文版, the verb ' to walk ' may appear as ' walk ', ' walked ', ' walks ', ' walking '. The base form, ' walk ', that one might look up in a dictionary, is called the lemma for the word. The Association of the base form with a part of speech is often called a lexeme of the word.

Lemmatisation is closely related to stemming. The difference is a stemmer operates on a single word without knowledge of the context, and therefore cannot Discriminate between words which has different meanings depending on part of speech. However, stemmers is typically easier to implement and run faster. The reduced "accuracy" may not be matter for some applications. In fact, when used within information retrieval systems, stemming improves query recall accuracy, or true positive rate, W Hen compared to lemmatisation. Nonetheless, stemming reduces precision, or true negative rate, for such systems. [4]

For instance:

    1. the word "better" has "good" as its lemma. This link was missed by stemming, as it requires a dictionary look-up.
    2. the word "walk" is the base form for word "walking", and hence this is matched in both stemming and lemmatisation.
    3. The word ' meeting ' can be either the base form of a noun or a form of a verb (' to meet ') depending on the context; e.g., "in our last meeting" or "We were meeting again tomorrow". Unlike stemming, Lemmatisation attempts to select the correct lemma depending on the context.

Document indexing software like Lucene[5] can store the base stemmed format of the word without the knowledge of Meani Ng, but only considering word formation grammar rules. The stemmed word itself might not being a valid word: ' lazy ', as seen in the example below, was stemmed by many stemmers to ' l Azi '. This is because the purpose of stemming are not to produce the appropriate lemma–that are a more challenging task that req Uires knowledge of the context. The main purpose of stemming is to map different forms of a word to a single form. [6] as a rules-based algorithm, dependent only upon the spelling of a word, it sacrifices accuracy to ensure this, for example, when ' laziness ' was stemmed to ' Lazi ', it had the same stem as ' lazy '.

Use in biomedicine

Morphological analysis of published biomedical literature can yield useful results. Morphological processing of biomedical text can is more effective by a specialised Lemmatisation program for biomedicine, and may improve the accuracy of practical information extraction tasks. [7]

Natural language 19_lemmatisation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.