PROLOG and WordNet (4)

Source: Internet
Author: User
Tags split words knowledge base
3. Use WordNet to discuss the basic issues of using WordNet, and give several convenient predicates to improve efficiency. For more documents, see the source program. (1) Create the WordNet database file for indexing Prolog, with a total of 484381 lines of code. Therefore, improving file processing speed is one of the main goals of WordNet. One way to reduce word query time is to create an index. To search for words from WordNet, we need to use the predicate S (+ synset_id, + w_num, + word, + ss_type, + sense_number, + tag_count ). the predicate g (+ synset_id, + gloss) is available ). query the meaning of a word. The word to be searched is specified by the second word of S/6, which means that the default index does not affect the matching. There are two ways to create an index. One is the built-in predicate index (+ Predicate), which specifies the predicate used by the index with the form parameter. If an index is created based on the parameter predicate, the parameter is instantiated as 1; otherwise, the parameter is instantiated as 0. Remember, up to four of the first 32 parameters can be indexed. Therefore, when querying words, the predicate index (S (,) should be optimized. The advantage of this technology is that we can index more than one form parameter, for example, the identifier of the synonym set and the word itself; the disadvantage is that we have to in every generated file, declare the predicate INDEX/1, so the database compatibility is deteriorated. The second way to reduce the word search time using indexes is to change the order of the form parameters of the wn_s.pl file. The predicate improve_file/0 takes the predicate s as the first parameter and obtains the word through S. Other parameters remain unchanged. The updated file is wn_s_new.pl. Now, the default index method increases the search speed by 9 times. (2) There are several common ways to convert WordNet files to use WordNet for natural language processing to simplify interaction with other tools. In particular, when developing a natural language processing tool for Prolog, we can use the following parameters to improve interaction. First, only lowercase letters are allowed in the clause. Second, words are separated by underscores (_), instead of list. Third, open lists are used to process words, statements, and full text. As mentioned above, the predicate convert_file/0 is used to convert the wn_s.pl file, which is an extension of the predicate improve_file/0. The benefits of creating an index still exist. The following clause is an example of the new structure: S ([human, Action | _ g4016], 100022113,2, N ). (3) A subset of WordNet can only use a subset of WordNet to facilitate testing. The subset only contains the most common English words. Therefore, we will introduce the predicate subset_wn_s (+ number ). The predicate subset_wn_s (+ number) uses the wn_s.pl file as the input data and creates the new file wn_s_subset.pl. The new file only contains the most common words. The specific words are determined by the predicate S/6. The Number Parameter of the predicate specifies the number of words in the subset. This will greatly accelerate the WordNet experiment. If you decide to use a subset in the project, you can make an interesting attempt to create not only a subset of the main file wn_s.pl, but also a subset of other WordNet files. The predicate subset_wordnet (+ number) queries the subset wn_s/1 and converts it to another file. The name of the new file is wn_operator_subset.pl. These predicates can benefit from using the WordNet subset. In addition, they provide an example of how to automatically convert files. Different subsets can be created in the future, such as WordNet subsets containing only nouns, verbs, and adjectives. (4) Useful WordNet predicates use the interface on which WordNet requires the prolog database. All the predicates listed in this article are described in detail in the prolog file. You only need to check the predicates defined in these files to use them. Other files can be queried after being transferred to the memory. First, you need to understand what input predicates are acceptable. Since we converted the file to use the open list, it becomes an input object obviously. To make it easier for users, the Data Type "atom" of PROLOG is also available. For example, if you want to enter the word dog, you can type it or [DOG | x]. However, as mentioned earlier, since other natural language processing tools do not recognize underscores, if two words are used, you must use an open list, such as [physical, thing | x]. All predicates are used to convert the WordNet file instead of the original data. You can choose whether to use the complete database or its subset. WordNet is most directly used to find words. The predicate Lookup (+ word) searches for words from the wn_s.pl file and displays its syntax category and definition. Its Extended Lookup (+ word,-synsetlist) searches for a word and returns a set of all synonyms for the word. To get all synonyms of different meanings of a word, this predicate is very useful. The two predicates are defined in the file lookup. pl. In the upper part of this article, I explained the WordNet file and described the way words are grouped as synonyms. The predicate find_synset (+ synset_id,-wordlist) identifies the given synonym set and returns a list of all words in the set. It is defined in the find_synset.pl file. At the same time, various semantic relationships and word-plane relationships are explained in detail. The following experimental predicates are used to process these relations. The predicate find_hyp_chains (+ word, + cat) searches for top-level word chains of words and displays them in the list. For example, [[organism | _ g529] and [being | _ g520] are top-level word chains of the word dog. Therefore, the top-level word chains also include [animal | _ g616], [animate, being | _ g607], [Beast | _ g595], [brute | _ g586],... Remember, only verbs and nouns can be used as top-level words. Only similar words can become top-level words. You must specify a category in the second form parameter. For example, if you look for the top word of the noun dog, you will find the word mammal. Of course you are not looking for the word to move, that is, the top word of the verb to dog. The predicate find_hyp (+ word, + cat,-hyplist) queries all top words of a word and returns the result in a list. The predicates find_ent_chains (+ word) and find_ent (+ word,-entlist) are used to find containing relational words. The difference is that you do not need to specify a word class, because the implication is only applicable to verbs. The above four predicates are defined in the file hyp_ent.pl. These four predicates are very similar. Both find_hyp_chains/2 and find_ent_chains/1 call the general program predicate find_chain/3, and use 4th form parameters to determine whether the relationship is an upper-level or an implicit relationship. Similarly, the find_chain/3 and find_ent/1 call the Common Program predicate find/4, and use 3rd form parameters to determine whether the relationship is an upper-level relationship or an implicit relationship. The predicate find_sim_meanings (+ word) is an adjective that finds all similar meanings and displays them. The predicate like find_sim (+ word,-simlist) returns a list of words similar to the parameter word. Similar words are not necessarily upper-level relations or implication relations. Therefore, the internal predicate find_all is used to find all similar methods. It does not need recursive loops to find similar relationship chains. As mentioned above, the wn_mm.pl, wn_ms.pl and wn_mp.pl files describe the relationship between "part" and "whole. The following predicates are interfaces used to indicate and process these relations. The relationship between "part" and "whole" is only produced by nouns. The input variable of the predicate member_of (+ word,-grouplist) is word, which is used to search for a set of Word containing members and output it as a list grouplist. Example :? -Member_of (Faculty, x). x = [School | _ g413]. Faculty is an integral part of the school. The predicate has_member (+ word,-memberlist) is used to find the members in the set and output them in the list. Example :? -Has_member (Faculty, x). x = [partition sor | _ g407]. The partition sor (Professor) is an integral part of Faculty (all faculty members. The predicates substance_of (+ word,-list) and has_substance (+ word,-substancelist) are used to find the term that belongs to the word, or the term that contains the term that is returned in the list. Example :? -Substance_of (water, x). x = [Tear | _ g407]? Has_substance (water, x). x = [H2O | _ g407] water is the substance that makes up tears, and water molecules are the substance that makes up water. The predicates part_of (+ word,-wholelist) and has_part (+ word,-partlist) return the list of words that are part of word. Example :? -Part_of (leg, x). x = [Table | _ g407]? Has_part (leg, x). x = [knee | _ g407] the word leg (LEG) is part of the table, and knee (knee) is part of the leg. The input variable can be an "atom" or an open list, and the output variable is an open list. The two predicates are defined in the file meronym_holonym.pl. Predicate member_of/2, substance_of/2 and part_of/2 call the general program all_one/4 and specify the appropriate file extension mm, MS, and MP with 3rd yuan. First, we use them as common operators to form corresponding predicates. Second, we call the built-in predicates find_all/3 to obtain the answer. Use List/2 to search for all words that are identified by the synonym set and store them to the list. There are two other definitions available: Atom and open list. The predicate all_two/4 is similar to the predicate all_two/4. Instead of matching the 1st yuan, it returns the 2nd yuan. This predicate works with has_member/2, has_substance/2, and has_part/2. Finally, let's talk about the predicate cause (+ verb,-causelist ). The input variable verb of This predicate is given a verb and returns the list of verbs it generates. For example, the verb leak is the result of the verb break, get out, and get around. This predicate is defined in the file cause. pl and has only one line of code. It has the same mechanism as the predicate member_of/2. It calls the predicate all_one/4 with 3rd yuan. Now we know why the predicate all_one/4 requires 4th variable elements to specify a category. As the relationship between expression and processing of "part-whole", it seems incredible that you can only use nouns but specify classes. However, we reuse predicates in the causal relationship. To be able to process verbs, we need to introduce new variables. (5) how to deal with common single words that are not in the WordNet database this section is about words that cannot be found from WordNet. This is an important issue when you face a database that uses WordNet as a natural language processing tool. The predicate lookup_text (+ filename) tries to find all words in the text and finds words that are not in WordNet and are displayed. By calling this predicate, you will see that some words are not in the WordNet database, such as this, which, whether, of, from, the, my, and is. How to deal with these words? It depends on your problem. Extending the knowledge base is a way. If lookup_text/1 returns the word list without screen display, it is very useful. Its new form is lookup_text (+ filename,-list ). (6) The pronto Interface Designed by Jason Schlachter for pronto voice analyzer is an independent natural language processing tool for Prolog, which can be used independently or integrated. For integrated use, the WordNet database must have interfaces for voice analysis tools. Pronto has several outputs and requires WordNet to check whether a word is used. Pronto can analyze only one word at a time. Its output is a list containing the results of independent voice analysis. You can obtain all the analysis results, such as [[walk,-ed]. Later, the output result of pronto was extended to include a list of how to split words, such as [[partitioned Ed], [Walke,-D], [walk,-ed]. Now, you can call the predicate morph_atoms_lookup (+ morph) to check whether the explanation of the voice is a word ). If this word is found in WordNet, the call is successful, and vice versa. However, the voice analyzer also explains long phrases or statements. For example, the analysis result is [[[he], [[stored Ed], [Walke,-ed], [walk,-ed], [[Slowly], [Slow,-ly]. In this example, the result of the predicate morph_bag_lookup (+ morph) processing is returned. If at least one word list is found in WordNet, the predicate is successfully executed; otherwise, it fails. These predicates are useful to those who want to use the Pronto tool. The success or failure of a predicate execution is not enough, but the output information is more useful. Therefore, I designed the following predicates morph_atoms_lookup (Morph, result) and morph_bag_lookup (Morph, result). They have the same input variable, but return a word list at 2nd yuan, regardless of success or failure. If the entered word is not found, it returns the list [word, synset_id, w_num, category]. Example :? -Morph_atoms_lookup ([[talk,-ed], R ). R = [[talk, 100677091, 1, N]; r = [[talk, 105953501, 1, N]? -Morph_bag_lookup ([[he], [[talked], [talke,-D], [talk,-ed], R ). R = [[He, 105716399, 1, N], [talk, 100677091, 1, N]; r = [[He, 105716399, 1, N], [talk, 105953501, 1, N] to learn more about how these predicates work, see the file morph lookup. pl. 4. Conclusion: This article aims to provide a good document for the WordNet of the prolog version. The knowledge given in this article will simplify the programming of the prologwordnet database. Of course, the listed predicates can be expanded by the user based on actual needs.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.