Probabilistic soft logic (psl,probabilistic soft logic) universal (can handle Chinese) version

Source: Internet
Author: User
Tags postgresql
First, Introduction

Probabilistic soft logic (psl,probabilistic soft logic) is a machine learning framework for developing probabilistic models, developed by the Statistical Relations Learning Group Linqs of the University of California, Santa Cruz and the University of Maryland. At present, its complex environment construction mode and groovy language expression to the same as the author of Beginners bring great difficulty, and many of the dependencies often make the originally built model small errors frequently.

After an effort, the author builds it into a single jar package and adds a coding mechanism that enables it to support data in a variety of languages. There are three main contributions:

1. Package PSL's original complex dependencies into a single jar package, and join the dependencies to get started.

2. Fully convert the less familiar groovy language model constructs into. Java support, and you can build your model with just one. java file.

3. Add the coding mechanism so that PSL can easily handle data in languages other than English.

Link: pan.baidu.com/s/1pybpnoppvk4jmsmw7rm_7a Password: g1cx
There are three files in the link folder:
Psl_swust1.0.jar modified PSL Model
Simpleacquaintances.zip version of the PSL official example (does not contain weight learning and functions)
Entity_resolution.zip version of the PSL official example (includes weight learning and functions)

When using, please add Psl_swust1.0.jar to the Java Project dependency, import the sample project also need to Psl_swust.jar the project Lib folder to join the dependency.

Ii. Examples and explanations

The Simpleacquaintances.java of Simpleacquaintances is described as an example.

1. Configuration Items
1 /*2 * ====== "configuration item" ======3          */4Tool tool =NewTool ();5 DataStore DataStore;6hashmap<string, partition> partitions =NewHashmap<string, partition>();7String Path = Tool.getpath (Newsimpleacquaintances (). GetClass ())8+ "/.. /data/";//simpleacquaintances Change to current class name9string[] paths =tool.getfiles (path);TenPslmodel PSL =NewPslmodel (Paths, "H2");//H2 changed to PostgreSQL when the PostgreSQL database is installed OneDatastore =Psl.getdatastore (); APsl.transcoding =false;//whether to encode the data (this value only determines whether the data is encoded, and the predicate is encoded by default)

• Simpleacquaintances needs to be changed to the current class name (the purpose is to get the current project folder by using the tool class GetPath () function).
• When the installation is configured with PostgreSQL, "H2" can be changed to "PostgreSQL" to use the PostgreSQL database (H2 is a model with a database, running in memory).
The transcoding key is set to True when the model encodes the data, and the model can support data processing in various languages (in addition, setting the predicate property to Uniqueintid after set true encoding can improve model calculation efficiency).
• This modified version retains the similarity calculation function and the custom function function of the PSL, but the encoded data cannot be computed by the PSL self-band similarity calculation function (because the encoded data is no longer the original string).

2. Defining partitions
1        //Weight Learning Partitioning2         //partitions.put ("Learn_obs", Datastore.getpartition ("Learn_obs"));3         //partitions.put ("Learn_target",4         //datastore.getpartition ("Learn_target"));5         //partitions.put ("Learn_truth", Datastore.getpartition ("Learn_truth"));6         //Experimental Partitioning7Datastore =Psl.getdatastore ();8Partitions.put ("Obs", Datastore.getpartition ("Obs"));9Partitions.put ("Target", datastore.getpartition ("target")));TenPartitions.put ("Truth", Datastore.getpartition ("truth")); OnePsl.setpartitions (partitions);

• When weight learning is required (with training data), the weight learning partition needs to be defined.
• "Obs" stands for known data partitioning;
• "Target" represents the target data storage partition to be inferred (you can not load data into it when using lazyinference inference),
• "Truth" is the real data partition.

3. predicate (function) definition
1hashmap<string, constanttype[]> p =NewHashmap<string, constanttype[]>();2Hashmap<string, externalfunction> f =NewHashmap<string, externalfunction>();3         //Add verb4P.put ("Lived",Newconstanttype[] {constanttype.uniquestringid,constanttype.uniquestringid});5P.put ("Likes",Newconstanttype[] {constanttype.uniquestringid,constanttype.uniquestringid});6P.put ("Knows",Newconstanttype[] {constanttype.uniquestringid,constanttype.uniquestringid});7         //Add a function8 //f.put ("Sameinitials", New Sameinitials ());9 //f.put ("Samenumtokens", New Samenumtokens ());TenPsl.definepredicates (P, f);//predicate, function input model

• Predicate definitions can be changed or added by simply replacing the modifications.
• Predicate common attributes are Uniquestringid, Uniqueintid, String, and so on.
• Functions can define the PSL self-band similarity function (when transcoding is false), or you can define a custom function that inherits from Externalfunction.

4. Rule definitions
1String[] Rules = {2 3"20.0: (Lived (P1, L) & (P1! = P2) & Lived (P2, L)) >> KNOWS (P1, P2) ^2",4"5.0: ((L1! = L2) & (P1! = P2) & lived (P2, L2) & Lived (P1, L1)) >> ~ (KNOWS (P1, P2)) ^2",5"10.0: (Likes (P2, L) & (P1! = P2) & Likes (P1, L)) >> KNOWS (P1, P2) ^2",6"5.0: (KNOWS (P1, P2) & KNOWS (P2, P3) & (P1! = P3)) >> KNOWS (P1, P3) ^2",7"1.0 * KNOWS (P1, P2) + -1.0 * KNOWS (P2, P1) = 0.0.",8"5.0: ~ (KNOWS (P1, P2)) ^2"9 Ten         }; OnePsl.definerules (rules);//Rule Input Model

• Rule format: Weight: Rule body >> rule header
^2 represents squared optimization,
• Define the rules for your own project simply by adding or subtracting the rules in the sample format.
• Hint: the definition of an arithmetic rule can only be written as "aPredicatex (A, b) +bpredicte (A, b) = 0.0." In the form of the PSL acceptable rule: "Knows (P1, P2) = Knows (P2, P1)." In the author modifies In the version, please write "1.0 * KNOWS (P1, P2) + -1.0 * KNOWS (P2, P1) = 0.0.", otherwise it will be an error and will be optimized later.

5. Import data
1 /*2 * ====== "import data" ======3 * where "1-2" means transcoding one or two columns of data4 * Function only when transcoding = True, indicating that only 1, 22 columns are transcoded5          */6Psl.loaddata ("lived", Path + "Lived_obs.txt", "Obs", "1-2");7Psl.loaddatatruth ("likes", Path + "Likes_obs.txt", "Obs", "1-2");8Psl.loaddata ("Knows", Path + "Knows_obs.txt", "Obs", "1-2");9Psl.loaddata ("Knows", Path + "Knows_targets.txt", "Target", "1-2");TenPsl.loaddatatruth ("Knows", Path + "Knows_truth.txt", "Truth", "1-2"); One //arraylist<string[]> Likepe = tool.filetoarraylist (path + "Likes_obs.txt", "the"); A //Psl.insertdatatruth ("likes", Likepe, "Obs"); - //psl.insertdata ("likes", Likepe, "Obs");

• This version provides loaddata,loaddatatruth,insertdata,insertdatatruth four ways to load data in the format: LoadData ("predicate", predicate corresponding data file path, "Partition to import", " Column to fetch (does not contain the probability value of that column) ")
The "1-2" in Loaddata,loaddatatruth means that the one or two columns of the data are transcoded, only in transcoding = True, indicating that 1, 22 columns in the data file are transcoded, multiple columns can be added, and separated by "-".
Loaddata,loaddatatruth difference is: With Loaddatatruth loading data, the default is the last column probability value;
Insertdata,insertdatatruth applies to a data file that stores multiple predicates corresponding to the data, you need to convert the file to list data before use, this version contains the tool class tools provide filetoarraylist ("file path" , the "Number of columns to fetch") function assists in completing the conversion work.
• "" "" "" "" "means to take out 1, 2, 3 as the predicate data, insertdatatruth out of each data" 1-2-...-n "of the nth item by default is the probability value.

6. Weight Learning
//   psl.learnweights ("Learn_target", "Lived-likes", "Learn_obs", "Learn_truth", "Maxlikelihoodmpe");

• Format: learnweights ("Training data destination Partition", "closed predicate, i.e.: As known data, the inference process is no longer a newly generated atom", "training data known data partition", "Real data storage Partition", "Weight Learning Method")
• Weight Learning optimization rule weights when training data is available.
• This version retains five PSL weight optimization methods:
"Lazymaxlikelihoodmpe",
"Maxlikelihoodmpe",
"Maxpiecewisepseudolikelihood",
"Maxpseudolikelihood",
"Simplexsampler"
Replace to use.

7. Print output model
Psl.printmodel ();

• Can be used to view already defined models and coded models (rules).

8. Running Inference
1 //       psl.runlazyinference ("Known data Partition", "Target partition (storage result)"); 2 //       psl.runlazyinference ("Obs", "target"); 3 //       psl.runinference ("Known data Partition", "closed predicate 1-enclosing predicate 2", "Target partition (contains the defined target atom)"); 4         Psl.runinference ("Obs", "Lived-likes", "target");

• This version has two ways of reasoning: lazympeinference,mpeinference.
lazympeinference Format: runlazyinference ("Known data Partition", "destination partition for storing results")
mpeinference Format: runinference ("" Known data Partition "," closed predicate 1-enclosing predicate 2 "," target partition contains the target atom of the input ")

9. Data output
Psl.writeoutput ("Target", "Knows", Path + "/result/knows_inffer.txt");

• Output function using format: WriteOutput ("Target Partition", "Data to output predicate 1-data to be output to predicate 2", Output Path)

10. Evaluate the results of the experiment
Psl.evalresults ("Target", "truth", "Knows", path                + "/result/evalresults.txt");

• Evaluation function using format: evalresults ("Target Partition", "Real Data Partition", "target predicate 1-target predicate 2", evaluation result output path)
• It is worth mentioning that the "target predicate 1-target predicate 2" entry needs to fill in the predicate corresponding to the data contained in all the real data partitions.

11. Close the Model
Psl.closemodel ();

• The inference is complete, please close the model.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.