JWS——Java WordNet Similarity是由University Of Sussex的David Hope等開發的基於java與WordNet的語義相似性計算開源項目。其中實現了許多經典的語義相似性演算法。是一款值得研究的語義相似性計算開源工具。
JWS是WordNet::Similarity(一個Perl版的WordNet相似性比較包)的Java實現版本,想用Java實現用WordNet比較詞語相似性的朋友有福拉!!簡述使用步驟:
1、下載WordNet(Win、2.1版):http://wordnet.princeton.edu/wordnet/download/;
2、下載WordNet-InfoContent(2.1版):http://wn-similarity.sourceforge.net/ 或http://www.d.umn.edu/~tpederse/Data/;
3、下載JWS(現有版本:beta.11.01):http://www.cogs.susx.ac.uk/users/drh21/;
4、安裝WordNet;
5、解壓WordNet-InfoContent-2.1,並將檔案夾拷貝至WordNet目錄D:/Program Files/WordNet/2.1下;
6、將JWS中的兩個jar包:edu.mit.jwi_2.1.4.jar和edu.sussex.nlp.jws.beta.11.jar拷貝至Java的lib目錄下,並設定環境變數;
7、在Eclipse下運行JWS中的例子程式:TestExamples
說明:由於下載的WordNet是2.1版本的,所以程式中有幾處需要修改
String dir = "C:/Program Files/WordNet"; //這裡指定WordNet的安裝路徑,按照你實際安裝的路徑加以修改
JWS ws = new JWS(dir, "3.0"); //把3.0改為2.1即可
程式執行個體:
1 import java.util.TreeMap; 2 import java.text.*; 3 import edu.sussex.nlp.jws.*; 4 5 6 // 'TestExamples': how to use Java WordNet::Similarity 7 // David Hope, 2008 8 public class TestExamples 9 {10 public static void main(String[] args)11 {12 13 // 1. SET UP:14 // Let's make it easy for the user. So, rather than set pointers in 'Environment Variables' etc. let's allow the user to define exactly where they have put WordNet(s)15 String dir = "E:/Commonly Application/WordNet/";16 // That is, you may have version 3.0 sitting in the above directory e.g. C:/Program Files/WordNet/3.0/dict17 // The corresponding IC files folder should be in this same directory e.g. C:/Program Files/WordNet/3.0/WordNet-InfoContent-3.018 19 // Option 1 (Perl default): specify the version of WordNet you want to use (assuming that you have a copy of it) and use the default IC file [ic-semcor.dat]20 JWS ws = new JWS(dir, "2.1");21 // Option 2 : specify the version of WordNet you want to use and the particular IC file that you wish to apply22 //JWS ws = new JWS(dir, "3.0", "ic-bnc-resnik-add1.dat");23 24 25 // 2. EXAMPLES OF USE:26 27 // 2.1 [JIANG & CONRATH MEASURE]28 JiangAndConrath jcn = ws.getJiangAndConrath();29 //System.out.println("Jiang & Conrath\n");30 // all senses31 TreeMap<String, Double> scores1 = jcn.jcn("apple", "banana", "n"); // all senses32 //TreeMap<String, Double> scores1 = jcn.jcn("apple", 1, "banana", "n"); // fixed;all33 //TreeMap<String, Double> scores1 = jcn.jcn("apple", "banana", 2, "n"); // all;fixed34 for(String s : scores1.keySet())35 System.out.println(s + "\t" + scores1.get(s));36 // specific senses37 //System.out.println("\nspecific pair\t=\t" + jcn.jcn("apple", 1, "banana", 1, "n") + "\n");38 // max.39 ///System.out.println("\nhighest score\t=\t" + jcn.max("java", "best", "n") + "\n\n\n");40 41 //*/42 // 2.2 [LIN MEASURE]43 Lin lin = ws.getLin();44 ///System.out.println("Lin\n");45 // all senses46 TreeMap<String, Double> scores2 = lin.lin("like", "love", "n"); // all senses47 //TreeMap<String, Double> scores2 = lin.lin("kid", "child", "n"); // fixed;all48 //TreeMap<String, Double> scores2 = lin.lin("apple", "banana", 2, "n"); // all;fixed49 //for(String s : scores2.keySet())50 //System.out.println(s + "\t" + scores2.get(s));51 // specific senses52 System.out.println("\nspecific pair\t=\t" + lin.lin("like", 1, "love", 1, "n") + "\n");53 // max.54 System.out.println("\nhighest score\t=\t" + lin.max("From","date","n") + "\n\n\n");55 56 // ... and so on for any other measure57 }58 } // eof
簡單實現基於JWS的語義相似性計算程式,例如:
1 import edu.sussex.nlp.jws.JWS; 2 import edu.sussex.nlp.jws.Lin; 3 4 5 public class Similar { 6 7 private String str1; 8 private String str2; 9 private String dir = "E:/Commonly Application/WordNet/";10 private JWS ws = new JWS(dir, "2.1");11 12 public Similar(String str1,String str2){13 this.str1=str1;14 this.str2=str2;15 }16 17 public double getSimilarity(){18 String[] strs1 = splitString(str1);19 String[] strs2 = splitString(str2);20 double sum = 0.0;21 for(String s1 : strs1){22 for(String s2: strs2){23 double sc= maxScoreOfLin(s1,s2);24 sum+= sc;25 System.out.println("當前計算: "+s1+" VS "+s2+" 的相似性為:"+sc);26 }27 }28 double Similarity = sum /(strs1.length * strs2.length);29 sum=0;30 return Similarity;31 }32 33 private String[] splitString(String str){34 String[] ret = str.split(" ");35 return ret;36 }37 38 private double maxScoreOfLin(String str1,String str2){39 Lin lin = ws.getLin();40 double sc = lin.max(str1, str2, "n");41 if(sc==0){42 sc = lin.max(str1, str2, "v");43 }44 return sc;45 }46 47 public static void main(String args[]){48 String s1="departure";49 String s2="leaving from";50 Similar sm= new Similar(s1, s2);51 System.out.println(sm.getSimilarity());52 }53 }
當時碰到想基於protege+Wordnet來處理語義分析這塊,所以接觸到JWS,但沒有太多的時間去深入研究,是一個非常的遺憾,希望有研究的朋友,發個Blog Url,大家參考參考!