java 字串詞頻統計執行個體代碼

來源:互聯網
上載者:User

複製代碼 代碼如下:package com.gpdi.action;

import java.util.ArrayList;
import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

public class WordsStatistics {

class Obj {
int count ;
Obj(int count){
this.count = count;
}
}

public List<WordCount> statistics(String word) {
List<WordCount> rs = new ArrayList<WordCount>();
Map <String,Obj> map = new HashMap<String,Obj>();

if(word == null ) {
return null;
}
word = word.toLowerCase();
word = word.replaceAll("'s", "");
word = word.replaceAll(",", "");
word = word.replaceAll("-", "");
word = word.replaceAll("\\.", "");
word = word.replaceAll("'", "");
word = word.replaceAll(":", "");
word = word.replaceAll("!", "");
word = word.replaceAll("\n", "");

String [] wordArray = word.split(" ");
for(String simpleWord : wordArray) {
simpleWord = simpleWord.trim();
if (simpleWord != null && !simpleWord.equalsIgnoreCase("")) {
Obj cnt = map.get(simpleWord);
if ( cnt!= null ) {
cnt.count++;
}else {
map.put(simpleWord, new Obj(1));
}
}
}

for(String key : map.keySet()) {
WordCount wd = new WordCount(key,map.get(key).count);
rs.add(wd);
}

Collections.sort(rs, new java.util.Comparator<WordCount>(){
@Override
public int compare(WordCount o1, WordCount o2) {
int result = 0 ;
if (o1.getCount() > o2.getCount() ) {
result = -1;
}else if (o1.getCount() < o2.getCount()) {
result = 1;
}else {
int strRs = o1.getWord().compareToIgnoreCase(o2.getWord());
if ( strRs > 0 ) {
result = 1;
}else {
result = -1 ;
}
}
return result;
}

});
return rs;
}

public static void main(String args[]) {
String word = "Pinterest is might be aa ab aa ab marketer's dream - ths site is largely used to curate products " ;
WordsStatistics s = new WordsStatistics();
List<WordCount> rs = s.statistics(word);
for(WordCount word1 : rs) {
System.out.println(word1.getWord()+"*"+word1.getCount());
}
}

}

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.