Today we implement a simple word breaker, just do the demo using the following functions:1, the participle according to the space, the horizontal bar, the point number to divide;2, the implementation of HI and hello synonym query function;3, to achieve hi and hello synonym highlighting;Myanalyzer Implementation code:public class Myanalyzer extends Analyzer {private int analyzertype;public myanalyzer (int type) {super (); analyzertype = Typ e;} @Overri
Previous article Lucene participle process explained some of the process of participle, we also have a preliminary understanding of the word segmentation process, know that a word breaker consists of multiple tokenizer and Tokenfilter, This article explains that we use these two features to implement their own a simple synonym word breaker, the wrong place please point out(i) AnalysisHow to implement synony
, containing traditional Chinese GBK corresponding to 3.The following code is simplified :
1 Packagecwordseg; 2 3 Importjava.io.UnsupportedEncodingException; 4 //Import Utils. Systemparas; 5 Importcom.sun.jna.Library; 6 Importcom.sun.jna.Native; 7 8 /** 9 * Ten * Function: Basic word-breaker function One * Last updated: March 14, 2016 21:01:21 A */ - - Public classcwordseg { the //define interface Clibrary, inherit from Com.sun.
1. Hystrix Circuit Breaker
Hystrix is an open-source repository for latency and fault tolerance in distributed systems where many dependencies inevitably fail to be invoked.such as timeouts, anomalies, etc., Hystrix can ensure that in the case of a dependency problem, will not cause the overall service failure, avoid cascading failures, to improve the sub-The elasticity of the cloth-type system;
"Circuit
ElasticSearch5.3 Install IK word breaker
Previous use of the Elasticsearch installation head plugin succeeded, but the installation of IK participle failed. It appears that ElasticSearch5.0 later does not support the direct configuration of IK in elasticsearch.yml. The reason is explained below.First Download:Https://www.elastic.co/downloads/elasticsearchHttps://github.com/medcl/elasticsearch-analysis-ik, the latest version seems to be 5.3.
First step
Today we implement a simple word breaker, just do the demo using the following functions:
1, the participle according to the space, the horizontal bar, the point number to divide;
2, the implementation of HI and hello synonym query function;
3, to achieve hi and hello synonym highlighting;
Myanalyzer Implementation code:
public class Myanalyzer extends Analyzer {
private int analyzertype;
public myanalyzer (int type) {
super ();
Analyzertype
SOLR itself on the Chinese word processing is not too good, so the Chinese application often need to add a Chinese word breaker to Chinese word processing, Ik-analyzer is one of the good Chinese word breaker.
First, version information
SOLR version: 4.7.0
Requires Ik-analyzer version: IK Analyzer 2012ff_hf1
Ik-analyzer Download Address: http://code.google.com/p/ik-analyzer/downloads/list
Second, the confi
In a microservices architecture, the service is split into services according to the business, and the service and service can call each other (RPC), which can be invoked with Resttemplate+ribbon and feign in spring cloud. To ensure their high availability, a single service is typically deployed in a cluster. Due to network reasons or their own reasons, the service does not guarantee that 100% is available, if a single service problem, call the service will be a thread blocking, if there is a la
, Constants.nounandverbpos); foreach (var keyword in keywords) { console.writeline (keyword);}The running result isThe algorithm defines the calculation attempt to formalize the status of the input containingThe returned results of the corresponding Extracttagswithweight methods contain the corresponding weight values in addition to the keywords. Textrankextractor interface and Tfidfextractor exactly the same, no longer repeat.SummaryWord segmentation, pos tagging and keyword extraction are t
Because IK does not have the function of ambiguous participle, intends to use ANJS to pass the data to the front end with ANJS for word segmentation.Anjs Operation Document website address: http://nlpchina.github.io/ansj_seg/Just started because of the jar package problem for a while, so the jar is shared outJar Package: Http://yunpan.cn/cmuTuFhBxREnx (Extract code: 20C4)Import Java.util.list;import Org.ansj.domain.term;import Org.ansj.splitword.analysis.baseanalysis;import Org.ansj.splitword.an
Nameisrealname=true3, under the Solr_home to establish a collection1) Create a collection called Collection1pwd/luxh/solr/mkdir collection12) Copy the contents of the/solr-5.3.1/server/solr/configsets/basic_configs to the new Collection1pwd/luxh/solr/solr-5.3. 1/server/solr/configsets/CP -R. /* /luxh/solr/solr_home/collection1/4, configure the Schema.xml in Collection1, add the ANSJ participle configurationpwd/luxh/solr/solr_home/collection1/lscurrency.xml lang protwords.txt _rest_managed.js
Time words
Take the 1th letter of time in English.
U
Particle
Take English auxiliary word auxiliary
Vg
Moving morpheme
Verb-sex morphemes. The verb code is v. Precede the code g of the morpheme with V.
V
Verb
Take the first letter of the English verb verb.
Vd
Auxiliary verbs
A verb that is used directly as an adverbial. The code of the verb and the adverb is together.
Vn
Phpsplit is a Chinese word thesaurus based on PHP development.
PHP word breaker residing in Unicode encoding dictionary
Only applicable to PHP5, necessary function iconv
This program is used RMM inverse matching algorithm for word segmentation, thesaurus needs to be specially compiled, this class provides a makedict () method
Simple operation Flow: SetSource, Startanalysis, GetResult
Use special format to encode the main dictionary w
what, here can see some basic information zookeeper, such as the port number: 2181.5, Start zookeeperSince we need to start the boot entry in the bin directory, let's take a look at which is the boot entry in the bin directory:The rest is the direct start: zkserver.sh.So how do we make sure that it's started successfully? Let's take a look at its status:Here standalone, a good lonely person is not? Haha, because here is the configuration of the stand-alone version, and is not configured to clus
Initialization failed. Fail reason is./file\data\nlpir.user not valid license or your license Please feel pipy_zhang@msn.com!
Exception in thread "main" Java.lang.Error:Invalid memory access
Workaround:
Open Nlpir official website, download the latest version of the word breaker installation package, open the path, such as 20160509171502_ictclas2016 Word system download package \ Chinese Word segmentation 20140928\data under the Nlpir.user
Replac
2012ff_hf1 Chinese word segmentation configuration =============
1. Import the Ikanalyzer2012ff_u1.jar into the drive letter: \solr\server\solr\web-inf\lib
(and to have commons-io-2.3.jar,commons-logging-1.1.1.jar two jars)2, copy the IKAnalyzer.cfg.xml, stopword.dic to the core of the need to use the word breaker conf (drive letter: \solr\server\solr\web-inf\classes) below ( Note: If you do not have the Classe folder to create a classes folder yours
Word breaker: Chinese word breaker and English word breaker
Input text--keyword segmentation--Remove the word "some auxiliary words that don't affect the meaning of the article" This step can speed up indexing, reduce the index file--morphological restore "such as:" To remove the tense "--into lowercase
Private Analyzer Analyzer = new StandardAnalyzer ();Privat
For Chinese search engine, Chinese word segmentation is one of the most basic parts of the whole system, because the Chinese search algorithm based on word is not very good at present. Of course, this article is not to do research on Chinese search engine, but to share if you use PHP to do a site search engine. This article is an article in this system.
The word breaker I use is the Ictclas of the open source version of the CAS Computational Institu
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.