Several common machine learning kits

Source: Internet
Author: User

 

The so-called machine learning, in Wikipedia, is a kind of "used to create a dataset for analysis.Program(The specific definition is not mentioned here ). With these methods, we can model events, and often achieve rapid judgment of new data through analysis of existing data. Common machine learning models (in my contact) include CRF (Conditional Random Field), SVM (SVM), EM (maximum entropy), and me (Maximum Likelihood ), of course, the ranking is not sequential, so the classification of generativemodel or discriminativemodel is not detailed here. Most of these models in foreign countries have mature toolkit. That is to say, generally users only need to process the data format and can get the corresponding model through these toolkit, use models to customize your own applications. Next we will talk about several tools that have been used together with my experiences.

 

1. Mallet
Andrewmccallum from UMass (the second role of CRF, Daniel, should be known in the industry, no need to explain ). Written in Java. Besides CRFs, topicmodel and graphicalmodel are supported. I used the Chinese Chunk annotation before, but it has been a problem on my machine, so the experiment has been running data on the server.

2. CRF ++
The famous open-source tools can be listed below in SourceForge. It seems that there is a corresponding example for the conll2000 task (chunk. However, it is not difficult for other tasks to modify the Feature Template. I used pos (part-of-speech tagging) and thought it was quite convenient. However, if CRF was used, the training speed was too slow, and the corpus of dozens of megabytes had to run for a while and it was too sweaty, it hurts to use your own notebook!

3. libsvm
Two experts from NTU (Nanyang tech. Many of them are used in bioinformatics. We can see from their names that they are SVM toolkit. You can choose the kernel on your own. There is also a graphical interface demonstration on its official website. In my words, I used this toolkit to do a simple text binary classification experiment. I need to write the interface file myself to process the data, and the overall feeling is good.

4. maxent
Opensource tool with maximum entropy. This is also a Java thing. I have never used it any more. I am going to redo the previous NLP tasks in the next stage. If possible, I will try again.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.