DAY14: Using the Stanford NER package to implement your own named entity recognizer _ one months

Source: Internet
Author: User

I'm not a fan of machine learning (Machine Learning), Natural language processing (Natural Text PROCESSING,NLP), but I always think of some ideas that need to be used. Our goal today in this blog post is to build a real-time job search using Twitter data. Each individual search result should include the name of the company that provided the position, the location of the job, and the person who contacted the company when they applied. This requires us to analyze each tweet from the individual (person), place (Location), organization (organisation) three aspects. This type of problem is categorized as a named entity recognition (Named Entity recognition,ner) problem.

According to Wikipedia, named entity recognition is a subtask of information extraction (information extraction), which locates and categorizes the atomic elements (Atomic Element) of the text, and then outputs them in a fixed format directory, for example: person name, organization, location, The representation of time, quantity, currency value, percentage, and so on.

To make it more clear, let's give an example. Let's say we have the following push:

An ordinary person can easily tell that a group called PSI Pax has a vacant position in Baltimore. But how do we do this in a programmatic way? The easiest way to do this is to maintain a list of all your organization's names and locations, and then search for the list. However, the scalability of this approach is too poor.

Today, in this blog post, I will describe how to use the Stanford NER (Stanford NER) software package to set up our own NER server. What is Stanford NER.

The Java implementation of the Stanford NER named entity recognition (Named Entity recognizer,ner). NER identifies a series of nouns in a text, such as the name of a person, a company, or a gene or protein name. Pre-preparation

Some basic Java knowledge is needed. Install the latest version of the JDK on your operating system, and you can install OpenJDK or Oracle JDK 7. OpenShift supports OpenJDK 6 and 7.

Download the Stanford NER package from the official website.

Register for a OpenShift account. This is completely free, and red hat will give each user three free Gears, on the Gears you can run your program. When this article is written, OpenShift allocates 1.5GB of RAM and 3GB of hard disk space for each user.

On this computer, install the RHC client tool. RHC is a ruby gem, so you need to install Ruby 1.8.7 and above on the machine. To install RHC, enter:

sudo gem install RHC

Update RHC to the latest version and execute:

sudo gem updatge rhc

If you need to read the Help files for additional installation of the RHC command-line tool, you can browse to: Https://openshift.redhat.com/community/developers/rhc-client-tools-install

5. Use the RHC Setup command to set up the OpenShift account, this command will create a namespace for you, and then upload your SSH keys to the OpenShift server. First step: Create a JBoss EAP application

We are now starting to create this demo application. The name of this application is Nerdemo

RHC Create-app Nerdemo Jbosseap

If you can access the media gear (Medium Gears), you can use the following command:

$ RHC Create-app Nerdemo jbosseap-g Medium

It will create an application container for us, called Gear, that will automatically set the desired selinux/cgroup configuration. OpenShift also creates a private git repository for us and then clones the warehouse to the local system. Finally, OpenShift also deploys a DNS outside the connection. Deployment applications can be accessed through Links: http://linkbin-domain-name.rhcloud.com/. Replace the field with your own openshit domain (sometimes called command space) Step two: Increase Maven dependency

In the Pom.xml file, add dependencies:

<dependency>
    <groupId>edu.stanford.nlp</groupId>
    <ARTIFACTID>STANFORD-CORENLP </artifactId>
    <version>3.2.0</version>
</dependency>

Then, update the Maven project to Java 7 by updating some of the properties in the Pom.xml file

<maven.compiler.source>1.7</maven.compiler.source>
<maven.compiler.target>1.7</ Maven.compiler.target>

Now update maven with right click > Maven > Update Project Third step: Start CDI

We use CDI for Dependency injection (Dependency injection). CDI (context and Dependency injection) is a feature of Java EE 6 that allows dependency injection in Java EE 6 projects. CDI defines a type-safe (Type-safe) Dependency injection mechanism for Java EE. Almost any POJO can be injected as a CDI bean (bean).

In the Src/main/webapp/web-inf directory, create an XML file named Beans.xml. Replace the contents of Beans.xml with the following:

<beans xmlns= "Http://java.sun.com/xml/ns/javaee" xmlns:xsi= "Http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemalocation= "Http://java.sun.com/xml/ns/javaee http://java.sun.com/xml/ns/javaee/beans_1_0.xsd" >

</beans>
Fourth step: Application Scoping Classifier beans (Application Scoped classifier Bean)

Now we create an application-scoped bean (bean), which creates an instance of the Crfclassifier class. This classifier is used to detect names, places, and organizations in text

Package Com.nerdemo;

Import javax.annotation.PostConstruct;
Import javax.enterprise.context.ApplicationScoped;
Import javax.enterprise.inject.Produces;
Import javax.inject.Named;

Import Edu.stanford.nlp.ie.crf.CRFClassifier;
Import Edu.stanford.nlp.ling.CoreLabel;

@ApplicationScoped public
class Classifierconfig {

    private String serializedclassifier = "classifiers/ English.all.3class.distsim.crf.ser.gz ";
    Private crfclassifier<corelabel> classifier;

    @PostConstruct public
    void Postconstruct () {
        crfclassifier<corelabel> classifier = Crfclassifier.getclassifiernoexceptions (serializedclassifier);
        This.classifier = classifier;
    }

    @Produces
    @Named public
    crfclassifier<corelabel> classifier () {return
        classifier;
    }
}

Copy the english.all.3class.distsim.crf.ser.gz classifier to the Src/main/resources/classifiers directory from the download of the Stanford NER software package. Fifth Step: Open Ax-rs

To turn on Ax-rs, create a class that extends Javax.ws.rs.core.Application, and then mark the path of the application with the following Javax.ws.rs.ApplicationPath notation:

Package Com.nerdemo;

Import Javax.ws.rs.ApplicationPath;
Import javax.ws.rs.core.Application;

@ApplicationPath ("/api/v1") public
class Jaxrsinitializer extends application{


}
Sixth step: Create the Classifyrestresource class

Now we're going to create the Classifyrestresource class, which returns a NER result. Create a new Classifyrestresource class and replace it with the following code:

Package Com.nerdemo;
Import java.util.ArrayList;

Import java.util.List;
Import Javax.inject.Inject;
Import Javax.ws.rs.GET;
Import Javax.ws.rs.Path;
Import Javax.ws.rs.PathParam;
Import javax.ws.rs.Produces;

Import Javax.ws.rs.core.MediaType;
Import Edu.stanford.nlp.ie.crf.CRFClassifier;
Import edu.stanford.nlp.ling.CoreAnnotations;

Import Edu.stanford.nlp.ling.CoreLabel; @Path ("/classify") public class Classifierrestresource {@Inject private crfclassifier<corelabel> Classifie

    R @GET @Path (value = "/{text}") @Produces (value = mediatype.application_json) public list<result> Findner (
        @PathParam ("text") String text) {list<list<corelabel>> classify = classifier.classify (text);
        list<result> results = new arraylist<> (); for (list<corelabel> corelabels:classify) {for (CoreLabel corelabel:corelabels) {S
                Tring Word = Corelabel.word (); String answer =Corelabel.get (CoreAnnotations.AnswerAnnotation.class); if (!)
                O ". Equals (Answer)) {Results.add (new result (word, answer));
    }} return results;
 }
}
Deploy to OpenShift

Finally, the deployment changes to OpenShift:

$ git Add.
$ git commit-am "NER demo app"
$ git push

After the code has been successfully deployed, we can see the application run by accessing http://nerdemo-{domain-name}.rhcloud.com. My example application runs in: http://nerdemo-t20.rhcloud.com

Now, send a request: http://nerdemo-t20.rhcloud.com/api/v1/classify/Microsoft%20SCCM%20Windows%20Server%202012%20Web% 20development%20expert%20 (SME3)%20at%20psi%20pax%20 (BALTIMORE,%20MD)

Then you'll get a result of a JSON format:

[
{"word": "Microsoft", "Answer": "Organization"},
{"word": "PSI", "Answer": "Organization"},
{"word": " Pax "," Answer ":" Organization "},
{" word ":" Baltimore "," Answer ":" LOCATION "}
]

That's the content of today, keep feedback. The next online login OpenShift account obtains its own private PaaS (Platform as a Service) by evaluating the OpenShift Enterprise Edition. Need help. Go to openshift Community Forum to ask questions. Show your cool apps in OpenShift Developer Spotlight. Start browsing the OpenShift application exhibition today

Original: Day 14:stanford ner--how to Setup Your Own Name, Entity, and recognition Server in Cloud
translation segmentfault

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.