How can I tell if a string is Java code or an English word?

Source: Internet
Author: User
Tags java keywords

Consider the following two strings:

1.for(int i=02.doin English(nottoa sentence).

It's easy to see that the first one is Java code, the second is an English sentence. So how do computer programs differentiate between the two?

Java code may not be resolvable because it is a complete method (or declaration or expression), which provides a workaround for this problem. Sometimes the Java code and the English word are not completely delimited, and the solution is not 100% accurate and reliable. However, such a solution will only require a slight adjustment to your business needs, and you can also download the relevant code from GitHub.

The basic idea of this scheme is to turn a string into a set of marker symbols. For example, the code above might be: "Key,separator,id,assign,number,separatot,..." and then use some simple rules to differentiate Java code from English.

A Tokenizer class that converts a string to a set of tokens is provided below.

 PackageLexicalImportJava.util.LinkedList;ImportJava.util.regex.Matcher;ImportJava.util.regex.Pattern; Public  class tokenizer {    Private  class TokenInfo {         Public FinalPattern regex; Public Final intToken Public TokenInfo(Pattern regex,intToken) {Super(); This. Regex = regex; This. token = token; }    } Public  class Token {         Public Final intToken Public FinalString sequence; Public Token(inttoken, String sequence) {Super(); This. token = token; This. sequence = sequence; }    }PrivateLinkedlist<tokeninfo> Tokeninfos;PrivateLinkedlist<token> tokens; Public Tokenizer() {Tokeninfos =NewLinkedlist<tokeninfo> (); Tokens =NewLinkedlist<token> (); } Public void Add(String regex,intToken) {Tokeninfos. Add (NewTokenInfo (Pattern.compile ("^("+ Regex +")"), token); } Public void tokenize(String str)        {String s = Str.trim (); Tokens.clear (); while(!s.equals ("")) {//system.out.println (s);            BooleanMatch =false; for(TokenInfo Info:tokeninfos) {Matcher m = Info.regex.matcher (s);if(M.find ()) {match =true;                    String tok = M.group (). Trim (); s = M.replacefirst (""). Trim (); Tokens.add (NewToken (Info.token, Tok)); Break; }            }if(!match) {//throw New Parserexception ("Unexpected character in input:" + s);Tokens.clear (); System.out.println ("Unexpected character in input:"+ s);return; }        }    } PublicLinkedlist<token>Gettokens() {returnTokens } PublicStringgettokensstring() {StringBuilder SB =NewStringBuilder (); for(Tokenizer.token Tok:tokens)        {sb.append (Tok.token); }returnSb.tostring (); }}

We can use Java keywords, operators, identifiers, separators, and so on. and assign a mapping value identifier (used to store the Java keyword), so it is easy to distinguish between Java code and English.

 PackageLexicalImportGreenblocks.javaapiexamples.DB;ImportJava.io.IOException;ImportJava.sql.ResultSet;ImportJava.sql.SQLException;ImportJava.util.regex.Matcher;ImportJava.util.regex.Pattern;ImportOrg.apache.commons.lang.StringUtils;ImportNlp. Postagger; Public  class englishorcode {    Private StaticTokenizer Tokenizer =NULL; Public Static void Initializetokenizer() {Tokenizer =NewTokenizer ();//key WordsString keystring ="Abstract assert Boolean break byte case Catch"+"Char class const continue default do double else enum"+"extends false final finally float for Goto if implements"+"Import instanceof int interface long native new null"+"Package private protected public return short static"+"STRICTFP Super switch synchronized this throw throws true"+"transient try void volatile while Todo"; String[] keys = Keystring.split (" "); String KEYSTR = Stringutils.join (keys,"|"); Tokenizer.add (KEYSTR,1); Tokenizer.add ("\\(|\\)|\\{|\\}|\\[|\\]|;|,|\\.| =|>|<|!| ~|"+"\\?|:| ==|<=|>=|! =|&&|\\|\\| | \\+\\+|--|"+"\\+|-|\\*|/|&|\\| | \\^|%| \ ' |\ "|\n|\r|\\$|\\#",2);//separators, operators, etcTokenizer.add ("[0-9]+],3);//numberTokenizer.add ("[a-za-z][a-za-z0-9_]*],4);//identifierTokenizer.add ("@",4); } Public Static void Main(string[] args)throwsSQLException, ClassNotFoundException, IOException {initializetokenizer (); String s ="Do Something in 中文版";if(Isenglish (s)) {System.out.println ("中文版"); }Else{System.out.println ("Java Code"); } s ="for" (int i = 0; I < b.size (); i++) {";if(Isenglish (s)) {System.out.println ("中文版"); }Else{System.out.println ("Java Code"); }    }Private Static Boolean Isenglish(String replaced)        {tokenizer.tokenize (replaced); String patternstring = tokenizer.gettokensstring ();if(Patternstring.matches (". *444.*") || Patternstring.matches ("4+")){return true; }Else{return false; }    }}

Output Result:

EnglishJava Code

Original link

How can I tell if a string is Java code or an English word?

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.