Split English sentences into words (for different situations in sentences)
Recently, I have a requirement that the background should give me an English sentence. I want to split each word out, calculate the number of words, and arrange each word on a horizontal line. Some people may think, it's easy to do it. Just split it with spaces, and then judge whether the last character of each word is a comma or other symbol. If yes, remove it.
In this case, the background English sentence must be highly standard. If there are two consecutive spaces in the sentence, then the split words will contain spaces, the number of words is inaccurate. There is a solution: if the word is determined to be a space, remove it. However, the sentence is "Yeah, I need a word ". There is only a comma between Yeah and I, and there is no space. In this case, it is very troublesome to judge again .. What's more, there are various problems in the background, you know .. It is impossible to solve this problem simply by using spaces for splitting. So how can this problem be solved? A method that traverses characters one by one. The two methods are listed. The two methods have the same idea ..
Assume that the sentence is:
NSString * sentence = @ "Yeah,... I need a world .";
Method 1:
-(NSMutableArray *) componentsWithString :( NSString *) str
{
NSMutableArray * wordArray = [NSMutableArrayarray];
NSString * wordStr = @"";
For (int y = 0; y <str. length; y ++ ){
// Use the string truncation method to extract each character
NSString * string1 = [str substringFromIndex: y];
NSString * string2 = [string1 substringToIndex: 1];
// Convert to the character format. The encoding method can be NSUTF8StringEncoding.
Constchar * s = [string2 cStringUsingEncoding: NSASCIIStringEncoding];
// Determine whether it is a letter
// For quotation marks ', s [0] = '\''
// For short horizontal line-in this case, str [0] = '-'
If (s [0]> = 65 & s [0] <= 90) | (s [0]> = 97 & s [0] <= 122) | s [0] = '\ ''| str [0] = '-'){
NSString * string = [NSStringstringWithCString: s encoding: NSUTF8StringEncoding];
WordStr = [wordStr stringByAppendingString: string];
} Else {
If (wordStr. length> 0 ){
[WordArray addObject: wordStr];
}
WordStr = @"";
}
}
Return wordArray;
}
Method 2:
-(NSMutableArray *) componentsWithString1 :( NSString *) str
{
NSMutableArray * wordArray = [NSMutableArrayarray];
NSString * wordStr = @"";
For (int k = 0; k
If (str. UTF8String [k]> = 97 & str. UTF8String [k] <= 122) | (str. UTF8String [k]> = 65 & str. UTF8String [k] <= 97 | (str. UTF8String [k] = '\ '') | (str. UTF8String [k] = '-')){
WordStr = [wordStr stringByAppendingFormat: @ "% c", str. UTF8String [k];
} Else {
If (wordStr. length> 0 ){
[WordArray addObject: wordStr];
WordStr = @"";
}
}
}
Return wordArray;
}
We can see from the above that method 2 is relatively simple, and method 2 is recommended.