Remove English characters from []

Source: Internet
Author: User

Recently, I uploaded a course in the smart. Net job class to the Information Technology University. At noon, a teacher from the Information Technology University asked me about the regular expression. The question is roughly described as follows:
There is a string containing a mix of Chinese and English characters, which contains several pairs of brackets (square brackets). Now, you need to remove all English characters in brackets and keep other data.
I thought it could be done at first glance, but I found some problems by writing them at will, which may be the cause of my birth in mathematics. I always wanted to make the problem as comprehensive as possible, when I had lunch, I was thinking about how to do better. here are my thoughts on this issue and see if I have any inspiration.
 
Thought of conversion
There is a very important idea in mathematics, that is, "turning to". What is turning? Simply put, it is To sum up complicated problems and turn them into a combination of some simple problems. Then the simple problems are solved, and the complicated problems are solved.
Some friends will say, "I have a headache in mathematics. What else do I want to think about... ..."
Otherwise, the name of "Thought" is very scary. It is a mathematical idea. In fact, she is very basic, because we use this idea every day. I want to know the quality and price of a supermarket's products are both good and bad? Obviously, we cannot perform tests one by one, but we can see that we can compare them. if there are always many people in this supermarket and there are few complaints from consumers, we can conclude that this supermarket is good. for another example, how does one understand the concept of object-oriented inheritance in C? In fact, it is very simple. Because the Child class contains the members of the parent class, the Child class inherits the Child class and contains the Child class members. That is, the Child class contains the parent class members, therefore, it is concluded that the derived classes all have base class members, and the nature of transmission is quite understandable.
 
Question-based
So what is the relationship between so many solutions to this problem? Obviously, I have used the thought of normalization to solve this problem. What are the key points of this problem?
Let's take an example:
There is a string
1 string str = "abc [Hello abc] abc ";
Remove the English character "abc" in "[you are good at abc]" in the string to change the string
1 string str = "abc [Hello] abc ";
Apparently, the regular expression can be
1 string str = Regex. Replace (str, @ "(\ [You) [a-zA-Z] + (good \])", "$1 $2 ");
Of course, the answer here is definitely not the only one, so it is just a method that I think. but the key is that this situation is quite special. first, the square brackets "[]" do not necessarily contain Chinese characters on both sides, and they do not necessarily represent hello. Second, there is not necessarily a string of English letters in the middle, for example, it may be "you are a good abc", so it will not work at this time.
The problem here is:
1. The square brackets are not sure about the number of man blocks and the number of English blocks;
2. The English and Chinese characters are not clearly arranged in square brackets.
It is easy to solve the problem. (Note that this article does not necessarily provide the best method, but provides a way to think about the problem)
Chinese characters on both sides
Because the problem is to remove the English letters in square brackets, you can use the C # Statement no matter what the two ends of the square brackets are:
1 string str = Regex. Replace (str, @ "(\ [) [a-zA-Z] * (. + ?) [A-zA-Z] * (\]) "," $1 $2 $3 ");
Converts the Chinese character brackets on both sides of the string into Chinese characters.
That is to say, all strings can be categorized into the following model:
... ... ...
 
Medium a medium
After the string is changed to "two sides of Chinese characters", the problem is that it is not clear how many group of quote characters exist in the two sides of the Chinese character, but analyze it yourself and remove the Chinese characters on the right, then the string will become a repeating loop combination of "a" in the string segment, namely: "Medium a", "medium a", "medium ",... ...
Then the problem is simple. As long as the string of "type a in [medium a]" is well solved, other situations can be solved through loops.
Step 1: remove the first group of English letters from left to right (of course, you need to determine whether the English Group exists). For example, replace a string "[medium a]" with a regular expression to "[medium a]". C # statements can be used.
1 if (Regex. IsMatch (str, @ "\ [[^ \] +? [A-zA-Z] + [^ \] + \] ")
2 {
3 string str = Regex. Replace (str, @ "(\ [^ \] + ?) [A-zA-Z] + ([^ \] + \]) "," $1 $2 ");
4}

The second step is simple, and the rest is repeated until the "[medium a] type" is left ". in fact, when this mode string is left, the above method is still called, and the intermediate English can be removed.
 
Analyze matching rules
In fact, the problem has been solved now. Next I will explain why I have written a regular expression like this. I will take the last matching as an example.
1 string str = "a [IN a] ";
2 if (Regex. IsMatch (str, @ "\ [[^ \] +? [A-zA-Z] + [^ \] + \] ")
3 {
4 string str = Regex. Replace (str, @ "(\ [^ \] + ?) [A-zA-Z] + ([^ \] + \]) "," $1 $2 ");
5}

Analysis:
1. In this case, if is used to determine whether IsMatch
The Regular Expression in IsMatch writes "@" \ [[^ \] +? [A-zA-Z] + [^ \] + \] ", indicating matching the beginning of the brackets, the ending character "]" is not allowed in the middle, and the following "[a-zA-Z] +" is used to match the first group of English letters as much as possible. this matches "[medium a" in str ". then the remaining "[^ \] + \]" matches all the characters after the first group of citation letters and the ending brackets "]". therefore, true is returned.
2. Start replacement
With the if match, it is easy to Replace. In the Replace method, the first and last groups in the regular expression match "[medium" and "medium]" respectively. Therefore, "$1 $2" replaces the quote letter in the middle.
For non-last match, for example, replace "[medium a]", group 1 matches "medium", and group 2 matches "medium a medium]". I just replaced the first group of English letters on the left.
 
Integration result
Through the above analysis, to remove the English in the brackets, the general thinking is
-> Remove the first and last English letters in brackets.
-> In a loop, remove a group of English characters from left to right until the English characters are entered.
A method is provided based on the previous discussion.
1 public static string MyReplace (string str)
2 {
3 str = Regex. Replace (str, @ "(\ [) [a-zA-Z] * (. + ?) [A-zA-Z] * (\]) "," $1 $2 $3 ");
4 while (Regex. IsMatch (str, @ "\ [[^ \] +? [A-zA-Z] + [^] + \] ")
5 {
6 str = Regex. Replace (str, @ "(\ [^ \] + ?) [A-zA-Z] + ([^ \] + \]) "," $1 $2 ");
7}
8 return str;
9}

However, it is worth thinking that this is not necessarily an absolute solution. According to my thinking, there will certainly be a better way. here I just gave a way to think about the problem and developed a solution.

 

From the sea of Dirac

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.