In java, we often encounter the following situation: </img> ss </img> pp </img> obtain information in img, however, alt is not what we need. All we need to get is the src content. Maybe you will say that I can extract it twice, but I can extract it twice. But sometimes it is not allowed to be extracted twice .. So here we use non. Here we will give an example String str = "aaatggcccssaaakkcccaaaxvcccaaavxccc"; we want to extract the content between aaa and ccc, however, we do not want to start with xx .. What should I do. As we all know, the regular expression is not ^, obviously, regex = "aaa [^ xx] ccc"; such a method is definitely not feasible .. Maybe you will think of regex = "aaa [^ x] [^ x] ccc"; but in fact this method does not work either... If the first letter x does not meet the [^ x] condition during regular expression determination, the first letter x is regarded as false. Php language ?! Non-string usage, but it does not seem to exist in java. At least I haven't found it yet. If you have any hope, let me know .. For the above problem, we can think about it from another angle. If the form of xx is not allowed, in turn, x [^ x] | [^ x] [^ x] | [^ x] x is allowed. So our thinking is coming .. Example:
String regex="aaa(x[^x]\\w*?|[^x][^x]\\w*?|[^x]x\\w*?)ccc"; String str="aaatggcccssaaakkcccaaaxvcccaaaxxccc"; Matcher m = Pattern.compile(regex).matcher(str); while(m.find()){ System.out.println(m.group(1)); }
The result is as follows: tggkkxv has correct results. several methods have been tested and no errors have been found .. If you have a better method, please share it with us. Finally, it is very inefficient to extract data in this way. If you can, if you don't want to be lazy, it would be more appropriate to extract data twice.