10. NET methods for deleting blank strings,. net Method for deleting blank strings

Source: Internet
Author: User
Tags string methods

10. NET methods for deleting blank strings,. net Method for deleting blank strings

There are countless ways to delete all the spaces in the string, but which one is faster?

Introduction

If you ask what the blank space is, it's a bit messy. Many people think that the blank SPACE is the SPACE character (UnicodeU + 0020, ASCII 32, HTML), but it actually includes all the characters that make the layout appear horizontally and vertically. In fact, this is a full class of characters defined as Unicode Character databases.

The blank space mentioned in this article not only refers to its correct definition, but also includes the string. Replace ("", "") method.

The reference method here will delete all leading and trailing spaces. This is the meaning of "all blank spaces" in the title of the article.

Background

This article was initially out of my curiosity. In fact, I don't need to use the fastest algorithm to delete white spaces in strings.

Check white space characters

It is easy to check for blank characters. All the code you need is:

Char wp = ''; char a = 'a'; Assert. true (char. isWhiteSpace (wp); Assert. false (char. isWhiteSpace (a); however, when I implement Manual Optimization of the delete method, I realize that this is not as good as expected. Some source code in Microsoft's reference source code library char. cs mining found: public static bool IsWhiteSpace (char c) {if (IsLatin1 (c) {return (IsWhiteSpaceLatin1 (c);} return CharUnicodeInfo. isWhiteSpace (c);} Then CharUnicodeInfo. isWhiteSpace: internal static bool IsWhiteSpace (char c) {UnicodeCategory uc = GetUnicodeCategory (c); // In Unicode 3.0, U + 2028 is the only character which is under the category "LineSeparator ". // And U + 2029 is th eonly character which is under the category "ParagraphSeparator ". switch (uc) {case (UnicodeCategory. spaceSeparator): case (UnicodeCategory. lineSeparator): case (UnicodeCategory. paragraphSeparator): return (true);} return (false );}

The GetUnicodeCategory () method calls the InternalGetUnicodeCategory () method, which is actually quite fast, but now we have four methods in turn! The following code is provided by a reviewer and can be used to quickly implement custom versions and JIT default inline:
 

// whitespace detection method: very fast, a lot faster than Char.IsWhiteSpace [MethodImpl(MethodImplOptions.AggressiveInlining)] // if it's not inlined then it will be slow!!! public static bool isWhiteSpace(char ch) {   // this is surprisingly faster than the equivalent if statement   switch (ch) {     case '\u0009': case '\u000A': case '\u000B': case '\u000C': case '\u000D':     case '\u0020': case '\u0085': case '\u00A0': case '\u1680': case '\u2000':     case '\u2001': case '\u2002': case '\u2003': case '\u2004': case '\u2005':     case '\u2006': case '\u2007': case '\u2008': case '\u2009': case '\u200A':     case '\u2028': case '\u2029': case '\u202F': case '\u205F': case '\u3000':       return true;     default:       return false;   } } 

Different Methods for deleting strings

I use various methods to delete all the spaces in the string.

Separation and merging

This is a very simple method I have been using. Separates strings based on space characters, but does not include null items, and then merges the generated fragments together. This method sounds silly. In fact, at first glance, it seems like a very wasteful solution:

Public static string trimallwitheat litandjoin (string str) {return string. concat (str. split (default (string []), StringSplitOptions. removeEmptyEntries);} This is an elegant declarative method to implement this process: public static string TrimAllWithLinq (string str) {return new string (str. where (c =>! IsWhiteSpace (c). ToArray ());}

Regular Expression

Regular Expressions are very powerful, and any programmer should be aware of this.

static Regex whitespace = new Regex(@"\s+", RegexOptions.Compiled);  public static string TrimAllWithRegex(string str) {   return whitespace.Replace(str, ""); } 

Character array in-situ Conversion Method

This method converts the input string into a character array, and then scans the string in the same place to remove the blank characters (no intermediate buffer or string is created ). Finally, a new string is generated for an array that has been deleted.

public static string TrimAllWithInplaceCharArray(string str) {   var len = str.Length;   var src = str.ToCharArray();   int dstIdx = 0;   for (int i = 0; i < len; i++) {     var ch = src[i];     if (!isWhiteSpace(ch))       src[dstIdx++] = ch;   }   return new string(src, 0, dstIdx); } 

Character array COPY method

This method is similar to the character Array in-situ conversion method, but it uses Array. Copy to Copy consecutive non-blank "strings" while skipping spaces. Finally, it creates a character array of the appropriate size and returns a new string in the same way.

public static string TrimAllWithCharArrayCopy(string str) {  var len = str.Length;  var src = str.ToCharArray();  int srcIdx = 0, dstIdx = 0, count = 0;  for (int i = 0; i < len; i++) {    if (isWhiteSpace(src[i])) {      count = i - srcIdx;      Array.Copy(src, srcIdx, src, dstIdx, count);      srcIdx += count + 1;      dstIdx += count;      len--;    }  }  if (dstIdx < len)    Array.Copy(src, srcIdx, src, dstIdx, len - dstIdx);  return new string(src, 0, len);}

Cyclic Switching

Use code to implement loops, and use the StringBuilder class to create a new string by relying on the internal optimization of StringBuilder. To avoid interference from this implementation by any other factors, do not call other methods, and avoid Member of the category class by caching local variables. Finally, adjust the buffer size to the appropriate size by setting StringBuilder. Length.

// Code suggested by http://www.codeproject.com/Members/TheBasketcaseSoftware

public static string TrimAllWithLexerLoop(string s) {  int length = s.Length;  var buffer = new StringBuilder(s);  var dstIdx = 0;  for (int index = 0; index < s.Length; index++) {    char ch = s[index];    switch (ch) {      case '\u0020': case '\u00A0': case '\u1680': case '\u2000': case '\u2001':      case '\u2002': case '\u2003': case '\u2004': case '\u2005': case '\u2006':      case '\u2007': case '\u2008': case '\u2009': case '\u200A': case '\u202F':      case '\u205F': case '\u3000': case '\u2028': case '\u2029': case '\u0009':      case '\u000A': case '\u000B': case '\u000C': case '\u000D': case '\u0085':        length--;        continue;      default:        break;    }    buffer[dstIdx++] = ch;  }  buffer.Length = length;  return buffer.ToString();;}

Cyclic character Method

This method is almost the same as the previous cyclic exchange method, but it uses the if statement to call isWhiteSpace (), rather than the messy switch trick :).

public static string TrimAllWithLexerLoopCharIsWhitespce(string s) {  int length = s.Length;  var buffer = new StringBuilder(s);  var dstIdx = 0;  for (int index = 0; index < s.Length; index++) {    char currentchar = s[index];    if (isWhiteSpace(currentchar))      length--;    else      buffer[dstIdx++] = currentchar;  }  buffer.Length = length;  return buffer.ToString();;}

Change the string method in situ (unsafe)

This method uses insecure character pointers and pointer operations to change strings in the same place. I don't recommend this method because it breaks the basic convention of the. NET Framework in production: the string is immutable.

public static unsafe string TrimAllWithStringInplace(string str) {  fixed (char* pfixed = str) {    char* dst = pfixed;    for (char* p = pfixed; *p != 0; p++)      if (!isWhiteSpace(*p))        *dst++ = *p;/*// reset the string size      * ONLY IT DIDN'T WORK! A GARBAGE COLLECTION ACCESS VIOLATION OCCURRED AFTER USING IT      * SO I HAD TO RESORT TO RETURN A NEW STRING INSTEAD, WITH ONLY THE PERTINENT BYTES      * IT WOULD BE A LOT FASTER IF IT DID WORK THOUGH...    Int32 len = (Int32)(dst - pfixed);    Int32* pi = (Int32*)pfixed;    pi[-1] = len;    pfixed[len] = '\0';*/    return new string(pfixed, 0, (int)(dst - pfixed));  }}

Change the string method V2 in situ (unsafe)

This method is almost the same as the previous one, but here we use pointers similar to Arrays for access. I'm curious, but I don't know which of the two types of storage access will be faster.

public static unsafe string TrimAllWithStringInplaceV2(string str) {  var len = str.Length;  fixed (char* pStr = str) {    int dstIdx = 0;    for (int i = 0; i < len; i++)      if (!isWhiteSpace(pStr[i]))        pStr[dstIdx++] = pStr[i];    // since the unsafe string length reset didn't work we need to resort to this slower compromise    return new string(pStr, 0, dstIdx);  }}String.Replace(“”,“”)

This implementation method is naive. Because it only replaces space characters, it does not use the correct definition of space, so it will omit many other space characters. Although it should be the fastest method in this article, it does not provide much functionality.

However, if you only need to remove the real space characters, it is difficult to use pure. NET to write code that is better than string. Replace. Most string methods will be rolled back to manually optimized local C ++ code. String. Replace itself uses comstring. cpp to call the C ++ method:

FCIMPL3(Object*,   COMString::ReplaceString,   StringObject* thisRefUNSAFE,   StringObject* oldValueUNSAFE,   StringObject* newValueUNSAFE)

The following is the benchmarking suite method:

public static string TrimAllWithStringReplace(string str) {  // This method is NOT functionaly equivalent to the others as it will only trim "spaces"  // Whitespace comprises lots of other characters  return str.Replace(" ", "");}

The preceding 10 methods are used to delete blank strings in. NET. I hope they will be helpful for your learning.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.