Link: http://www.cnblogs.com/archimedes/p/c-library-ctype.html.
1. Background Knowledge
ctype.h
Is the header file in the C standard function library. It defines a batch of C character classification functions (C character classification functions) to test whether a character belongs to a specific character category, such as letter characters and control characters.
We often sort and divide characters into different categories. To identify a letter, you can write:
if('A' <= c && c <= 'Z' || 'a' <= c && c <= 'z') ......
When the execution character set is ASCII code, you can get the correct result. However, this is not suitable for other character sets.
Similarly, to determine a number, you can write it as follows:
if('0' <= c && c <= '9') ......
To identify the blank space, you can write the code:
if(c == ' ' || c == '\t' || c == '\n') ......
But the problem arises, and we will soon get tired of the Code being filled with judgment statements like this, And the easiest way to think about it is to introduce functions to replace these judgment statements, the following code is displayed:
if(isalpha(c)) ...if(isdigit(c)) ...if(isspace(c)) ...
It seems that the problem has been solved, but considering that a typical text processing program calls such a function three times on average for each character in the input stream, the execution efficiency of the program will be seriously affected.
So we thought of further improvements and considered using macros to replace these functions,
#define isdigit(x) ((x) >= '0' && (x) <= '9')
This causes problems, such as macro parameters.x
Has side effects. For example, if you callisdigit(x++)
Orisdigit(run_some_program())
, Maybe not very obvious,isdigit
Is evaluated twice. Earlier versions of Linux use this method that may lead to mistakes. This article will not repeat the disadvantages of macros.
To ensure security and compact code, one or more macro sets of conversion tables are used for further improvement. Each macro has the following form:
#define _XXXMASK 0x...#define isXXX(c) (_Ctytable[c] & _XXXMASK)
Character c is compiled into the index of the conversion table named "_ Ctytable". Different characters of each table item are characterized by index characters. If any bit corresponding to mask _ XXXXMASK is set, the character will be expanded into a compact non-zero expression for all correct parameters in the test class.
Disadvantages of this method: When a macro parameter is not in its definition field, it will access the storage space outside the conversion table.
2. Content of <ctype. h>
The following table lists the macros defined by <ctype. h>:
isalnum |
Whether it is a letter or Digit |
isalpha |
Letter or not |
islower |
Lowercase letters |
isupper |
Uppercase letter or not |
isdigit |
Digit or not |
isxdigit |
Whether it is a hexadecimal number |
iscntrl |
Whether it is a control character |
isgraph |
Whether it is a graphical character (for example, no space or control character) |
isspace |
Whether it is a space character (including tabs, carriage returns, line breaks, etc) |
isblank |
Whether it is a blank character (C99/C ++ 11 is added) (including horizontal tabs) |
isprint |
Whether it is printable |
ispunct |
Punctuation? |
tolower |
Convert to lowercase |
toupper |
Convert to uppercase |
Standard C from Plauger and Brodie:
3. Implementation of <ctype. h>
In the Ctype of the c standard library of P. J. Plauger version, the conversion table is used to determine whether a character belongs to a certain type.
Take the case of determining whether it is a lowercase letter:
/* Ctype. h */# ifndef _ CTYPE # define _ CTYPE/* _ Ctype conversion bit */# define _ XA 0x200/* extra alphabetic */# define _ XS 0x100/ * extra space */# define _ BB 0x80/* BEL, BS, etc. */# define _ CN 0x40/* CR, FF, HT, NL, VT */# define _ DI 0x20/* '0'-'9' */# define _ LO 0x10/* 'a'-'Z '*/# define _ PU 0x08/* punctuation */# define _ SP 0x04/* space */# define _ UP 0x02/* 'a'-'Z '* /# define _ XD 0x01/* '0'-'9 ', 'A'-'F', 'a'-'F' * // * declare the external _ Ctype conversion table */extern const short * _ Ctype; /* determine whether the macro islower is a lowercase letter */# define islower (c) (_ Ctype [(int) (c)] & _ LO) // The rest are omitted... # endif
_ Ctype conversion table:
/* Xctype. c _ Ctype conversion table -- ASCII */# include <limits. h> # include <stdio. h> # include "ctype. h" # if EOF! =-1 | UCHAR_MAX! = 255 # error wrong ctype table # endif/* combination bit */# define XDI (_ DI | _ XD) # define XLO (_ LO | _ XD) # define XUP (_ UP | _ XD)/* conversion table */static const short ctype_tab [257] = {0,/* EOF */_ BB, _ BB, _ BB, _ CN, _ BB, _ BB, _ BB, _ SP, _ PU, _ PU, XDI, XDI, _ PU, _ PU, XUP, _ UP, _ UP, _ UP, _ PU, XLO, XLO, XLO, _ LO, _ LO, _ LO, _ PU, _ PU, _ BB,}; const short * _ Ctype = & ctype_tab [1];
Here is an example:
When determining whether 'A' is a lowercase letter, use the macro islower to replace it with the macro, that is, execute (_ Ctype [(int) (c)] & _ LO)
After preprocessing, assuming that the current c is 'A', it becomes: (_ Ctype [(int) ('A')] & _ LO)
The value of 'A' is 97, so the following is: (_ Ctype [97] & _ LO)
The conversion macro of _ Ctype [97] is _ LO. By checking the _ Ctype conversion table, the value of _ LO is 0x10, so the final result is:
(_ LO & _ LO) ----> 0x10 & 0x10 ----> 1, indicating that the current character is a lowercase letter
Other character judgments can be obtained through a similar replacement and.
Attached to the ctype. h implementation in the Linux kernel, the basic principles are similar
# Ifndef _ LINUX_CTYPE_H # define _ LINUX_CTYPE_H/** NOTE! This ctype does not handle EOF like the standard C * library is required. */# define _ U 0x01/* upper */# define _ L 0x02/* lower */# define _ D 0x04/* digit */# define _ C 0x08/* cntrl */# define _ P 0x10/* punct */# define _ S 0x20/* white space (space/lf/tab) */# define _ X 0x40/* hex digit */# define _ SP 0x80/* hard space (0x20) */extern const unsigned char _ ctype []; # define _ ismask (x) (_ ctype [(in T) (unsigned char) (x)]) # define isalnum (c) (_ ismask (c) & (_ U | _ L | _ D ))! = 0) # define isalpha (c) (_ ismask (c) & (_ U | _ L ))! = 0) # define iscntrl (c) (_ ismask (c) & (_ C ))! = 0) # define isdigit (c) (_ ismask (c) & (_ D ))! = 0) # define isgraph (c) (_ ismask (c) & (_ P | _ U | _ L | _ D ))! = 0) # define islower (c) (_ ismask (c) & (_ L ))! = 0) # define isprint (c) (_ ismask (c) & (_ P | _ U | _ L | _ D | _ SP ))! = 0) # define ispunct (c) (_ ismask (c) & (_ P ))! = 0)/* Note: isspace () must return false for % NUL-terminator */# define isspace (c) (_ ismask (c) & (_ S ))! = 0) # define isupper (c) (_ ismask (c) & (_ U ))! = 0) # define isxdigit (c) (_ ismask (c) & (_ D | _ X ))! = 0) # define isascii (c) (unsigned char) (c) <= 0x7f) # define toascii (c) (unsigned char) (c )) & 0x7f) static inline unsigned char _ tolower (unsigned char c) {if (isupper (c) c-= 'a'-'A'; return c ;} static inline unsigned char _ toupper (unsigned char c) {if (islower (c) c-= 'a'-'A'; return c ;} # define tolower (c) _ tolower (c) # define toupper (c) _ toupper (c)/** Fast implementation of tolower () for internal usage. do not use in your * code. */static inline char _ tolower (const char c) {return c | 0x20;} # endifCtype. h/** linux/lib/ctype. c *** Copyright (C) 1991,199 2 Linus Torvalds */# include <linux/ctype. h> # include <linux/compiler. h> # include <linux/export. h> const unsigned char _ ctype [] = {_ C, _ C, /* 0-7 */_ C, _ C | _ S, _ C | _ S, _ C, _ C,/* 8-15 */_ C, _ C, /* 16-23 */_ C, _ C, /* 24-31 */_ S | _ SP, _ P, /* 32-39 */_ P, _ P, /* 40-47 */_ D, _ D, /* 48-55 */_ D, _ D, _ P, /* 56-63 */_ P, _ U | _ X, _ U | _ X, _ U | _ X, _ U,/* 64-71 */_ U, _ U, /* 72-79 */_ U, _ U, /* 80-87 */_ U, _ P, /* 88-95 */_ P, _ L | _ X, _ L | _ X, _ L | _ X, _ L,/* 96-103 */_ L, _ L, /* 104-111 */_ L, _ L, /* 112-119 */_ L, _ P, _ C,/* 120-127, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,/* 128-143 */0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 144-159 */_ S | _ SP, _ P, _ P,/* 160-175 */_ P, _ P, _ P, /* 176-191 */_ U, _ U, _ U,/* 192-207 */_ U, _ U, _ U, _ P, _ U, _ L,/* 208-223 */_ L, _ L, _ L, _ L,/* 224-239 */_ L, _ L, _ P, _ L, _ L, _ L};/* 240-255 */EXPORT_SYMBOL (_ ctype );Ctype. c references
C Standard Library
Standard C by Plauger and Brodie