I encountered a problem when I was doing a personal App-caykanji:
How to incrementally update the kanji data from the server to the app side, that is, each time the user performs an update operation, only the content that is newer than the local cache is obtained.
Data formatIn order to be able to seamlessly integrate with MongoDB, and eliminate the hassle of writing background code, simply save the Chinese character data into a JSON fil
Author: Bluedoor
Original address: http://www.anbbs.com/anbbs/index.php?f_id=3page=1
These two days are doing a keyword highlighting the program, write a good program in the local test also run well, but a page on a pile of garbage, don't say add light, it is no look!
I find the wrong, find out, find English no problem, encountered Chinese characters prone to problems, sometimes encountered Chinese characters will be a problem.
To summarize:
When using pattern matching, such as: Preg_match_all (
The first method: regular expressionsString text = "is not a Chinese character";for (int i = 0; i {if (Regex.IsMatch (text. ToString (), @ "[\u4e00-\u9fa5]+$"))Console.WriteLine ("is Kanji");ElseConsole.WriteLine ("Not kanji");}Console.readkey ();The second method: UNICODE encoding range of Chinese charactersstringText ="is not kanji";Char[] C =text. ToCharArray
Chinese character coding knowledge points ASCII code is a western European code, the use of 7-bit encoding, so it is 2^7=128, a total of 128 conceited, including 34 characters, (such as line LF, enter CR, etc.), the remaining 94 are English letters and punctuation and arithmetic symbols. In the computer, an ASCII code occupies 8 bits, and the highest bit (BIT7) is used as a parity check. Odd Check rule: Correct code in one byte the number of 1 must be odd, if not odd, the highest bit b7 to f
://wiki.op.xywy.com/download/attachments/5736618 /c.jpg?version=1modificationdate=1432021367000api=v2 "alt=" c.jpg?version=1modificationdate= 1432021 "/>View the amount of data inserted after inserting data650) this.width=650; "class=" Confluence-embedded-image "src=" http://wiki.op.xywy.com/download/attachments/5736618 /%e6%8f%92%e5%85%a5.jpg?version=2modificationdate=1432022834000api=v2 "alt="%E6%8F%92%E5%85%A5.jpg? Version=2modific "/>View the bytes consumed by the dataInsert Front Table650)
Chinese characters in PHP may be some friends feel very simple, but in use will find in the GBK encoding and UFT8 coding may be a little different oh, the following small part to introduce.
Chinese character regularization under GBK coding1. Determine if the strings are all Chinese characters
Copy CodeThe code is as follows:
$str = ' All are kanji tests ';
if (Preg_match_all ("/^ ([X81-xfe][x40-xfe]) +$/", $str, $match)) {
Echo ' All is
1.GBK Code Position distribution map2.GBK Code bit descriptionGBK also uses double-byte representation, the overall encoding range is 8140-fefe, the first byte between the 81-fe, the tail byte between 40-fe, culling xx7f a line. A total of 23,940 code positions, a total income of 21,886 Chinese characters and graphic symbols, including Chinese characters (including radicals and components) 21,003, graphic symbols 883.all encodings are divided into three parts:1. Chinese character area. Includes:
? We can ask ourselves, is the object of our complaint mistaken?Before proceeding, define several concepts:Clear Concept 0:
"I am Kanji" is a string in the C language, which is a narrow string of char type. The above example can be written as
const char * str = "I am a kanji"; QString a= str;OrChar str[] = "I am a Chinese character"; QString a= str;such asClear Concept 1:
The source f
967 295 byte maximum text data ------------------------------------------------------------------------------------ The following information is required to create a MySQL data table: Table nametable field NameDefine each table field National Standard GB2312: one kanji = 2 bytesUTF-8: One kanji = 3 bytes (general)First to determine the MySQL version4.0 version, varchar (50), refers to 50 bytes, if stored U
§orderby== ' scores ' ranked by score§orderby= ' id ' sorted by article ID§orderby= ' Rand ' randomly obtains a list of documents for the specified conditionidlist = ' Extract specific document (document ID) call the specified ID document example idlist = ' 4,45,78,237 'Limit= ' Start ID, number of records ' (starting from 0) indicates a limited range of records (for example: limit= ' + ' means starting from the record with ID 1, taking 2 records)keyword= "A list of documents with the specified
Recently seen a piece of code:
"; ? >
On the internet to check a lot of information, but for the PHP regular expression pattern modifier u really some do not understand, solve ah ...
Reply to discussion (solution)
The u:unicode abbreviation, which indicates that the string to be matched is a string conforming to the Unicode encoding rules, such as a utf-8 encoded stringUnder the U modifier, a kanji is treated as a character. \w has the or
Common PHPCMS Tag calls1, call the system single data(Call the information with ID 1, the title length of not more than 25 kanji, display the update date):
"SELECT * from Phpcms_content where contentid=1"/}
Title: {str_cut ($r [title],)} url:{$r [URL]}
Update Date: {date (' y-m-d ', $r [UpdateTime])}
2, call the system more than one data(Call column ID 1 through the audit of 10 information, the title length of not mo
encoding only supported 1-3 bytes, only support the BMP part of the Unicode coding area, BMP from where to?Poke here basically is 0000 ~ FFFF this area.Starting with MySQL 5.5, 4 byte UTF encoded UTF8MB4 can be supported, and a character can have a maximum of 4 bytes, so more character sets can be supported.utf8mb4 is a superset of utf8TF8MB4 is compatible with UTF8 and can represent more characters than UTF8.As for when to use, to see what you do the project ...When you do a mobile app, you wi
theoretically 65535 bytes, the encoding is GBK, each character is up to 2 bytes, the maximum length cannot exceed 32,766 characters; if the encoding is UTF8, Each character can be up to 3 bytes long and the maximum length cannot exceed 21,845 characters, that is, no matter the letter, number or kanji, only 21,785 are storedArticle reference: http://www.cnblogs.com/sochishun/p/7026762.htmlFor example:1 Chinese with UTF8 encoding is 3 bytes (byte), enc
, which effectively reduces the size of the database file. The varchar type of the MySQL database is in versions below 4.1, nvarchar (characters that store Unicode data types), whether a character or a Chinese character, are stored as 2 bytes, which is generally used as input in Chinese or other languages, which is not easily garbled; varchar: The kanji is 2 bytes, the other character designators is 1 bytes, and varchar is suitable for inputting Engli
kanji is 2 bytes, the other character designators is 1 bytes, and varchar is suitable for inputting English and numerals. 4.0 versions, varchar (20), refers to 20 bytes, if stored UTF8 kanji, can only save 6 (each Kanji 3 bytes), 5.0 version above, varchar (20), refers to 20 characters, regardless of the number of storage, Letters or UTF8
store Unicode data types), whether a character or a Chinese character, are stored as 2 bytes, which is generally used as input in Chinese or other languages, which is not easily garbled; varchar: The kanji is 2 bytes, the other character designators is 1 bytes, and varchar is suitable for inputting English and numerals. 4.0 versions, varchar (20), refers to 20 bytes, if stored UTF8 kanji, can only save 6 (
# int 0 ~ More than 4 billion# Mediumint 0 ~ more than 16 million# smallint 0 ~ 65535# tinyint 0 ~ 255# varchar (5) and char (5) What's the difference?# varchar (5) Can I save ' abcdef '? The same cannot be saved!# as stored ' abc ' char (5), 5 characters, byte: gbk/Kanji * * utf8/Chinese characters# in the case of ' ABC ' varchar (5), 4 characters (it will be added by default), Bytes: gbk/Kanji * * utf8/
Label:Turn from:Http://www.cnblogs.com/doit8791/archive/2012/05/28/2522556.htmlVariations of the 1.varchar typeThe maximum length of the MySQL database's varchar type is limited to 255 in versions under 4.1, and its data range can be 0~255 or 1~255 (depending on the database of the different versions). In versions above MySQL5.0, the varchar data type is supported to 65535, which means that 65,532 bytes of data can be stored, and the starting and ending bits take up 3 bytes, which means Data tha
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.