Discover chinese punctuation marks, include the articles, news, trends, analysis and practical advice about chinese punctuation marks on alibabacloud.com
In the recent retrieval work, the user needs to input the request and return the result. Because it is difficult to predict the user's input content, if no filtering is added, an exception may be thrown during the retrieval phase.
User input may have different styles. The most common is the input with various punctuation marks. How to filter?
I saw a post on the csdn Forum. After combining it, I got the fo
limited number of;
What is the fourteen punctuation Marks in 中文版 Grammar?Period: full stop; comma: comma; colon: colon; semicolon: semicolon; Prime: an apostrophe, such as a first-order derivative in mathematical analysis f′ (x) F ' (x) underscore: underline; ellipsis: ellipsis; Exclamation mark: exclamation point; dash: horizontal line; hyphen: hyphen; quotation Marks
Some punctuation marks in ⒈ Chinese are not in English.
(1) comma (,): it is used to separate the parallel components in sentences in Chinese. There is no comma in English. For example:She slowly, carefully, deliberately moved the box.Note: In a similar case, add and after the last comma. This comma can also be omitte
public class Text {Java one character judging all is Chinese including punctuationPrivate static Final Boolean Ischinese (char c) {Character.unicodeblock UB = Character.UnicodeBlock.of (c);if (UB = = Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS|| UB = = Character.UnicodeBlock.CJK_COMPATIBILITY_IDEOGRAPHS|| UB = = Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A|| UB = = Character.UnicodeBlock.GENERAL_PUNCTUATION|| UB = = Character.UnicodeBlo
. Unicodeblock is a simple range of values (some of which may have "empty numbers" of characters that have not been assigned).
2. Characters in a unicodescript may be dispersed in multiple unicodeblock;
3. A character in a unicodeblock may be drawn into multiple unicodescript.
Distinguish Chinese punctuation marks.
Because the
[Daily] json_encode Differences between Chinese and quotation marks, json_encode quotation marks
Json_encode ()1. By default, all characters other than ASCII characters that can be displayed are escaped as Unicode.If the characters are escaped as Unicode, no garbled characters will occur regardless of whether the file encoding is consistent. Therefore, converting
characters encoded by the UTF-8 may be 1 ~ It consists of three bytes. The specific number can be determined by the first byte. (Theoretically it may be longer, but it is assumed that the length cannot exceed 3 bytes)
Encoding method of UTF-8
The first byte is greater than 224, which together with the second byte after it forms a UTF-8 character
The first byte is greater than 192 less than 224, and it is a UTF-8 character with the first byte after it
Otherwise, the first byte is an English
Due to the need for naming in the code, I have compiled a common Chinese-English table of punctuation marks.
Punctuation in Chinese and English tables
Symbol
English
Chinese
.
Period or full sto
in the work often encountered a lot of special punctuation, such as Chinese punctuation, English punctuation. English punctuation is easier to filter, while filtering Chinese punctuation
Mac OS X Java based programs (such as IntelliJ idea, jEdit, etc.) will appear in the Chinese punctuation input invalid problem, in the Chinese input method state, you can enter the text, but the last to enter Chinese punctuation is the English
Before the Chinese word segmentation statistics, often have to crawl down the text contained in some of the tags, punctuation, English letters, such as filtering out, this process is called data cleansing.#Coding=utf-8ImportReImportCodecsdefstrs_filter (file): With Codecs.open (file,"R","UTF8") as F,codecs.open ("Result.txt","A +","UTF8") as C:lines=F.readlines () forLineinchlines:#line=line.decode (' UTF8
Can be written like this string @" [\u4e00-\u9fa5]| [\(\)\《\》\——\;\,\。 \ "\" \";The first half represents the matching Chinese characters, and the second half is the punctuation that needs to be matched.OtherFor the processing of HTML source code, it is recommended to use Htmlagilitypack, with the following code to remove the script, style or comment content. Public StaticHTMLDocument Initializehtmldoc (str
Mac OS X Java-based programs (such as IntelliJ idea, jEdit, etc.) will appear invalid Chinese punctuation input, in the Chinese input method state, you can enter the Chinese characters, but the input punctuation at the end of the English
This paper illustrates the method of JS matching Chinese punctuation mark. Share to everyone for your reference, specific as follows:
The screenshot of the running effect is as follows:
The specific code is as follows:
PS: Here again for you to provide 2 very convenient regular expression tools for your reference to use:
JavaScript Regular expression online test tool:Http://tools.jb51.net
To you QQ cloud Input Method software users to detailed analysis to share the QQ cloud input method in the Chinese/English punctuation switch shortcut keys.
Share list:
QQ Cloud Input method of the Chinese/English punctuation switch shortcut key is CTRL + ...
Well, the above information is small make
the display, a Chinese character or a Japanese-Korean length of 1, an English character length of 0.5** @param string s need to get length strings* The string length obtained @return int*/public static double GetLength (String s) {Double valuelength = 0;String Chinese = "[\u4e00-\u9fa5]";Gets the length of the field value, if it contains Chinese characters, the
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.