Because I was a newbie in conong, I haven't officially started writing large engineering code. So the old employee gave me a project code for a large PHP project last year. Let's take a look at it first. In the afternoon, I met
The shtmlspecialchars () function is used by many people on the Internet. However, it is not provided by PHP, but not officially written. However, the regular expression in this section makes me tangle with each other. If I don't talk nonsense, let's get into the question.
[Php]
Function shtmlspecialchars ($ string ){
If (is_array ($ string )){
Foreach ($ string as $ key => $ val ){
$ String [$ key] = shtmlspecialchars ($ val );
}
} Else {
$ String = preg_replace ('/& amp; (# (\ d {} | x [a-fA-F0-9] {4 }) | [a-zA-Z] [a-z0-9] {2, 5});)/',' & \ 1 ',
Str_replace (array ('&', '"', '<', '>'), array ('& amp;', '& quot;', '& lt; ',' & gt; '), $ string ));
}
Return $ string;
}
The above is the definition of the shtmlspecialchars () function. If you do not want to talk about it, many people are worried about it.
[Php]
$ String = preg_replace ('/& (# (\ d {3, 5} | x [a-fA-F0-9] {4 }) | [a-zA-Z] [a-z0-9] {2, 5});)/',' & \ 1 ',
Str_replace (array ('&', '"', '<', '>'), array ('&', '"', '<', '> '), $ string ));
Here we will first introduce the functions of this function:
Escape the four special characters that may appear in html:
& Convert & amp;
"To & quot;
<Switch & lt;
> Convert & lt; (ps: The semicolon ";" behind this is connected together, a whole, not used by the author for separation)
This is the opposite of htmlspecialchars () in PHP.
In general, the following code is used to implement the function.
[Php]
Str_replace (array ('&', '"', '<', '>'), array ('&', '"', '<', '> '), $ string ));
But wait!
Q: What are you waiting? Have you completed this function?
A: Yes, it's a big mistake. It's really wrong. You 'd rather kill 3000 and never let it go.
Q: What is the error?
A: The following content is displayed!
If we only use the above functions, the special characters in html and unicode encoding will be destroyed. This is not the result. For details, see the attachment below the article.
Someone observed all the data in the orders table and finally came to the following conclusion:
1. special characters in html are strings consisting of 3-5 numbers or 1 character and 2-5 characters or numbers after the start & #
2. unicode encoding is a string consisting of 4 hexadecimal numbers starting.
According to the first one, we should write the regular expression: & #/d {3, 5} | [a-zA-Z] [a-zA-Z0-9] {}; (ps: this is also the built-in Semicolon ";)
According to the second, we can get & # [a-fA-F0-9] {4}; (ps: Because hexadecimal is from 0-f)
As the previous operation has replaced & with & amp;, the following is displayed in the above two integrations:
/& Amp; (# (\ d {3, 5} | x [a-fA-F0-9] {4}) | [a-zA-Z] [a-z0-9] {2, 5 });)/
Question 1:
Someone asked, can it be written as follows?
/& Amp; # (\ d {} | x [a-fA-F0-9] {4}) | [a-zA-Z] [a-z0-9 });)/
Yes, of course. But if you want to write it like this, I will mention it later and make some changes.
Step 1
[Php]
Str_replace (array ('&', '"', '<', '>'), array ('&', '"', '<', '> '), $ string ));
The result is written as $ string.
Then we can simply write it as a reverse replacement.
Preg_replace ('/& amp; (# (\ d {3, 5} | x [a-fA-F0-9] {4 }) | [a-zA-Z] [a-z0-9] {2, 5});)/',' & \ 1', $ string)
Here, the regular expression above is quite clear, but the author is confused by & \ 1. What does it mean?
It is verified that \ 1 represents the content in the first bracket of the regular expression.
I wrote a test myself.
[Php]
<? Php
$ String = 'x10p ';
$ String1 = preg_replace ('/(x) ([0-9] +) p/', '& \ 1', $ string );
$ String2 = preg_replace ('/x ([0-9] +) p/', '& \ 1', $ string );
Echo $ string1;
Echo '<br/> ';
Echo $ string2;
?>
The output results are as follows:
In & x, x is enclosed in brackets.
& 10 the first parenthesis is 10.
[Php]
Preg_replace ('/& (# (\ d {3, 5} | x [a-fA-F0-9] {4}) | [a-zA-Z] [a-z0-9] {2, 5 });) /',' & \ 1', $ string)
The result is to replace $ amp; with &, while the subsequent values remain unchanged.
This can solve the above problem 1. Can I take out #? If you take out #, it will replace & amp; #, then you have to write '& # \ 1' next to it, so you can, but do you feel it?
How can this problem be solved? Yes!
Appendix:
Html sequence table
Special symbol naming entity decimal encoding special symbol naming entity decimal encoding
Latency & Alpha; & #913; latency & Beta; & #914; Gamma & Gamma; & #915;
Delta & Delta; & #916; Middle & Epsilon; & #917; Middle & ETA; & #918;
Expire & Eta; & #919; then & Theta; & #920; then & Iota; & #921;
Role & Kappa; & #922; Lambda & Lambda; & #923; Role & Mu; & #924;
Region & Nu; & #925; Region & Xi; & #926; Region & Omicron; & #927;
Gini & Pi; & #928; Gini & ROV; & #929; Σ & Sigma; & #931;
Round & Tau; & #932; Round & Upsilon; & #933; Phi & Phi; & #934;
Region & Chi; & #935; Region & Psi; & #936; Ω & Omega; & #937;
α & alpha; & #945; β & beta; & #946; gamma & gamma; & #947;
Delta & delta; & #948; ε & epsilon; & #949; ε & ETA; & #950;
Eta & eta; & #951; θ & theta; & #952; Middle & iota; & #953;
Kappa & kappa; & #954; λ & lambda; & #955; μ & mu; & #956;
ν & nu; & #957; ε & xi; & #958; large & omicron; & #959;
π & pi; & #960; P & ROV; & #961; large & sigmaf; & #962;
σ & sigma; & #963; tau & tau; & #964; Round & upsilon; & #965;
Phi & phi; & #966; chi; & #967; psi & psi; & #968;
ω & omega; & #969; Middle & thetasym; & #977; Middle & upsih; & #978;
Middleware & piv; & #982; • & bull; & #8226 ;... & Amp; hellip; & amp; #8230;
'& Prime; & #8242; "& Prime; & #8243; queue & oline; & #8254;
Watermark & frasl; & #8260; watermark & weierp; & #8472; watermark & image; & #8465;
Latency & real; & #8476;™& Trade; & #8482; balance & alefsym; & #8501;
Region & larr; & #8592; Region & uarr; & #8593; → & rarr; & #8594;
Middleware & darr; & #8595;↔& Harr; & #8596; Small & crarr; & #8629;
Region & lArr; & #8656; Region & uArr; & #8657; Region & rArr; & #8658;
Region & dArr; & #8659; Region & hArr; & #8660; Region & forall; & #8704;
Parts & part; & #8706; Parts & exist; & #8707; Parts & empty; & #8709;
Region & nabla; & #8711; ε & isin; & #8712; Region & notin; & #8713;
Latency & ni; & #8715; latency & prod; & #8719; Σ & sum; & #8722;
− & Minus; & #8722; lower & lowast; & #8727; √ & radic; & #8730;
Latency & prop; & #8733; ∞ & infin; & #8734; latency & ang; & #8736;
Between & and; & #8869; between & or; & #8870; between & cap; & #8745;
Duration & cup; & #8746; duration & int; & #8747; duration & there4; & #8756;
Watermark & sim; & #8764; watermark & cong; & #8773; ≈ & asymp; & #8773;
==& Ne; & #8800; bytes & equiv; & #8801; ≤& le; & #8804;
≥& Ge; & #8805; Region & sub; & #8834; Region & sup; & #8835;
Region & nsub; & #8836; Region & sube; & #8838; Region & supe; & #8839;
Region & oplus; & #8853; Region & otimes; & #8855; Region & perp; & #8869;
Counter & sdot; & #8901; counter & lceil; & #8968; counter & rceil; & #8969;
Activities & lfloor; & #8970; Activities & rfloor; & #8971; Activities & loz; & #9674;
♠& Amp; spades; & amp; #9824;♣& Amp; clubs; & amp; #9827;♥& Hearts; & amp; #9829;
♦& Diams; & #9830; & nbsp; & #160; explain & iexcl; & #161;
Middle & cent; & #162; £& pound; & #163; Middle & curren; & #164;
¥ & Yen; & #165; Clerk & brvbar; & #166; § & sect; & #167;
? & Uml; & #168;©& Copy; & #169; bytes & ordf; & #170;
«& Laquo; & #171; Region & not; & #172; & shy; & #173;
®& Reg; & #174; large & macr; & #175; ° & deg; & #176;
± & Plusmn; & #177; ² & sup2; & #178; large & sup3; & #179;
'& Acute; & #180; µ& micro; & #181 "& quot; & #34;
<& Lt; & #60;> & gt; & #62; '& #39;
Author: wolinxuebin