Recently, when using PHP for structured search, it was found that smiles could not be queried and then thought of the conversion to a molecular formula. It is mainly used in organic matter.
Difficulty one: I use regular matching c,o on smiles. The problem is that other elements also have C letters, so C is not properly removed
Solution: I only use the original smiles to analyze the composition of organic matter, and then I follow the structure of organic matter, alone to find out the number of c,o, the rest of the elements, simple statistics put in the back of the good. Mainly divided into three parts, c number, o number, other elements
1. The smiles obtained at the front desk complies with the relevant rules
2.php processing
$Cnum = '; $Onum = ';//print ($smiles. " Original "); $find =array (" = "," # ",". "," 1 "," [","] "," (",") "); $replace = array (" "); $smiles =str_replace ($find, $ Replace, $smiles, $j);//print ($j);//echo '/n '; $ChemElement = array ("Li", "Be", "Na", "Mg", "Al", "Si", "Cl", " Br "," Ca "," Cr "," Mn "," Fe "," Co "," Ni "," Cu "," Zn "," Ga "," Gc "," Ag "," Au "); foreach ($ChemElement as $value) {// Print ($value), $k _x=substr_count ($smiles, $value), if ($k _x>0) {$k _x= $k _x==1? ': $k _x;//str_replace ($value, ', $ smiles); $smiles _new.= $value. $k _x;}} $k _c=substr_count ($smiles, ' C ');//print (' Number of C '. $k _c); $i _c=preg_match_all ('/c[a-z]/m ', $smiles);//print (' non-carbon number '. $i _ c); $j _c= $k _c-$i _c;//print_r (' Carbon number ' $j _c);//$smiles =preg_replace ('/c[0-9a-z/.] /M ', ', $smiles, -1, $count);//print ($smiles);//print (' Number of replacements ' $count) $k_o=substr_count ($smiles, ' o ');//print (' O ' $k _o); $i _o=preg_match_all ('/o[a-z]/m ', $smiles);//print (' non-oxygen number ' $i _o) ; $j _o= $k _o-$i _o;//$smiles =preg_replace ('/c[0-9a-z/.] /M ', ', $smiles, -1, $count);//print ($smiles);//print (' Number of replacements ' $count); if ($j _c>0) {$j _c= $j _c==1? ': $j _c; $Cnum = ' C ' . $j _c;} if ($j _o>0) {$j _o= $j _o==1? ': $j _o; $Onum = ' o '. $j _o;} $smilesPara = $Cnum. $Onum. $smiles _new; results: Basically can solve the general meaning of the molecular formula, of course, I do not write all the elements, I think commonly used to write a good, is to search, not common substances, Not in the chemical library either. It is good to recommend the PHP search engine for sphinx here.
This article is from the "one-stop solution" blog, so be sure to keep this source http://10725691.blog.51cto.com/10715691/1940277
PHP Conversion smiles to molecular formula