For a given chemical formula represented by a string, count the number of atoms of each element contained in the molecule and return an object.
1Water ='H2O' 2 parse_molecule (water)3 #return {h:2, o:1}4 5Magnesium_hydroxide ='Mg (OH) 2'parse_molecule (magnesium_hydroxide)6 #return {mg:1, o:2, h:2}7 8var Fremy_salt ='K4[on (SO3) 2]2' 9 parse_molecule (Fremysalt)Ten #return {k:4, o:14, N:2, s:4}
The main idea is to convert the molecular expression into an atom (dictionary representation), the difficulty is 3kyu on the codewars, the difficulty lies in the analysis of various conditions, to prevent cross-border, there are various restrictions in the formula.
My idea is probably to put square brackets, curly braces are converted to parentheses first, then the most inner layer, and then the outer brackets expand, and finally get a non-bracket expression, which is good to deal with. Here is the question of finding the most inner brackets, which I understand is to find the first ') ', and then look forward to the corresponding ' (', with the expanded results instead of ' (...) 2 ', I use 2 instead of the number behind the brackets, it is possible that the number is 1, naturally omitted, we will be in the conversion process of 1. In the final processing, we should also note that 1 is omitted, need to be calculated when added.
The code is as follows:
1 defParse_molecule (Formula):2Formula_dict = {}3 #Replace []{} to ()4 forBracketinch '[{':5Formula = Formula.replace (bracket,'(')6 forBracketinch ']}':7Formula = Formula.replace (bracket,')')8 9 if '(' inchformula:TenHas_bracket =True One Else: AHas_bracket =False - whileHas_bracket: - #looking for the inner layer () the forIinchRange (len (formula)): - ifFormula[i] = =')': - Break - forJinchRange (len (formula[:i))-1,-1, 1): + ifFORMULA[J] = ='(': - Break + #If there is an omission of 1, fill up the A ifi+1 = Len (Formula)or notFormula[i+1].isdigit (): atSub_formula = formula[j:i+1] - #to prevent subsequent replace errors, a temporary variable is set, otherwise - #if direct Sub_formula = formula[j:i+1] + ' 1 ' - #Sub_formula becomes a substring that is not in the formula, does not execute - #This is going to go on all the time. -TMP = Sub_formula +'1' in Else: -Sub_formula = formula[j:i+2] toTMP =Sub_formula +Parsed_sub_formula =Parse_paren (TMP) -Formula =formula.replace (Sub_formula, Parsed_sub_formula) the if '(' inchformula: *Has_bracket =True $ Else:Panax NotoginsengHas_bracket =False - #Processing of non-() Molecular Expressions thei =0 + whileI <Len (Formula): Aj = i+1 the ifJ < Len (Formula) andformula[j].islower (): +J + = 1 -TMP =Formula[i:j] $ #attention to the processing of the boundary prevents J from crossing $ #I have a small bug here, I assume that the atomic subscript is up to two bits, if three bits appear - #will take the third position as an element and subscript 1 . - #I didn't expect it to pass. the ifJ < Len (Formula) andformula[j].isdigit (): -K = j+1Wuyi ifK < Len (formula) andformula[k].isdigit (): theFORMULA_DICT[TMP] = formula_dict.get (tmp, 0) + int (formula[j:k+1]) -i = k+1 Wu Else: -FORMULA_DICT[TMP] = formula_dict.get (tmp, 0) +Int (formula[j]) Abouti = j+1 $ elifJ < Len (Formula) andformula[j].isupper (): -FORMULA_DICT[TMP] = formula_dict.get (tmp, 0) + 1 -i =J - elifj = =Len (Formula): AFORMULA_DICT[TMP] = formula_dict.get (tmp, 0) + 1 + Break the - returnformula_dict $ the defParse_paren (sub_formula): theresult = {} thetimes = Int (sub_formula[-1]) thei = 1 - whileI < Len (Sub_formula)-2: inj = i+1 the ifsub_formula[j].islower (): theJ + = 1 AboutTMP =Sub_formula[i:j] the ifsub_formula[j].isdigit (): theK = j+1 the #It is also assumed that the atom is labeled as a maximum of two bits + ifK < Len (Sub_formula)-2 andsub_formula[k].isdigit (): -RESULT[TMP] = result.get (tmp, 0) + int (sub_formula[j:k+1]) * Times thei = k+1Bayi Else: theRESULT[TMP] = result.get (tmp, 0) + int (sub_formula[j]) * Times thei = j+1 - elifSub_formula[j].isupper ()orSUB_FORMULA[J] = =')': -RESULT[TMP] = result.get (tmp, 0) + 1* Times thei =J the thet = [] the forKey, ValinchResult.iteritems (): - t.append (Key) the t.append (str (val)) the return "'. Join (t) the 94 #when the test was deliberately added some messy molecular expressions, but also in line with the rules the PrintParse_molecule ('K4[on (SO3) 2]2') the PrintParse_molecule ('(H2O) H10') the PrintParse_molecule ('(OH123) 2')
Although also passed, but the code of the bug has time to change (do not know when, anyway, was tortured, next time ...) The level is too poor)
But it seems to be better with regular expressions, then stay tuned ...
Molecule to atoms