The subject is not a direct string disassembly,
The scenario is as follows, with a field in the table that represents the set of countries affected by the event, separated by commas, and unfortunately some countries themselves have commas, so how exactly do they find them when normalized?
The following code has certain limitations. But basically enough.
The following code is supported by the analysis function lag and lead as well as the cte,sqlserver2012 and later versions, and Oracle seems to support it more than 10g.
Main ideas:
For the decomposition of a string, you can use the numeric auxiliary table, then cross join brush Copy, and then the root separator where it appears and then cut out the string to the stuff we need. (The recursive CTE I use in the solution to handle finding the corresponding location)
One more step now is to verify and remove parts of the disassembly to the part that does not meet the requirements.
The advantage of using lag and lead is that you do not need to use the self-connection to find the next piece of data.
The principle of the problem is how to connect to the right country, then take long-term, otherwise take short items.
The code is as follows:
--preparing sample tables and dataDrop Tablemy_countries;Drop TableValid_country; Create TableMy_countries (RIDint, COUNTRY_NAME_CCvarchar( $));Insert intoMy_countries (RID,COUNTRY_NAME_CC)Values(1,'china,test, public of');Insert intoMy_countries (RID,COUNTRY_NAME_CC)Values(2,'us, public Of,china,evan, public of'); Create TableValid_country (CIDint, country_namevarchar( -));Insert intoValid_country (Cid,country_name)Values(1,' China');Insert intoValid_country (Cid,country_name)Values(2,'test, public of');Insert intoValid_country (Cid,country_name)Values(3,'Evan, public of');Insert intoValid_country (Cid,country_name)Values(4,'us, public of');Insert intoValid_country (Cid,country_name)Values(5,'Evan');--select * from My_countries;--select * from Valid_country;
The correct result is:
withSplit_country as(SELECTRID,1 asLVL,1 asStartpos,CHARINDEX(',', COUNTRY_NAME_CC+',')-1 asEndpos frommy_countriesUNION AllSELECTSC. RID,LVL+1 asLvl,endpos+2,CHARINDEX(',', COUNTRY_NAME_CC+',', Endpos+2)-1 frommy_countries CCJOINsplit_country SC onCc. RID=SC. RIDWHERE CHARINDEX(','Cc. Country_name_cc+',', Endpos+2)>0), Cte_country as (SELECTRid,lvl,startpos,endpos,lead (Endpos,1) Over(PARTITION byRIDORDER byLVL) asNextendpos fromsplit_country), CTE as (SELECTMC. RID,SC.LVL, Case whenNextendpos is not NULL and EXISTS(SELECT * fromValid_country VCWHEREVc. Country_name= SUBSTRING(Country_name_cc,startpos,nextendpos-Startpos+1)) ThenSUBSTRING(Country_name_cc,startpos,nextendpos-Startpos+1)ELSESUBSTRING(MC. Country_name_cc,startpos,endpos-Startpos+1)END asCountry fromMy_countries MCJOINcte_country SC onMc. RID=SC. RID), Check_valid as (SELECT Case when CHARINDEX(',', LAG (Country,1) Over(PARTITION byRIDORDER byLVL))>0 Then 0 ELSE 1 END asISVALID,* fromCTE)SELECTCV. RID,CV. COUNTRY,VC. Cid fromCheck_valid CVJOINvalid_country VC onCV. Country=VC. Country_name andISVALID=1 ORDER byRID;
Another option, based on the first one, is slightly modified:
withSplit_country as(SELECTRID,1 asLVL,1 asStartpos,CHARINDEX(',', COUNTRY_NAME_CC+',')-1 asEndpos frommy_countriesUNION AllSELECTSC. RID,LVL+1 asLvl,endpos+2,CHARINDEX(',', COUNTRY_NAME_CC+',', Endpos+2)-1 frommy_countries CCJOINsplit_country SC onCc. RID=SC. RIDWHERE CHARINDEX(','Cc. Country_name_cc+',', Endpos+2)>0), Cte_country as (SELECTRid,lvl,startpos,endpos,lead (Endpos,1) Over(PARTITION byRIDORDER byLVL) asNextendpos fromsplit_country), CTE as (SELECTMC. RID,SC.LVL,SUBSTRING(MC. Country_name_cc,startpos,endpos-Startpos+1) asCountry,SUBSTRING(Country_name_cc,startpos,nextendpos-Startpos+1) asCOUNTRY2 fromMy_countries MCJOINcte_country SC onMc. RID=SC. RID)SELECTCTE. RID,VC. COUNTRY_NAME,VC. CID fromCTEJOINvalid_country VC on( Case when EXISTS(SELECT * fromValid_country XWHEREX.country_name=Cte. COUNTRY2) ThenCTE. COUNTRY2ELSECte. CountryEND)=VC. country_name;
sql-a special string decomposition topic