Python data cleansing CVS with Chinese characters

Source: Internet
Author: User

Data cleansing, using Python data to clean the CVS with Chinese characters, the intention is to use a dictionary corresponding to Chinese characters, that is, the key value is the Chinese characters, value is index, self-increment can be, using the dictionary data structure does not duplicate the key value of the attribute, the Chinese characters are mapped to the value index.

The Python code is as follows: (CSV format for data)

Import CSV

Dict2 = {} #C
Dict4 = {} #E
Dict25 = {} #z
Dict26 = {} #AA
Dict27 = {} #AB
Dict37 = {} #AL
Dict38 = {} #AM
DICT40 = {} #AO
dict41 = {} #AP
Dict42 = {} #AQ
Dict45 = {} #AT
dict49 = {} #AX
index = 0
Flag = False

# print (row[2],dict[row[2]])

With open ("E:/yuce/test.csv", ' w+ ', newline= ") as Csv_file_write:
writer = Csv.writer (csv_file_write)
With open (' E:/yuce/b.csv ', ' R ', newline= ') as Csv_file_read:
Reader = Csv.reader (csv_file_read)
For row in reader:
if (flag):
DICT2[ROW[2]] = Index
DICT4[ROW[4]] = Index
DICT25[ROW[25]] = Index
DICT26[ROW[26]] = Index
DICT27[ROW[27]] = Index
DICT37[ROW[37]] = Index
DICT38[ROW[38]] = Index
DICT40[ROW[40]] = Index
DICT41[ROW[41]] = Index
DICT42[ROW[42]] = Index
DICT45[ROW[45]] = Index
DICT49[ROW[49]] = Index
ROW[2] = dict2[row[2]]
ROW[4] = dict4[row[4]]
ROW[25] = dict25[row[25]]
ROW[26] = dict26[row[26]]
ROW[27] = dict27[row[27]]
ROW[37] = dict37[row[37]]
ROW[38] = dict38[row[38]]
ROW[40] = dict40[row[40]]
ROW[41] = dict41[row[41]]
ROW[42] = dict42[row[42]]
ROW[45] = dict45[row[45]]
ROW[49] = dict49[row[49]]
index = index + 1
Writer.writerow (Row)
Flag = True
Csv_file_read.close ()
Csv_file_write.close ()

Print (' done! ')


The above example is real data processing, with 200 columns of properties and 30,000 data of the original. These include Chinese characters, and missing values, which require a step-by-step cleaning.

Python data cleansing CVS with Chinese characters

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.