標籤:ext 正則表達 返回 line turn 表達 寫入 blog rssi
大資料統計1.項目需求,統計海量資料中某一參數的機率分布
2.實現過程
#!/usr/bin env python# -*- coding:utf-8 -*-import redef preprocess(fileName, pattern): ‘‘‘ 將資料集進行預先處理,比如取出RSSI那一列的資料 :param fileName: 接收相對路徑 :param pattern: 接收Regex的模板 :return: 返回Region of interest資料集 ‘‘‘ with open(fileName, ‘r‘, encoding=‘utf-8‘) as f, open(‘laterText.txt‘, ‘w‘, encoding=‘utf-8‘) as f2: for line in f: result = re.findall(pattern, line) #‘.*(-\d{2}),‘ if result: newContent = result[0] + ‘\n‘ f2.write(newContent) return ‘laterText.txt‘def sort(fileName): ‘‘‘ 將Region of interest資料集內容取出來放進一個列表 再將列表進行排序,然後再對列表的內容進行統計 :param fileName: ROI資料集的路徑 :return: ‘‘‘ s1 = [] s_result = [] with open(fileName, ‘r‘, encoding=‘utf-8‘) as f: for line in f: line = line.split()[0] s1.append(line) s1 = sorted(s1) for i in s1: flage = False for j in s_result: if i in j: a, b = j.split(‘:‘) new_j = a + ‘:‘ + str(int(b) + 1) s_result.remove(j) s_result.append(new_j) flage = True else: continue if flage == False: new_str = i + ‘:‘ + ‘1‘ s_result.append(new_str) return s_resultdef finalText(list1): ‘‘‘ 將統計後的列表寫入檔案,結果更加直觀 :param list1: 統計之後的列表 :return: True ‘‘‘ with open(‘result.txt‘, ‘w‘, encoding=‘utf-8‘) as f2: for i in list1: new_line = i + ‘\n‘ f2.write(new_line) return Trueif __name__ == ‘__main__‘: inputFile = input(‘Enter a file path:‘) # 輸入檔案的相對路徑 例 trainText.csv pattern = input(‘Enter a re expression:‘) #輸入Regex 例 .*(-\d{2}), laterText = preprocess(inputFile, pattern) # laterText接收預先處理檔案的路徑 ‘laterText.txt‘ list1 = sort(laterText) # 將預先處理後的檔案內容取出,放入列表進行排序並統計列表中各個元素出現的次數,並返回一個列表 if finalText(list1): # 將列表裡面的元素放入一個result.txt裡面 print(‘統計完畢,結果參考result.txt‘)
3.Demo
-47:1-48:2-49:7-50:7-51:23-52:22-53:33-54:58-55:157-56:81-57:200-58:149-59:214-60:269-61:603-62:256-63:636-64:427-65:525-66:585-67:1233-68:483-69:1127-70:654-71:676-72:735-73:1133-74:432-75:766-76:418-77:411-78:395-79:519-80:184-81:321-82:137-83:146-84:138-85:128-86:110-87:96-88:36-89:38-90:20-91:7-92:11-93:1
1.python小項目:大資料統計