Using Python for data analysis (10) pandas basics: processing missing data, pythonpandasIncomplete Data is common in data analysis. Pandas uses the floating-point value NaN to indicate missing data in floating-point and non-floating-point groups. Pandas uses the isnull () and notnull () functions to determine the missing condition.The general processing method fo
[[1, 3]-> merge column 1 and column 3 as a date column
Dict, e.g. {'foo': [1, 3]}-> merge column 1 and 3 and name the merged column "foo ".
Example:
DF = Pd. read_csv (file_path, parse_dates = ['time1', 'time2']), parses the time1 and time2 columns into the date format.
I have to say that it is a pity that Chinese characters cannot be used. For example, the format 'August 1' cannot be parsed.
Infer_datetime_format:
Boolean, default false if it is set to true and parse_dates is available,
Pip Install Pandaspip Install XLRDWhen a lot of records, with Excel sorting processing more laborious, Excel program is not responsive , with pands perfect solution.# We'll use data structures and data analysis tools provided in Pandas Libraryimp Ort pandas as pd# Import retail sales data from an Excel Workbook into a data frame# path = '/documents/analysis/python/ex Amples/2015sales.xlsx ' path = ' f:/pyt
Original link: http://www.datastudy.cc/to/27
In the process of using the dataframe of the pandas framework, if you need to handle some character strings, such as determining whether a column contains some keywords, whether a column has a character length of less than 3, and so on, it can be much easier to handle if you master the method built into the STR column.
Let's take a look at the details of what the Str-band method of the series class is.
1,
Original English: 09-lesson
Export data from Microsoft's SQL database to CSV, Excel, or text files.
# import library Import
pandas as PD
import sys from
sqlalchemy import create_engine, MetaData, Table, select
Print (' Python version ' + sys.version)
print (' Pandas version ' + pd.__version__)
Python version 3.6.1 | Packaged by Conda-forge | (Default, Mar 2017, 21:57:00)
[GCC 4.2.1 compatible Apple LLVM
Organize Pandas Operations
This article original, reproduced please identify the source: http://www.cnblogs.com/xiaoxuebiye/p/7223774.html
Import Data:
Pd.read_csv (filename): Import data from CSV file
pd.read_table (filename): Import data from a delimited text file
pd.read_excel (filename) : Importing data from an Excel file
pd.read_sql (query, Connection_object): Importing data from SQL Tables/Libraries
Pd.read_json (json_string) : Import data fro
Excel has a computational function skew () for skewness, but it is unclear how to traverse with Excel, which has a large amount of data.Try using Python for resolution.The first time to learn python, did not expect to overcome the installation of various packages of sadness, incredibly successful implementation.python3.3:#this is a test case#-*-coding:gbk-*-print ("Hello python! Chinese") #env configimport xlrdimport osimport xlwt3import Numpyim Port pandas
Original English: 08-lesson
How to crawl data from Microsoft's SQL database.
# import library Import
pandas as PD
import sys from
sqlalchemy import create_engine, MetaData, Table, select, Engine
Print (' Python version ' + sys.version)
print (' Pandas version ' + pd.__version__)Python version 3.6.1 | Packaged by Conda-forge | (Default, Mar 23 2017, 21:57:00) [GCC 4.2.1 compatible Apple LLVM 6.1.0 (clang-60
Original: Chapter 9
Import pandas as PD
import sqlite3
So far, we've only been involved in reading data from a CSV file. This is a common way to store data, but there are many other ways. Pandas can be from html,json,sql,excel (!!! ), Hdf5,stata and other things to read data. In this chapter, we will discuss reading data from an SQL database.
You can use the Pd.read_sql function to read data from an SQL da
Import NumPy as NP import pandas as PD from pandas import series,dataframe ' If copied code, error syntaxerror:invalid character
In identifier, there is a space for the Chinese symbol in the copied code. "DATA=PD." Dataframe (Np.arange (6). Reshape ((3,2)), INDEX=PD. Index ([' A ', ' B ', ' C '],name= ' state '), COLUMNS=PD. Index ([' I ', ' II '],name= ')] Print (data) ' number I II state a 0 1 B 2 3 C 4 5
Time resampling of Pandas data Visualization (iii)
Python+pandas generate the specified date and resampling-CSDN blog https://blog.csdn.net/LY_ysys629/article/details/73823803
Pandas Resample Method-Csdn Blog https://blog.csdn.net/wangshuang1631/article/details/52314944
——————————————————————————————————————————————————
Time Series Conversions:
C=PD. Seri
', DF ['v1']) #2 indicates the insert position, and V6 indicates the column name, DF ['v1 '] is the inserted value print ('insert column:') print (DF, '\ n') print (' * 50)
4. General selection methods:
Operation Method
Method
Result
Select a column
Def [col]
Sequence
Select a row using column tags
DF. Loc [col]
Sequence
Select a row by location
DF. icol [2]
Sequence
Line Cutting
DF [5: 10]
Data box
Dateframe modifying column names in pandasWhen doing data mining, want to change a dataframe column name, so looked up, summarized as follows:The data are as follows:>>>Import PandasAs pd>>>a = PD. DataFrame ({' A ': [1,2,3], ' B ': [4,5,6], ' C ': [7,8,9]})> >> a a B C0 1 4 71 2 5 82 3 6 9 /c21> Method One: Methods of violence>>>a.columns = [‘a‘,‘b‘,‘c‘]>>>a a b c0 1 4 71 2 5 82 3 6 9But the disadvantage is to write three, or error.Method Two: A better method>>>a.rename(columns={‘A‘:‘a‘, ‘
Workaround:Pd_data = pd.read_table (comment_file,header=none,encoding='utf-8', engine=' python ')Official website Analysis:engine : {' C ', ' Python '}, optional
Parser engine to use. The C engine was faster while the Python engine was currently more feature-complete.
1,
iterator : boolean, default False
Return Textfilereader object for iteration or getting chunks Withget_chunk () .
or get
from Chunk
pd_data = pd.read_table (comme
A few tips that you think are more useful.DF is a dataframeSE is a series1, import data, often need to see what the data look like, this time need. Head (n) function,That is, the first n rows of data are displayed.Df.head (5)Se.head (5)2, want to know how many columns df, what is the specific content of the column, with Df.columns3. If you want to know how many different elements are in a column or SE of DF, use the. value_counts () functiondf[' mm '].value_counts ()Se.value_counts ()
installation of PandasCMD window inputPip Install PandasV. Testing1, now the Python interactive mode and under the Pycharm editor are not error.,2, PIP installation JupyterPip Install Jupyter3. cmd command to open Notebook#cmd命令jupyter Notebook4. Open a Jupyter notebook Click File New to select Python version 2 Enter the following code click the cell run all to execute the code#coding: Utf-8import Matplotlib.pyplot as Pltimport numpy as NpX = Np.linspace (-np.pi,np.pi,256,endpoint=true) (C,S) =
data conversion refers to filtering, cleaning, and other conversion operations on the data. Remove Duplicate data Repeating rows often appear in the Dataframe, Dataframe provides a duplicated () method to detect whether rows are duplicated, and another drop_duplicates () method to discard duplicate rows:Duplicated () and Drop_duplicates () methods defaultJudging all Columns, if you do not want to, the collection of incoming columns as a parameter can be specified as a column, for example:Dupl
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.