The charm of dynamic visual data visualization D3,processing,pandas data analysis, scientific calculation package NumPy, visual package Matplotlib,matlab language visualization work, matlab No pointers and references is a big problemD3.js Getting Started GuideWhat is D3?D3 refers to a data-driven document (Data-driven documents),According to the official definition of D3:D3.js is a JavaScript library that can manipulate documents through data.D3 can v
# Coding:utf-8__author__ = ' Weekyin 'Import NumPy as NPImport Pandas as PDDatas = Pd.date_range (' 20140729 ', periods=6)# first create a time index, the so-called index is the ID of each row of data, you can identify the unique value of each rowPrint Datas# for a quick start, let's look at how to create a 6x4 data: The RANDN function creates a random number, the parameter represents the number of rows and columns, and dates is the index column creat
()Print(c isA#False (C and a point to memory addresses are different)#copy A, assign a value to C#if it is c=a, then C and a are the same (point to the same address)#Print (c is a) in the word, it prints truec[1,2] = 100Print(a)" "[ [1 2 3] [4 5] [7 8 9]]" "#here we find that C has been modified, so a has also been modified.#C and a have different addresses but share a set of dataD=a.copy ()Print(d isA#falsed[1,3] = 100#There's no change here .Print(a)Read TXT file:Import NumPy # The first para
Original addressThe coding of discrete features is divided into two situations:1, the value of discrete features do not have the meaning of the size, such as Color:[red,blue], then use one-hot encoding2, discrete characteristics of the value of the size of the meaning, such as SIZE:[X,XL,XXL], then use the value of the map {X:1,xl:2,xxl:3}It is convenient to use pandas to one-hot encoding of discrete features
Import
Import NumPy as NP
Import Pandas as PD
DATA=PD. Dataframe (Np.arange (6). Reshape ((3,2)), INDEX=PD. Index ([' A ', ' B ', ' C '],name= ' state '), COLUMNS=PD. Index ([' I ', ' II '],name= ' number ')]
Data
Number I II
State
A 0 1
B 2 3
C 4 5
Result=data.unstack ()
Result
Number State
I a 0
B 2
C 4
II a 1
B 3
C 5
Type (Result) #pandas. Core.series.Ser
1 concat
The Concat function is a method underneath the pandas that allows for a simple fusion of data based on different axes.
Pd.concat (Objs, axis=0, join= ' outer ', Join_axes=none, Ignore_index=false, Keys=none, Levels=none, Names=None,
Verify_integrity=false)1 2 1 2 1 2
Parameter descriptionObjs:series,dataframe or a sequence of panel compositions lsitAxis: Axis that needs to merge links, 0 is row, 1 is columnJoin: Connecting the way i
dateframe Modify column names in Pandas
The data are as follows:
>>>import pandas as PD
>>>a = PD. Dataframe ({' A ': [1,2,3], ' B ': [4,5,6], ' C ': [7,8,9]})
>>> a
a B C
0 1 4 7
1 2 5 8
2 3 6 91 2 3 4 5 6 7 1 2 3 4 5 6-7
method One: Methods of violence
>>>a.columns = [' A ', ' B ', ' C ']
>>>a
a b c
0 1 4 7
1 2 5 8
2 3 6 91 2 3 4 5 6 1 2 3 4 5-6
But the disadvantage is that you
The is very simple to use when data manipulation is done through the Pandas library, and then a brief instance is written to the CSV file:
In [1]: Import pandas as PD in [2]: data = {' Row1 ': [1,2,3, ' Biubiu '], ' row2 ': [3,1,3, ' Kaka ']} in [3]: Data out[3]: {' row1 ': [1, 2, 3, ' Biubiu '], ' row2 ': [3, 1, 3, ' Kaka ']} in [4]: DATA_DF = PD.
Dataframe (data) in [5]: DATA_DF out[5]: row1 row2 0
Objective
Pandas is a numpy built with more advanced data structures and tools than the NumPy core is the Ndarray,pandas is also centered around Series and dataframe two core data structures. Series and Dataframe correspond to one-dimensional sequence and two-dimensional table structure respectively. Pandas's conventional approach to importing is as follows:
From
Operating system: Windowspython:3.5Welcome to join the Learning Exchange QQ Group: 657341423
The previous section describes the library of data analysis and mining needs, the most important of which is pandas,matplotlib.Pandas: Mainly on data analysis, calculation and statistics, such as the average, square bad.Matplotlib: The main combination of pandas to generate images. Both are often used in combination
merging and splitting of arrays in numpy and pandas
Merging
in NumPy
In NumPy, you can combine two arrays on both the vertical and horizontal axes by concatenate, specifying parameters axis=0 or Axis=1.
Import NumPy as NP import pandas as PD Arr1=np.ones (3,5) arr1 out[5]: Array ([[1., 1., 1., 1., 1.], [1., 1
., 1., 1., 1.], [1., 1., 1., 1., 1.]] Arr2=np.random.randn. Reshape (Arr1.shape) arr2 out[8]: A
Dateframe modifying column names in pandasWhen doing data mining, want to change a dataframe column name, so looked up, summarized as follows:The data are as follows:>>>Import PandasAs pd>>>a = PD. DataFrame ({' A ': [1,2,3], ' B ': [4,5,6], ' C ': [7,8,9]})> >> a a B C0 1 4 71 2 5 82 3 6 9 /c21> Method One: Methods of violence>>>a.columns = [‘a‘,‘b‘,‘c‘]>>>a a b c0 1 4 71 2 5 82 3 6 9But the disadvantage is to write three, or error.Method Two: A better method>>>a.rename(columns={‘A‘:‘a‘, ‘
Workaround:Pd_data = pd.read_table (comment_file,header=none,encoding='utf-8', engine=' python ')Official website Analysis:engine : {' C ', ' Python '}, optional
Parser engine to use. The C engine was faster while the Python engine was currently more feature-complete.
1,
iterator : boolean, default False
Return Textfilereader object for iteration or getting chunks Withget_chunk () .
or get
from Chunk
pd_data = pd.read_table (comme
A few tips that you think are more useful.DF is a dataframeSE is a series1, import data, often need to see what the data look like, this time need. Head (n) function,That is, the first n rows of data are displayed.Df.head (5)Se.head (5)2, want to know how many columns df, what is the specific content of the column, with Df.columns3. If you want to know how many different elements are in a column or SE of DF, use the. value_counts () functiondf[' mm '].value_counts ()Se.value_counts ()
installation of PandasCMD window inputPip Install PandasV. Testing1, now the Python interactive mode and under the Pycharm editor are not error.,2, PIP installation JupyterPip Install Jupyter3. cmd command to open Notebook#cmd命令jupyter Notebook4. Open a Jupyter notebook Click File New to select Python version 2 Enter the following code click the cell run all to execute the code#coding: Utf-8import Matplotlib.pyplot as Pltimport numpy as NpX = Np.linspace (-np.pi,np.pi,256,endpoint=true) (C,S) =
data conversion refers to filtering, cleaning, and other conversion operations on the data. Remove Duplicate data Repeating rows often appear in the Dataframe, Dataframe provides a duplicated () method to detect whether rows are duplicated, and another drop_duplicates () method to discard duplicate rows:Duplicated () and Drop_duplicates () methods defaultJudging all Columns, if you do not want to, the collection of incoming columns as a parameter can be specified as a column, for example:Dupl
minus minimum for each columnThe Apply function enters a sequenceValue_counts () View the number of occurrences of an element with. Mode () View the most frequently occurring elementsCreate a random sequence firstCall Value_counts ()Call. Mode () to see the most frequently occurring elementsData mergeCreate an array of 10*4 first(1) Call the concat () function to merge the array (Concat accepts an array, which is the array to be merged)See if the merged array is equal to the original arrayOr(2)
Explore the students ' consumption of wineData See GitHubStep 1-Import the necessary librariesImport Pandas as PD Import NumPy as NPStep 2-Data set" ./data/student-mat.csv " Step 3 Name The data studentStudent = Pd.read_csv (PATH4) Student.head ()Output:Step 4 Slice the data from ' school ' to ' Guardian '" School ":"Guardian"]stud_alcoh.head ()Output:Step 5 Create a lambda function that captures a stringLambda x:x.upper ()Step 6 capitalize the ' Fjo
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.