degree of flattening of the data distribution graph)>>>Df.kurt ()#generate dataframe with a python dictionary>>> DF=PD. DataFrame ({'Weather':['Cold',' Hot'],' Food':['Soup','Ice Cream']})>>>DF Food Weather0 Soup Cold1Ice cream Hot#Group an attribute by type>>> Group=df.groupby ('Weather')>>> forName,groinchGroup: ...Print(name) ...Print(GRO) ... cold food weath
some coincidences, so that the content you really want is not extracted, and other content like pattern. Therefore, first of all, take out the key blocks first, and then take out the specific information.
1 import re2 3 re_books = Re. Compile ('
Check the source code of the webpage, find matching rules for retrieving the main information, and obtain all the intermediate content. The rest is to extract every item of information in each book through regular expressions. This is to observe their r
a technique of 1.pandas
Apply () and applymap () are functions of the Dataframe data type, and map () is a function of the series data type. The action object of the Apply () dataframe a column or row of data, Applymap () is element-wise and is used for each of the dataframe data. Map () is also element-wise, calling a function once for each data in series. 2.PC
Original English: 04-lesson
In this lesson, we will revert to some basic concepts. We'll use a smaller dataset so you can easily understand the concepts I'm trying to explain. We will add columns, delete columns, and slice the data (slicing) operations in different ways. enjoy!
# Import required Libraries import
pandas as PD
import sys
Print (' Python version ' + sys.version)
print (' Pandas version: ' + pd
A KNN algorithm for recognizing handwritten numbers is written, as shown in. Refer to link http://blog.csdn.net/april_newnew/article/details/44176059.#-*-coding:utf-8-*-ImportNumPy as NPImportPandas as PDImportOSdefreadtxt (filename): Text=[] f= open (filename,'R', encoding='Utf-8') forLineinchf.readlines (): Text.append (line) txt=list (text) txt=np.array (txt,dtype='float') txt=txt.tolist ()returntxtdefReadData (rootfile): Data=[] Label= [] forRoot,dirs,filesinchOs.walk (rootfile): for
Python functions(1) Another way to define the data frame is to put the data content (multidimensional array) directly into data, and then define columns and index. (Data frame. Columns is a column name,. Index is the row name, and the type that is taken is similar to the tuple, you can use [0],[1] ... Direct removal)DF = PD. DataFrame (data=[[34, ' null ', ' Mark '], [[a], ' null ', ' Mark '], [", ' null ',
. As the name implies, series is a sequence, similar to a one-dimensional array; The Data frame is the equivalent of a two-dimensional table, similar to a two-dimensional array, with each column being a Series. To locate the elements in the series, Pandas provides the Index object, each with a corresponding index, which is used to mark different elements, which are not necessarily numbers or letters, Chinese, and so on, similar to the primary key in SQL.Similarly, the Data frame is a combination
This is a short introduction to pandas and geared mainly for new users.
Customarily, we import as follows
In [1]: Import pandas as PD in
[2]: Import NumPy as NP
Object Creation
The Data Structure Intro section
Creating a Series by passing a list of values, letting pandas create a default integer index
In [3]: s = PD. Series ([1,3,5,np.nan,6,8]) in
[4]: S
out[4]:
0 1 1 3 2 5 3 nan
4 6
5 8
Ten Minutes to Pandas
This is a short introduction to pandas and geared mainly for new users. You can have a complex recipes in the cookbook
Customarily, we import as follows
In [1]: Import pandas as PD in
[2]: Import NumPy as NP in
[3]: Import Matplotlib.pyplot as Plt
Object Creation
The Data Structure Intro section
Creating a Series by passing a list of values, letting pandas create a default integer index
In [4]: s =
Here only the data analysis commonly used graphic drawing, as for the complex graphics is not in the scope of this discussion, a few of the graphics to meet the requirements of the data analysis process, as for reporting materials or other high-quality graphics, and then write another about the simple use of ggplot2.Python's drawing tools are mainly matplotlib, which is not complex to use, but simple to use.
There are two ways to use matplotlib drawings:1.matplotlib drawing, specifying parameter
)#random reflow, column, and column rearrangement, since each operation randomly results in a different result, you can set the seed -n=0.8 -Train=data[:int (nlen (data)),:] -Test=data[int (nlen (data)):,:] in - #Modeling Data Grooming to #k=30 +m=100 -RECORD=PD. DataFrame (columns=['Acurrary_train','acurrary_test']) the forKinchRange (1,m+1): * #k feature expansion multiples, eigenvalues of 0-1, eac
1. In the dataframe of pandas, we often need to select the rows of a specified condition based on a property, at which point the Isin method is particularly effective.
Import pandas as PD
DF = PD. Dataframe ([[1,2,3],[1,3,4],[2,4,3]],index = [' One ', ' two ', ' three '],columns = [' A ', ' B ', ' C '])
print DF
#
the writerows function.Read the csv file as DataFrame
Code
# Read the csv file DataFrameimport pandas as pddframe = pd. DataFrame. from_csv ('E:/iris.csv ')
It can also be slightly tortuous:
Import csvimport pandas as pdwith open ('E:/iris.csv ') as csvfile: reader = csv. dictReader (csvfile, fieldnames = None) # fieldnames is set to None by default. if the csv
Import NumPy as NP
import pandas as PD
Stack
Rotate the row index to a column index and complete the hierarchy index.
In the following example, first create a box of 5x2 dataframe.
It is then stack, so the original row index becomes the outer index, and the original column index becomes an inner index.
Df_obj = PD. Datafram
Path to mathematics-python Data Processing (2)-python Data Processing
Insert column
#-*-Coding: UTF-8 -*-
"""
Created on Mon Mar 09 11:21:02 2015
@ Author: myhaspl@myhaspl.com
"""
Print u "python data analysis \ n"
Import pandas as pd
Import numpy as np
# Constructing product sales data
Mydf = pd. dataFrame ({u'item region Code': [,], u'item a': np. random. randi
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.