Original link: http://www.datastudy.cc/to/43Let's look at how to learn the data structure of a language efficiently, and today we'll look at the Python article.The so-called data structure, refers to the existence of one or more of the specific relationship between the type of the collection.650) this.width=650; "src=" http://www.datastudy.cc/img/f46d5b62c074a214d9c462eb4e7bee90/0.jpg "alt=" 0.jpg "/>Python in the field of data analysis, the most commonly used data structure,
the Dataframe>>>np.sign (DF)>>> last_col=df.columns[-1]>>>np.sign (Df[last_col])#Head (take the first few lines) and tail (take a few lines)>>> Df.head (2)>>> Df.tail (2)#find a row of data by index>>> last_col=df.index[-1]>>>Last_col>>>Df.iloc[last_col]#find a column of data for a row by index>>> Df.iloc[2:9]#Iloc and IAT function the same>>> df.iloc[2,3]>>> df.iat[2,3]#Logical Lookup>>> df[df>Df.mean ()]
The following for you to share a pandas implementation of the selection of a specific index of the row, has a good reference value, I hope to be helpful to everyone. Come and see it together.
As shown below:
>>> Import numpy as np>>> import pandas as pd>>> Index=np.array ([2,4,6,8,10]) >>> Data=np.array ([3,5,7,9,11]) >>> DATA=PD. DataFrame ({' num ':d ata},index=index) >>> print (data) num2 910 11> >> select_index=index[index>5]>>> Print (se
"Original" 10 minutes to fix pandasThis article is a simple translation of "Ten Minutes to Pandas" on the official website of Pandas, the original is here. This article is a simple introduction to pandas, detailed introduction please refer to:Cookbook . As a rule, we will introduce the required packages in the following format:First, create the objectYou can view detailed information about the contents of this section through the Data Structure Intro setion.1, you can create a series,pandas by p
Function Prototypes:Https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.fillna.html#pandas.DataFrame.fillnaPad/ffill: Fills the missing value with the previous non-missing valueBackfill/bfill: Fills the missing value with the next non-missing valueNone: Specify a value to replace the missing value
123456789101112131415161718192021st22232425262728293031323334353637383940414243444546474849505152535455565758596061 62 63 64 65 66 67 68 69 70 71 72 73 74
3
6
H
7
3
7
I
8
3
8
J
9
3
9
By using *loc, we can select some of the data in the Dataframe.
Df.loc[' a ']
Rev. 0
Test 3
col 0
name:a, Dtype:int64
# df.loc[starting index (included): Terminating index (inclusive)]
df.loc[' a ': ' d ']
Rev
Test
Col
A
0
3
0
B
1
Http://www.cnblogs.com/batteryhp/p/5006274.htmlPandas is the preferred library for subsequent content in this book. The pandas can meet the following requirements:
Data structure with automatic or explicit data alignment by axis. This prevents many common errors caused by data misalignment and data from different data sources (indexed differently).
Integrated time series capabilities
Data structures that can handle time series data as well as non-time series
Mathematical ope
section "Getting Started with data structures (Intro to data Structures)". Open this page next to your Jupyter notebook. When you read the document, write down (rather than copy) the code and execute it in the notebook. As you execute your code, explore these operations and try to explore new ways to use them.Then select the section "Index and select data (indexing, Selecting data)". Create a new Jupyter notebook, write and execute the code, and then explore the different actions you learned. T
previous data that can be used? Is it possible to make a basis of speculation?First I'll define a function to grab the player data from Fox Sports and then grab the player's spring training or regular season's batting stats.Fox Sports Links:Https://www.foxsports.com/mlb/statsImport pandas as Pdimport Seaborn as Snsimport requestsfrom bs4 import BeautifulSoupplt.style.use (' FiveThirtyEight ')% Matplotlib inline%config Inlinebackend.figure_format = ' Retina ' def batting_stats (Url,season): R =
use anonymous functions5 column names1 Df.columns2Df.columns = ['a','b','C','e','D','F']# Renaming3Df.rename (columns = {'A':'AA','B':'BB','C':'cc','D':'DD','E':'ee','F':'FF'}, Inplace=True)4Df.rename (columns=LambdaX:x[1:].upper (), inplace=true)#You can also use the function inplace parameter to replace the original variable, the deep copy6 Dummy Variable Dummy variables1 PD. Series (['a|b'a|c']). Str.get_dummies ()7 Pure DF Matrix, i.e. does not contain column and index1 df.values 2 df.get_v
This is a short introduction to pandas and geared mainly for new users.
Customarily, we import as follows
In [1]: Import pandas as PD in
[2]: Import NumPy as NP
Object Creation
The Data Structure Intro section
Creating a Series by passing a list of values, letting pandas create a default integer index
In [3]: s = PD. Series ([1,3,5,np.nan,6,8]) in
[4]: S
out[4]:
0 1 1 3 2 5 3 nan
4 6
5 8
Dtype:float64
Creating a dataframe by pass
Ten Minutes to Pandas
This is a short introduction to pandas and geared mainly for new users. You can have a complex recipes in the cookbook
Customarily, we import as follows
In [1]: Import pandas as PD in
[2]: Import NumPy as NP in
[3]: Import Matplotlib.pyplot as Plt
Object Creation
The Data Structure Intro section
Creating a Series by passing a list of values, letting pandas create a default integer index
In [4]: s = PD. Series ([1,3,5,np.nan,6,8]) in
[5]: S
out[5]:
0 1
1 3
the data has been very clean, where the main task of data preprocessing is to discretization of each attribute, and to cluster each attribute into 4 classes. This is done to accommodate the needs of the algorithm because the association rule algorithm cannot handle continuous data
The key to clustering each attribute into the 4 class is to find the right dividing point. The dividing point is determined by clustering algorithm to find the cluster center of each attribute, taking the average valu
', ' 110 ')
Replace
Data preprocessing
Sort the data
Df.sort_values (by=[' The number of messages sent by the customer on the Day '])
Sort
PivotTable report in data grouping --excel* * Group Customer chat Records
#如果price列的值 >3000,group column shows high, otherwise show low
df[' group ' = Np.where (df[' customer sends messages on the day '] > 5, ' High ', ' low ')
DF
Group
grouping to meet multiple criteria
# >24 in sign column with broker-level A1 and broker response length shown as 1
df
The following for you to share a Python data Analysis Library Pandas basic operation method, has a good reference value, I hope to help you. Come and see it together.
What is Pandas?
Is it it?
。。。。 Apparently pandas is not so cute as this guy ....
Let's take a look at how Pandas's official website defines itself:
Pandas is a open source, easy-to-use data structures and data analysis tools for the Python programming language.
Obviously, pandas is a very powerful data analysis library for Pyth
Query Write operations Pandas can have powerful query functions like SQL and is simple to do: printtips[[' Total_bill ', ' tip ', ' smoker ', ' time ']] #显示 ' total_bill ', ' tip ', ' Smoker ', ' time ' column, functionally similar to the Select command in SQL printtips[tips[' time ']== ' Dinner ']# Displays data equal to dinner in the time column, functionally similar to the where command in SQL printtips[(tips[' size ']>=5) | (tips[' Total _bill ']>45)]printtips[(tips[' time ']== ' Dinner ')
sklearn.model_selection import train_test_splitX_train,X_test,y_train,y_test= train_test_split(X,y,random_state=42,test_size=0.25)
Get discontinued words
def get_custom_stopwords(stop_words_file): with open(stop_words_file,encoding="utf-8") as f: custom_stopwords_list=[i.strip() for i in f.readlines()] return custom_stopwords_list
stop_words_file = "stopwords.txt"stopwords = get_custom_stopwords(stop_words_file) # 获取停用词
Import Word Bag model
from sklearn.featur
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.