Let's create a data frame by hand.[Python]View PlainCopy
Import NumPy as NP
Import Pandas as PD
DF = PD. DataFrame (Np.arange (0,2). Reshape (3), columns=list (' abc ' )
DF is such a dropSo how do you choose the three ways to pick the data?One, when each column already has column name, with DF [' a '] can choose to take out a whole column of da
I believe many people like me in the process of learning Python,pandas data selection and modification has a great deal of confusion (perhaps by the Matlab) impact ...
To this day finally completely figure out ...
Let's start with a data box manually.
Import NumPy as NP
import pandas as PD
DF = PD. Dataframe (Np.arange (0,60,2). Reshape (10,3), columns=list (' abc ')DF is such a drop
So what are the three
browsing data. The default value is 5.
Df. sample (n): Randomly browses n rows of data. The default value is 5 rows.
Df. shape: the number of rows and columns of the tuple type)
Df. describe (): Calculate the evaluation data Trend
Df.info (): memory and Data Type
3. It is easy to add columns to DataFrame in DataFr
. However, an integer-based axis supports label-based indexing only, and does not support location-based indexing. Therefore, in such cases, the use of. Iloc or. Loc will usuallyMore explicit.. Loc,. Iloc,. IX, and [] indexes can accept a callable object as an indexer. Use the following tags to get values from a multi-axis object (using. Loc For example, but also for. I
index-feature name-Attribute-easy to understand
2. filter the row and column data of dataframe
import pandas as pd,numpy as npfrom pandas import DataFramedf = DataFrame(np.arange(20).reshape((4,5)),column = list('abcde'))
1. df [] df. Select column data
Df.Df [['A', 'B']
2. df. loc [[index], [colunm] use tags to select data
When you do not filter rows, enter "
Ming 6.0 - Name:price, Dtype:float64 -Zhang San 1.2 theReese 1.0 -Harry 2.3 -Chen Jiu 5.0 -Xiao Ming 6.0 +Name:price, Dtype:float64 In general, we often need to value by column, then Dataframe provides loc and Iloc for everyone to choose from, but the difference is between the two.1 Print(frame2)2 Print(frame2.loc['Harry'])#Loc can use the index of the string type, whereas the Iloc can only be of type int
Pandas Select Data Iloc and LOC are not used the same way, Iloc is based on the index, LOC is based on the value of the row>>>importpandasaspd>>>importos>>>os.chdir ("d:\\") >>>d=pd.read_csv ("Gwas_water.qassoc",delimiter= "\s+") >> >d.loc[1:3]CHRSNPBPNMISS BETASER2 tp11. 447440.18000.17830.02369 1.0090.318521.449 440.27850.24730.029311.1260.26653 1.452440.1800 0
section "Getting Started with data structures (Intro to data Structures)". Open this page next to your Jupyter notebook. When you read the document, write down (rather than copy) the code and execute it in the notebook. As you execute your code, explore these operations and try to explore new ways to use them.Then select the section "Index and select data (index
Dataframe Data Filter--loc,iloc,ix,at,iat condition Filter Single condition filter Select a record with a value greater than N for the col1 column: data[data[' col1 ']>n] filters the col1 column for records with a value greater than N, but displays col2, Col3 column value: data
(' relative importance ') Plt.draw () plt.show ()
The code is a bit long, but mainly divided into two, one is model training, the other is based on the importance of training to screen important features and drawing.
The attributes that are more important than 18 are obtained as shown in the following illustration:
It is important to see the three properties of TILTLE_MR title_id gender. and the title related to the attributes are our analysis of the name, can be seen in some string propertie
Using the Docker process, we need to look at the data generated in the container, and between the container and the container, the container and the host before the data sharing, backup and other operations, where the data management of the container. The management of data currently provides the following two ways:#数据
Data filtering and sorting------Explore 2012 Euro Cup dataRelated data See (github)Step 1-Import the Pandas libraryimport Pandas as PDStep 2-Data set" ./data/euro2012.csv " # Euro2012.csvStep 3-Name the dataset euro12Euro12 = pd.read_csv (path2) euro12.tail ()Output:
Team
goals
Shots
Sometimes we need to perform a large-scale data test and insert a large amount of data into the database.
There are three points to consider:
[Protect existing data]
This has two purposes:
1. We only want to test the inserted data.
2. After the test, we need to delete the data
1Course PlanMenu Data ManagementRights Data ManagementRole Data ManagementUser Data Managementin the Realm in the dynamic query user rights, RolesS Hiro integrated in Ehcache Cache Permission Data2Menu Data Additions2.1 using combotree parent menu item
', ' 110 ')
Replace
Data preprocessing
Sort the data
Df.sort_values (by=[' The number of messages sent by the customer on the Day '])
Sort
PivotTable report in data grouping --excel* * Group Customer chat Records
#如果price列的值 >3000,group column shows high, otherwise show low
df[' group ' = Np.where (df[' customer sends messages on the day '] > 5, ' High ', ' l
[-1]print ("lastvalue", Sunspots.loc[last_date])(4) The following describes how to query a date by using a date string in the YYYYMMDD format, as follows:Print ("Values Slice by date", sunspots["20020101": "20131231"])(5) The index list can also be used to queryPrint ("Slice from a list of indices", sunspots.iloc[[2,4,-4, 2])(6) To choose a scalar value, there are two methods, here is the speed of the obvious advantage of the second method. They require two integers as arguments, where the first
The previous article briefly introduced the conceptual data model, the logical data model, the physical data Model basic concept, the characteristic as well as the three corresponding database development stage. Now for the three kinds of data models used in the logical data
Let me tell you, Big Data engineers have an annual salary of more than 0.5 million and a technical staff gap of 1.5 million. In the future, high-end technical talents will be snapped up by enterprises. Big Data is aimed at higher talent scarcity, higher salaries, and higher salaries. Next, we will analyze the Big Data talent shortage and the employment of
transferred from: http://blog.csdn.net/lifuxiangcaohui/article/details/40588929Hive is based on the Hadoop distributed File system, and its data is stored in a Hadoop Distributed file system. Hive itself does not have a specific data storage format and does not index the data, only the column separators and row separators in the hive
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.