Pandas common knowledge required for data analysis and mining in PythonObjectivePandas is based on two types of data: series and Dataframe.A series is a one-dimensional data type in which each element has a label. The series is similar to an array of elements tagged in numpy. Where the label can be either a number or a
Let's create a data frame by hand.[Python]View PlainCopy
Import NumPy as NP
Import Pandas as PD
DF = PD. DataFrame (Np.arange (0,2). Reshape (3), columns=list (' abc ' )
DF is such a dropSo how do you choose the three ways to pick the data?One, when each column already has column name, with DF
Most of the students who Do data analysis start with excel, and Excel is the most highly rated tool in the Microsoft Office Series.But when the amount of data is very large, Excel is powerless, python Third-party package pandas greatly extend the functionality of excel, the entry takes a little time, but really is the
Querying and analyzing data is an important function of pandas, is also the basis of our learning pandas, the following article mainly introduces you about how to use the data analysis of Python pandas query
the string object method Split () method splits the string:The Strip () method removes whitespace and line breaks:Split () in combination with strip () using:The "+" symbol allows you to concatenate multiple strings together:The join () method is also the connection string, comparing it to the "+" symbol:The In keyword determines whether a string is contained in another string:The index () method and the Find () method determine the location of a substring: the difference between the index ()
methodRanking:Rank ()Axis index with duplicate valuesThe Is_unique () property of the index can tell you if its value is uniqueSummary and calculation of descriptive statisticsSUM ()Mean ()Describe ()Describing and summarizing statistical functionscorrelation coefficients and covarianceThe series and Dataframe methods are computed for the parameter pairs.Unique value, value count, and membershipUnique value: Unique () methodValue count: The Value_counts () method calculates how often each value
The Pandas object has some common mathematical and statistical methods. For example, the sum () method, which makes the column subtotal: the sum () method passed in Axis=1 is specified as a horizontal summary, which is subtotal: Idxmax () gets the index of the maximum value: There is also a rollup that is cumulative, cumsum (), compared to it and Su The difference between M ():The unique () method is used to return only values in the
Quick guide:steps to Perform Text Data cleaning in PythonintroductionTwitter has become a inevitable channel for brand management. It has compelled brands to become more responsive to their customers. On the other hand, the damage it would cause can ' t be undone. The character tweets have now become a powerful tool for customers/users to directly convey messages to brands.For companies, these tweets carry
Hierarchical Indexes Hierarchical indexing means you can have multiple indexes on an array, for example: a bit like a merged cell in Excel, right?Select a subset of the data based on the index to select a subset of the data from the other layer:Select data in the same way as the index in the layer:Multi-index series conversion to Dataframe hierarchical indexes pl
pandas:powerful Python Data Analysis Toolkit Official document: http://pandas.pydata.org/pandas-docs/stable/1. Import Package PandasImport Pandas as PD 2. Get the file name under the folderImport osfilenames=[]Path= "C:/users/forrest/pycharmprojects/test" for file in Os.listdir (path): filenames.append (file) 3. R
For example we have the dataframe like this: SPY AAPL IBM GOOG GLD2017-01-03 222.073914 114.311760 160.947433 786.140015 110.4700012017-01-04 223.395081 114.183815 162.940125 786.900024 110.8600012017-01-05 223.217606 114.764473 162.401047 794.020020 112.5800022017-01-06 224.016220 116.043915 163.200043 806.150024 111.7500002017-01-09 223.276779 117.106812 161.390244 806.650024 112.669998...Now we only we want to get highli
SummaryThe use of Python for data analysis, you need to install some common tools, such as numpy,pandas,scipy, etc., during the installation process, often encountered some installation details problems, such as version mismatch, need to rely on the package is not installed properly, etc. This article summarizes the next few necessary installation package install
If you are not a python based classmate, it is recommended to download the installation Anaconda directly, which has integrated a variety of data analysis required modules, here do not repeat.
Download Address: https://www.continuum.io/downloads/
Here's how to install and use Python's pip to install each module method, Pip is a tool for installing and managing Python
Extra Extra, DataCleaner cleans your data--but only if your data is Pandas DataFrame instances. Developer Randy Olson said: "DataCleaner is not magic, it can't magically parse your unstructured data. ”
It can delete rows that contain missing data, or fill missing
values appearDf.boxplot (column= ' label 1 ', by = ' Label 2 ')Plt.show ()The data under label 1 can then be plotted in a numerical distribution according to label 2As indicated below, it has been classified according to the level of education, high-level wage extremes, and other conclusions can be obtainedNote: When you want to paint, the individual input drawing instructions can not display graphics, then you need to enter Plt.show () on another li
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.