8 Python techniques for Efficient data analysis

Source: Internet
Author: User

A line of code defines the list

The following is a comparison of creating a list with a for loop and creating a list with one line of code.

x = [1,2,3,4]out = []for item in x:    out.append(item**2)print(out)[1, 4, 9, 16]# vs.x = [1,2,3,4]out = [item**2 for item in x]print(out)[1, 4, 9, 16]
Lambda expression

Tired of defining functions that can't be used a few times? Lambda expression is your savior! Lambda expressions are used to create small, disposable, and anonymous function objects in Python. It can create a function for you.

The basic syntax for a lambda expression is:

lambda arguments: expression

Note that as long as there is a lambda expression, you can do whatever the general function can do. You can see the powerful features of lambda expressions in the following example:

double = lambda x: x * 2print(double(5))10
Map and filter

Once you have mastered lambda expressions, learning to use them in conjunction with the map and filter functions allows for more powerful functionality.

Specifically, map performs some action on each element in the list and transforms it into a new list. In this example, it iterates through each element and multiplies it by 2, forming a new list. Note that the list () function simply converts the output to a list type.

# Mapseq = [1, 2, 3, 4, 5]result = list(map(lambda var: var*2, seq))print(result)[2, 4, 6, 8, 10]

The filter function accepts a list and a rule, just like a map, but it returns a subset of the original list by comparing each element and the boolean filter rule.

# Filterseq = [1, 2, 3, 4, 5]result = list(filter(lambda x: x > 2, seq))print(result)[3, 4, 5]
Arange and Linspace

Arange returns a linear list of the given step size. Its three parameters start, stop, step, respectively, indicate the starting value, the end value, and the step size, note that the stop point is a cutoff value, so it is not included in the array output.

# np.arange(start, stop, step)np.arange(3, 7, 2)array([3, 5])

Linspace and arrange are very similar, but slightly different. Linspace divide the interval evenly by a specified number. So given the interval start and end, and the number of split points Num,linspace will return a numpy array. This is especially useful for data visualization and declaration axes when plotting.

# np.linspace(start, stop, num)np.linspace(2.0, 3.0, num=5)array([ 2.0,  2.25,  2.5,  2.75, 3.0])
What does axis stand for?

In pandas, you may encounter axis when you delete a column or sum values in the NumPy matrix. We use the example of deleting a column (row):

df.drop(‘Column A‘, axis=1)df.drop(‘Row A‘, axis=0)

If you want to work with columns, set axis to 1, and if you want to work with rows, set it to 0. But why? Recall the shape in pandas

df.shape(# of Rows, # of Columns)

Calling the Shape property from Pandas Dataframe returns a tuple, the first value represents the number of rows, and the second value represents the number of columns. If you want to index it in Python, the number of rows is labeled 0 and the number of columns is labeled 1, which is much like how we declare axis values.

Concat,merge and joins

If you are familiar with SQL, these concepts may be easier for you. In any case, these functions are essentially a way of combining dataframe in a particular way. It can be difficult at which time to track which one is best for use, so let's review it.

Concat allows the user to append one or more dataframe (depending on how you define the axis) below or next to the table.

Merge merges multiple dataframe to specify the same row as the primary key (key).

Join, like merge, incorporates two dataframe. But it does not merge by a specified primary key, but is merged by the same column name or row name.

Pandas Apply

Apply is designed for the pandas series. If you're not familiar with series, you can think of it like an array of numpy.
Apply applies a function to each element on the specified axis. With apply, you can format and manipulate the values of the Dataframe column (which is a series) without looping, which is useful!

df = pd.DataFrame([[4, 9],] * 3, columns=[‘A‘, ‘B‘]) df   A  B0  4  91  4  92  4  9df.apply(np.sqrt)     A    B0  2.0  3.01  2.0  3.02  2.0  3.0 df.apply(np.sum, axis=0)A    12B    27df.apply(np.sum, axis=1)0    131    132    13
Pivot Tables

Finally, pivot Tables. If you are familiar with Microsoft Excel, you may have heard about PivotTables. The pandas built-in pivot_table function creates a spreadsheet-style PivotTable report in dataframe form, which helps us to quickly see the data in a few columns. Here are a few examples: very smart to group data according to "Manager"

pd.pivot_table(df, index=["Manager", "Rep"])


Or you can filter property values

pd.pivot_table(df,index=["Manager","Rep"],values=["Price"])

8 Python techniques for Efficient data analysis

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.