Data analysis with pandas-(1)-getting started with matrices

Source: Internet
Author: User

1.Reading data into NumPy

NumPy is a Python module, which has a lot of functions for working with data. If you want to does serious work with the data in Python, you'll be using a lot of NumPy. We ' ll work through importing NumPy and loading in a CSV file.

2.Fixing the data types

If you are looked at the same data you are read in last screens, you are noticed that it looked very strange. This was because genfromtxt reads the data into a? NumPy? Array. Every element in an array have to is the same data type. So everything are a string, or everything is an integer, and so on. NumPytried to convert all of our data to floats, which caused the values to become strange. We'll need to specify the data type when we have read our data on so we can avoid.

3. Indexing the data

Now the We know how to read in a file, let's start pulling values out. Remember how all elements in a matrix has an index? We can print the item at row 1, column 2, by Typing?print world_alcohol[0,1]

4. Vectors

When we grab a whole row or column from the matrix, we actually end to with a vector. Just like a matrix is a 2-dimensional array because it had rows and columns, a vector is a 1-dimensional array. Vectors is similar to Python lists in, they can be indexed with only one number. Think of a vector as just a single row, or a single column.

5. Array shape

All arrays, whether they is 1-dimensional (vectors), both dimensional (matrices), or even larger, with a number of element s in each dimension. For example, the a matrix may has rows and columns. We can use the"shape" method to find these dimensions.

6. Boolean Elements

We can also use a Boolean statements on arrays to get truth values. The interesting part on this is, the Booleans is computed elementwise.

The above code would actually compare each element of the fourth column Of?world_alcohol, check if it equals? " Beer ", and create a new vector with the true/false values.

7. Subsets of vectors

We can subset vectors based on a Boolean vectors like the ones we generated in the last screen.

The code above would select and print only the elements with the fourth column whose value is "Beer". World_alcohol[:,3][beer]? goes through each position in the fourth column vector (from 0 to the last index), and checks if the beer vector is True a t the same position. If the beer vector is True, it assigns the element of the the fourth column at that position to the subset. If the beer vector is False, the element is skipped.

8. Subsets of matrices

We can subset a matrix in the same to that we can subset a vector.

The above code would print all of the rows in? World_alcohol? Where the "Type" column equals? "Beer". Note how because matrices is indexed using both numbers, we are substituting the Boolean vector? beer? For the first number. We can alter the second number to select different columns.

The above code would select the second column where the "Type" column equals? " Beer ".

9. Subsets with multiple conditions

So, we can find all of the rows that correspond to? "Algeria", for example. But what if, what do we really want are to find all the rows for? " Algeria "? In? "1985"?

We'll have the multiple conditions to the generate our vectors.

The code above would generate a Boolean that uses multiple conditions. How it works was that the parentheses specify, the component vectors should be generated first. (Order of Operations) Then the both vectors is compared index by index. If Both vectors is true at index 1 and then the resulting vector is true at index 1. If either vector is false at index 1, the result would be false at index 1. Here's an expanded example:

We can add more than 2 conditions if we want--we just has to put a?&? symbol between each one. The resulting vector would contain? True? In the position corresponding to rows where all conditions is True, and? false for the rows where any condition is false.

Convert a column to floats

We now know almost everything we need to compute what much alcohol the people in a country drank in a given year! But there is a couple of things we need to work through first. First, we need to convertthe? Liters of alcohol drunk "? Column (the fifth one) to floats. We need to does this because they is? stringsNow, and we can ' t take the sum of strings. Because they aren ' t numeric, their sum wouldn ' t make much sense. We can use the? Astype? method on the array to does this.

One. Replace values in an array

There is values in US alcohol consumption column that is preventing we from converting the column from floats to string S. In order to fix this, we first has the to learn how to replace values. We can replace values in a? NumPy Arrayjust assigning to them with the equals sign.

The code above would replace any item in the Alcohol consumption column that contains ' 0 ' (remember that the world alcohol Matrix is all? stringvalues) with ' 10 '.

Convert The alcohol consumption column to floats

Now so you know what's the bad value are, we can replace it and then convert the column to floats.

Compute The total alcohol consumption

We can compute the total value of a column using The?sum?method.

14.? Finding how much alcohol a person in a country drank in a year

We can subset a vector with another vector, as we learned earlier. This means, we can find the total alcohol consumed by any given country in any given year now.

. A function to sum yearly alcohol consumption

Now so we know how to find the total alcohol consumption of the average person in a country in a given year, we can make A function out of it. A function would make it easier for us to calculate the alcohol consumption for all countries.

? 16. Finding the country that drinks the least

We can now loop through our dictionary keys to find the country with the lowest amount of alcohol consumed per person in 1989 .

Data analysis with pandas-(1)-getting started with matrices

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.