Tips on python data statistics:

Source: Internet
Author: User

Tips on python data statistics:

I have recently used python for data statistics. Here I have summarized some tips for searching and summarizing recent usage, hoping to help some kids shoes in this area. Some tips are common. We do not pay attention to them at ordinary times. However, in specific scenarios, these small methods can be of great help.

1. Map keys to multiple values in the dictionary

{'b': [4, 5, 6], 'a': [1, 2, 3]}

Sometimes when calculating the same key value, we want to add all entries with the same key to a dictionary with the key as the key, and then perform various operations, at this time, we can use the following code for operations:

from collections import defaultdictd = defaultdict(list)print(d)d['a'].append(1)d['a'].append(2)d['a'].append(3)d['b'].append(4)d['b'].append(5)d['b'].append(6)print(d)print(d.get("a"))print(d.keys())print([d.get(i) for i in d])

Here we use the methods in collections. There are also many useful methods here, so we have time to continue to learn more.

The running result of the above Code:

defaultdict(, {})defaultdict(, {'b': [4, 5, 6], 'a': [1, 2, 3]})[1, 2, 3]dict_keys(['b', 'a'])[[4, 5, 6], [1, 2, 3]]

After filling in the data, we can quickly group the data. Then, we can traverse each group to count the data we need.

2. quickly convert dictionary key-value pairs

data = {...}zip(data.values(), data.keys())

Data is the data in our format. After zip is used for fast key-value conversion, you can then use functions such as max and min for data operations.

3. Sort dictionaries by public keys

from operator import itemgetterdata = [  {'name': "bran", "uid": 101},  {'name': "xisi", "uid": 102},  {'name': "land", "uid": 103}]print(sorted(data, key=itemgetter("name")))print(sorted(data, key=itemgetter("uid")))

The data format is data. to sort the name or uid, we use the method in the code.
Running result:

[{'name': 'bran', 'uid': 101}, {'name': 'land', 'uid': 103}, {'name': 'xisi', 'uid': 102}][{'name': 'bran', 'uid': 101}, {'name': 'xisi', 'uid': 102}, {'name': 'land', 'uid': 103}]

As we expected

4. grouping multiple dictionaries in the list based on a certain field

Note: Before grouping, sort the data, and select the sorting field according to the actual requirements.

Data to be processed:

rows = [  {'name': "bran", "uid": 101, "class": 13},  {'name': "xisi", "uid": 101, "class": 11},  {'name': "land", "uid": 103, "class": 10}]

Expected processing result:

{101: [{'name': 'xisi', 'class': 11, 'uid': 101},{'name': 'bran', 'class': 13, 'uid': 101}],103: [{'name': 'land', 'class': 10, 'uid': 103}]}

We group by uid. Here we only demonstrate that uid is generally not repeated.

This is more complicated. Let's take a step to break it down.

some = [('a', [1, 2, 3]), ('b', [4, 5, 6])]print(dict(some))

Result:

{'b': [4, 5, 6], 'a': [1, 2, 3]}

Here we aim to convert tuples into dictionaries. This is very simple and should be understandable. Next we will sort the processed data:

data_one = sorted(rows, key=itemgetter("class"))print(data_one)data_two = sorted(rows, key=lambda x: (x["uid"], x["class"]))print(data_two)

Here we provide two sorting methods with the same principle, but the style is slightly different. The first type of data_one is to directly use itemgetter, Which is sorted by a certain field based on what we used earlier, however, sometimes we have another requirement:

Sort by a field first. When the first field is repeated, sort by another field.

In this case, we use the second method to sort the values of multiple fields.
The sorting result is as follows:

[{'name': 'land', 'class': 10, 'uid': 103}, {'name': 'xisi', 'class': 11, 'uid': 101}, {'name': 'bran', 'class': 13, 'uid': 101}][{'name': 'xisi', 'class': 11, 'uid': 101}, {'name': 'bran', 'class': 13, 'uid': 101}, {'name': 'land', 'class': 10, 'uid': 103}]

The results are slightly different.

Next, let's take the last step and combine the two methods we just mentioned:

data = dict([(g, list(k)) for g, k in groupby(data_two, key=lambda x: x["uid"])])print(data)

We group the sorted data, generate a list of tuples, and convert the data into a dictionary. This means we have successfully grouped the data.

Some tips on python data statistics can be shared here. For more information, see.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.