The original statistics of the number of devices, with the IMEI and MAC address are all not allowed

Source: Internet
Author: User

First, the IMEI and Mac


IMEI Code by GSM(Global System for mobile Communications, Worldwide Mobile Communications Association) unified distribution, authorized BABT (British approvals Board of telecommunications, The British Communications Accreditation Administration Committee).

IMEI consists of 15 digits, each digit only uses 0~9 number, its composition is: 1, the first 6 digits (Tac,type Approval code) is "model approval number", the general representative of the model. 2, the next 2 digits (fac,final Assembly Code) is the "final assembly number", the general representative of the origin. 3, after the 6 digits (snr,serial number, factory serial number) is "serial numbers", generally represents the production sequence number. 4, the last 1 digits (SP) is usually "0", for the inspection Code, standby. IMEI code is unique, affixed to the logo on the back of the phone, and read and write in the phone memory. It is also the "file" and "ID number" of the handset in the manufacturer. such as: Sumsung of a gt-i9308 phone IMEI is: 355065 05 331100 1/01. Of these, 355065 is tac,05 is fac,331100 is snr,1 is sp,01 is the software version number.MAC (Media access control or medium access control) address, which is a translation of media access controls, or called physical addresses, hardware addresses, to defineNetwork equipmentthe location. In theOSI modelMiddle, third floorNetwork layerresponsible forIP Address, the second-tier data link layer is responsible for MAC addresses. So a host will have a MAC address, and eachNetwork locationthere will be an IP address that belongs exclusively to it. We sometimes mistakenly treat them as the physical address of the phone. Why is it that they are not allowed?


The IDs of most mobile statistics are generated from the system ID, including but not limited to IMEI, MAC, and Android IDs. The most famous ID is UDID, under the pressure of privacy, Apple eventually abandoned the UDID and MAC address.

Most Web site statistics are cookie-based and therefore transient IDs (temporal IDs). The Openudid is a typical transient ID.

Apple's IDFA and IDFV are system IDs, but they are also transient IDs.


2. ID Quality

The basis for distinguishing statistics is to establish a reliable identity identifier, which seems to be a very simple thing, just choose an ID, or artificially construct a class cookie ID, you can complete the independent user volume, retention and other analysis. But unfortunately, except for the udid that Apple has abolished, there is hardly a close-to-perfect ID.

To facilitate discussion, first ignore the existence of false data, assuming that each device has a real identity x. The goal of the distinguishable statistic is to select an appropriate identity I, so that the statistical results based on I are as consistent as X as possible.

First, we introduce two concept ID collisions (collision) and ID Drift (jitter).

ID conflict

For a device Cohort, it is always possible to measure the number of X and I in a certain time period, denoted by count (X) and count (i). If within a short enough period of time



1

Count (X) > count (I)



We call I an ID that has a conflict.

ID Drift

For a device Cohort, it is always possible to measure the number of X and I in a certain time period, denoted by count (X) and count (i). If in a long enough period of time



[Java] Plain Text View Copy Code

?

1

Count (X) < count (I)



Then we call I an ID that has a drift.

The IMEI of an Android device is an ID with a serious conflict, which, according to our estimates, has a conflict rate greater than 3%. This is because the IMEI of many shanzhai machines is the same.

The Mac of the Android device is also a conflicting ID, because many of the VMS are the same Mac. In addition, the MAC is also a typical presence of severe drift ID, which is because the Android source code has a randomly generated MAC address after the 24-bit codes have been abused

Qualitative analysis

Next, we can qualitatively analyze the impact of ID collisions and drift on statistics:

When an ID is only in conflict, dau and installation using this ID will be underestimated, but it is possible to overestimate the retention. However, these effects are moderate, for example, a 5% ID conflict only causes Dau to be underestimated at most 5%, while the effect on retention can be negligible.

When an ID is only drifting, the dau and installation using this ID are overestimated and will affect retention. When the drift is large, the impact on the statistical indicators is dramatic. For example, a daily drift of 5% ID may cause dau to be overestimated by 2%, but will cause 5% false installs per day (this is because the drift will affect all users, including inactive users), while the fake installation of the retention in the short term high, but long-term retention is low (short-term drift will be high , the time is long, drift will be low). The ID of any kind of cookie will have a similar nature, so traditional Web site statistics are turning to more reliable device fingerprints.

When an ID has both a conflict and a drift, the dau and installation using this ID are completely unreliable. In the case of MAC addresses, the MAC address of this part of the device with drift changes frequently, resulting in a large number of spurious installations, with very low retention rates. For applications with a small number of users, the consequences of choosing such an ID are catastrophic.

In summary, when the drift and conflict of the IDs are small enough, they can be ignored for distinguishing the statistical effects. When these errors are not negligible, the impact of ID collisions is moderate, while the drift of IDs can seriously disrupt installation and retention statistics.


Android Platform

As for the Android platform, the choice of ID has always been a headache due to the openness of the ecosystem.

(1) Single ID

As mentioned earlier, the IMEI and Mac are not the best IDs. In particular, the MAC address is almost an unusable ID.

(2) Combination ID

Some developers choose to combine multiple IDs into a single combination ID, such as


1

CID = MD5 (imei+mac+android_id)



Using the previous analysis is not difficult to draw, the combination ID will greatly reduce the conflict, but will enlarge the drift. For combination IDs, the drift of any one source ID will cause it to drift.

Developers should try to avoid the CID, be sure to use and avoid using MAC addresses. If you are already using CID, be sure to persist the CID as a cookie ID in the next release and regenerate the CID only if the cookie is lost. Such a strategy can ensure the continuity of the ID as much as possible, while mitigating the impact of drift.


The original statistics of the number of devices, with the IMEI and MAC address are all not allowed

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.