How to do an algorithm to estimate a time to crawl to an app daily download volume

Source: Internet
Author: User
Keywords Php
Suppose I go to a certain website every day, such as a pea pod that crawls a download of an app, and if it's a fixed time crawl every time, I'm sure I can catch every download.

But I can not do a fixed time every day crawl, I can only do not fixed time to crawl once a day, how to use the algorithm to estimate the number of downloads a day.

Reply content:

Suppose I go to a certain website every day, such as a pea pod that crawls a download of an app, and if it's a fixed time crawl every time, I'm sure I can catch every download.

But I can not do a fixed time every day crawl, I can only do not fixed time to crawl once a day, how to use the algorithm to estimate the number of downloads a day.

Math problems

How to estimate the results when the data is not enough? The first step, make assumptions, limit it.

1) The simplest kind of hypothesis, the average number of user downloads between times of every two fetches.

Yesterday, today, tomorrow the total amount of capture, s0,s1,s2;
Yesterday, today, tomorrow crawl time Point, t0,t1,t2;

So today's total = (T1-today 0 O'Clock)/(T1-T0) (S1-S0) + (today 24 o'clock-T1)/(T2-t1 ) (S2-S1);

This value is sufficient for the general estimate of the total daily download.
But the disadvantage is that the user download frequency at the acquisition point mutation is not common sense, if the app is new or meet the promotion or hit the point of outbreak, this estimate of the single-day deviation will be very large.

2) A more detailed hypothesis: The user download times change is smooth, will not mutate.

You can refer to the Bezier formula for the problem of multi-point even smoothing curves. I will not do the derivation, write only a few features.

When the data is long enough, the curve of the simulation is smooth, but when the data is small, the detail data just looks real, but it is not worth reference.
Multipoint curves, adding new points (and collecting data for a day), the overall curve will change, unstable.

In summary, use the first kind, who let the data is not enough.

  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.