Suppose I go to a certain website every day, such as a pea pod that crawls a download of an app, and if it's a fixed time crawl every time, I'm sure I can catch every download.
But I can not do a fixed time every day crawl, I can only do not fixed time to crawl once a day, how to use the algorithm to estimate the number of downloads a day.
Reply content:
Suppose I go to a certain website every day, such as a pea pod that crawls a download of an app, and if it's a fixed time crawl every time, I'm sure I can catch every download.
But I can not do a fixed time every day crawl, I can only do not fixed time to crawl once a day, how to use the algorithm to estimate the number of downloads a day.
Math problems
How to estimate the results when the data is not enough? The first step, make assumptions, limit it.
1) The simplest kind of hypothesis, the average number of user downloads between times of every two fetches.
Yesterday, today, tomorrow the total amount of capture, s0,s1,s2;
Yesterday, today, tomorrow crawl time Point, t0,t1,t2;
So today's total = (T1-today 0 O'Clock)/(T1-T0) (S1-S0) + (today 24 o'clock-T1)/(T2-t1 ) (S2-S1);
This value is sufficient for the general estimate of the total daily download.
But the disadvantage is that the user download frequency at the acquisition point mutation is not common sense, if the app is new or meet the promotion or hit the point of outbreak, this estimate of the single-day deviation will be very large.
2) A more detailed hypothesis: The user download times change is smooth, will not mutate.
You can refer to the Bezier formula for the problem of multi-point even smoothing curves. I will not do the derivation, write only a few features.
When the data is long enough, the curve of the simulation is smooth, but when the data is small, the detail data just looks real, but it is not worth reference.
Multipoint curves, adding new points (and collecting data for a day), the overall curve will change, unstable.
In summary, use the first kind, who let the data is not enough.