Say Business First,
In the Flatmap phase, the current indicator will be in Redis to verify whether the data is a new user, if it is a new user, then statistics, statistics will add the new user to our Redis container ...
This business is actually very simple ... But the pit itself is, flatmap after the RDD, will be used by two different action ...
And then... I have found that the second action is never a new user in statistics .....
Don't you know the reason why you see the problem? I took a day off to find out why.
Because Flatmap is a transform, each action is executed. And I do two action so Flatmap will execute two times, the result is the second execution, certainly a new user will not have ah ...
Good pit
Good pit
The pit is killing me ....
Take a look at the problem of sparkstreaming because data validation with Redis is causing incorrect results