Big data is handled in two ways: memory-based streaming and hard disk-based storage processing.
Streaming is like building a sluice in front of the data. Data flow through this, through the gate, filtering filtering, analysis of valuable content, and then discarded, and later no longer used.
Storage processing is the building of a reservoir. The data is first put into the reservoir and stored up, when needed, then into the reservoir, in which to sift through the analysis, to find those valuable content. This process, because the water is still in the reservoir, did not release, so you can continue to use the next time.
The data processing of the storage mode can be repeated, used and reused. However, because of the mechanical characteristics of the hard disk itself, resulting in slow processing speed, the rate is not high. But now there are some optimizations for hard drives.
Streaming because the processing of data in memory, memory processing performance is a number of hard disk, so its processing rate is much higher than the storage mode. But also because the data resides in memory, the characteristics of memory is the loss of power, can only be used once. So streaming is usually done with disposable, like a sanitary napkin.
In big data products, Spark is streaming, Laxcus, and Hadoop are storage processing.
Two ways to deal with big data