The concepts of stream processing, real-time computing, add-hoc, offline computing, and real-time query often increase in data processing. Here, we will simply sort out their differences.
Stream processing and real-time ComputingComputing when data changesIt is applicable to scenarios with high real-time requirements for data computing and real-time response results. Generally, in seconds, storm of Yahoo S4 and twiter belongs to stream processing and real-time computing.
Both add-Hoc and Real-Time queryWhen calculating the query,Real-Time queryIt is a result of ever-changing response, which cannot be obtained and stored through enumeration in advance. Different responses need to be queried in real time based on different user input, which has high real-time requirements, such as hbase, memory Database queries such as redis and MongoDB; add-hoc is a scenario with low real-time requirements. Add-hoc is a solution for a variety of temporary and custom needs. For example, hive has uncertain requirements and can solve various problems by writing SQL statements.Doop Real-Time queryImpala solves the real-time requirement and is more efficient than hive.
Offline computing is generally a process of batch processing databases, such as using hadoop mapreduce; memory computing spark is similar to mapreduce, but data is stored in the memory, which is more efficient.