Netflix recently open source a tool called Suro, which the company can use to do real-time orientation of the data source host to the target host. Not only does it play an important role in Netflix's data pipeline, but it's also impressive for large-scale applications.
Netflix's various applications generate tens of billions of of events per day, Suro can be collected before data is sent, then partially via Amazon S3 to Hadoop batch, and another part via Apache Kafka to Druid and Elasticsearch do real-time analysis. From the Netflix blog, the company is also considering how to enable Suro to support real-time processing engines such as Storm or Samza to perform machine learning of event data.
People familiar with big data know that many technologies are linked to companies such as Netflix creating Suro, LinkedIn creating Kafka and Samza, and Twitter creating storm,metamarkets creating Druid. Suro Blog also admits it is based on the Apache Chukwa project, similar to Apache's Flume, Facebook's scribe. Admittedly, the most notable of these projects is Hadoop.
Why companies have to build their own technology has always been a controversial hot spot, because their needs are generally created, just like many things in life, but the answer to this question has to be specific analysis of specific problems. Storm, for example, is becoming a very popular streaming tool, but LinkedIn feels it needs something different, so create Samza. Instead of using some of the existing technologies, Netflix created Suro, largely because the company is a heavy cloud service user (primarily based on AWS), but there are also non-AWS businesses, including the Apache Cassandra Database.
The ultimate winner of this technology innovation is necessarily due to the use of these mainstream technology users, without the need to recruit professionals within the company, you can let the company from these open source technology benefits. For example, we've seen Hadoop vendors trying to make storm and spark frameworks available to their enterprise customers. At the same time, we also believe that Hadoop is definitely not the last such technology. AWS has a very good number of users, after all, they want to suro the capabilities that such technology provides, rather than being bundled with AWS-launched services.
[Big Data-suro] Netflix open source data stream manager Suro