Absrtact: The Big Data world is full of small, fragmented start-ups that use their ingenuity and open-source software to build complex systems. But Disney is by no means one of them, and this article tells the story of how the Fortune 100 companies build a big data platform from scratch.
There is no doubt that Disney is a huge entertainment company, but when it comes to big data platforms, the entertainment giant looks more like a start-up. Many small companies, relying on a strong will and extraordinary wisdom, with a small team, using Hadoop, NoSQL database and other open source technology, can completely create a unique large data platform. Whether the result is better or worse, such large companies are moving into large data areas, meaning that this is a completely different rule of "play".
Arun Jacob, head of Disney's Big Data technology and Services Solutions team, introduced Disney's Big Data platform at the Thursday Big Data Innovation Summit in Boston's IE group. Unlike other companies, Disney chose to build large data platforms from scratch, not from a software vendor. While high costs are an important factor, the key is the question of flexibility, which is the root cause of the final decision.
Reducing, reusing, recycling
In order to guarantee to bring the biggest value to the company, the Disney Big Data platform, condenses each staff's painstaking effort, it is everybody's everything, this is a formidable task. At first, Jacob said, "We think of ourselves as a small advisory body, just that we have something to sell." "But when a department wants to use the platform to develop a specific feature, Jacob acts immediately."
Architecturally, through this platform and a specific purpose component, it can refactor path data, or it can easily replace components, if any. Disney's Big Data platform is developed based on Hadoop,cassandra and MongoDB. The operating team can use the platform to view, analyze, and index error messages, and application developers can get the high throughput, low-latency data access they need, while the analysis team has access to the high latency data they need.
However, while Jacob also wants to use a set of Open-source software to cut costs, he does have the luxury of being able to afford to splurge, a budget that most startups don't have to play-outsourcing and unscheduled new products. When he needs the support of the Hadoop cluster, he can call Cloudera. When it is necessary to deploy Solandra (an open source search engine based on SOLR and Cassandra), he can completely buy the Enterprise version of DataStax based on Cassandra Products, but he did not.
Flexibility is not free
Solandra events are actually meant to weigh the need to use free Open-source software. "You can work on open source projects late into the night, you can learn to run them, but that doesn't make any sense." "Jacob says that if you are willing to devote time and energy, these things can be completely overcome."
But there are more problems for Disney-scale companies that have to be overcome. Jacob says that while the deployment process can address fault tolerance, high availability, and security in its own way, it ultimately needs to find a way to implement these things.
The best for the public
While the system can be built on open source software that all people can use, it also means that there is not enough framework to build a scalable and stable system, and that the system must meet the needs of thousands of different types and levels of in-house developers. Jacob said it would be easy for a start-up with six people to learn Hadoop for one months and then start using it to deploy large data platforms. But for a big business, this is absolutely not feasible.
His team made deployment easier
To eliminate the excuse that enterprise users cannot load their data into the system, they only need to point the file to a user-defined interface. Jacob said that Disney's data platform, although with 5TB of data per day dramatically, but there are still many other types of data need to be stored. Because they've encapsulated the technology, Jacob's team didn't talk much about Hadoop and MongoDB, just the parts of the analysis and the query. The data platform uses many kinds of programming languages to build the client's frame structure, so developers can interact with the platform without having to write restful API calls.
However, after doing all the preparation, Jacob began to devote his energy to Disney's big Data platform, and Jacob did not want it to be a duplication of other data-platform processes. As the tools for big data management are getting better, Jacob says he is still analyzing, building a new tool and buying a tool, and there is still time to make changes. While it is a good choice to build a custom tool when there is no choice, it is not always wise to buy something that is ready to save countless time and effort.
If you want more technical details about Disney's Big Data platform, you can click to download a PDF document that was recently presented by Jacob at the Cassandra Summit, but unfortunately the link has been removed because, given the confidentiality of the data, the following are just excerpts from the PDF, Hope to help you:
Large data in Disney role played:
Data Management Platform:
Data Management platform Objectives:
Collect, search, analyze Application data:
Evolution of Use cases:
Recommendation Engine:
PostScript: Disney all known as The Walt Disney Company, named after its founder, Walt, is a large multinational corporation headquartered in Burbank, which includes entertainment production, theme parks, toys, books, video games and media networks. Pixar Animation Studios (Pixar Animation Studio), Marvel Comics (Marvel Entertainment Inc.), Touchstone Studios (Touchstone Pictures), Miramax (Miramax) film company, Buena Vista Home Entertainment, the Hollywood Studios (Yap Pictures), ESPN Sports, ABC are all its companies (brand names).
Today due to the need for the article, hands-on inquiry about Disney's information, the heart is also incomparable shock, most people think of Disney, perhaps is a film animation company, think a little more words, may be Hong Kong Disneyland, I believe you see Hollywood film company, the Heart will be a strong shock. With the advent of cloud computing, big Data has become hot, as one of the most important components of cloud computing, the big data is getting more and more attention, today Disney also played a "Cross-border", began to enter large data areas. But cloud computing and the film industry are linked early, with the Oscar-nominated Pixar film's 3D film "Toy Story 3", which is truly exquisite, thanks to cloud computing. The Dream factory, the owner of Kung Fu Panda 2, which is familiar to Chinese audiences, has been moving towards cloud computing in 2003. Cloud Computing provides a platform for production, storage, and processing of many film and television productions, and plays a key role in providing the computing resources needed to make these films. We also expect Disney to shine on the new platform. (compilation/Wang Peng, revisers/Bao)
(Responsible editor: Lu Guang)