Absrtact: 7 years ago, one of the ideas, the success of today's popular social network and microblogging service--twitter. Twitter now has more than 200 million monthly active subscribers, and about 500 million tweets are sent every day. Behind all this is the support of a large number of open source projects.
Twitter, known as the "Internet SMS Service", allows users to post no more than 140-word microblogging, the idea from Twitter co-founder Jack Dorsey, who was dubbed by analysts as "the dumbest" idea 7 years ago, But today has become a popular worldwide social network and micro-blogging services, the monthly active users reached 218.3 million people, every day about 500 million tweets (tweets) were sent, almost every second produced more than 6,000 tweets.
November 7, 2013, Twitter officially listed on the New York Stock Exchange, offering a price of 26 U.S. dollars, but the opening rose 73% to 45.1 U.S. dollars.
Twitter, which can be said to be built on open source projects, says that if there is no Open-source software, Twitter will not exist, and every tweet that users send and receive on both the mobile and PC ends will require open source software.
At Twitter, when planning a new project, engineers first measure requirements and the ability of open source projects, and tailor open source projects to better meet demand. That is why Twitter has grown so quickly and has easily solved the ever-increasing volume of traffic and requests.
Open source projects used by Twitter
Twitter needs to deal with more than 400 million tweets a day, plus a large number of timelines (all tweets from people who are concerned), and the project is quite vast and complex. Twitter uses a lot of open source projects, from a variety of tools to a variety of libraries. Without these open-source projects, the day-to-day work will not work.
Let's see what Open-source projects are used behind the tweets.
1. Analysis and search services
Twitter's search service supports more than 1 billion queries a day, and the Open-source projects behind it include:
Apache Cassandra: A distributed NoSQL database system, based on Amazon's proprietary, fully distributed dynamo, combines the Google BigTable data model based on the column family (columns accessibility), is an ideal database for online social computing. The project was originally developed by Facebook and was donated to the Apache Foundation in 2008 Open source.
Apache Hadoop: The Distributed system infrastructure developed by the Apache Foundation enables applications to take full advantage of the power of clustering for high-speed computing and storage, allowing users to develop distributed programs without knowing the underlying details of the distribution.
Apache Lucene: A full-text Search engine toolkit designed to provide software developers with an Easy-to-use toolkit to facilitate full-text retrieval in the target system, or to build a complete full-text search engine on this basis.
Apache Pig: A large scale data analysis platform based on Hadoop, which provides a class of SQL language called Pig correlation, which translates SQL-class data analysis requests into a series of optimized mapreduce operations. Pig provides a simple operation and programming interface for complex mass data parallel computing.
2. Server and Storage
Twitter needs to store tweets sent by daily users to the database and push them to other relevant users. The open source projects used in this process include:
Linux: mainly for Twitter servers.
Memcached: Primarily for Twitter's caching infrastructure, the role is to speed up dynamic Web applications and reduce database load.
Mysql: A popular Open-source relational database, used by Twitter to store Twitter messages.
Node.js: A JavaScript toolkit for writing high-performance Web servers, used on Twitter for queue processing (receiving tweets and writing to the database), enabling the server to process each connection without blocking the channel.
3. The Twitter Engineer's toolbox
Apache Subversion: Open source Version control system
Git: A distributed version control system
Eclipse: The famous Java IDE.
Gerrit: A web-based code review and Project management tool, primarily for projects based on GIT version control systems
Jenkins: A continuous integration engine that is designed to continuously and automatically build/test software projects and monitor some tasks that are timed to perform
RSpec: a BDD test tool
4. The programming language and framework behind Twitter
Open source version of Openjdk:java. Twitter has moved some projects from rails to Java.
Python: An efficient, dynamically interpreted web programming language.
Ruby and Ruby on Rails:twitter were initially developed primarily by Ruby and Rails.
One of the main application programming languages used by Scala:twitter, many of Twitter's infrastructure is written in Scala.
Clojure:clojure is a Lisp dialect running on the Java platform that can take advantage of the power of Lisp in any place with Java virtual machines. Twitter's large data processing system storm is based on Clojure.
Drupal: Open Source Content Management Framework (CMF), written in PHP, is composed of content management system (CMS) and PHP development framework. The Twitter developer community is built on Drupal.
Sinatra: A lightweight, fast ruby development framework.
5. Twitter's front end solution
JQuery: The most widely used JavaScript framework in the world.
Pager: A wide range of CSS preprocessor, with simple syntax and variables to extend the CSS, can reduce the amount of code in the CSS.
MooTools: A concise, modular, object-oriented Open source JavaScript framework that provides developers with a Cross-browser JS solution
Zepto.js: A lightweight JavaScript framework primarily for mobile development
6. Twitter Service Development Framework
Twistedmatrix: A python framework for developing non-blocking asynchronous network services and applications.
Netty: An asynchronous, event-driven Web application framework and tool for rapid development of high-performance, highly reliable network servers and client programs. Netty is currently the Kestrel Communication module for Twitter's core queue.
Apache Thrift: An Open-source remote service invocation framework for Facebook, which uses interface description language to define and create services that support extensible Cross-language Service development that includes a code generation engine to create efficient, seamless services in multiple languages.
The Open-source project of the Twitter company
Twitter has benefited a lot from the open source community, and Twitter has been giving back to the community, opening up a lot of infrastructure and tools so that other businesses and developers don't have to reinvent the wheel to achieve what they need more quickly on the basis of these open-source projects.
1. Large Data processing
Scalding: A scala API for cascading. Cascading is an API built on Hadoop to create complex and fault-tolerant data-processing workflows that abstract cluster topologies and configurations, allowing developers to quickly develop complex and distributed applications without considering the mapreduce behind them.
Summingbird: Allows developers to write mapreduce programs in a way that is similar to local Scala or Java, and executes them in most well-known distributed mapreduce platforms, including storm and scalding.
2. Front-End projects
Bootstrap: A toolkit for front-end development that includes basic CSS, HTML components, including typography, forms, buttons, tables, grids, navigation, and more.
Twui: The UI Framework for the MAC platform that supports hardware acceleration is inspired by Uikit.
Typeahead.js: A fast, full-featured, automatic completion Library
Hogan.js: A compiler for a mustache template language
3. Back-end Services
The Mysql branch of Twitter Mysql:twitter
Parquet: A tabular storage format that is used by Hadoop within Twitter for all projects in the Hadoop ecosystem to support efficient compression of tabular data, regardless of data processing frameworks, data models, or programming languages.
Finagle: A library that allows developers to build asynchronous RPC servers and clients using Java, Scala, or other JVM languages, primarily for Twitter backend services.
Iago: A load generator that is used to perform a traffic load test before the product is officially released.
Twemproxy: A fast, lightweight memcached and Redis proxy Server
Zipkin: A distributed tracking system. It is used in Twitter to collect monitoring data on various services and provide a query interface.
4. Twitter Infrastructure Common Library
Commons:twitter python, JVM Common Library
Reusable code libraries for Util:twitter
Cassovary: A simple large graphics processing library based on the JVM
5. Open source projects after the acquisition of other companies
Twitter has also bought companies and released their software in Open-source form.
Storm: This is a real-time data processing framework similar to Hadoop, originally developed by Backtype, later Backtype was acquired by Twitter, and Twitter will be Storm as its real-time analytics system.
All of Whisper Bae's projects: Whisper Bae is a mobile security start-up that delivers enterprise-class security and management solutions for Android handsets and tablet users. Twitter bought the company in December 2011 and then announced that it would gradually open whisper bae all software source code.
More open source projects: http://twitter.github.io/
The open source atmosphere within Twitter
1. More free than Google
Although Google has a large number of open-source projects, the level of open source has not been completely Twitter. In the data center, for example, Google does a lot of secrecy, and at Twitter, it's much more open, and there's more room for staff to experiment freely.
In some large enterprises, the software or systems used are fairly fixed, and employees need to develop things on that infrastructure. Twitter, according to Twitter employees, allows employees to try new and different things, and even allow different languages and open source projects to refactor some of Twitter's services.
Google's "20% free Time" has been a relish, and now the benefits have been canceled. On Twitter, companies organise hackweeks every quarter, and employees can spend a week working on a variety of projects that don't need to be related to their day-to-day responsibilities.
2. Training open source technology within the company
In August 2013, Twitter acquired Marakana, a company dedicated to open source technology training, and set up the Twitter University (Twitter University), which aims primarily at providing in-house staff with richer training resources, Also hope to attract better engineering and technical talents to join the company.
Twitter will continue to open up to the public, and Twitter will also put some of its educational resources online. such as Scala_school (this is a series of tutorials for the Scala programming language).
Twitter support for Open Source Foundation
Twitter has also sponsored some open-source foundations and organizations with money and code.
ADA Initiative: An organization that supports the technical and cultural participation of women in open source technology.
The Apache Software Foundation: Twitter engineers are also involved in some of the Apache Software Foundation projects.
Eclipse Foundation
Jcp:java Community Process, primarily responsible for developing Java specifications and standards.
Linux Foundation: responsible for coordinating and promoting the development of Linux systems.
Oin:open Invention Receptacle (open Innovation Network), a body designed to reduce the pressure on Linux developers to be sued by patent lawsuits.
Openjdk:java Open Source implementation.
Summary
Twitter created an "open source office" in 2011 to support open-source organizations that are critical to Twitter. This is enough to be seen as the importance of open source for Twitter.
"If you spend a bit of energy in the open source community, you will realize the positive impact of open access to the whole world, with an open mind and a strong open source environment on Twitter," says Chris Aniszczyk, head of open source for Twitter. And every employee has a chance to get involved. At the same time, Twitter is thankful for the great work done by the open source community and will keep a healthy relationship with the open source community.
Looking back at home, some large internet companies are also starting to pay attention to open source, while using open source projects to build basic services, but also do not forget to repay the open source community. "Enterprise Open Source Series" will also focus on the domestic open source enterprises, leading everyone to understand a new domestic open source ecology.
For more information, refer to the PowerPoint presentation at the LinuxCon EU 2013 conference by the Twitter source leader, Chris Aniszczyk, and the Open-source site of Twitter.
Original link: http://blog.ithomer.net/2014/04/twitter-tweets-behind-open-source/