The ten-year technology road of Taobao

Source: Internet
Author: User
Tags connection pooling isearch
recently fortunately, in the school library to borrow Mr. Ziliu's "Taobao technology this decade," read some, feeling extremely. first, Taobao's core technology (domestic and international top, which is still 2011 data)The country has the largest distributed Hadoop cluster (ladder, 2000 nodes, 24000 kernel CPU,48000GB memory, 40PB storage capacity) distributed 80+CDN node, can automatically find the nearest node to provide services, support traffic over 800Gbps, Enough to drag down the flow of a city to Baidu's search engine, search for billions of of commodities, the world's largest electric business platform, the top of the load balancing system, the top distributed systems, the top of the Internet ideas, functions and diverse operation of the extremely stable and rich ecological industry and advanced data mining technology ... A lot of second, the birth of Taobao

Horse in April 7, 2003 secretly called Alibaba's 10 employees, came to Hangzhou a secret roughcast room, asked them in one months or so time to make a c2c website.

The result, of course, is to buy directly, a website based on lamp architecture, formerly known as Phpauction, an auction site developed by the old us. Of course, you have to make changes to use. (as a once used to develop a front-end page with the United States to develop their own blog students, do feel the use of other people write more convenient lazy-_-, but I believe that the virtual bamboo, Sanfeng, Dolong and other predecessors are sufficient strength to develop their own website-or the horse always urge the tight)

At that time, the deep pockets of ebay is China's swagger, and SARS rampant, may be a new understanding of the online shopping. And Taobao deliberately keep a low profile, even Ali's employees do not know that this is their own company's products.

Taobao employees actively answer the user's question, early in the black, the way to exercise the body is inverted.

Taobao's function is also constantly improving, publishing, management, search, details, purchase and so on, the server has become three. Because of the large amount of data, Taobao search is very slow (using like matching ...). , more than the predecessor of the Alibaba search engine ISearch moved over.

At that time MySQL's default storage engine MyISAM will lead to read and write lock wait too long and so many problems, so the accident is still a lot of.

At the end of 2003, Taobao registered users 230,000, PV 310,000/day, half a year turnover of 33.71 million. third, Taobao update

It is obvious that MySQL cannot afford such a large amount of traffic, the database bottleneck appears. Thankfully, Ali's DBA team is strong enough to use Oracle instead of MySQL.

Oracle has already had a powerful concurrency access design--connection pooling, which is much less expensive to connect from a connection pool than a separate connection. But PHP did not provide the official support language connection pool characteristics, so Dolong predecessors with Google (not Baidu) to search an open source of SQL Relay, so the database software bottlenecks temporarily resolved.

But there was not enough hardware, Ali bought the NAS (and later bought EMC's low-end storage for a serious delay), and with Oracle High-performance RAC, the hardware capacity was temporarily out of the question.

Open source is a good thing, but bold use is also a trial process, SQL Relay will frequently lead to deadlock problems, causing engineers have to periodically restart the service, from the description of the book can be seen, Taobao engineers are really very hard.

Taobao will not stop at only for sellers and buyers to provide a trading site only, but also need to establish a comprehensive third-party system to ensure that the seller and buyer of the transaction between the security, so Alipay was born. More troublesome is, although there are many banks opened the network of silver interface, but can not even guarantee the payment will be deducted after the success, or need engineers hard scrupulous to reconcile ...

Taobao in order to facilitate the exchange of users, developed an IM software-wang Wang, not only to the buyers and sellers use, Ali internal also use Wang Wang Exchange. Iv. First Milestone

Because the problem with SQL relay is really too serious, 2004 so Taobao finally made the decision of the Trans-era-use Java rewrite site (applause ~ ~ ~ ~).

Yes, Taobao asked the Sun's senior engineer to help do the Java architecture. So how do they modify the programming language without changing the use of the site--modular replacement, today wrote a module, opened a new domain name, will connect to the module, while other modules unchanged, wait until the completion of all modules, the original domain name to give up.

The framework used: Taobao's architects developed their own MVC framework--WEBX based on Jakarta Turbine. and Sun insists on using EJB as the control layer (presumably only they can play through EJBs), and with Ibatis as a persistence layer, a scalable and efficient Java EE application is born. Byw, Alipay is also a sun engineer with the same architecture design.

After sending away the Sun's Daniel, Ali's data storage again met the bottleneck, so reluctantly bought an IBM minicomputer (I guess at least millions other ...), there are IoE (IBM + Oracle + EMC) such a legend.

At the end of 2004, Taobao registered users 4 million, PV 40 million/day, the total network turnover of 1 billion. v. Redouble our efforts

Oracle also has the upper limit, which is not supported by an Oracle server when the number of levels is "billion". The DBAs divided the data into two databases and decided which data to query by the first digit of the ID. For example, ' 0 ' to ' 7 ' is placed in a database, ' 8 ' to ' F ' is placed in the B database, and general information is placed in the C database. But how do you query both the ' 3′ ' start and the ' E ' data? A database routing framework (Dbroute) is written by the architect, which handles the merge problem uniformly and transparently to the upper layer.

Spring was born, and the spring framework was an integral part of Web applications, and in Taobao, Spring reached the point that Rod Johnson designed it to replace EJB.

2005 end, Taobao registered users 13.9 million, PV 89.31 million/day, the number of 16.63 million products.

To tell you the truth, I really admire, such a large number of visits can be so strong, but, in view of the future development, such a facilities structure is barely able to meet the current requirements. As a result, CDN technology came in handy, the first use of commercial ChinaCache, and later use of Dr Zhangwensong to build a low-power CDN network, Taobao performance is getting better.

At the end of 2006, Taobao registered users 30 million, PV 150 million/day, the number of goods 50 million, the total net turnover of 16.9 billion yuan. Vi. creation of technology

In order to consider the fairness of the transaction, Taobao increased the transaction snapshot function, the current transaction page as a picture of the form of preservation, Taobao's trading volume so large, brought a problem-the debris picture too much, 2010, Taobao back end of the store 28.6 billion photos.

Taobao used NetApp's commercial storage system before 2007, but it was still not enough to cope with the rapid growth trend. In the same year, Google unveiled the GFS design idea, in the light of its thinking, Taobao also developed its own file system--tfs. As for the specific principle of the file system is not detailed in the book (should be I do not understand-_-), but you can probably understand is designed for a large number of pictures, from each user 1 pictures to TFS Online after 5 photos to 1GB of picture space, These benefit from the TFS cluster's file storage system and a large number of image servers. Taobao uses real-time generation of shrinkage graphs, global load balancing, and level one and level two caching to ensure that the image is optimized for access and efficient access.

Taobao's server software uses Tengine, an optimized nginx module.

Taobao has also done failed products, not because of technical reasons but market reasons. The first is "group purchase", failure lies in Mancing. Again is "My Taobao", using AJAX technologies that are popular around the world, but too Ajax, it may be too hard to get started (as the horse always says), and "wealth" (which is touted by rivals as a breach of "free" promises).

Recording the amount of access to the goods, using the traditional database I/O is too effective to affect efficiency, so Taobao used a buffer technology, first using ESI (Edge Side Includes) to solve the fragment buffer problem. Because some large store visits are too large, frequent I/O is not worth it, so Dolong predecessors wrote the Tbstore, you can cache a lot of data, the core idea is to use the hash algorithm to find quickly. Its core is based on Berkeley DB, a class of memory database, causing the problem is the amount of memory data or will be brushed to disk, so performance is not so good.

Later, Taobao separated the UIC (User information Center) for all modules to call. Dolong predecessors wrote the tdbm for it again, completely based on the memory of the data cache (refer to the memcached). Then, Taobao will merge Tbstore and TDBM, write out the Tair, a key-value based distributed caching Data system. Then upgrade your ISearch system.

At the end of 2007, Taobao registered users 50 million, PV 250 million/day, the number of goods 100 million, the total net turnover of 43.3 billion yuan. vii. More Technology

An indispensable detail of an e-commerce platform-------commodity processing. Because the category of goods is too large, so how to classify goods according to the category becomes a problem. The wisdom of a lamp predecessors said that these attributes can be labeled as a direct "paste" on the product (this should be the bar).

2008, Taobao will pay treasure separate out. The underlying business of the transaction is called the Transaction Center TC (Trade Center), which involves atomic operations such as orders. The upper level of the transaction is called the Transaction Management TM (Trade Manager) and does not involve the operation of the logistics.

Thus, the emergence of the second landmark project-system split was born. This is exactly what we were told at the Ali Round table by an employee of the Senate--the "change engine for a high-speed flight"--so breathtaking a refactoring mission. These component partitions are so difficult that I can't really understand the complex logic diagram ... In short, Taobao middleware was born.

HSF (High Performance Service Framework): The core, the nickname is very comfortable. Please see the author's blog http://www.blogjava.net/BlueDavy/archive/2008/01/24/177533.html

Notify (Message middleware): Taobao independent development of Message Queuing products. Support for 1 billion + message notifications.

TDDL (Distributed data Access Layer): Optimizes the Dbroute, separates the JDBC and the DB, is responsible for the database optimization work.

Tbsession: Because the session is saved in the server, but the user may be passive frequent switching server, Taobao design idea is to save session information in cookies, and finally use Tair to save.

Ali's open platform is also quite historical, interested can visit Http://open.taobao.com/index.htm Eight, summary

When you are in the industry, you can learn from the boss, and when you become the industry boss, you need to constantly surpass yourself, with your own power to change the entire industry, and the world. Whether it is Huawei or Ali, when the top of the industry, the responsibility is even more important.

Always feel that they want to follow the flow, but the heart is unwilling. Now have the opportunity to access the best Internet sites in China, has been honored for their years of dedication, while constantly encouraging themselves, you need to become stronger to integrate into this group.

Any heavy and road far, longitudinal hope Ali Treasure These years of development, those who are unknown but the courage to explore the most lovable people, encounter problems never admit defeat, there will always be a way to solve. As the Ali Round Table HR said, "Everyone here is a toss-and-call person," I admit that I was ashamed of myself, my body has been unable to guarantee the fight without scruples, although every day to go running, the foundation or not, want to become a martial arts people, the longer road need me to stick to go on, willpower, I can have.

Stick to study, study hard, practice study. I hope I can stick to these three-point creed.

Quite admire the general idea of horse and people work, but also admire so many powerful and loyal subordinates, they are worthy of their worth.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.