Starting from January 11, 12306 websites began selling train tickets on New Year's Eve. At this point, the railway system's only official purchase site 12306 will become a target. This year is no exception, 12306 is again submerged in a complaint sound.
January 10, an ID named "Code Dog" of the former Taobao engineer, and later in a power company to do technical VP of the IT industry also in the famous forum "Sisi" on the dispatch, expressed his own views on the 12306 system. It is noteworthy that the "Code dog" in the 12306 system when the first line has also been a lot of criticism. To prove that the 12306 system is easy to build, the "code dog" has even launched an open source project called "for the 12306 design system." Through the practice of work, the "Code Dog" has a new understanding of the 12306 system.
The full text reads as follows:
I Taobao technical experts, in 2012 in a hundred-strong private enterprises to do electric business VP, at that time under extremely difficult conditions led the development of a consumer (enterprise-specific E-COMMERCE Activities-Observer network) Web site, walk Alipay and UnionPay payment channel, annual turnover tens (author note: Of course, too little, I'm just saying that this site has been put into actual operation.
Also at that time, I to 12306 sneer at, think they do too rotten, think they can lead spend millions of half a year to do a good out. So I arrogantly want to do an open source booking system to them. I spent one weeks thinking about building a data model, thinking about the inventory step, I found that 12306 of the inventory complexity than Taobao, jingdong many times higher, the operation is also large many times. The traditional distributed database, cache and load balancing technology can not meet the requirement of 12306.
In peacetime, 12306 is also a normal electric business website. But one to Golden Week, 12306 is a whole station all goods are seconds kill, all SKUs are dynamic inventory of metamorphosis.
Even if you do not consider the existing phone line, sales outlets, such as channels, to achieve a 12306, at least also tens other hardware input (author Note: This is the estimate at that time, no actuarial, May and the actual difference between the larger, in short, I said not necessarily right, 12306 of the business may not be so complicated as I say, But it's not as simple as some people spray. Those who clamor for just 40 servers, as long as 2 architects, 4 programmers, talking about the sub-Library and the front-end CDN, are just on paper. The so-called calf, did three years of CMS and BBS, with this experience to spray 12306, it is too naïve.
Media people spray 12306, is they do not understand the technology, no ability and patience to analyze the difficulty behind. Technical personnel spray, is because most of the technical staff in a short time to think, easy to fall into too optimistic error, the classic example is to estimate the workload, programmers tend to estimate a very short duration, the work of the program is optimistic to imagine the job as a typist on the keyboard.
Knowing the article, I don't think it's a wash. The answers to the first and second rankings are very objective. Taobao technology is more than 12306 powerful many times, Taobao now system is also spent 10 times times 12306 of the money, time and talent to do it. The root cause or the railway capacity can not meet the Spring festival demand, Taobao can not solve the problem.
12306 this year has made great progress. From the previous animation verification code, time to grab tickets, to the back end to minicomputer, virtualization, the use of memory database. It can be said that 12306 is the most powerful Chinese government agencies (electric business system), can make such a change in just a year or two, almost a miracle, even some of the market-oriented private enterprises are far behind, even some listed companies are not as good as it! (such as 51job and Ctrip).
It is not difficult to know, in the online critique of 12306 of people, most of the formation of "state-owned enterprises = monopoly + corruption + inefficient" mindset. A small part really despises its difficulty.
As for the 123,061th-phase project 300 million (including hardware) expensive I do not evaluate, I only provide a number for reference, Baidu a year of research and development costs (excluding hardware) is 1 billion, this number from Baidu earnings. Can be found on the Internet. 300 million looks so big a number, really used to super large electric business system, search engine system inside, in fact, is not what astronomical.
Explain why the second kill is stressful and why 12306 of the dynamic inventory is complex.
First say second kill.
Around December 25, 2013, the cat engaged in a Christmas season integral exchange activity, lasting several days. Number 25th 10:12 A.M., released 15,000 days of the Cat box (Taobao Bazaar someone to sell, about 190-230), from the transaction record, is 19 seconds to complete the grab.
In fact, I also participate in the second kill, the topic of the day is particularly simple (please enter the first letter of xxx Chinese pinyin), I should be 5 seconds to complete the answer and submit the order, the results told me to queue up too many people, squeeze not in, and prompted 14 seconds later try again. Too many people is because the topic is too simple, the lower the threshold, 5 seconds to squeeze in the more people, if the title replaced by "2 grams of 3% U235 at the Daya Bay nuclear power station can send how many kw of electricity", 5 minutes will not have 15,000 people with me competition.
I think, 14 seconds later there is my thing ah, so again answer second kill, the results of the server error page. Repeatedly refreshed several times, told the second kill ended.
In the group asked a colleague, less than 10 people answered me, said no seconds to (also may be seconds to the people stuffy sound fat, do not reply to me).
Taobao is what technology level, Taobao has at least 4000 technical staff, at least 40,000 servers (this is the public data two years ago, according to the provisions can be discussed), November 11, 2013 turnover of 35.1 billion, 2012 annual turnover of more than 1 trillion.
Taobao has a variety of independent research and development team: servers, switches (online can search for Taobao open Green Server open standards); operating system (Linuxkerneltaobao version, YunOS mobile operating system is Aliyun, temporarily excluded), Web server (tengine), Java language virtual machine (Jvmtaobao version), database (MySQL kernel Taobao version, Google and Facebook also have their own version, HBase Taobao version, there are all the oceanbase from scratch), load balancer (LVS, LVS pioneer in Taobao, as a researcher, Java Run container (Jboss, one of its founders, Wang Wenbin, also Taobao, as Vice president).
Taobao also has countless open source projects and middleware, such as high-performance Java Communication Middleware HSF, distributed database middleware TDDL, asynchronous messaging system notify, and so on.
Taobao such a technical level, can not do seconds to kill when each user has no sense of crowding, why?
One is to respect the principle of physics, a server can withstand a second of the calculation is limited, let you how to optimize, using more efficient algorithm and programming language, can not break a certain limit, for example, car engine-driven F1 car can not break through 400 kilometers per hour ( The supersonic propulsion number is 1000 kilometers per hour, which is driven by an aircraft engine. It's not easy to understand if you go deeper. Interested can start from the famous c10k problem.
The second is to consider economic benefits, 11 Golden Week, the main city of Beijing to Badaling Road blocked tightly, but not because of the peak of the Golden Week, this section of the road to the 10-lane highway as Chang ' an avenue. Otherwise, the cost astronomical (really astronomical, 12306 that 300 million is only enough to repair 1-3 km). Repaired a section of road, Golden Week can soar to 80 km/h, but usually, take to the residents on both sides of the sun millet?
Taobao's current number of hardware and bandwidth, has exceeded the day-to-day operation of the demand, that is, to leave a considerable margin to the big promotion (known is the Double 11, double 12, in fact, basically every quarter has a big promotion, every month have promotions, and even every day in the promotion-poly cost-effective). Amazon had to deal with the big black Friday promotions to buy a large number of servers, peacetime orders are not so big, Amazon will take the surplus server for cloud computing. By the way, Aliyun is one of the first World cloud computing service providers in China, and Amazon is also a bit like the way to go.
Again dynamic inventory.
Taobao seconds to kill the cat box, only one commodity (jargon is called SKU), its inventory is 15,000. One person seconds to kill, inventory on the minus 1, 19 seconds to sell, a second to successfully produce 789 orders (the request may be 80,000, but maybe Ah, not the actual number, it may be 10,000, used to illustrate the extent of the spectacular). Imagine, you sell train tickets in the square, a second 80,000 people holding the money to you shout: Sell me!
People who have been to college know that there are milliseconds, seconds, and seconds in a unit of time smaller than a second. But a trading system that registers a transaction is not as simple as an atom running round the nucleus, it does these things: check for malicious access, take the system time, take the customer's default delivery address, check the customer's second kill eligibility (then the rule is the cat t2.t3 talent), generate order number, The customer ID system time Order number receipt address is written to the order system, deduct customer cat points, merchandise inventory minus one, to mark the customer (each person can only be a second, the next can not be seconds) and so on, each of these things will take a millisecond level of time, these operations add up to the time may be nearly 1 second level, But because Taobao's server is relatively strong, and the use of distributed and cluster technology, the results than 1 seconds ideal. But even if there are 10,000 servers, it can not dilute this time to one out of 10,000 seconds, because the product only one, it has 15,000 inventory, the corresponding database records only one line, all the transaction requests to be processed here.
Could you split the 15,000 into 5,000 items and assign them to 5000 servers? Does that not make it possible for 5000 servers to be processed at the same time? The answer is no, first of all, 5,000 items, meaning that there are 5,000 product Details page, 5,000 purchase button, this on the prophase of marketing, Drainage is a disaster. There is basically no way to drain the portal, which is clearly contrary to the principles of business management and artificially increasing the level of information chaos. Second, the Cat box seconds Kill is not a big deal, even according to the official price of 399 yuan to calculate, also 6 million of the transaction. If 6 million of the transaction costs so much supporting cost, it would be too bad. Once again, Taobao has a billion of of merchandise, the a billion of commodity display transactions and management, is distributed to tens of thousands of servers up. There is no need to divide each commodity into several items by stock.
The 789 men got it, will not necessarily pay (99 points for the day the Cat box is fine, do not need to go to net silver, the cost is very low, most of it will be paid, 3,999 seconds to kill iphone5s is not necessarily, some people may have a silver problem, some people may change their minds do not want to, So it brings in the issue of order cancellation and restocking. And consumers who want to, will think that there is still a chance to continue to brush for a while in the foreground, the end of this second kill will be enthusiastic consumers to swipe 30 seconds to 1 minutes.
A minute has passed, the server can finally take a breath? And so on, there are oversold, the original, two servers in the same millisecond have been locked, have to reduce inventory, 15,000 inventory, was under 15,500 orders, but also to cancel part of the order ... If the single thread exclusive lock, can be done at the same time only one server thread to reduce inventory, but that is the ability of the concurrent peak is much worse. With 80,000 people holding the money, maybe only 8 will be able to make a single success, and this crowded frenzy will last more than 10 minutes. Usually seconds a day cat box, 10 minutes is 10 minutes, double 11 is miserable, the cashier reduced by 90%, also want to do 35 billion, or dream, or add 10 times times the server and bandwidth. Therefore, business is imperfect, to make a trade-off between absolute correctness and absolute speed, to ensure relatively fast and very correct, to allow a certain inventory errors and oversold (I do not know how much specific permission).
Well, said this half a day Taobao, can say 12306?
I take Beijing west to Shenzhen north of the G71 high speed railway as an example (here only consider the direction of the South, do not consider Shenzhen north to Beijing West, that is another train, called G72, it has 17 stations (Beijing West is No. 01 station, Shenzhen North is 17th station), 3 kinds of seats (business, first, second, etc.). On the surface, isn't that 3? G71 business seat, G71 first-class seat, G71 second-class seat. Most of the 12306 technicians (including experts from some mid-sized companies, CTO) are the first to stumble here.
In fact, G71 has 136*3=408 products (408 SKUs), how to calculate it? Please see:
If you sell Beijing West, there are 16 kinds of selling method (because there are 16 stations behind), Beijing West to: Baoding, Shijiazhuang, Zhengzhou, Wuhan, Changsha, Guangzhou, Humen, Shenzhen .... are an independent commodity, the same, Shijiazhuang on the bus, there are 15 kinds of getting off the possibility, and so on, a single above the station to calculate, there are 136 kinds of tickets: 16+15+14....+2+1=136. There are 3 kinds of seats for each type of ticket, 408 items altogether.
Well, then see how the vote to reduce inventory, because business, first, second-class three kinds of seats are independent, inventory operation is the same, below I will not mention the difference of seating, only to discuss the departure and arrival station. In addition, the following is a theoretical model of the world, not to say that 12306 of the database is so designed.
Passenger a bought a piece of Beijing West (No. No. 01 station) to Baoding East (No. 02 station), that "Beijing West to Baoding East" This commodity inventory will be reduced one, at the same time, Beijing west to Shijiazhuang, Zhengzhou, Wuhan, Changsha, Guangzhou, Humen, Shenzhen and other 15 platform of commodity inventory also want to reduce one, that is to say, a ticket to Beijing to Baoding east , in fact, to reduce the inventory of 16 items!
This is not the most complicated, if passengers b bought a Beijing West (No. No. 01 station) to Shenzhen North (17th station) tickets, in addition to the "Beijing West to Shenzhen North" of this commodity inventory to reduce one, Beijing west to Baoding East, Shijiazhuang, Zhengzhou, Wuhan, Changsha, Guangzhou, Humen and other 15 platform merchandise inventory also to reduce 1, Baoding East to Shijiazhuang, Zhengzhou, Wuhan, Changsha, Guangzhou, Humen, Shenzhen north, such as 15 platforms of goods inventory to reduce 1 ... The total number of items to reduce inventory is 16+15+14+......+1=120.
Of course, not every ticket is the inventory of all the real time calculation, according to the operation of the past years, in the golden Week, such as the peak period, in advance to do some allocation of votes, such as Beijing to Wuhan long-distance more, Baoding to Shijiazhuang, a short distance less. I have no evidence to confirm that the ministry has done so, but I do not believe that the Ministry has such an artificially-distributed strategy in the absence of 12306 sites.
Imagine, 80,000 people holding the money shouting to you: Sell me. You finally found a hand in the pile of money, took his money, turned to find 120 colleagues, told them to reduce inventory, and these 120 colleagues and you are 80,000 people around, as well as you, each sell a product to find dozens of people to reduce inventory ... This is the metamorphosis of the 12306 dynamic inventory. The inventory mechanism of any Web site that you buy is more than a dozen times more complex.
Again, grab tickets, machine is always faster than people, when you finally from 80,000 people to break through the siege, came to the counter, you found, I do, to 100,000 tied with money bamboo, and when there is a refund out of time, you have to break through 3 layers of human flesh to access the counter, bamboo pole in 8 people behind a stretch, money to the counter before. You look down at the phone, the ticket is gone, the bamboo is always there stretched, never bow, never blink. Without these 100,000 poles, though you may well not be able to get the tickets, you won't be frustrated: why am I always the slowest hand?!!
Anti-robot grab tickets, also not add a picture verification code so simple. I have written the article systematically analyzed, the picture verification code has 6 kinds of machine violence to crack the method, the ticket-grabbing plug-in uses is I said the Third kind, the OCR recognition (optical character recognition--the Observer net Note). Google's wave waveform has been used to better prevent the machine OCR, ems.com.cn on the verification code is a negative example, the machine OCR success rate of nearly 100%, 12306 than the EMS picture verification code a little stronger. However, the verification code set a bit more complex, people want to spray: This is only cheap college students and office workers, migrant workers even 26 letters are not recognized, how to do? Do animation verification code bar, but also some people spray, poor eyesight how to do? The final verification code is too simple, happy, in fact, the most happy is the development of the company to steal tickets.
Even if the machine is completely impossible to identify the verification code, also can not prevent social engineering to crack the method. Recruit a bunch of internet cafes to play games of teenage friends, each successfully input 50 authentication code to 1 dollars, or equivalent of the virtual currency, game equipment, I guarantee that the people want to make this money countless. This is an acceptable cost to the profit of a resale ticket. Is there any technology that can prevent social engineering from being cracked? Can prevent the Internet Café teenager's verification code only "2 grams of 3% U235 at the Daya Bay Nuclear power plant can send the number of kw of electricity".
The above discussion is just 12306 as the same as Taobao no historical baggage from the start of the trading system, in fact, it is not, it is behind the ticket pool, there are telephone ticketing, railway station ticketing, selling point ticketing and other traditional channels to serve. In addition to passenger service, 12306 also has the country's largest (and possibly the world's largest) bulk cargo delivery system.
Overhead policy (including pricing policy, police crackdown on yellow cattle policy, identity verification policy) talking about technology, it is impossible to solve the spring Festival Rush ticket, to make the Spring festival when everyone in the 12306 tickets are no sense of congestion (but not necessarily can get tickets, railway capacity placed there), That is forcing 12306 to buy a lot of servers to deal with the Spring Festival, after the Spring Festival, become the same as Amazon's brilliant cloud computing service provider. and forcing Beijing to build a 10-lane highway to Badaling a truth.
The current 12306 technology is still a problem, such as the Rush ticket, enter an ID number and image verification code are stuck (I test), server-side busy, your browser card what ah.
But others are making progress. Believe that the 2014-year Spring Festival, technology is no longer a ticket difficult to find the main problem. In the case of no rapid increase in railway capacity, it is necessary to stop the policy adjustment to achieve more equitable ticket purchase during the Spring Festival.
The following is for the Spring Festival National Day this very summer. Other times, most of the line to maintain the status quo on the line, the problem is small, very few votes nervous lines can be handled according to the Spring festival:
1, auction Law, the highest bidder
When the hard seat ticket to take out the price of the plane ticket, believe that the ticket is not difficult to buy (unfortunately is expensive), there are not so many cattle. To say what Taobao can help 123,061 to fix the technical problems, Taobao auction system can help, Zhejiang Province High Court in Taobao auction more than a year, deal 2.6 billion.
Unfortunately, this method cannot be implemented. Today's high-speed rail fares are being sprayed by media and opinion leaders, not to mention auctions. Moreover, the train ticket is the survival of the just need, fares 20 years did not rise to have a care subsidy component inside, the whole auction may also be inappropriate.
2, Lottery law, good luck
Open the registration 2 months before driving, the 7 days before the lottery, can be canceled halfway. Advance deposit, no refund. Upload ID card and face from photo, machine check.
In this way, the success rate of interception of cattle is much higher, the ox can be stored in advance, you can find a large number of real identity card number, you scalpers to each give your ID card number of people to the ID photos and facial self-portraits also give you try? Even if someone really want to look for the ox, give the photo of ID card will hesitate. And the middle of a lot of manual operation, the cost of cattle to improve, not necessarily get tickets. Anyway, I think the real consumers will choose their own luck first.
The implementation of this method is also very difficult, no matter how the design lottery rules, there must be someone shouting "there is shady, do not trust the government."
7 days before driving the lottery results, the change of itinerary should be 7 days before the decision to change or not change. There is still time to think of other ways. Of course, not necessarily 7 days, 15 days, 10 days can also be a specific number of days to have data models to calculate.
3. Auction + Lottery
Soft sleeper, high-speed rail business block, such as high price, auction, anyway to buy this is relatively strong economic ability. It's better to be financially powerful.
Hard seat, ticket lottery.
4, with ID card, tickets and invoices, is the reimbursement voucher, is not a pit stop voucher, refund after the money into 12306 accounts, can not be mentioned, only the passengers next ride; During the Golden Week, the maximum number of personal accounts to order 10 tickets
This method can be used to combat cattle hoarding tickets resale; After running for a period of time, according to the balance of the account to find out who is the yellow cattle, unfortunately this needs the station equipment transformation and coordination.