It is a true community of cyber-questions, community friendly and rational, connecting the elites of all walks of life. Users share their expertise, experience and insights to provide the Chinese Internet with a steady stream of high-quality information.
Perhaps a lot of people still do not know, in the scale is second only to Baidu post-bar and watercress Chinese Internet the largest UGC (user-generated content) community. Three years since the start of business, from 0 onwards, to now has more than 100 servers. Currently, more than 11 million registered users, more than 80 million people per month, the site of PV more than 220 million per month, almost every second of dynamic requests more than 2500.
At the Archsummit Beijing 2014 conference, the co-founder and CTO Li Shenshen brought together the first comprehensive technology sharing for more than three years.
Initial architecture Selection
When the product was really started in October 2010, including Li Shenshen, there were initially only two engineers, and by December 2010, four engineers were on-line.
The main development language of the knowledge is Python. Because Python is simple and powerful, it is quick to get started, development is efficient, and the community is active, and team members prefer it.
The tornado framework is used. Because it supports asynchrony, it's ideal for real-time comet applications, and it's simple, lightweight, low-learning, and a friendfeed case, Facebook's community support. Knowledge of the product has a feature, is the desire to establish a long connection with the browser to facilitate real-time push feed and notifications, so tornado more appropriate.
At first, the whole team focused on the development of product function, while other aspects, basically can save time, can save the simplest way to solve, of course, this in the late also brought some problems.
The initial idea was to use a cloud mainframe to save costs. The first server that is known is the Linode host for 512MB memory. However, after the site on-line, in-beta popularity exceeded expectations, many users feedback site is slow. Cross-border network latency is larger than imagined, especially in the domestic network imbalance, the situation of users throughout the country are not quite the same. This problem, coupled with the time to do domain name filing, it is back to buy their own machine to find the old road.
Bought the machine, found the computer room and encountered a new problem, service often down. At that time, the service provider's machine memory is always a problem and is restarted. Finally, one time the machine was down, and then it did the high availability of the web and the database. Entrepreneurship is a situation where you never know what problems you will face when you wake up in the morning.
This is the architecture diagram at that stage, and the Web and database are both master and slave. The image service was hosted on the cloud. In addition to the master and slave, in order to better performance also do a read and write separation. In order to solve the synchronization problem, a server was added to run the offline script to avoid the delay in responding to the online service. In addition, in order to improve the network throughput delay, but also replaced the equipment, so that the overall network throughput doubled 20 times times.
In the first half of 2011, it was known to be very dependent on redis. In addition to the first queue, search in use, and later like the cache also began to use, stand-alone storage as a bottleneck, so the introduction of shards, and consistency.
The team is a team that believes in tools and believes that tools can improve efficiency. The tool is actually a process, and the tool does not have the best tools available, only the most suitable tool. And it is in the whole process, with the change of the whole state, the change of the environment is changing constantly. The tools you develop or use include profiling (function-level tracking requests, analysis tuning), Werkzeug (Tools for debugging), Puppet (Configuration Management), and Shipit (one-click-Online or rollback).
Log system
The first is the invitation system, the second half of 2011, know that the application registration, no invitation code users can also fill in some information to apply for registration. User volume went up a step, then there are some advertising accounts, need to sweep the ads. The requirements of the log system are put on the agenda.
This logging system must support distributed collection, centralized storage, real-time, subscription, and simple features. Some open-source systems were investigated, such as Scribe, but the subscription was not supported. Kafka is developed in Scala, but the team has less in Scala, and Flume is similar, and more heavy. So the development team chose to develop a log system--kids (Kids is Data Stream). As the name implies, kids is used to assemble a variety of data streams.
Kids reference to Scribe's ideas. Kdis can be configured as an agent or server on each server. The agent directly accepts messages from the application, and after the message is aggregated, it can be called to the next agent or directly to the central server. When you subscribe to a log, you can get it from the server or from some agents on the hub node.
The specifics are as follows:
It also made a Web gadget (Kids Explorer) based on Kids, which supports real-time viewing of online logs and is now the most important tool for debugging online problems.
Kids has been open source and put on GitHub.
Event-Driven architecture
Know that this product has a feature, the earliest after adding an answer, the follow-up operation actually only update notification, update dynamic. However, as the whole function of the increase, but also a number of update index, update count, content review and other operations, subsequent operations are various. If you follow the traditional approach, the maintenance logic will become larger and the maintenance will be very poor. This scenario is well suited for event-driven, so the development team has tuned the entire architecture and made an event-driven architecture.
The first thing you need is a message queue, it should be able to get a variety of events, but also a high level of consistency requirements. In response to this demand, the development of a small tool called sink. When it gets the message, it makes a local save, persists, and then distributes the message. If that machine hangs up, it can be fully restored at reboot to ensure that the message is not lost. It then uses the Miller development framework to put messages into the task queue. Sink is more like a serial messaging subscription service, but the task needs to be parallelized, and beanstalkd comes in handy, with its full-cycle management of the task. The schema looks like this:
For example, if a user answers a question now, the system first writes the problem to MySQL, plugs the message into sink, and then returns the problem to the user. Sink sent the task to Beanstalkd,worker by Miller to find the task and handle it.
At the beginning of the launch, there were 10 messages per second, followed by 70 quests. Now there are 100 events per second, with 1500 tasks generated, supported by the current event-driven architecture.
Page Rendering Optimization
With millions of PV per day in the 2013, page rendering is computationally intensive, and there are IO-intensive features to get data. The development team then made the component of the page and upgraded the data acquisition mechanism. According to the structure of the entire page component tree, the top-down hierarchical access to data, when the upper layer of data has been obtained, the lower level of the data will not need to go down, there are several layers basically several data acquisition.
Combined with this idea, I know that I have done a set of template rendering development framework--zhihunode.
After a series of improvements, the performance of the page has been greatly improved. The problem page reduced from 500ms to 150ms,feed page from 1s to 600ms.
Service-Oriented Architecture (SOA)
As the function of knowledge becomes more and more complex, the whole system becomes larger and bigger. How do you know how to make a service?
First, a basic RPC framework is required, and the RPC framework has evolved over several editions.
The first version is wish, which is a strictly defined serialization model. The transport layer uses STP, which is a very simple transmission protocol written by itself, running on TCP. It was good to start with, because only one or two services were written at the beginning. But as the service grows, some problems begin to arise, first of all protocolbuffer will generate some description code, very lengthy, and put in the whole library is ugly. Another strict definition makes it inconvenient to use. A new RPC framework--snow was developed by an engineer. It uses simple JSON to do data serialization. But the problem with loose data definitions is that, for example, services are going to be upgraded, data structures are rewritten, it is difficult to know which services are being used, and it is difficult to notify them, often errors occur. So again out of the third RPC framework, write RPC Framework engineer, hope that combined with the characteristics of the previous two frames, first to keep snow simple, followed by a relatively strict serialization protocol. This version introduced the Apache Avro. At the same time, added a special mechanism, in the Transport Layer and serialization protocol this layer has been made pluggable, can either use JSON, or can use Avro, the transport layer can be used STP, can also be used binary protocol.
And then a service registration found that simply define the name of the service to find the service on which machine. At the same time, it also has the corresponding tuning tools, based on Zipkin developed its own tracing system.
Depending on the invocation relationship, the service is divided into 3 tiers: The aggregation layer, the content layer, and the base layer. By attribute can be divided into 3 categories: Data Services, logical services, and channel services. Data services are primarily some types of storage that are made to do special data, compared to slice services. Logical services are more CPU intensive, computationally intensive operations, such as definition of the answer format, parsing, and so on. Channel service is characterized by no storage, more is to do a forwarding, such as sink.
This is the overall architecture after the introduction of service.
Product Service
Home, there are roughly four functional areas. On the left, is the "latest news", about 70% of the page, the main display of users concerned about the latest questions and answers and other information. Users in this section, in addition to view the latest questions and answers, but also
You can participate in issues of interest through features such as "settings", "concerns", "add comments", "share", "Thank You", and "favorites". With the "Settings" feature, users can choose to block the topic. Under the concern of the user concerned, you can also add attention to the issue, add comments and other behaviors.
In the top right of the page, users are aware of the network related behavior management information. There are "my drafts," "My Favorites," "All Questions," "Questions I care about", and "questions to ask me to answer." In the middle right, it's an out-of-network invitation--"Invite friends to know." In this section, users can invite their friends to join the community by email and Sina Weibo. In the right, below, for the user's attention or topic of interest or user recommendation plate. Topic and user recommendation on the one hand, the operator may be based on the user's focus on the topic of information aggregation, on the one hand may be informed by users in the network related behavior data records statistics, to achieve a fairly accurate recommendation and summary. At the same time, in particular, the bottom right of the "topic Square" section, the network will be all the topic Classification label presentation, for users in addition to search and navigation, there is a good way to obtain information.
The topic page, can be divided into two sections, 2, one is "topic dynamic", one is "often go to the topic." On the left is the "topic dynamic" information, which accounts for about 70% of the layout. In this section, users can click to view issues (in chronological order) on the topic of interest, or they can "fix" and "Cancel attention" to the topic of interest.
In the lower right, it is the "Go to topic" page. In this section, users can learn about specific topics such as sub-topics, number of followers, and dynamic information.
The notification page can be divided into four layouts, 3 of which are shown below. "All notifications" on the left for users to follow questions for other users to answer information (in chronological order). On the right, user behavior data Summary, "Invite friends to join know", topics and topics recommended layout, and home page introduction, here no longer repeat.
The personal homepage is broadly divided into 5 sections: "Profile", "Personal answer", "personal page", "Search user questions and Answers", "followers and concerns" and "topics of concern". Shown in detail in 4.
In the "Profile" section, users can view the "personal achievements" of the user by clicking on "View Details" (including the number of "approval", "gratitude", "collection" Quantity and "share"), "professional experience", "Residence information", "Education experience" and "good skills" in 5 aspects. If you know the user, you can complete the above 5 information by clicking on "Edit My Profile".
The bottom left, for the "personal answer" layout, is the user's answer to the relevant questions (in descending order of the approved quantity or in accordance with the response time sequence from near to far). The above "personal data" and "personal answer" two pages can occupy the entire 70% position.
In the top right, the "Profile" section is a summary of questions, answers, collections, and log information about the latest developments.
Right middle position, is a search box. Users can use this search box to query specific user questions and answer content.
The bottom right is the user's personal attention or attention and focus on the topic information. Users can click on the relevant icon, a button to connect the specific plate.
The question page--is the most important page to know. Here the user can understand, edit, answer specific questions and information,
In this section, according to the function can be divided into six parts, namely "question answer", "Attention function", "Invite function", "Related Problem link", "Share function" and "problem status".
In the left position, for the "Question answer" section, occupy about 70% of this plate. In this section of the page, users can modify, comment, report, and manage votes on related issues. Users can modify their own feeling of inappropriate issues, problem tags, and problem additions. At the same time, users can comment or report if they find an inappropriate or interesting issue. In the answer to the question, the user can answer questions in a way that is quite appropriate for their
Row sort operations (known as three content rendering by vote, sorted by time, and displayed by user followers).
In addition to this, it is worth mentioning that each of the answers on the left side has a distinction of endorsing and opposing one on the two triangles, 6 shows. The user can personalize the question answer according to their own knowledge understanding angle or interest.
On the right side of this plate, top to bottom is the "focus" feature. In this functional section, users can focus on the issue, which is a bit like Sina Weibo focus on features, the difference is that the focus on specific issues, and Sina Weibo mainly for specific users.
To the right and down, the "invite people to answer questions" page. This is the same as the previous "know the Home" and "inform" the introduction of the function of the section, here no longer repeat.
Down again, it's all about the problem. This is one of the most recommended ways of Web site systems. Although this kind of recommendation method is relatively mature in technique and experience, but the effect is not to achieve impeccable degree. Knowledge of the problem linked to the issue, mainly for the specific characteristics of the problem, through the corresponding algorithm for machine recommendation, and did not do for different users like personalized recommendation effect (this is the future trend of Internet development, e-commerce platform more attention to this technology).
Then down is the problem sharing function. Users can share their questions via "Weibo" and "email" and share them via "private messages" in the station.
In the bottom right, the problem state. In this section, users can learn about the time of the most recent activity, the number of times they were viewed, how many people were interested in the topic, and how much attention the issue was taking.
User Experience
1, accurately speaking, it is more like a forum: Users around a topic of interest for the relevant discussion, and you can focus on people with your interests. For the conceptual interpretation, the Web encyclopedia covers almost all of your doubts, but the integration of divergent thinking is a great feature of knowledge. Encourage discussion during the question-and-answer process to broaden the divergence of the problem. A wiki that encourages answers that are non-targeted and encourage answers is a reference.
2, more exclusive than the forum, in the knowledge of each registered user has a PR (person Rank), each of your operations will directly affect your personal PR value. In response, the order of answers is ranked by the number of votes in favor of the same number of votes in the case of individual PR values, while hiding the answer that is considered invalid. This filters out quite a lot of junk information to some extent.
3, know that once insisted on strict invitation system, one is to ensure the authenticity of the user's quasi-real name identity, and secondly to avoid excessive garbage information. Quasi-real-name can be convenient for users to ask you interested in the question, this is the original Han cold produced in the "Solo Regiment" there is a quite interesting column, "All people ask everyone", in other words, this is the reality version of the knowledge. At the same time, know that the strict invitation system also makes the knowledge of the dense rigorous atmosphere, with Keso as the representative, not the words have been, a word convincing.
Since March 2013, it has been open to the public for registration.
4, the credit-based SNS relationship. Perhaps simply as the integration of SNS and question and answer, the domestic renren should be more rapid development, but as mentioned above, strict invitation system, the exclusion of a significant part of the invalid information, if Renren also launched a social quiz, it will inevitably integrate your former friends, And this part of the friend obviously can't all be interested in your attention point of people. It also almost negates the possibility of any large internet company entering the Quora category.
Because large-scale internet companies are widely popular, and the Quora type question and answer is not purely based on sentiment, but the value of information ratio (value information/total content), that is, the production of elite information.
But thousands of rubber under the low-key introduction of Jingwei network, as a vertical SNS gathered a considerable number of professionals, if the thousand oak as a fit point, integrated class Quora question and answer, or quite potential.
5, compared with Quora, the blue is the tone. Compared with Quora, the function still needs to be perfected, such as the best topic under a topic.