Author: fenng | reprinted. The original source and author information and copyright statement must be indicated in hyperlink form during reprinting.
Web: http://www.dbanotes.net/arch/infoq_interview_review.html
After infoq published an interview with me, I saw a website reposted the text. In fact, it is inevitable that some of my words will not be enough to translate into words. After asking for the advice of infoq, I will make some corrections to some Q & A here to avoid misleading.
The following is the text:
JASON: as a senior dBA, Da Hui is on his own "> blogI wrote a lot of articles about the website architecture. Can you talk about it? "> DBAWhat is the relationship with the website architecture?
Fenng: Many of my friends are joking with me, saying that I am a DBA, but I always write something about the architecture. "Isn't this cook looking at recipes and looking at tactics? "In fact, I think there is a relationship between the two. Such as database maintenance, and even design and architecture-related work, to a certain extent, we still need to take a few more steps: that is to say, we need to integrate some of our architecture-related things. Of course, as a "> DBAThere is no need to do some practical work such as coding like our related architects, but some things closely related to DB must be noted, this is also my blogI wrote a lot of articles related to the architecture.
JASON: In general, what are the main bottlenecks for improving the website performance? What are the best practices if you want to solve these bottlenecks?
Fenng: In the past, most of the bottlenecks may be in the database, that is, the final bottleneck will fall into "> IoAbove. However, with the development of some web, related technical solutions have emerged. Can a website cope with high traffic and high concurrency, the main problem is that the cache can be fully, flexibly, and correctly used, which is very important. [Supplement: this topic is basically for Web 2.0, so cache is a major problem here. As you know, for e-commerce sites, transaction processing capabilities are undoubtedly tricky]
JASON: What should I pay attention to when designing a large-scale, high-concurrency, and high-traffic Web website?
Fenng: In the preliminary plan, some preventive measures must be taken, such as selecting the appropriate technical architecture. This is the first step that must be considered. In addition, there will also be a lot of attention in product design. Many of our Web 2.0 websites, including some emerging Web 2.0 websites in China, are more or lessOver-design. These designs may inadvertently have a catastrophic impact on the background, which puts a lot of pressure on developers, architects, and even maintenance personnel.
On the other hand, it is critical to build a DIY architecture based on some mature technologies in the industry. Just like building a car, we don't have to make a top-level luxury like a Mercedes-Benz (the cost will be very high). We just need to build a car to run, the car that runs well may have reached half of the success.
JASON: As you mentioned earlier in terms of website performance and optimization, the role of cache is very important. So what is the important position of cache? How to optimize the cache to improve performance?
Fenng: As far as my previous experience and some practices based on the Oracle environment are concerned, on the one hand, there are some precautions for Designing highly concurrent applications, the other is whether or not Oracle's memory can be fully utilized. Finally, it depends on whether it can fully utilize its own cache mechanism. For Web 2.0 websites, Oracle databases (mostly MySQL) may be rarely used. However, in MySQL, MySQL has its own cache mechanism. It should be said that it is still doing well, most websites will consider using external components such as memcached. In this case, we finally consider the hit rate to measure the hit rate, this is an indicator of scalability and performance that everyone must pay attention.
The lost I/O hit is actually pressed to our database, and the lost I/O hit to the database may be pressed to the bottom disk, in this way, disks [or storage] must support our current minimum requirements. For example, in our application memcached, the I/O hit rate may be 80%, and the remaining 20% will be pushed to the following dB, the hit rate of this dB may reach 95%, and the remaining 5% is multiplied by the previous 20%. The total I/O volume x 20% x 5%, the I/O volume is routed to the hard disk or storage at the end. The overall response capability of the hard disk [and even the storage] is limited... we may be doing raid, or even a single hard disk supporting application. Moving forward from this foundation, we can calculate the bottleneck that our current system can carry the cache and further understand the overall I/O processing capability. During the design, we must consider this situation. Otherwise, when the pressure suddenly increases and we cannot afford it, it may be troublesome to temporarily implement some expansion measures.
JASON: when you talked about cache hit rate, what is the average cache hit rate for a successful website?
In Oracle, the hit rate is 90% or even 99.99. This may not be possible in MySQL. memcached may be 70% to as far as I know ~ 80% is already good. [the performance of different applications varies greatly. For example, Douban's friends told me that their hit rate is 97%! ]. Of course, the hit rate is only a superficial phenomenon. We also need to see what the actual application is like, and the Access frequency may not be the same for different web application types, therefore, there is no fixed proportion. Here we can only rely on some experience. In general, the higher the hit rate, the better.
The first part is here first. Tomorrow we have time to correct the remaining parts.