Source: Vanilla Technology Blog (@ Ordinary vanilla)
The book on the last, today describes Millet interview experience. There may be some technical understanding and technical solutions, welcome to the discussion. The other yesterday total income of 7 a total of 95 yuan, enough for me to drink a few cups of coffee, thank all the friends who donate money.
Millet: Yun-wei Department
In the millet is chatting two departments, the first is the operation of the Department, in the @wilbur well source of warm hospitality, eat a meal, I am sorry that I did not bring enough cash, so I can not "polite" payment, another day to fill please.
Wilbur Gan with two colleagues and four of us eating and chatting, I briefly introduced the current Web site service structure and some of the business of technical design, such as the distribution of Web site architecture, Distributed File System Fastdfs usage, Redis and MySQL, some of the deployment structure and technology, In particular, I have made some detailed instructions for monitoring this matter (see some of the considerations and practices of service availability monitoring), in the middle of the mention of active monitoring (active monitoring means that through the operation of the Department of Peacekeeping Operations designated monitoring system resources, interfaces, pages, logs, etc., active detection problems, high level of alarm), The concept of controlled monitoring (refers to the monitoring of all operations, especially network interfaces, through Jslib or client LIB, reporting of exceptions, and discovery of usability issues through the collection of logs). Of course, there are also essential to the operation and optimization of Haproxy (see Haproxy configuration), MySQL architecture and optimization (see MySQL architecture and Operations), Redis common performance problems (see Redis architecture and operational dimensions), Fastdfs with other distributed storage mogilefs, TFS, lusterfs in the functional, operational cost of the horizontal comparison, more than IDC image cache deployment and performance optimization (see more IDC Photo cache deployment), Linux kernel parameters (see Linux kernel configuration) and I am particularly proud of the network card SMP affinity/rpf/rfs optimization results (reference 3/4/5) some of the optimization. Of course, this is a serious operation of the Department, I elaborated my understanding of "operation Dimension" work: 60% analysis and finishing work plus 40% of the skills, analysis and finishing ability is the basis for operation and maintenance.
Gan also asked a few security questions, my superficial understanding is that from the experience of the system administrator (SA), good IT system planning, a reasonable distinction between the role of the server, through Iptables is able to prevent the majority of the Access layer illegal request; For web business security, SQL injection, CRSF attacks are caused by the less stringent filtering of input input, in the process of development, reasonable use of some excellent framework or LIB, can also avoid most vulnerabilities; there's an interesting topic about overflow, and now I'm not going to compute the overflow address, when I'm a script Boy the time to study a little, forget the light, ashamed ...
Gan this side of the efficiency is very good, while eating chat atmosphere is very relaxed, but a lot of problems are stuck in some ideas and effect data, there is no hook sketch painting too much in-depth discussion.
Electric Business Department
Around 8:30 to the electrical business sector, the first round of regular interview is the technology, including details. The interviewer is a team leader with the surname of Zhang.
In the course of the interview, because it is in the conference room, there is a pen board, so I speak and write. In general, I've explained my understanding of the Web services architecture, I think that the Web services architecture is largely inseparable from such a number of levels: Access layer (load balancing), business services layer, data layer, there will be a lot of back-Office programs to synchronize, asynchronous processing all kinds of services that do not fit in the business layer integration of service units. The data layer can include DB, Cache, file, etc., the data layer may also have many middleware or proxy server to do the data layer load balancing or ha, and sharding, etc. With the interviewer detailed information on the current services of the company at each level of the technology used, namely: Haproxy, nginx+php, Twemproxy+redis, Mysql+rediscache, Varnish+squid+nginx+fastdfs.
The Haproxy server configuration is configured and optimized according to the 100w concurrent target, plan 100w Client connection, consider each client connection may produce 1 internal connections, consume 4k according to each connection (this is amended to 17K,HAPROXY official data, see Reference 8, Thank you @ Gnuer correction) memory, approximately 8G (corrected to 32G) memory "Here's the calculation that needs to be considered, and I'm worried that every connection that Haproxy consumes 17k memory is a connection to the internal server", which is often larger than this number. Currently reached the maximum number of connections measured over 16w, in the Access Layer system optimization are: Network card interrupt optimization (reference 3/4/5), Linux kernel parameter optimization (see Linux sysctl.conf Configuration).
It is worth mentioning that our Haproxy server is 64G memory, in fact far from so much, the outermost cache of picture services, that is varnish, we are also deployed on the Haproxy server.
On the outermost server, we daily about 500 million + (1-150 million + dynamic request, 3.4 billion + picture requests) of the request, total use of 7 64G of Dell R410, currently look at the load is still very low, from the various resources of the system, the request volume should be no problem.
On the outermost server configuration, there is a problem to note that, in the sysctl.conf configuration, timestamp must be 0, this is mentioned in the TCP protocol extension standard, there is no NAT environment of the client connection may produce an exception, abnormal conditions can be in the netstat-s To see in the output. It is also necessary to note that in the case of Timestamp=0, Tw_reuse is not effective.
To ensure that the server is able to receive large concurrent connection requests is not difficult, but need to consider a detail, every request, haproxy need to allocate at least one system TCP port request after the business Server, cache server, the system one IP address the maximum number of ports available to 65535 , and generally you need to subtract 1024. It is worth considering to reduce the capacity of tw_bucket, so that the system in Tw_bucket full state, the connection to the TW status to discard, to achieve rapid recovery purposes, TW's default recovery time of twice times the MSL. There is another way is to configure a number of IP.
There is also a problem, the access layer of the server will often open iptables, the kernel of NF's related configuration is also needed to optimize, such as Nf_conntrack_max, nf_conntrack_tcp_timeout_established and so on.
In the business Layer optimization has nginx+php (fastcgi connection mode, php-fpm.conf configuration in the optimization), my experience is that if the Nginx with phpcgi running on the same server, using Unix The way the socket is fastcgi the interaction of the Protocol is the fastest, much faster than the 127.0.0.1 of the loopback address. I optimized a server in 08 (Dell 2960,16g memory), through two steps, a server from 900QPS, optimized to 6000QPS above, the first is the FASTCGI protocol running on UNIX sockets, The second is to rationally configure the number of spawn-fcgi processes. Now basically phpcgi is running in PHP-FPM, and its process pool logic is one of my most admired features.
If Nginx and php-fpm are not on the same server, consider using the fastcgi_keepalive configuration to achieve nginx persistent connections to the FASTCGI server for increased efficiency.
The running state provided by NGINX+PHP-FPM is very meaningful, and the status output of the Nginx status module and PHP-FPM can tell us the request processing status of Nginx process. The status output of PHP-FPM can tell us whether the PHP-FPM process pool settings are reasonable. We currently collect these two data through Nagios, and draw a chart, which is very "ornamental value".
Php-fpm.conf configuration also has several parameters for optimization is more important, one is the process of automatic restart conditions pm.max_requests, the second is Php-slow log configuration, slow log is to optimize the PHP code is very important information. In my current environment, PHP's slow execution log is transmitted through Rsyslog and focused analysis, in order to reverse the development of PHP code optimization.
PHP server in the case of high concurrency, it is possible because the server itself can provide a limit of the number of ports, Redis server can not establish a large number of connections, this time in the sysctl.conf with Timestamps=1 plus tw_reuse/tw_ Recycle way, the port fast recovery, in order to better to the data layer to establish a connection, the access layer of the haproxy can not be such.
This layer also involves a security issue, that is, the PHP code was modified and the situation of the horse, my solution is to php-fpm run the user and the owner of the PHP code to different users, and ensure that the PHP-FPM run users can not have write permissions on the PHP code.
In the case of the data tier, the MySQL master-slave structure and the highly available configuration of the mha+keepalived, this is basically to see the document should be able to understand. If the new version of MySQL is 5.6, its highly available monitoring may be easier to do, the MySQL official provides the corresponding tool, but I haven't tested it yet. On the MHA monitoring function, I think the highlight is MHA on the handover process in the MySQL Binlog acquisition and implementation, to the greatest extent to avoid data loss. But there are disadvantages, such as: the monitoring process after the trigger switch stopped, once triggered, you must restart the process and continue monitoring. I did a project called Trust DMM in Sina in 06, through DNS, Mon plus their own written plug-ins, monitoring the availability of MySQL master-slave cluster, can be realized, the main library, the main standby automatic switching (lack of binlog processing links); From the library is a group of servers, if there is a problem from the library, can be automatically off-line. It's just that the system is more cumbersome to deploy. This project has won the first prize of Sina's innovation.
I also mentioned that I think that the day-to-day work of the DBA should include, at a minimum, reviewing and executing the online SQL, periodically checking MySQL slow logs and analyzing, feedback the analysis results to the development Department, and periodically reviewing the efficiency and usability of the indexes in the database to optimize my feedback. Now it's quite easy to do a general level DBA, to get a thorough understanding of Percona tools and to solve a lot of database problems.
MySQL also has a difficult problem, NUMA architecture, large memory server memory usage efficiency problem, numactl policy adjustment, if using Percona MySQL version, you can through Memlock configuration to the MySQL InnoDB engine limits, prevent it from using swap.
MySQL common architecture, there is a master-slave storage engine inconsistent way, that the main library with the InnoDB engine, improve the ability to write concurrently, from the library using the MyISAM engine, this way we are currently in use. This is done in order to get better read performance, in addition, the MyISAM engine is able to save memory. MyISAM in the index data memory reading, data content disk read state, has been able to run more efficiently, MYISAM_USE_MMAP configuration items, will let MySQL will myisam the data file also mmap into memory, so that both efficient, You can also use the features of the Mysiam engine.
The database master should avoid one thing, that is, unconditional deletion and unconditional modification, such as "Delete from table" and "Update table set XXX=YYYY" without a Where condition statement, the principle should be forbidden to execute, Such permissions should not be open to developing students, and even DBAs cannot operate indefinitely. At present my solution is Sql_safe_updates=1, but this configuration is not able to write in my.cnf, can only start MySQL and then go to the console for configuration.
At present we also use the Redis as DB, based on the master-slave architecture, across the IDC. The current problem is that after the replication connection is disconnected, the problem of Redis snapshot retransmission is a transient performance jitter from the library during snapshot substitution. Redis2.8 the new version of the Psync feature should improve the problem. We also use Twemproxy, which is currently deployed on every PHP server and listens for UNIX socket,php to connect using Phpredis modules. Effectively reduce the time to shake hands three times. Temwproxy also has many other outstanding characteristics, through the consistent hash does cache cluster, can effectively avoid the cache migration problem. Through its health monitoring of back-end Redis, it is possible to automatically line up faulty redis.
There is also a number of IDC for the image storage and cache deployment. At present, we built the picture CDN hosting site about 400 million of the daily request, the peak value of about 1.5G, the structure of which is largely the center of IDC storage image +squid disk cache memory picture thumbnail, in the field of IDC use Level two cache, respectively, for a layer of SQUID disk Cache (two units, do ha), another layer for varnish cache (up to four), in fact, if only consider the state of work around, squid cache layer can not be basically. However, at present, such a structure can reduce the varnish back to the Center node request, reduce the center room bandwidth pressure. This structure is also simple, varnish in the high concurrent request, there are some resource allocation needs attention, such as Nfiles/varnish_max_threads/nuke_limit.
The technical issues of communication are still very much, including the reference to the monitoring framework at the well source, and especially the optimization of my rsyslog, the optimized Rsyslog is very commendable in terms of reliability (see reference 6 for optimization)
I have some of the electric business three aspects of the operation of the dimensional transport students to the problem of integration here, and some words will not repeat the description.
It is worth mentioning that the two side is another development leader, look is a very independent thinking of the students, he asked me a very interesting question, the general meaning is, in the system architecture, there are several levels, from the bottom up: the use of open source, proficient in open source, optimize and modify open source software, create open source software. Ask myself what level I am on. I seriously think about it, I should be at the second level, some proficient, some modified.
Electric quotient is the longest, at least two hours, at the end of the night has been 14, I think the boss of the electricity should be paid in the treasure inside give me some money just good, do not know if there is no millet students can tell ha. We should be talking about a lot of things, including a second kill solution, including an understanding of continuous integration and automated testing, and an understanding of data calculation errors in the development of back-end data business types, from time to time, to the "we think very consistent" assessment.
It was nearly midnight, memory into the low effect, some too trivial things to remember, the duplication of technical solutions are no longer to repeat. Here is a brief description of my solution to the second kill: 10w of data, from 0 to 10w, can not sell more. The current problem is that each time to the second kill may be at the same time to enter the 100w request/connection. How to break.
My solution is: excluding user, session, and other external dependent services, two ha outside the anti-concurrent connection (later thought this does not matter, as a PHP server), three PHP server (do not use any framework, the most simple pure PHP code), Two Redis (originally said one). The specific optimization conditions are as follows: Haproxy optimization can support millions concurrent connections, this is easy to nginx optimize worker connections, optimize nginx concurrency support capability and request queue reception capability PHP-FPM optimizes the Listen.backlog, optimizes the receiving ability of FastCGI request queue. Redis If the server does not fail within 1 minutes of the second kill, optimize the maximum number of redis connections to optimize all server network cards, sysctl parameters
PHP logic can be simply understood as a key to Redis incr atomic operation, if the current value of the return is less than 10w (two units Redis should be less than or equal to 5w), it is considered to be checked.
From the data I've seen before, Redis's best state is in 8w QPS. nginx+php in 08 has been optimized to 6000 QPS, the current server equipment (dual-core 16cpu+64g memory) reached 2, 3wQps should also be not difficult things (this latest data I do not know). The above configuration should at least be able to complete 10w Redis incr operations within 5s. Coupled with system system for the request queue support, can almost do without error, a short delay.
If you consider 1 sets of Redis request will be very high, you can consider fragmentation, each 5w.
Of course, this is only thinking in less than 1 minutes to give the solution, from now on, Haproxy can not, nginx the ability to carry concurrent connections is also good. All the details need to be validated by a stress test. and the actual situation plus the reliance on other services (I do not know what else to cobwebs to remove interference), the scheme will be more complex. According to the electric business boss said, the actual situation is, the second kill service used more than 10 servers, seconds to kill occasionally some fault, millet do seconds to kill students, pressure is very big oh.
If you mention that you want to record the UID and the number of the checked user, or Redis.
(suddenly WPS's Linux version crashes, only to restore here, the latter part of the content is rewritten, may be a bit confusing)
To solve the problem, I drew a simple architecture diagram on the Whiteboard: Haproxy+nginx/php+redis,haproxy and nginx/php are all linearly extensible, Redis can be extended through sharding. In theory, a scalable architecture can meet any performance requirement, not to mention such simple logic, which can be very high in single machine performance.
The owner of the electrical business Wang asked me the plan when asked what is the difficulty of this demand. I looked at the Whiteboard smile: At present, there should be no difficulty. If you have a problem, you should look at the log and service status as well as the server status.
Four side chat very much, each other several times want to end all of a sudden out of a problem, each will be discussed for a long time, such as the background of some of the calculation operation is more suitable for Java, because Java can be more rigorous. I say this may not be a language problem, but a programmer's habits and quality issues, if you want to change, in fact, I would prefer a taste of taste, such as go, but also may be able to meet the performance of the problem.
And a sudden chat to continuous integration, I confess, my understanding of continuous integration in the tools to achieve automated testing and release of such a level, no practice experience. But one of my superficial perceptions is that the premise of continuous integration is automated testing, two difficulties in automated testing: 1, the design of automated test cases, and 2, programmers ' understanding of automated testing and the degree of psychological resistance. I have had a brief attempt at my current unit: Professional traditional Testers design test cases, and the requirements that programmers receive should include the product requirements of forward logic and the needs of test cases. The mark of development work is that the test cases that you write are fully passed on your own code, representing the completion of your own development work.
Speaking of this, the other side can not help hands out thumb. (haha haha)
More or less there are some other topics, I think that night like a speech is very exciting, but the time has passed by midnight, other details are not too remember, if I think or millet to participate in the interview of the students mentioned, I added.
The whole Millet interview two departments together total about 7 hours, this is the longest time I have experienced the interview ... Millet interview is very hard, the code word is also very hard today, it is already 1:30 A.M., if you think the above has helped you or interesting, put a money field or a person field bar: Http://me.alipay.com/chunshengster
Reference: Profiling:from single systems to data centers googleusercontent x-trace:a Pervasive network tracing Framework trace- Pervasive-network-tracing-framework http://blog.chunshengster.me/2013/05/smp_irq_affinity.html http://weibo.com/ 1412805292/zvnrdsqtt http://weibo.com/2173311714/zw40tv3D2 http://blog.chunshengster.me/2013/07/high_performance _rsyslogd.html Http://www.wandoujia.com/blog/from-qa-to-ep http://cbonte.github.io/haproxy-dconv/ Configuration-1.5.html#4-maxconn