Reproduced from Aboutspeaker in the answer.
Since you are a student in school, and the basis of programming language and data structure is good, I think it should be in the "Operating system" and "computer Architecture" two courses, and then to read the programming aspects of Apue, UNP and other books.
The following is a brief discussion of my views and suggestions on learning these two courses, are standing on the point of view of the server programmer, fromthe standpoint of pragmatism (pragmatic) .
The purpose of learning the operating system is not to let you invent your operating system kernel, to defeat Linux, or to become a kernel developer, but rather to understand what the operating system provides for the user-state process, and what is good for programmers to take advantage of this environment which are useless and which are not helping.
The purpose of learning computer architecture is not to design your own CPUs (new ISA or microarchitecture), to defeat Intel and ARM, or to participate in CPU design teams to improve existing microarchitecture, but to understand the capabilities and features of modern processors (e.g. pipelining, multiple launches, Branch prediction, order execution, and so on instruction-level parallelism, memory locality and cache, multi-processor memory model, visibility, reordering, etc.), in the programming time through the appropriate organization of code and data to play the CPU performance, avoid pitfalls. Modern microprocessors
How do you learn these two courses? What books do you read? here I'm going to tell you a general way to the American computer department's top-of-the-line course home page to find out about the course outlines, handouts, bibliographies, readings, follow-up exercises, after-school assignments, programming experiments, final projects, and more in the last few years of these two courses.
to learn any course should be good at grasping the main contradiction, prioritize, highlight The key point is to master the Knowledge framework, learn the real useful knowledge and skills, rather than the average distribution of energy in some trivial.
Allow me to quote Meng Yan's view again:/http/blog.csdn.net/myan/article/details/5877305
I (Meng Yan) advocate, after having the foundation, learn any new thing, must seize the main line, highlight the key point. For the key theory of learning, to focus on, quick. But the side branch stub and the non-essential knowledge content, completely may leave the practice to piecemeal.
The reason for this is that any high-level knowledge content, only a small part of it, has a creative, significant impact, and many other things are trivial, non-essential. Therefore, the concentration of learning must grasp the really important part of the other things to the practice. For the key knowledge, only focus on learning its theory, to ensure the system, coherence, correctness, and for those side branches, only the side of the study can let you know their true value is small, to let you leave a more vivid impression. If you put your energy in the wrong place, such as using a concentrated chunk of time to learn the small skills that you would have only to check the manual, and for the really important, thoughtful things in the usual piecemeal, then it must be less, or even counterproductive.
So I am dissatisfied with most of the development books on the market-they are basically geared towards the knowledge system itself, not the reader. Always put all the relevant knowledge details in a pile, and then pile it up into a book. Reflected in the content, is the straightforward, without emphasis on the details, often in the third chapter before the use of boring details of the murder of the reader's enthusiasm.
For example, the operating system should focus on process management and scheduling, memory management, concurrent programming and synchronization, efficient IO, and so on, and not too much to initialize (from the BIOS load boot sector, set GDT, into protected mode) this one-off task. I found that the domestic Linux kernel books often put the initialization details in the first few chapters, and foreign books are usually placed in the appendix, you can experience. Initialization is of course important to the operating system itself, but for someone who writes a service program in the user, figuring out why it is really useful to open the A20 address line on the PC? (It's just a historical burden.) )
For example, "Computer network", one of the key is to understand how to design a reliable network protocol under the condition of packet loss, heavy packet and disorderly order, which is not difficult. The hard part is that this reliable protocol achieves "the ability to take full advantage of bandwidth and be fair enough (sharing bandwidth roughly evenly for concurrent connections)". Instead of learning to hand over CRC32, this is better suited to information theory or other courses.
pay attention to distinguish the level of knowledge。 Just like the difference between making a car and driving a car, I think a driver's skill is mainly embodied in all kinds of road conditions and weather conditions can be safe driving (urban roads, highways, country roads X clear, rain, snow, fog), safe to reach the destination. As a driver, understanding the fundamentals of car operation is certainly a good thing, which can help to better drive and eliminate some common malfunctions. But it should not be distracting, as long as you are not really engaged in automotive design work, you how to study the engine, transmission, steering, and can not be more than the engineer of the car factory, after all, this is someone else's full-time job. And after studying the structure of the car more than a certain extent, the car is not much impact, became a personal interest hobby. "Some people learn to become a language expert, but forget that they are originally to solve the problem." (Meng Yan Fast mastering a language the most commonly used 50%)
For concurrent programming, mastering the correct usage of mutexes, condition variable, avoiding misuse (such as preventing busy-waiting and data race), avoiding performance pitfalls, is what the General Service programmer should know. How to achieve efficient mutexes is something that libc and kernel developers should be concerned about, and as hardware grows (the interconnection between CPU and memory changes and the number of cores increases), the best practices change. If you can't keep up with the development of this field, then the knowledge you have after delving into it may become cumbersome after a few years, and the best special practices for the hardware at that time (like customizing your own mutex or LOCK-FREE data structure) may actually slow down performance in a few years. It's better to write the code in the clearest way, making use of the ready-to-sync facilities of the language and library to let compilers and libc authors worry about "advancing with the Times".
pay attention to identifying outdated knowledge. Say"Operating System"Speaking of disk IO scheduling often speaks of the elevator algorithm, but now the disk generally built-in this function (NCQ), no need to worry about the operating system. If you are in a better school, the operating system course teachers should be able to point out these points of knowledge, to avoid wasting energy, if you rely on self-study, I have no good way, try to use the new version of the book it. There are similar examples of"Computer Architecture"May talk about the delay slot in RISC CPU pipelining, and now it seems to be obsolete."Computer network"A lot of similar cases, first of all, the OSI seven layer model has been proven to be nonsense, now the foreign popular textbooks are basically five-story model (Internet protocol suite), if your textbook also solemnly talk about the OSI (also portrayed as the hope of the future), throw a change. Second, the LAN level, Ethernet a single big (almost become synonymous with local area network), Fddi/token Ring/atm basically no company in use. Say that Ethernet, now also can not use the CSMA/CD mechanism (because 10M coaxial cable, 10m/100m hub is obsolete, the switch is also popular), so the collision detection algorithm requires "Ethernet minimum frame over the maximum propagation delay of twice times" this knowledge point to understand the line.
The other point is that the knowledge of low-level optimization is very easy to get out of date and avoids overfitting when coding (overfitting). For example, some textbooks in the country (especially the first one in the Freshman programming language) are also teaching "multiplication Fabiga subtraction, floating-point operation is slower than integer operation, the fastest bit operation," the outdated knowledge. The reality of the modern universal CPU is that the addition and subtraction of integers is almost as fast as the multiplication, and the integer division is much slower; the addition and subtraction of floating-point numbers and multiplication are almost as fast as integers, and floating-point division is much slower. Therefore, adding and subtracting instead of multiplication (or using bitwise operations instead of arithmetic operations) does not necessarily speed up, but makes the code difficult to understand. And the modern compiler can convert integer division of small integers into multiplication, without the programmer worrying. (the current use of floating-point multiplication instead of floating-point division seems worth doing, such as dividing by 10 by 0.1, because the specificity of floating-point arithmetic (which does not satisfy the binding law and the allocation rate) prevents compiler optimizations. )
An example of a similar low-level optimization is a codec that used assembly language to write a popular image format in the early years, but with the development of the CPU microarchitecture, it is no faster than the modern C language (possibly SIMD) version, but because of the use of 32-bit assembly language, resulting in the migration to 64-bit It's trouble now. It's better to use a third-party library if you can't send people to keep updating this private library. It is now possible to write in assembly language a much faster code than the C language: New instructions for specific algorithms using the CPU, such as Intel's new CPU (which will) have built-in instructions for algorithms such as AES, CRC32, SHA1, SHA256. However, the mainstream third-party libraries (such as OpenSSL) will certainly use these tools, and follow up in a timely manner, basically without their own fencing. (another example, if the company earlier in assembly language written a very efficient large integer arithmetic database, has been working well, originally wrote this Library of High man also promoted or another job. Intel released the new microarchitecture Haswell in 2013 with the addition of the MULX directive, which can further improve the efficiency of large integer multiplication GMP on Intel Haswell, so does anyone in your company keep up with the evolution of these CPUs and update the large integer operations library in a timely manner? or directly with the open source GMP library, let the GMP author to worry about these things? )
If you want to remember the conclusion, be sure to remember both the premise and the applicable conditions. The funny example of using the original correct conclusion on the wrong occasion is endless.
- in the Linux kernel source scenario analysis, the kernel uses the Gdt/ldt table entries to conclude that the number of processes does not exceed 4090. If you are going to remember this conclusion, be sure to remember that this was established on the Linux 2.4.0 kernel, 32-bit Intel x86 platform, and that the new kernel and other hardware platforms would probably not be established. Don't ever come to your mouth after reading the book "The maximum number of processes in Linux is 4090".
- A Linux process creates up to 300 threads, and this conclusion is conditional on 3GB user space with a line stacks of 10M or 8M. is not established under the 64-bit. The
- Reactor mode can only support no more than 64 handle, and this conclusion is conditional on the wfmo_reactor implemented using WaitForMultipleObjects functions under Windows, and the use of Pol under Linux The Reactor implemented by L/epoll does not have this limitation. The vector container of the
- C + + STL does not release memory after clear () and requires swap (empty vector), which is intentionally (adding the Shrink_to_fit () function to the c++11). Do not remember that all STL containers require swap (empty one) to free memory, in fact other containers (Map/set/list/deque) need only clear () to free memory. Only containers with the reserve ()/capacity () member function need to use swap to free up space, while in C + + only the vector and string are eligible.
The last bit of advice, server-side development has been popular 64-bit multi-core hardware platform in recent years, so when learning the operating system, you can not be too concerned about the unique approach of the single core (in the single-core era, the kernel code into the critical section of the way is shut down, but in the multicore era, this practice will not work), You don't have to spend too much effort on the 32-bit platform. In particular, 32-bit x86 in order to support large memory, must not have a lot of work around practices (the difficulty is that the 32-bit address space is not enough to map all physical memory into the kernel), which brings additional complexity, these practices at that time has its positive meaning, but now to go deep learning seems not worth it.
About the project, I have two practiced hand topics :
one, multi-machine data processing . There are 10 machines, each machine holds 1 billion 64-bit integers (not necessarily just 1 billion, there may be up and down tens of millions of floating), a total of about 10 billion integers (in fact, there is a total of 80GB data, not big, choose this magnitude is to consider the capacity of the VPS virtual machine, easy to experiment). Programming to find out:
1. The average of these numbers.
2. The median number of these numbers.
3. The maximum number of occurrences is 1 million.
. The 10 billion integers are sorted and the results are stored in the order of 10 machines.
. (Additional robustness requirements) Your program should be able to properly handle the various distributions (homogeneous, normal, Zipf) of the input data.
*6. (Additional scalability requirements) Your program should be able to scale smoothly to more machines, supporting a larger amount of data. For example, 20 machines, altogether 20 billion integers, or 50 machines, altogether 50 billion integers.
second, N-Queens problem of multi-machine parallel solution . Using multiple machines to find out how many solutions the N-Queens problem has. (Note that the current world record is N = 26,a000170-oeis)
1.8 Queen problem The operation time on a single machine is millisecond, there are 92 solutions, and the programming is realized.
2. Study the parallel algorithm of the N-Queen problem and write a single-machine multithreaded program to achieve a linear speedup (in CPU cores). Then try to extend the algorithm to multi-machine parallelism.
3. Use 10 X 8-Core machines (80 CPU cores) to solve the 19-queens and 20-queens problem and see how much time is required for each. Can your solution scale smoothly to more machines?
. If the 10 machines have different models, 8 cores have 16 cores, and the old CPUs have faster new CPUs, what load balancing strategy should you use to shorten the time to solve the problem (at least better than the plain round-robin algorithm)?
You can use Amazon EC2 or Google GCE to verify your program's correctness and performance, both of which are charged by the hour (or even shorter), and it's not going to cost much to open 10 virtual machines for an afternoon experiment.
Linux C + + server side This line how to go (reprint) (often see new)