Java Tour-Hardware and Java concurrency (God's Source)

Last Update:2015-05-04 Source: Internet

Author: User

Tags cas

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Magical fantasy novels, often have a "pig's foot" particularly powerful, others use the top of the ice arrows or fire ball, he does not, he only use a level of ice arrows, just right use, instant use, particularly powerful.

Why would he do it? Because he realized a kind of "the origin of God" or "source of power" things, mastered the nature of magic, the degree of control reached the extreme, so that the extreme, became the "pig's feet."

This lecture, for Java concurrency, is also such a thing, let us from the bottom, from the hardware level, understand the nature of Java concurrency, as if we have mastered the "source of God."

Modern computer

Let's start with the current development of modern computers (before 2015), the architecture solution, the low-level nature of the language, is essentially based on the current state of the hardware.

Calculation

In modern computers, the clock frequency of the CPU is not higher, but the direction of multi-core is moving forward. Because the CPU is already ridiculously fast, the pursuit of faster for the daily application of little meaning, so more pursuit of multi-core, parallel computing.

On Core 2 3.0 GHz, the execution of most simple instructions requires only one clock cycle, or 1/3 nanoseconds.

Data Read location	Spend CPU clock cycles	Time spent (in nanoseconds)
Register	1 cycle	1/3
L1 Cache	3-4 Cycles	0.5-1
L2 Cache	10-20 Cycles	3-7
L3 Cache	40-45 Cycles	15
Cross-groove Transmission		20
Memory	120-240 Cycles	60-120

Reference article:

http://duartes.org/gustavo/blog/post/what-your-computer-does-while-you-wait/

Speed

For speed, there is an image analogy, if we think of a CPU clock cycle as one second, then:

Reading information from the L1 cache is like picking up a draft sheet on the table (3 seconds);

Reading information from the L2 cache is like taking a book (14 seconds) from a bookshelf on the side;

Reading information from main memory is equivalent to walking down to the office building to buy a snack (4 minutes);

The time spent on a hard drive is equivalent to leaving the office building and starting a 03-month round trip around the world.

Cache

Cache everywhere, whether it is hard disk, network card, video card, RAID card, HBA card, there is a cache, caching is a relatively easy solution to solve performance problems, very effective, very useful.

Speed is increased by caching at the first level, for example:

CPU L1 only 512K,L2 is 2m,l3 only good server, is 18m,l3 very expensive;

Hard disk cache, generally only 64M, through this 64M cache to increase the speed.

Extension of a hard disk problem:

What if the power is off?

Server hard disk, you can ensure that the contents of the cache can be written to the disk, and the home computer hard disk is not available, which is a big price difference is one of the reasons.

Expand, EMS disk array, do very well, very quickly, is through the multi-head technology, to reduce the pressure of a certain disk, the internal cache.

As for the cache, one of the most frequently asked questions to address in architecture design is the sense of insecurity in reading and writing. When writing, make sure that the persistence device and the cache are written, read the cache. and the server-side devices, such as the above-mentioned server hard disk, these problems are taken into account.

Cas

Cas,campare and Swap, supported by all modern CPUs. It is also the foundation of all kinds of languages, no lock algorithm, no lock concurrency implementation.

This is the CPU hardware level, the OS does not know, so this implementation can bring a certain performance improvement, there are side effects.

CAS provides better performance in intense contention situations, in other words, when many threads want to access shared resources, the JVM can spend less time scheduling threads and spending more on the execution thread.

However, this is intended for advanced users, and is only available if you have a deep understanding of the scenarios that are really needed.

Reference article:

More flexible and scalable locking mechanism in JDK 5.0

Java concurrency

Using a routine to illustrate Java concurrency, demonstrate 4 scenarios: thread insecure, synchronized, analog CAs, atomic class, as follows:

Import Java.lang.reflect.field;import Java.util.hashmap;import Java.util.map;import Java.util.concurrent.countdownlatch;import Java.util.concurrent.atomic.atomicinteger;import sun.misc.Unsafe;/** * Demonstrates several implementations of Java concurrency * */public class Compareandswapshow {private static int factorialunsafe;private static int factorialsafe;  private static int factorialcas;private static long factorialcasoffset;private static Atomicinteger factorialatomic = new  Atomicinteger (0);p rivate static int SIZE = 200;private static Countdownlatch latch = new Countdownlatch (SIZE * 4);p rivate Static object lock = new Object ();p rivate static Unsafe unsafe;//gets CASTest's static field's memory offset static {try {field field = Unsaf E.class.getdeclaredfield ("Theunsafe"); Field.setaccessible (true); unsafe = (unsafe) field.get (null); Factorialcasoffset = Unsafe.staticfieldoffset (CompareAndSwapShow.class.getDeclaredField ("Factorialcas"));} catch (Exception e) {e.printstacktrace ();}} /** * Stores the final result of each calculation method */private static map<string, integer> factOrialmax = new hashmap<string, integer> ();p ublic static void Main (string[] args) throws Exception {for (int i = 0; i < SIZE; i++) {New Thread (new Increamunsafe ()). Start (); New Thread (New Increamsafe ()). Start (); New Thread (New Increamcas ()). Start (); New Thread (New Increamatomic ()). Start (); Latch.await (); System.out.println ("Increamunsafe Result:" + factorialmax.get ("unsafe")); System.out.println ("Increamsafe Result:" + factorialmax.get ("safe")); System.out.println ("Increamcas Result:" + factorialmax.get ("CAS")); System.out.println ("Increamatomic Result:" + factorialmax.get ("Atomic"));} /** * Non-thread-safe factorial * */static class Increamunsafe implements Runnable {@Overridepublic void Run () {for (int j = 0; J < 1000 ; J + +) {factorialunsafe++;} Recordmax ("unsafe", Factorialunsafe); Latch.countdown ();}} /** * Thread safety factorial * */static class Increamsafe implements Runnable {@Overridepublic void run () {synchronized (lock) {for (int j = 0; J < 1000; J + +) {factorialsafe++;}} Recordmax ("Safe", factorialsafe); LATch.countdown ();}} /** * thread-safe factorial of the CAS algorithm, the Java Atom class is so, the dead loop, is crazy squeezing CPU * */static class Increamcas implements Runnable {@Overridepublic void Run () {for (int j = 0; J <; J + +) {for (;;) {int current = Factorialcas;int next = Factorialcas + 1;if (Unsafe.compareandswapint (Compareandswapshow.class, Factorialcasoffset, Current, next) {break;}}} Recordmax ("CAS", Factorialcas); Latch.countdown ();}} Static Class Increamatomic implements Runnable {@Overridepublic void Run () {for (int j = 0; J <; J + +) {Factorialat Omic.incrementandget ();} Recordmax ("Atomic", Factorialatomic.get ()); Latch.countdown ();}} /** * Record The final result of each thread * * @param key * @param target */static synchronized void Recordmax (String key, int target) {Integer VA Lue = Factorialmax.get (key); if (value = = null) | | (Value < target)) {Factorialmax.put (key, Target);}}}

Focus on Scenario 1:factorialunsafe++, which is not atomic, will be decomposed into 3 CPU instructions: Fetch, add 1, assign, and when the CPU is dispatched, in the middle of these 3 instructions, a thread-safety issue can occur.

Attach a Picture:

Summarize

Why Atomicinteger in the realization, with a dead loop? The reason is because the CPU is too fast. Some other concepts, such as "spin lock", are also the reason.

Performance write to the extreme, is can be used on L1, L2,java programmer is not able to do, away from the bottom too far. In the scenario, the biggest performance bottleneck is often the database, not the program. If the problem is in the program, it is basically a bug in the program, not the program is not optimized to the extreme. For example: Once encountered a CPU load gradually reached hundreds of of the scene, and the final answer is a special case of thread dead loop.

The CPU added a lot of caches in order to improve the speed, which has an effect on our program. If it is a CPU, is there no thread safety problem? Still, when the CPU is scheduled, it can still be interrupted between 3 instructions.

Valotile keyword, shielding the L1, L2, let the variable between the results of writing to memory, so the other threads read must be the latest results, realize the read thread security.

ThreadLocal, this class, in the Java language, closely follows the thread, storing the variables of the thread itself, much like the CPU's L1, L2.

Since the CPU is so fast, why does the CPU also often load alarm? Aside from the application level reasons, in terms of language, at the operating system level, there are locks, Strace followed, found a lot, Futex, this is the JVM in order to ensure the order of code execution. Although my code is not locked, actually there is. Doing c is much faster than Java, which is why.

Programmer's Sky

Programming, like writing, painting, composing, is first a creative activity, not a technical work.

Of course, it's important to keep practicing and being familiar with a technology or programming language, which is learning to use tools and techniques, but it doesn't make you into a better programmer in nature. It just allows you to use the tool more skillfully.

And what makes you a better programmer is learning how to think, because eventually you convert the logic in your mind into a series of instructions for manipulating your computer so that the computer can follow the instructions to solve the problem.

Learning how to think correctly-how to abstract, how to combine, how to analyze information, how to self-examine-can be done in a variety of ways, far from programming.

There is a book: "Hackers and painters", have time to look over.

Postscript

Knowledge is fragmented, a lot of people can know, if can string up, is not easy, this is a mastery of the stage, the first chapter of enlightenment.

Depth, and breadth are mutually reinforcing.

Depth is more needed at the technical level, and the breadth is more needed at the management level.

The two complement each other and are invincible.

Java Tour-Hardware and Java concurrency (God's Source)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More