CPU working mode, multi-core, Hyper-threading technical details [repost]

Source: Internet
Author: User
Tags gromacs xeon e5

CPU is the soul of a computer, determines the overall performance of the computer. Now the mainstream CPU is multi-core, and some use multithreading technology (hyper-threading, referred to as HT). Multicore may also be easy to understand, believing that a lot of players can say why. But what does hyper-threading really mean, what is the difference between a CPU that supports hyper-threading and the ability to turn off HT, there's probably not a lot of people who can explain it clearly. To this end, I specially open this post to you to introduce the dual core, hyper-Threading technology. This paste combined with my usual work in the accumulation, with the manufacturer (Intel) Exchange experience, as well as private as a DIY player's understanding, and strive to achieve the most authoritative, most accurate, while ensuring easy to understand, I hope to use a few simple examples to let you quickly reach the level of knowledge of hardware experts.

But to say it beforehand,

1) This is a forum post is not a paper published, some knowledge points really can only be donuts.

2) Some can only be as accurate as possible, in order to ensure easy to understand, may not reach the academic level of accuracy.

3) This post emphasizes knowledge and understanding. And in reality, is to spend six hundred or seven hundred to buy a i3, or 1000 buy a i5, the specific situation of the specific analysis, there is no fixed answer.

4) If it is local tyrants, just a ' cool ' word. Do not seek cost-effective, only the most expensive. This post is not recommended to read, because all the theory can not explain why hang QQ need to use 4 Core 8 thread of i7.

Hope you finish this article, from this installed U no longer troubled!!!!!

Experienced players should be aware of the five most common Intel consumer-class CPUs below, saying they are consumer-grade for the purpose of differentiating from the Enterprise processor Xeon (log strong):

-Celeron is dual-core and does not support hyper-threading-starter player

-Pentium is dual core, does not support hyper-threading-low-end players

-i3 is dual core, supports Hyper-threading-midrange player

-i5 is 4 cores and does not support Hyper-threading-mid-high-end players

-I7 is a 4-core support for Hyper-threading-high-end gamers

And the strong low-end CPU, ordinary players can also use, such as

-E3 is a 4-core support for Hyper-threading-high-end gamers

Of course, the Perverted i7 extreme can reach 6 core 12 threads, 8 cores 16 threads, but generally are bought by enthusiasts, not common among ordinary players.

Some of the introduction of the E3, in fact, the scheme is basically the use of i7, such as the highly respected E3 1231v3, the U cost is very high, in fact, is to set display, can not manually overclock the i7. But the price is a lot cheaper, so-called i5 price, i7 performance.

CPU Architecture

To talk about Hyper-threading and multicore, we have to talk about CPU architecture and logic. Irrelevant technical details are too many to omit here. Let's focus on the two related modules in the CPU:

1) Processing Unit (Operation processing unit), referred to as Pu

2) architectual State (Architecture Status Unit), abbreviated as

Pu is generally performed operations, such as arithmetic operations subtraction. As performs a number of logical and scheduling operations, such as controlling memory access.

 

Single-core CPUs (start with a simple conversation)

Generally a piece of traditional meaning of the CPU will have a PU, an as.

metaphor : A small restaurant (single core CPU), husband, boss and chef kitchen stir-fry, wife and waiter order. This no, come a guest, first, go to the boss's cashier, look at the menu to prepare order. After almost 5 minutes, the guest ordered a bowl of rice. The boss has copied the single, handed to the husband after the kitchen. My husband started cooking. In this example, the owner can understand as, the boss/chef can understand PU (dry practical).

 

Multi-core CPUs

Multicore is said to be a plurality of physical cores, such as i3 's dual-core, i5 's 4-core. In this architecture, each physical nucleus has a pu and an as. So. For i3, there is a total of two Pu, two as. For i5, there are always over 4 pu,4 as.

metaphor : The small restaurant above, for 5, 6 guests may still be busy to come over. But imagine coming to him a 16 guests, the team estimated to be in the street. If I tell you again, there will be 16 new guests coming in every 10 minutes to order ... Finished. The business is not going to work-the boss and the landlady are too busy to die.

At this point, we need a larger unit canteen (multi-core CPU). There are 4 waiters and 4 chefs. 4 Waiter orders at the same time, 4 chefs at the same time to stir Fry (1th waiter dedicated to a chef orders, second service God special to second chef order ... etc.). This compares to the small restaurant a boss, a guest queue, here became 4 queues, efficiency immediately than the small restaurant increased 4 times times. 16 guests, evenly divided into 4 queues, each queue only 4 guests, the situation is not much better?

This should still be easier to understand.

 

Hyper-Threading Technology (HT)

The play comes, Hyper-threading is a thing. Is he the multithreading we usually call it?

Hyper-Threading (HT) is not what we generally call multithreading. We generally say that multi-threaded (multi-threading) refers to the procedural aspects, simply said to be ' soft ', code-level. Hyper-Threading generally refers to hardware architecture, which is ' hard ' : the ' logical kernel ' that is simulated by adjusting as.

To put it simply, hyper-threading is a physical core, with two as and one PU. Two as shares of one PU. Why do this, look at the following example:

analogy : Just the unit canteen, 4 waiters, 4 chefs, 4 queues. Will efficiency be a problem?

Yes!

Imagine that every guest has a single menu, and you can make sure that every guest has two eyes to order? Some guests will inevitably procrastinate, ask east, a dish it a 15 minutes. And imagine the chef on average fry a dish as long as 10 minutes. What about the rest of the 5 minutes? The chef is busy in the kitchen, drinking tea and reading the newspaper. All the time was wasted by the guest-waiter order.

Is there a way out? I think we should all be able to guess.

--- Add the waiter !

At this time, we add a waiter to each chef, from a waiter into two waiter (as), waiter 1 A and waiter 1 B open two queues, while giving a chef (PU) order. In this way, when the waiter 1 a guest 15 minutes of the list is not finished, the 1B guest list is likely 3 minutes to the chef to open Fire (PU), so the chef will not stand in the kitchen silly waiting for 1 a guest orders. In this way, the maximum size of the chef's workforce is squeezed (the chef is estimated to be dozens), and for the CPU, the CPU usage is maximized and the CPU's (idle) idle time is reduced. Sometimes, I really can't blame the chef (PU) not working hard, but you service students (as) called Tantai Ink.


In, orange and blue indicate that the chef (PU/CPU) is working, and the white lattice indicates that the chef (PU) is free. A diagram is a single core without a hyper-threading, B-figure dual core without Hyper-threading, Figure C is a single core enabled hyper-Threading. It can be clearly seen that the CPU usage has not increased from a single core to a dual core (in the absence of Hyper-threading). With Hyper-threading, the overall CPU usage has increased, albeit just a core.

The diagram on the left is a single-core hyper-thread, and the graph on the right is dual-core with no Hyper-threading. You see the difference?


  

Now look at the issues related to multicore and Hyper-Threading in practice:

1) i3 dual core 4 thread, and i5 4 Core 4 thread, is that the case?

First of all, I3,i3 is a dual core, and after HT, it becomes 4 logical cores (4 threads). Latest Win10 I don't know, but in Win7 the logic nuclei are shown as physical nuclei, and i5. Is that i3 and i5? If you think it's one thing, the stuff on top of me is all written in white.

I3 is 4 waiter two cooks, i5 is 4 waiter 4 cooks, do you feel the same????

2) that i5 4-core 4 thread, compared to the HT i7 (4-Core 8 thread)?

i5 is a 4 waiter and 4 cooks. i7 If HT was opened, it was 8 waiters and 4 cooks. Of course, from the CPU utilization, especially running the multi-process/thread, it is the HT i7 good.

3) that i5 4-core 4 thread, compared to the i7 (4-Core 4-thread) that was off HT?

i5 is a 4 waiter and 4 cooks. i7 If HT is turned off, it is also a 4 waiter and 4 cooks. At first glance, at least, the number of chefs (PU) and service students (as) is tied. But I7 's single-core processing ability is slightly stronger than i5, which means that i7 's cook is a class cook, a i5 cook. So in fact, i5 and i7 still have a gap, but theoretically speaking, the gap is not particularly large.

Summary: Theoretically speaking, the gap between i3 and i5 is quite large. The difference between i5 and i7 is mainly the quality of the cook (PU) and the number of the 4 waiters. In fact, the gap is not as large as the gap between I5-i3.

4) What are the advantages of having HT with the same CPU, such as i7?

- parallel capability Enhancement : The ability to handle multi-process/threading is enhanced, and it is more obvious to support multi-threaded games.

- Increased CPU utilization : In general theory, overall performance is increased by almost 20%-30%. From this point of view, i3 opens Hyper-threading, improving the overall level of 20%-30%. But does that mean you can tie the i5??? If this is true, i5 don't sell it either. Two chefs (i3), not I wait for a whip to pull the top 4 chef (i5) ....

5) What are the disadvantages of HT?

- single-core performance degradation :

Typically between 5% and 15%, the main manifestation is running a single-threaded thread. The extra overhead of two as is greater than the cost of an AS

analogy: only one guest to order a chef, but you two waiter standing there, and this guest may have a brain, think, I am looking for waiter 1 A, or waiter 1 B?? So one thought, half a minute passed by ... It's not as simple as just a waiter.

So in reality, we calculate the system test run points are generally to HT off, because the pursuit of extreme performance. Now the latest CPU can do 5%-15% performance loss, while the old Hyper-threading CPU, such as the old Pentium 4/10几 years ago, I have seen single-core performance over 50% performance loss, the extra overhead of starting HT is great.

- average power consumption increased by 30%. You asked 4 more waiters, do not pay???

- congestion is prone to situations where there are many cores , such as dual-slot servers.

analogy : Imagine a large canteen, there are 56 waiters (dual cpu,28 core, 56 thread Xeon E5 series CPU), came hundreds of people come over, is not going to mess up a set? Everyone just entered the cafeteria at the beginning do not know which team to row (generally decide which team, is the operating system set). (under the arrangement of the operating system) a guest, check out the 56 queues one at a time, and see which team is at least the number of guests in the line ....

What I want to ask is, in reality, you go to the cafeteria to eat, assuming there are 56 teams, you will be a one-check, find the least team, and then make a decision? It is estimated that you have finished 56 teams, and 15 minutes have passed, and your friends have finished their meals. Is it easier for you to reduce the team to 28 teams? (Of course, the 28 team is still tired enough)

the old system supports poor

Older win2008,win2000, for example, have poor support for Hyper-threading.

analogy: If the canteen is relatively empty, no one. At this time two guests A and B to order, the result two people ran to the same chef's two waiters 1 A and 1 B queue (generally this is the operating system is good), you can find what is wrong?

The correct approach should be a to a chef (1th physical core), B to the second chef (2nd physical core). You let A, b all squeeze to a chef there, second, third, fourth chef what is not, idle to death, meaningful?

In fact, the problem is that the operating system cannot distinguish between physical and logical cores. See there are two waiters, two queues, think there are two chefs, so the guests A and B sent to 1 A and 1 B to queue, completely do not know the actual situation of the kitchen-there are several chefs.

Back to reality, what kind of CPU do I need?

Here, I discuss the situation.

1) Internet, chat QQ, simple office use (such as Office document processing), the elderly machine

Celeron in fact it can. Celeron is a 2-Core 2 thread, in fact, compared with 2 core 4 thread i3, in dealing with this kind of application, throw away the main frequency, the difference of the cache, the advantages of i3 fully play out. Note that the price of i3 is almost 3 times times that of Celeron.

There is a Pentium, Pentium is actually a slightly higher frequency, the cache slightly larger Celeron. The same is the 2 Core 2 thread, performance is only a little higher than Celeron, but the price can buy almost 1.5 celeron. Personally feel no meaning, more out of this money, really better buy a high-level keyboard, mouse, monitor. At the very least, the experience of use is real.

2) Lightweight games, graphic workers (e.g. PS)

i3 actually quite fit. Small game, there are some web games, PS, and so on, although it is multi-threaded (such as PS), but in fact, the burden on the CPU will not be particularly heavy. Instead, bottlenecks can be disk I/O speed, and so on. So the hyper-threading i3 to deal with this kind of situation, in fact, the problem is not big.

3) Heavy-duty 3D game

Now the 3D game, will be many such as 3D accelerated task to the GPU to do, GPU work, the general CPU will be in the blocking (interrupt wait) state, until the GPU instructions are executed, the CPU continues. So there will be two bottlenecks, one from the CPU and one from the GPU.

For 3D games, generally see i5 fully competent, you say you want to i7? Of course, you have no problem with the i7 on the drum, the running points will certainly improve. But if the budget is limited, it may be easier and more straightforward to invest money in upgrading the graphics card. For example, i5 with high-end graphics such as 970 this more balanced, relative to the i7+950.

is the E3 value of having i7 performance worth getting started? Certainly worth the start. But if E3 1231v3 price by JS Price is fried over the head, still better use i5 forget.

4) 3D graphics worker

If there is a lot of 3D modeling in the work, what to render. The CPU is important and the GPU is important. The more CPU (logical) cores the better, because the various rendering methods, from the algorithm, can be highly parallel. Each logical core can give you the task queue to plug the full, the maximum amount of dry CPU performance. There will never be a big canteen, only one or two customers in this case. At this time, the difference between E3/i7 and i5 is likely to be very large.

GPU Burden is also heavy, and ordinary game graphics card such as GTX980 this may not be competent, and need to Quadro graphics card. Not to say that 980 is not strong enough, but because some graphics-related drivers/libraries are not joined GTX980 this game card, no driver can not run on the GPU, run can only rely on the CPU to simulate the run. As a result, the logic of the CPU itself is running, and the GPU is not going to run, and finally it's all about getting your CPU running. You say the CPU is not strong enough to survive?

So this kind of application, must choose a formidable CPU, for example i7, E3 this, even is the mid-range strong E3 Series-6 core 12 thread, 8 core 16 thread CPU.

Advanced-Why do we turn off Hyper-threading when the system runs out of tests?

At this point you may ask, since HT can improve the performance of the system, especially the ability to handle multi-threaded processes, why do you turn off the system when testing? For example, a 4-core 8-thread E3 1231v3 turns off HT, leaving only 4 cores and 4 threads, 4 waiters, 4 chefs, and 4 queues. Isn't performance getting worse? Is CPU idle time not high?

This is actually a very practical and interesting question, and we should, as a matter of principle, be Hyper-threading.

Example:

For example, 64 guests, each with a rice-covered, two cases

1) to a canteen of 8 waiters, 8 queues and 4 chefs, how many guests per queue? -8 of them.

2) to a canteen of 4 waiters, 4 queues and 4 chefs, how many guests per queue? -16 of them.

Which one is fast? Should be the first, because at the same time 8 waiters, staggered open orders, of course, can reduce the hesitation of a guest, grinding the delay caused by the. Keep 4 chefs busy.

Don't forget, we've already explored, after Hyper-threading is turned on, because the addition of 4 waiters will bring extra overhead-every guest will hesitate before the team, taking the time to think-"two teams how exactly should I row?" Which team is less? Which waiter looks at the eye-candy? ”。 This extra overhead (processing latency, performance loss) is hardware-level and is required to be dead when Intel designs the CPU. None of us can solve the hardware problem. And the only way to do that is to----> turn off HT. But turn off HT, each queue becomes 16 guests, and each service student, from the reception 8 guests, increased to 16 guests (as the delay from 8 parts, increased to 16 parts), how to break????

The play comes, the hardware we certainly can't change, but the software program we can optimize, we can rewrite the program's parallel scheduling algorithm, so that the program to maximize the CPU's natural hardware architecture optimization. The specifics of the algorithm are too professional to understand, I give the following example to say, you may understand:

Example:

For example, 64 guests, everyone wants to eat a rice. Came to a canteen of 4 waiters, 4 queues and 4 chefs. How many guests will each queue have? -16 of them.

Well, for each team, now I don't let these 16 people go in line, but from the team to elect 1 representatives, let this representative instead of 16 people to the waiter order. A list of 16 bowls of rice, the remaining 15 people back. In this way, there are only 4 guests (on behalf) of the order, the remaining 60 people rest below. On the point of single speed, each team will be at most (one representative) grinding. The back-hall chef received 16 orders for rice bowls, and only desperately did it. You can't fire a rice-bowl break for 5 minutes.

See, is the problem solved?

1) eliminates as extra overhead from 8 waiters and 8 queues

2) also maximize the use of the chef (reduce the idle time of PU)

As an over-counting system, everyone is pursuing extreme performance. The world's top 500 supercomputer performance rankings every year, a little bit of performance differences are likely to keep your rankings back a lot, so we all need to squeeze the system's last performance as much as possible.

At the same time, this example also tells DIY players, hardware important, software is also important. While hardware is strong, the software (driver) also needs to be related to the optimization. If the software is not targeted optimization, then strong hardware can not play 100% of the power. This also explains from the side why some hardware, belongs to the type of running the king. such as testing 3Dmark this, scoring high, and one to the actual game, the performance of a mess.

Buy hardware, to buy a lot of people, do not engage in too small things.

Carrot, not only the hardware performance is stronger, software optimization should be done in place

1) compared to the 4-core 8-thread (open Hyper-threading), 4-Core 4-Thread (timeout thread) in the processing (scheduling) multithreading disadvantage, we can completely modify the source code, this disadvantage to cancel out. The additional overhead of the hardware architecture of 8 threads (4 more hardware as), which can be understood as an integrated circuit level, is beyond our reach.

So it is like the sword and the Pope of Huashan.

The Sword Pope is: simply increase the number of program threads while turning on the CPU Hyper-threading feature.

The gas is: Modify the program, make the above changes to the algorithm, manually calculate the operation cycle, adjust the parallel strategy, the delay is hidden away.

Jian Zong, Qi Zong slow. Likewise practiced for 1 years, the sword is practiced to 6 levels of power, and the gas pope can only be 3 levels. But if given enough time, the limit of the sword and the pope can only be practiced into 9 levels, it can not break through. And the chi will eventually be able to practice 10 levels.

2) There is also optimization. This involves a balance problem. Performance vs. General nature.

For a simple example, if you are given an addition operation:

1+1+1+1+1+1+1+1 (8 1 Add, of course, in reality, such a small granularity of the operation is not necessary to parallel, not worthwhile. This is for example. )

First scenario (low performance + maximum versatility):

Any optimization is not done, the programmer as long as the first grade school graduation on the line. The program is too simple and clear, a line of work, throw to the CPU, do 7 operations, calculated once 1 seconds, this is 7 seconds. Multicore is not used at all, it is full of single-core performance. emphasis: A single sequence program (serial programs) such as 8 1 Add, you do not parallelize at the code level, it does not become a multi-core program itself. In other words: it will only use a core!!! Here, no miracles, no magic! How do you parallelize? Change your program, use pthread, fork, MPI, OpenMP ... And so many ways, the details are not much to say. If you are interested, please help me.

Total time: 7 seconds

Second scenario (good performance + high versatility)

Do parallel optimizations:

1) First, count your machines with several cores, which assume only physical cores. OK, count it out, there are 4 cores, and it takes 0.5 seconds (the time value is just an example).

2) At this time, I can be based on the number of cores (=4), the operation is split into 4 parts, producing 4 (program) thread, into the following form, and the number of hardware cores (=4) for 1:1 matching. The scheduling overhead of this 0.5 seconds clock,

3) Then, start the following calculation

(+) (+) (+) (+) The first round, each core is an operation, a total of 1 seconds

(+) (+ 1th) The second round, number 2nd, is an operation, for a total of 1 seconds.

(4+4) The third round only 1th nuclear work a total of 1 seconds

It took 1+1+1=3 seconds. And this also has the mathematical proof (Divide and conquer question, the concrete detail not much to say. Interested in the words to help a little Niang), I believe we have learned the logarithm, Log28 = 3.

Total time: 0.5+0.5+3=4 seconds

Third scenario, extreme (extreme performance + low versatility)

If, I know there are 4 cores in my system, is not: 1) Number of cores 2) scheduling costs can all be saved? All right, these time steps are all removed. Direct Ben Step 3).

----> In this case, it will last as long as 3 seconds.

However, this method is only suitable for 4-core machines, if you give him a dual-core or 8-core machine, the overall speed will be greatly reduced, the second scenario, because the second solution with a certain versatility and adaptability. And the third option is "dead", without brains. That is, when programming hard coding (translated to call dead Code or code), this programming habit is not recommended, because the program written out will be very poor.

Now I don't know if you can see the point?

In fact, our system testing is the third option, because we know the architecture of our system very well. There's no need to think about things like dual-core and 4-core CPUs (we typically use a 8-core u). The size of level two three cache in the CPU is fixed. In other words, our code optimization can be very extreme, fully oriented to specific models of hardware optimization. This optimization in exchange for low versatility, that is to say, we achieve the performance of the indicators, only on our system.

 

Let me give you an interesting example:

The Fermi generation of Nvidia graphics cards, which is GTX460, is basically about 300 cuda Core (which can be understood as a stream processor). The next generation of Fermi is Kepler (Kepler), which is GTX660, which is almost doubled in performance compared to 460. But how do you double that, you know? The GTX660 stream processor has been increased to around 1000. But the performance of a single stream processor, 660 only 460 is almost 1/2. So 660 is all about winning by quantity.

As a result, the problem comes. One of our users, the same GPU program (GROMACS) ran on the Fermi card and the Kepler card (the Tesla card, which is almost equivalent to 460 and 660), and the result Kepler was slower than Fermi, and ran a bit more times. According to reason, should not ah, the user to help us. We check that it is not a hardware failure, two cards are working properly, NV driver is also normal. Then, I began to look at the source code of the program, took some effort to find the reason!!! The number of threads called in that program is dead! Set to the maximum 256 thread (as for why 256 instead of 257, because Cuda has a thread wrap (line threads) concept, a package is 32, so the general bus path is a multiple of the habit of 32. Not much to say, interested can look at "GPU high-performance programming Cuda combat". In this case, on the 300-core Fermi, the basic entire card stream processor (almost) is full. But for 1000 stream processors the kepler,256 equals just about 1/4, and 700 more stream processors are idle from beginning to end. Don't forget, Kepler single-core performance =50%fermi single-core performance. So, then, why the Kepler card runs slowly, can explain.

We do not know the logic of Gromacs this software, in fact, there is no need to know, because we are not computational molecular dynamics experts. So later, we report our findings to the developers of the GROMACS program, let them optimize and improve the program, re-improve the parallel algorithm, increase the number of threads, the next version of the update finally solved the problem, so that Kepler this generation of graphics card can be perfectly supported.

Visible, targeted program optimization is how important!

(Below is my understanding as a player, not necessarily professional and accurate)

In the actual consumer market, a consumer-grade product, especially its hardware drive, take a third way, it is likely to appear ' run the king ' phenomenon. Here, I give an example of the video card, it may be easier for everyone to understand. Like many years ago, the ATI graphics card, hardware drivers for those running sub-software such as 3Dmark, it is possible to do extreme optimization. Why is that? To a large extent, because ATI (the second solution) can not spell nvidia (the second solution), and these manufacturers know that consumers buy video cards before the evaluation run points, then, I will give you this set of tin bogey horse racing: ATI (third program) to and Nvidia fight. The results do not fall off the peak, but the versatility may be greatly reduced. In practice, there are a variety of games, and if the game does not further match the graphics driver optimization, it will result in a large performance reduction. So here is a: Game program <--> hardware driver <--> hardware architecture three match each other, optimize the scheduling problem. (Of course, there may be more game engines in the real world, that is, 4 of them).

Crysis (Crysis) This kind of game developers (including CryEngine 3D engine developers), and the development of hardware-driven manufacturers, the idea is generally not the same. That is to say, the developers of the island must consider the universality of the platform, generally do not take particularly extreme development ideas (such as the third scenario). In fact, the game speed slow, sluggish, the general strange is a lot of manufacturers (ATI,NV), strange products do not give force, but you will blame the game developers write code garbage? Even if the island is always known as a hardware killer, what can it do? As a consumer, you can only continue to pay to upgrade the video card ....

If the island to take the second approach to thinking, then it is clear. 2 Core 4 threads, 4 cores 4 threads, 4 cores 8 threads are of course not the same. Because you never have absolute performance on the benchmark-how fast the island should be more fluent, whether or not the absolute performance of GTX980 This hardware to squeeze to the extreme. And you only may have the relative feeling-uses the nuclear small i3 to fight the island, certainly does not have the nuclear many i7 to fight the island so cool.

That, that's a pass.

Original address: http://bbs.zol.com.cn/diybbs/d231_821332_uid_qq_9rp1y964o048.html

CPU working mode, multi-core, Hyper-threading technical details [repost]

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.