Spark: A framework for cluster computing on a workgroup

Source: Internet
Author: User
Keywords nbsp we can through work

Translation: Esri Lucas

The first paper on the Spark framework published by Matei, of the University of California, AMP Lab, is limited to my English proficiency, so there must be a lot of mistakes in translation, please find the wrong direct contact with me, thanks.

(in parentheses, the italic part is my own explanation)

Summary:

MapReduce and its various variants have achieved great success in the application of large-scale data sets on commercial clusters. However, most of these systems revolve around a non iterative data flow model that does not apply to many of the current mainstream applications. The research in this paper focuses on the application of a set of workflows that reuse data across multiple parallel operations. This kind of application includes various machine learning algorithms and interactive data analysis tools. We propose a new framework called "Spark" to support such applications while preserving the scalability and fault tolerance of mapreduce. To achieve these goals, spark will introduce the concept of a so-called resilient distributed dataset (resilient Distributeddatasets (RDDs)). The so-called Rdd, refers to a read-only, partitioned distributed dataset, which, if lost, can also be rebuilt through other replicas for lost partitions. Spark can exceed 10 times times the speed of Hadoop in iterative machine learning operations, and can enable interactive querying of 39GB datasets at a nanosecond response time.

1. Introduction

Cluster computing, as a new model, has been widely circulated, which can be used in parallel computing on a cluster of unreliable machines, and can automatically provide local scheduling, fault tolerance and load balancing. Google pioneered the MapReduce model, while a similar Dryad system (Microsoft's Distributed Computing framework) and Map-reduce-merge framework (Yahoo's MapReduce open source implementation) support the model's data flow in broad terms Flow: This refers to the MapReduce data flow pattern in the following nouns, where the data flows are interpreted in the same way. These systems realize their scalability and fault-tolerance, and provide a set of programming model, so that users can easily create a non-cyclic data processing process to operate the input data. and allows the underlying system to manage the scheduling of tasks without user intervention.

This type of data flow programming model is very effective in a large number of applications, but it is not effective in dealing with the applications of aperiodic data streams. In this article, we focus on one of these applications: those that need to be reused, and that perform parallel operations across multiple task nodes and datasets. In such applications, I can see the many users of Hadoop and the two examples that the MapReduce framework itself is not well implemented by the various reports of academia industry:

A, iterative operations: Many common machine learning algorithms need to apply a method to repeatedly optimize parametric iterations on the same dataset (e.g. gradient descent algorithm). In MapReduce or Dryad, each iteration is represented as a new task, and each task must reload the data from the disk, causing significant performance losses.

B, interactive analysis: Hadoop often performs ad hoc exploratory data queries on large datasets via SQL interfaces, such as hive or pig, ideally, the user loads the data set of interest into the memory of multiple computers and queries repeatedly. In Hadoop, however, each query is treated as a separate mapreduce job, which reads data from the disk, so it can be significantly delayed (at least a few seconds).

This paper presents a cluster computing framework called "Spark", which enables applications to handle parallel jobs while providing high scalability and fault tolerance like mapreduce.

The concept of an elastic distributed dataset (RDD) is designed in spark to represent a read-only collection of data that can be used for partition reconstruction after a set of partition data that is stored in a machine is lost. The user can cache a rdd in memory by explicit definition, which can be used in parallel operations like data in MapReduce and reused across multiple work nodes. Rdd follows the traditional fault-tolerant concept: If the partition data of the storage Rdd device is lost, the missing Rdd can deduce enough information from the other Rdds to reconstruct the partition. Instead of the traditional shared memory approach, Rdds represents a new concept that focuses on finding the best balance between reliability and scalability, so we find that they can be applied to many scenarios.

Spark is an integrated development in the Scala language, a static type of high-level language that runs through Java virtual machines and opens functional programming interfaces, as well as a dryadlinq (Microsoft's large Data parallel processing framework, similar to hive) How the command script executes. In addition, Spark can also use interactive command-line execution, which is an improved Scala interpreter that allows users to customize RDDS, functions, variables, and classes in a scripted way, and use them to manipulate clusters in parallel. We believe that Spark is the first system to allow the use of an efficient, general-purpose, interactive programming language to handle large datasets on a cluster.

Although the spark we are currently implementing is only a prototype system, the early performance of the system is encouraging. We have shown in the experiment that spark has the ability to work more than 10 times times the iterative machine learning algorithm, and can achieve the second-level response in the interactive query for the 39GB dataset's Full-text scan.

The organizational structure of this paper is as follows: The second section introduces the programming model and Rdds of Spark. The third section gives some examples, and section fourth describes how we implement the system, including how to integrate it into the Scala environment and its interpreters. Section Fifth presents the results of the preliminary studies and the related work of the survey, and discusses the conclusions and future developments in sections sixth and seventh.

2. Programming model

With spark, developers are required to write, implement high-level control processes for their applications, and drivers for parallel operations. Spark provides two main architectures for parallel computing: creating resilient distributed datasets on datasets and parallel operations (calls to datasets by passing functions). In addition, spark can support the operation of two restricted types of shared variables on the cluster in a functional way, which we will explain in a later article.

2.1. Flexible distributed DataSet (RDDS)

A flexible distribution is a dataset (RDD) that is a collection of read-only data in a set of storage computers that can be rebuilt after a partition object is lost. Rdd elements do not necessarily need to be stored in physical media, instead, a RDD process includes how to get enough information from a reliable data store to handle the RDD. This means that if the RDDS task node fails, it can always be rebuilt.

In Spark, each RDD is represented by a Scala object, spark allows programmers to build Rdds from the following four aspects:

1. Obtain from the shared file system, such as the HDFS from the Hadoop Distributed File System (RDDS).

2. Create (for example, an array object built in a driver) from a Scala collection in the driver for parallel computing. This means that the set is divided into fragments and sent to multiple nodes.

3. By converting existing rdds. If there is a dataset with a type of atype adata, we can convert each element to a btype type by using an operation called Flatmap, and convert the:atype-> List (btype) as follows through a user-defined function. Other transformations can also be made using Flatmap, including mappings (through functions like atype–> btype) and filtering (element predicate matching).

4. Change existing RDD through persistence. By default, Rdds is delayed (delay refers to the initialization of the object is not to fill the content, only when the call to do data filling) and temporary. That is, a dataset partition is typically instantiated in parallel operations, such as passing a block of files through a map function, and then discarded from memory after use. But there are two ways users can persist RDD:

Ø the dataset placed in the cache will be deferred, but it will imply that after the first calculation it should be saved in memory rather than discarded because it will be reused.

Ø validates the data set processed by the operation and writes it to a distributed file system (such as HDFS, etc.). Take a version to save for future operation.

We note that caching serves only as a hint: if there is not enough memory to cache the datasets for all partitions in the cluster, then spark will not populate the datasets for the time being and use them only when calculating. We chose this design so that if a node fails or the dataset is too large to allow spark to work uninterrupted (performance degradation), the idea is similar to a loose virtual memory.

We also plan to extend spark support for other levels of persistence (such as replicating to multiple nodes in memory). Our goal is to allow users to rdd, fast access, or lose some of their data in a certain probability, and to weigh the costs between those who recalculate it.

2.2. Parallel processing

The following parallel operations can be implemented on Rdds:

1. Aggregation: Use related functions in the driver to aggregate the dataset.

2. Collect: Send all elements of the dataset into the driver.

3. Process: For example, a simple way to update parallel data is to map and collect the entire array in parallel.

4, Traversal (foreach): Through the user-defined function to traverse each element. Only the side effects of this function are obvious (this may be the function used to copy data to another system, as explained in the following shared variables). (Note: The side effect of foreach here is that foreach can only be used to traverse, not be used to modify the collection, otherwise there will be a lot of strange situations).

Note that spark currently does not support the group aggregation operations in MapReduce, and we can only collect a reduce result in one driver process. We plan to support group reduce in the future to reduce the "shuffle" transformation in the distributed data set. described in section seventh. However, even using a single reduce operation is sufficient to satisfy a variety of practical algorithms. For example, in Mapredcue's recent implementation of machine learning algorithms on multi-core systems, it is pointed out that at least 10 algorithms do not support parallelization, but most of them can be solved in spark.

2.3. Shared variable

Programmers can pass closures in spark (closures are self-contained functional code blocks that can be used in code or used as parameters to pass values.) Calls operations such as mapping, filtering, and aggregation. As a typical functional programming approach, these closures can be created anywhere by reference variable scope. Typically, when a spark work node runs a closure, the variables are copied to the workspace. However, spark can also allow programmers to create two restricted shared variables to support the following two simple and common usage scenarios:

Broadcast variables: If you have a large read-only fragment (such as a query table) that needs to be used for multiple parallel operations, it is best to distribute it to each workspace one at a time instead of wrapping it in every closure.

Accumulator: A variable used as an accumulator that the workspace can only "add" and its associated operations, and that only the driver can access it. They can be used to provide the cumulative function of counters in parallel computing on MapReduce. An accumulator can be defined as any type, with an "add" operation and a "0" value. Because of its "only add" operation, so there is good fault tolerance.

3. Sample

We show some of the spark programs as examples, and note that we ignore the types of predefined variables, because Scala uses type inference, but Scala is a statically typed language that executes in the same way as Java.

3.1. Text Query

Assuming that there are a number of oversized log files on the HDFs, we need to find the rows that represent the errors. You can generally start by creating a file DataSet object in the following ways.

Val file = Spark.textfile ("hdfs://...")

Val errs = File.filter (_.contains ("ERROR"))

Val ones = Errs.map (_ => 1)

Val count = Ones.reduce (_+_)

We first create a distributed dataset using the files in HDFs as a collection of rows. The dataset is then used to create a set of rows containing "error" (errs), and then each row is mapped to 1 (one row, the count is 1), and the aggregate function is used to add the data. Where filter, map, and reduce are Scala's functional function names.

It should be noted that because the Rdds is deferred, the errs and ones are not implemented at initialization time. Conversely, when reduce is invoked, each work node scans the input module, reads the data in a streaming fashion, and adds the count to the driver. Through this kind of delay processing data set way, let spark to simulate mapreduce accurately.

But unlike other frameworks, spark can allow some intermediate datasets to achieve continuous crossover operations. For example, if we want to reuse the errs dataset, simply use the following statement to create the RDD from the cache:

Val Cachederrs = Errs.cache ()

After this step, we can call errs from the cache, and do it in parallel as we call other datasets. But the content of errs is the result of the cache in memory after our first calculation, so invoking the cache will greatly speed up the subsequent operations.

3.2. Logistic regression

The following program implements the logical regression, and through iterative classification algorithms, attempts to find a hyperplane W to separate the two-point set.

The algorithm uses gradient descent method to start to give W a random value, and in each iteration to the results of W, in order to move the direction of the W to optimize the results. This does not explain the details of the logistic regression, but we can use it to show some spark features. Because implementing this algorithm on spark benefits from the ability to iterate over data in memory.

Read points from a text file and cache them

Val points = Spark.textfile (...). Map (parsepoint). Cache ()

Initialize W to Random d-dimensional vector

var w = vector.random (D)

Run ListBox iterations to update W

for (I <-1 to iterations) {

Val Grad = Spark.accumulator (new Vector (D))

For (p <-points) {//SETUPCL in parallel

val s = (1/(1+exp (-p.y* (w dot p.x))-1 *p.y

Grad + + s*p.x

}

W-= Grad.value

}

First we create a RDD node named points, and we handle it by running a loop. The FOR keyword is the syntax used in Scala to represent the call loop, where the loop body resembles the Foreach method. In other words, for code for (p <-points) {The body} is equivalent to Points.foreach (P =>{body}) (This is Scala-specific, calling all operations as objects), We call the Spark parallel foreach operation.

Secondly, we define a gradient accumulator named Grad (type vector). It is important to note that a overloaded operator = = is used in the cyclic summation, which uses the additive syntax in the For loop, which looks much like a serial operation program. In fact, this example differs from the traditional version of a logical regression with only three lines of code executing serially.

3.3. Alternating least squares

Our final example is the use of the so-called alternating least squares (ALS) algorithm. ALS is used to deal with the problem of collaborative filtering, for example, we want to predict their favorite movies (such as those in the Netfix Challenge) by viewing the history and scoring of movies. Unlike the previous example, the ALS algorithm is CPU-intensive rather than data-intensive.

We briefly describe the ALS algorithm for readers ' reference. Suppose we need to predict the user U's score on the movie m, and we've got a lot of previous user's view of the movie's Data Matrix R. The ALS Model R is the result of the operation of the two matrix M and U, and the dimensions of M and u are respectively M * U and K * U. In other words, each user and film has a K-dimensional "eigenvector", describing its characteristics and the user's evaluation of it, the feature vector is the user rating and film characteristics of the inner product. als resolves a predictive algorithm that uses the known viewing evaluation of M and U, and then calculates the unknown value of the m*u matrix. The following uses an iterative process to implement:

1, the use of random value initialization m

2, the calculation optimizes u given M's predictive model R, minimizes the error.

3, calculates the optimization m given U's predictive model R, minimizes the error.

4, repeat 2, 32 steps until convergence.

ALS can update different user \ Movie information by running Step 2 and step contest in parallel on each node. However, all of the steps have the use of model matrix R, so we can turn R into a broadcast variable, which is very effective. This will not require that it be sent back to each node in all the steps of each node. ALS is achieved through spark, as shown in the following illustration. Note that we have obtained the 0 loop U (loop is Scala's scope processing method) and the Collect method (used to pour all elements in the RDD into the Scala collection type) to update each array through the Parallelize method.

Val Rb = Spark.broadcast (R)

for (I <-1 to iterations) {

u = spark.parallelize (0 loop u)

. Map (J => UpdateUser (J, Rb, M))

. Collect ()

m = spark.parallelize (0 loop m)

. Map (J => UpdateUser (J, Rb, U))

. Collect ()

}

4. Achieve

Spark is built on Mesos, so it's a "clustered operating system" that allows multiple parallel applications to share cluster resources in fine-grained ways, and provides an API to start application tasks on a cluster. This allows spark to run on any existing cluster computing framework, such as the Mesos interface of Hadoop and the MPI, and to share data storage with them. In addition, the spark built on the Mesos, but also greatly reduce the difficulty of programming, so that everyone easier to use the spark.

The core of Spark is that the elastic distribution is the dataset. The following example assumes that we define a cached dataset Cachederrs to represent the error message in the log, and we use the map and reduce content, as shown in section 3.1:

Val file = Spark.textfile ("hdfs://...")

Val errs = File.filter (_.contains ("ERROR"))

Val Cachederrs = Errs.cache ()

Val ones = Cachederrs.map (_ => 1)

Val count = Ones.reduce (_+_)

These datasets are stored as an object chain to connect each RDD, as shown in Figure 1. Each data object contains the transformation information that I point to its parent object and its associated information, as well as the parent object to itself.

Internally, each Rdd object implements three identical brief interfaces, including the following three actions:

Øgetpartitions, returns a list of data chunking IDs.

Øgetiterator (partition), iterating over a chunk of data

Øgetpreferredlocations (partition), which is used for task scheduling to achieve local data characteristics.

When the dataset is invoked for parallel operations, Spark creates a task and distributes the tasks to each node that handles each data chunk of the dataset. We managed to send each task to its preferred location (the optimal location), a technique called "deferred scheduling" (Delay scheduling). Once a working set job is performed, each task requires a Getiterator method to read the data chunking.

How do different types of RDD be implemented? Different rdd are just different interfaces. For example, for a hdfstextfile, the data chunking is the ID on the HDFs block, and the preferred location is the location of the blocks. Getiterator opens a data stream to read block. In a mapped dataset, the data chunking and preferred location have the same parent object, and the map function iterates through the elements of each parent object. Finally, if it is Cacheddataset, the Getiterator method looks for a copy of the local cache after the data block conversion, and the starting point of the preferred location for each piece of data is equal to the preferred location of its parent object, but after the update, The data chunking is cached to a number of nodes that tend to be reused. This design makes the failure easier to handle: If a node fails, the data chunk is re-read from their parent object dataset and eventually cached to another node.

Finally, the job assignment for the data Delivery task requires sending a closure to them-the closure can be used both to define a distributed dataset or to manipulate reduce. To achieve this goal, Scala can serialize Java objects through closures. This is also a Scala feature that allows it to be relatively simple to send a variety of computing processes directly to another machine.

Scala's built-in closures are not ideal because we've found that the closure object is referenced outside its scope, not directly in itself. We have submitted the report on this bug, but before we fix it, we need to check these invalid variables by static analysis of the byte code of the closure class, and set the corresponding object to null within the closure.

Shared variables: There are two types of shared variables in spark, namely broadcast variables and accumulators. Can be implemented using custom serialization format classes. If a person creates a broadcast variable B, and the assignment is V, and V is a file that is saved in a shared file system. Then the serialization format of B is a path that represents the file. When the value of B is queried on the work node, spark first checks to see if V is in the local cache, and if not, reads it in the file system pointed to in the path. We initially used HDFS as a broadcast variable, but now we are developing a more efficient streaming global broadcast system.

The accumulator uses a different "serialization trick". Each accumulator is given a unique ID when it is created. When the accumulator is saved, his serialization form contains its ID and the "0" value of two types. On the work node, the accumulator is a separate copy created with each thread, using the running task of the thread local variable, and resetting the value to 0 at the beginning of the task.

Integration of interpreters: Because of the length of the paper, we simply describe how we can integrate spark into Scala's interpreter. Scala's interpreter is typically a command-line execution that is compiled one line at a time by a user typing a line. The following example contains a variable or function written in one line of code that we use to construct a separate object in one line, if the user defines an object:

var x = 5

Then immediately thereafter

println (x)

The interpreter defines such a class:

Class (Parameters: Line1)

Here line1 is the first line of code that contains x and is compiled in the second line as

println (Line1.getinstance (). x)

Each row of these classes will be loaded into the JVM and run. To allow spark to work in the interpreter, we have made the following two changes:

1, we did a definition for a shared file system interpreter output class. This allows the user to load the corresponding set of work jobs through a custom Java class loader.

2, we changed the way the code is generated so that a single object in each row can directly refer to the previous single object, rather than through static methods such as getinstance. This allows closures to refer to their current state when they are obtained, and the content is serialized and sent to a job set. If we do not, then a single object (for example, the definition of x=7 in one line of code above) is updated and will not be passed to the job set.

5. Conclusion

Although Spark is still in its infancy, we have demonstrated that spark as a cluster computing framework has great potential for development through the above three examples.

Logical regression: our example in 3.2 uses spark to implement a logical regression algorithm, and then we use a 4-core processor on the Amazon EC2 node, processing 29GB of data, setting 20 M1.xlarge (one m1.xlarge represents 15GB memory, eight EC2 units of calculation (four virtual cores, each core two EC2 computing unit), 1.69TB instance storage, 64-bit platform, high I/O performance platform. ) for logistic regression calculations. As shown in Figure 2, each iteration task takes 127 seconds on Hadoop because each mapreduce task runs independently. In Spark, the first iteration was 174 seconds (probably because Scala was used instead of Java), but it only took 6 seconds to iterate each iteration, because the data in each cache was more than 10 times times faster.

We also tried to test the damage to the node. In the case of 10 iterations, this slows down the average work by 50s (21%). The loss of data blocks makes the computation and cache transfer of the node revert to other parallel nodes, but the recovery takes longer, because we use a larger block (128M) setting for the HDFs, so each node is divided into only 12 blocks, All cores in the cluster may not be used during the recovery process. Smaller block sizes produce faster restores.

Alternating least squares: we have implemented alternating least squares in section 3.3, and we continue to measure the efficiency of copying shared datasets over broadcast variables to each node. We found that the most time-consuming operation, when not using broadcast variables, was a job that was based on the iterative resend rating matrix R. In addition, a simple broadcast implementation, such as using HDFs or NAS, allows the broadcast time to grow linearly with the node's data, limiting the scalability of the job. So we use an application-layer multicast system to solve this problem. Then, even with fast broadcasts, it is very costly to resend R in each iteration. Cached in-memory R uses broadcast variables, which provide 2.8 times times the performance of the EC2 cluster of 30 nodes under the volume of 5,000 movies and 15,000 users ' data.

Spark Interactive query: We used the SPARK interpreter to load 39GB of Wikipedia data, which spanned approximately 15 M1.xlarge EC2 nodes for interactive queries. The first query, which takes about 35 seconds, is almost as much as working on Hadoop. and subsequent queries, which only take 0.5 to 1 seconds to complete, even require them to scan the entire dataset. This is a qualitative leap, the experience is good, comparable to the local data set to work.

6. Some related work

Distributed memory: The Spark Elastic distributed dataset can be regarded as an abstract distributed shared memory (DSM), which has been widely studied. The Rdds is developed from two different interface modes of DSM. First of all, Rdds provides a more rigorous programming model, and if cluster node failure occurs, it can effectively reconstruct the relevant dataset. Although some DSM systems can achieve fault tolerance by checkpoint, spark the missing RDD data blocks by using the Rdd object to obtain information about its lineage's parent object. This means that only missing chunks of data need to be recalculated, and they can be recalculated in parallel on different nodes without having to restore the program to checkpoints. Additionally, there is no additional overhead if no node fails. Second, Rdds pushes the data to the MapReduce for calculation, rather than allowing any node to access the global address space.

Although there are other systems that provide DSM programming models, there are many limitations on performance, reliability, and programmability. Munin (Munin is also a DSM system that allows shared memory and systems executing parallel programs on multiple processors) requires programmers to use variables and access patterns to select a consistent protocol. But Linda (a kind of distributed programming language, mainly on the basis of network computer) requirements, using the programming model of the tuple space to achieve fault tolerance. Thor (an object-oriented database storage system designed to be used in a heterogeneous distributed environment, providing highly reliable and highly available persistent storage objects, and supporting the secure sharing of these objects) is the persistence of shared objects by providing an interface.

Cluster Computing Framework: The parallel operation of Spark inherits the Mapredcue model. However, it uses RDDS, which enables you to perform operations continuously across work jobs.

Twister also extends MapReduce support for iterative operations, enabling MapReduce to save long-term map tasks so that static data can be stored between jobs. However, Twister does not currently support fault tolerance. The concept of spark elastic distributed data is both fault-tolerant and capable of enabling MapReduce to have iterations. The Spark program can define and run multiple RDD and alternate applications on it, while Twister has only one map function and a reduce function. This also makes spark in interactive data analysis applications, allowing users to define multiple datasets and then query them.

Spark's broadcast variables provide a distributed cache of Hadoop-like files that can be distributed to all nodes running a particular job. However, broadcast variables can be reused in parallel operations.

Language integration: Spark language integration is similar to DRYADLINQ, which he can use. NET supports language integration queries, which define queries to obtain a directory list of tree structures and can run on clusters. However, unlike Dryadlinq,spark allows RDD to span parallel jobs in memory. In addition, spark enriches the language integration model by supporting shared variables (broadcast variables and accumulators), and can be customized for serialization in a class manner.

We are inspired to incorporate SMR (Scala MapReduce) in the Scala language, and Hadoop can use the Scala interface to define map and reduce tasks using a closure. Our contribution enables SMR to perform a more powerful serialization function by sharing variables and through closures.

Finally, although Ipython is an introduction for scientific workers, the Python interpreter can use the fault-tolerant task queue interface on the Compute cluster. But Spark has also introduced a similar human-computer interface, and we believe that applications that focus on data-intensive computing need more efficient programming languages such as Scala.

Tracing the origin of data is always an important research topic in the field of scientific computing and database. If the results of an application are specifically stated to allow them to be reproduced and recalculated by other applications, then for the results of this application, if an error is found in the workflow steps or in data loss, we recommend that the reader refer to the lineage retrieval Forscientific Data processing:a Survey, "A survey of Dataprovenance in E-science", "Map-reduce for machine learning Onmulticore. Spark provides a rigorous, parallel programming model that provides a simple way to trace and retrieve data in a fine-grained manner, allowing this information to be used to reconstruct elements of a missing dataset.

7. Future work and discussion

Spark provides three simple abstract data models for cluster computing programming: Resilient Distributed Datasets (RDDS), broadcast variables, and accumulators. Although these abstract data models are still very limited, we find that they have a strong enough advantage over existing cluster computing frameworks, including iterative and interactive computing to challenge many applications. In addition, we believe that the core idea behind Rdds is to get enough information from a reliable storage device to rebuild lost chunks of data and to prove useful in developing other abstract programming clusters.

Original link: http://blog.csdn.net/allenlu2008/article/details/39324123

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.