DCOs Practice Sharing (4): How to integrate smack based on Dc/os (Spark, Mesos, Akka, Cassandra, Kafka)

Source: Internet
Author: User
Tags cassandra lua docker run haproxy

This article was selected for the CSDN geek headline


At present, to ensure the market competitiveness of the business, only by designing a usable and good-looking products, has been completely unable to meet the requirements. Consumers all over the world want their products to be smart enough to improve their user experience through Big data analytics. In short, the Internet of things and big data will ultimately be the driving force for life-changing technology.

A number of technical architectures and design patterns have emerged in recent years, and developers and scientists can use them to develop real-time data analytics workflow applications for big data and the Internet of things. The batch architecture, streaming architecture, lambda architecture, and kappa architecture are among the representatives. All of these architectures require a scalable, large data processing platform as a basis. So at the end of 2014, a set of mutually compatible, collaborative open source components was integrated as the base platform. Smack was born.

Smack includes Spark, Mesos, Akka, Cassandra, and Kafka, with the following features:

    • Contains lightweight toolkits that are widely used in big data processing scenarios
    • Powerful community support with open source software that is well-tested and widely used
    • Ensures scalability and data backup at low latency.
    • A unified cluster management platform to manage diverse, different load applications.

When deploying specific applications, big data platforms often need to be used in conjunction with common applications, in recent years, general application microservices, container technology in full swing, we require a platform skills management container, can also manage the big data platform.

There are many frameworks for managing containers, with the Docker camp, the Kubernetes camp, and the pros and cons. There are many platforms that can manage big data, from Hadoop to spark. But when deployed, it often requires separate operations for each cluster, a cluster of containers, a cluster of Hadoop, a cluster of spark, which increases the difficulty of operations and the overhead of hardware. Dc/os solves this problem by managing containers, common applications, big data applications in the same framework, sharing resources, and simplifying operations.

This article will show you how to run an application based on the Dc/os smack and how each component in the smack is integrated.

Overall architecture

is the overall architecture of a classic application based on smack. This application accesses a large amount of data and analyzes the data. Specifically, this application collects energy usage data from smart meters in the user's home, which is analyzed by big data to generate a regional energy consumption distribution map. Can be used by relevant departments to estimate energy consumption in another area.

, the data of the smart meter is sent to the data center via the Internet, which calls the HTTP interface of the metering service. The metering service sends messages through Kafka to the emulator service. Simulator Service Award data is stored in Cassandra.

Spark reads the data from the Cassandra to analyze it and stores the results of the analysis in Cassandra.

The emulator service can read the analysis results from the Cassandra.

When the user opens the webpage from the mobile phone and the computer, the webpage accesses the Metering service HTTP interface, the metering service reads the analysis result from the simulator service, presents to the user.

Detailed Design Prerequisites
    • Install a Dc/os cluster
    • Deploying a Dc/os command-line tool
Dc/os Service

Next, we want to ensure that all required Dc/os services are in a normal state. Some of the services in the following list are the core components of Dc/os, and we have listed them here because our application relies heavily on these components.


Marathon is the core component of the Dc/os, Dc/os is installed and comes with marathon. Before using him, we'd better look at his state.

| jq ".elected == true"
Marathon lb-external (Marathon load balancer for external access)

The default marathon load Balancer framework creates a load balancer instance for external access.

You can access this link for more information on how Dc/os uses the marathon load Balancer framework.

Quick installation:

package install marathon-lb

The marathon load balancer is based on haproxy and can be accessed http://p1.dcos:9090/haproxy?stats with the following URLs. Mesos DNS as Dc/os internal DNS records the marathon load Balancer's domain name as Marathon-lb.marathon.mesos. Mesos DNS is also the core component of Dc/os. The marathon load balancer is installed on any Dc/os public node, for example P1 is a domain name for a public network node and can be accessed marathon load balancer via Http://p1.dcos.

Marathon lb-internal (Marathon load balancer for internal access)

You need to create another marathon load balancer for mutual access to internal components, without requiring network traffic between the internal components to go through the public network.

Quick installation:

cat < marathon-internal-lb-options.json{" marathon-lb ": {" name ": " marathon-lb-internal ",  "Haproxy-group":  "internal", false, " role ": --options=marathon-internal-lb-options.json marathon-lb        

The domain name of the internal marathon load balancer in Mesos DNS is Marathon-lb-internal.marathon.mesos.


Kafka is already in the Dc/os service library, so we can take it directly, without having to manage and maintain a Kafka cluster.

Quick installation:

package install --yes kafka

You only need to run the following command to verify the status of the service.


The Kafka service operates as a job for marathon, allowing for long-term operation, high availability, and elastic scaling. Installing Kafka takes a few minutes and you can view progress through marathon.

Kafka has three broker instances by default. You can customize the Kafka service to create more brokers based on the load situation you need to handle. The creation of topic in Kafka and the consumption of messages are handled by the application layer.


As a big data infrastructure, it is also necessary to run Cassandra on Dc/os. The Cassandra has been placed in the Dc/os service library.

Quick installation:

package install cassandraInstalling Marathon app for package [cassandra] version [1.0.0-2.2.5]Installing CLI subcommand for package [cassandra] version [1.0.0-2.2.5]New command available: dcos cassandraDC/OS Cassandra Service is being installed.

It takes a few minutes to install Cassandra. By default, Cassandra installs 3 nodes, 2 of which are seed nodes.

SSH to Cassandra Cluster

The Cassandra Cluster is already running and needs to be connected to the cluster below. Let's get the connection information first through the following command.

$ dcos cassandra connection{    "nodes": [        "",        "", "" ]}

Because IP is a private IP, we first have to ssh into the Dc/os cluster before we can practice the Cassandra cluster.

$ dcos node ssh --master-proxy --leader

Now we are inside the Dc/os cluster and can connect to the Cassandra cluster directly. We use the CQLSH client to select a Cassandra node to connect to. Run the following command.

$ docker run -ti cassandra:2.2.5 cqlsh>

Create Keyspace

We have connected to the Cassandra Cluster and created a keyspace named Iot_demo.

cqlsh> CREATE KEYSPACE iot_demo WITH REPLICATION = { ‘class‘ : ‘SimpleStrategy‘, ‘replication_factor‘ : 3 };

Created Keyspace, we can add some tables and mock data into keyspace, so our application can use Cassandra.

Service discovery based on Dc/os command line service discovery

We can use the Dc/os tool for service discovery. Inside the Docker entrypoint, you can embed a script, discover the service through the Dc/os command line, and export it inside the environment variable.

Akka's Service discovery

In order to discover the Akka node, in Docker's entrypoint script docker-entrypoint.sh, embed the following command:

export AKKA_SEED_NODES=`dcos marathon app show  | jq -r ".tasks[].host" | tr ‘\n‘ ‘,‘  | sed ‘s/,$//g‘`

The application configuration can use this environment variable. If you are the first node of the Akka cluster, create a Akka cluster, and if a Akka cluster already exists, you can discover and join the cluster.

We have considered the following special scenario:

The Akka node in the current container is the first node, in this particular case, the service discovers that the result of this step is to find itself, this result is correct, do the default processing.

Kafka's Service discovery

Similarly, we can also embed the following script in the entrypoint of Docker to discover all of Kafka's brokers.

export KAFKA_BROKERS_LIST=`dcos kafka connection --dns | jq -r ".names[]" | tr ‘\n‘ ‘,‘  | sed ‘s/,$//g‘`
Service discovery based on marathon load balancer

You can use both internal and external marathon load balancers as another way to discover services.

Deployment of the application tier

We have deployed the Dc/os service and configured the service discovery. Next, let's deploy the app to use these Dc/os services.

We will deploy the application tier using marathon to achieve long-running applications. The components of the application run as marathon tasks in Docker. The mutual configuration and dependencies between components can be achieved through marathon.

The application layer protects two microservices, metering services and simulator services, plus a simple Web page for presentation.

Metering Services

Metering Services form a Akka cluster, exposing the rest interface to the emulator service and Web page access.

The metering service is defined as the following JSON, sent to marathon for deployment

{  "id": "meter",  "container": { "type": "DOCKER", "docker": { "image": "cakesolutions/iot-demo-meter" } }, "labels":{ "HAPROXY_GROUP":"external,internal" }…}

The task definition of Marathon contains a special label Haproxy_group, through which the marathon load balancer knows whether to expose the application. "External" is the default marathon load balancer for external access, which indicates that the service can be accessed externally.

"Internal" is a marathon load balancer for internal access, stating that this service can be accessed through the following DNS by other components inside: marathon-lb-internal.marathon.mesos:1900. The emulator service can use this DNS to access the rest API of the metering service.

The marathon load balancer for external access needs to ensure that the internal DNS marathon-lb.marathon.mesos:19002 can be resolved to p1.dcos:19002 on the outside.

Web pages need to use this extranet to access the domain name, because the Web page is running in the browser, outside the data center, cannot use internal DNS.

Next, we call the following command to deploy the metering service.

add meter.jsonThe Marathon jobs can be redeployed by using either Marathon API, either DC/OS CLI. Traditionally now we’ve been using the Marathon API, directly or with the Python driver.
Simulator Service

The marathon JSON for the emulator service is as follows:

{  "ID ":"Simulator", "container ": {" type ":  "DOCKER", "docker": {"image": env ": {" meter_host":  " Marathon-lb-internal.marathon.mesos "," meter_port ":  "19002"}, "labels": {"haproxy_group":  "external"}}    

Similarly, we run the emulator service as a marathon task with the following command for a long time.

add simulator.json

The simulator service needs to know the API of the metering service, so we pass the DNS of the metering service as the environment variable to the simulator service.

The marathon load balancer for external access needs to ensure that the internal DNS marathon-lb.marathon.mesos:19001 can be resolved to p1.dcos:19001 on the outside. So that it can be accessed by Web pages.

Web Client

The Web page also uses marathon JSON as follows:

{  "ID ":"Web", "Container ":{ "Type ":"DOCKER", "Docker ":{"image": env ": {" meter_host":  "P1.dcos", " Meter_port ": " 19002 "," METER_ HOST ": " P1.dcos "," METER_PORT ": Span class= "Hljs-value" > "19001"}, "labels": {" haproxy_group ":                

A task that runs a Web page as a marathon.

add web.json

Web pages need to be able to access metering services and simulator services in a browser, so that two of the services ' DNS is passed as environment variables to the Web page Docker.

P1.dcos is the DNS domain name of the Dc/os public network node, and the marathon load balancer runs on this public network node.


Here we see how the frameworks in smack run on Dc/os, such as Kafka, Cassandra how these complex components are easily installed and configured, and how to build their own services based on these frameworks.

So we can conclude that Dc/os is indeed:

    • The most convenient way to deploy container applications in a production environment
    • The most convenient way to fully and efficiently leverage our infrastructure
    • It is very convenient to install different frameworks in the same cluster environment.
    • Provides a very convenient way to elastically scale a service.

All in all, Dc/os is a complete solution to solve your data center problems. As we all know, Dc/os is based on Mesos, which is highly reliable and is verified by the production environment.


DCOs Practice Sharing (4): How to integrate smack based on Dc/os (Spark, Mesos, Akka, Cassandra, Kafka)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.