Tachyon cluster: zookeeper-based master high availability (HA) High Availability Configuration implementation

Last Update:2014-08-27 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Introduction to tachyon

Tachyon is a highly fault-tolerant distributed file system that allows reliable file sharing in the cluster framework at memory speed, just like spark and mapreduce. Tachyon achieves high performance through information inheritance and memory intrusion. The tachyon working set file is cached in the memory, and different jobs/queries and frameworks can access the cached file at the memory speed. Therefore, tachyon can reduce the number of times that datasets that need to be frequently used are accessed by accessing the disk.

2. What problems does tachyon solve?(From the tachyon distributed memory file system)

1. Shared Memory Data slow between different frameworks
Given a scenario, the output result of a mapreduce task is stored in tachyon, And the spark job reads the output of a mapreduce task from tachyon as the input. If the disk is implemented as a file, the write performance is very low. However, if memory is implemented, the write performance is very high, and fast write makes spark job do not feel that this is the operation of two computing frameworks, because the write and read speed is very fast. Similarly, you can use Impala output results as SPARK input.
2. Spark executor crash Problems
The execution engine and storage engine of spark are both in the executor process, that is, multiple tasks are running in one executor, and the memory of the executor is put into the cache RDD.
The problem arises. Once my executor crashes, the tasks will fail and the RDD blocks of these caches will be lost. This will lead to a recompute process and re-fetch data, recursive Calculation of lost data based on kinship is of course resource-consuming and inefficient.
3. Memory Redundancy
The memory redundancy mentioned here means that different jobs in spark may read the same file at the same time. For example, the computing tasks of job1 and job2 must read the data in the account information table, so we all cache this account table in their respective executors. Is there a data and two memory copies? In fact, this is completely unnecessary, is redundant.
4. GC time is too long
Sometimes it is not the Code itself that affects program execution, but because too many Java objects are stored in the memory. If the executor has too many cache objects in the JVM, for example, 80 GB up, how many times of full GC occurs at this time, and you will wonder why my program does not move? You can check the GC log. It was originally in GC.

3. Implement Fault Tolerant tachyon Cluster Based on zookeeper

3.0 prerequisites

Hadoop version: 2.2.0.2.0.6.0-101
Zookeeper version: 2.3.5
Tachyon version: 0.4.1

Cluster status:

Cluster	Masters	Slaves
Tachyon	Bigdata001, bigdata002	Bigdata001, bigdata002, bigdata003, bigdata004, bigdata005, bigdata006, bigdata007, bigdata008

Zookeeper URL: bigdata001: 2181, bigdata002: 2181, bigdata003: 2181

3.1 ha Architecture

3.2 configuration (CONF/tachyon-env.sh)

1. Refer to official documents: Fault Tolerant tachyon Cluster

① HDFS

export TACHYON_UNDERFS_ADDRESS=hdfs://[namenodeserver]:[namenodeport]

② Zookeeper:

Property name	Example	Meaning
Tachyon. usezookeeper	True	Whether or not master processes shocould use zookeeper.
Tachyon. zookeeper. Address	Localhost: 2181	The hostname and port zookeeper is running on.

③ Master node configuration

export TACHYON_MASTER_ADDRESS=[externally visible address of this machine]

Tachyon_java_opts to include:

-Dtachyon.master.journal.folder=hdfs://[namenodeserver]:[namenodeport]/tachyon/journal

④ Worker node configuration

export TACHYON_MASTER_ADDRESS=[address of one of the master nodes in the system]

2. cluster configuration

Master node configuration: add the following to the tachyon/CONF/tachyon-env.sh of the bigdata001 node (underline Section)

export TACHYON_MASTER_ADDRESS=192.168.1.101
export TACHYON_UNDERFS_ADDRESS=hdfs://192.168.1.101:8020export TACHYON_JAVA_OPTS+="  -Dlog4j.configuration=file:$CONF_DIR/log4j.properties  -Dtachyon.debug=false  -Dtachyon.underfs.address=$TACHYON_UNDERFS_ADDRESS  -Dtachyon.underfs.hdfs.impl=$TACHYON_UNDERFS_HDFS_IMPL  -Dtachyon.data.folder=$TACHYON_UNDERFS_ADDRESS/tmp/tachyon/data  -Dtachyon.workers.folder=$TACHYON_UNDERFS_ADDRESS/tmp/tachyon/workers  -Dtachyon.worker.memory.size=$TACHYON_WORKER_MEMORY_SIZE  -Dtachyon.worker.data.folder=$TACHYON_RAM_FOLDER/tachyonworker/  -Dtachyon.master.worker.timeout.ms=60000  -Dtachyon.master.hostname=$TACHYON_MASTER_ADDRESS  -Dtachyon.master.journal.folder=$TACHYON_UNDERFS_ADDRESS/tachyon/journal/  -Dtachyon.master.pinlist=/pinfiles;/pindata  -Dorg.apache.jasper.compiler.disablejsr199=true  -Dtachyon.user.default.block.size.byte=67108864  -Dtachyon.user.file.buffer.bytes=8388608   -Dtachyon.usezookeeper=true  -Dtachyon.zookeeper.address=bigdata001:2181,bigdata002:2181,bigdata003:2181"

Configure synchronization to all slave nodes: bigdata002, bigdata003, bigdata004, bigdata005, bigdata006, bigdata007, bigdata008

Because we want to use bigdata002 as another master, we need to modify the value of tachyon_master_address for the configuration of this node, as shown below:

export TACHYON_MASTER_ADDRESS=192.168.1.102export TACHYON_UNDERFS_ADDRESS=hdfs://192.168.1.101:8020export TACHYON_JAVA_OPTS+="  -Dlog4j.configuration=file:$CONF_DIR/log4j.properties  -Dtachyon.debug=false  -Dtachyon.underfs.address=$TACHYON_UNDERFS_ADDRESS  -Dtachyon.underfs.hdfs.impl=$TACHYON_UNDERFS_HDFS_IMPL  -Dtachyon.data.folder=$TACHYON_UNDERFS_ADDRESS/tmp/tachyon/data  -Dtachyon.workers.folder=$TACHYON_UNDERFS_ADDRESS/tmp/tachyon/workers  -Dtachyon.worker.memory.size=$TACHYON_WORKER_MEMORY_SIZE  -Dtachyon.worker.data.folder=$TACHYON_RAM_FOLDER/tachyonworker/  -Dtachyon.master.worker.timeout.ms=60000  -Dtachyon.master.hostname=$TACHYON_MASTER_ADDRESS  -Dtachyon.master.journal.folder=$TACHYON_UNDERFS_ADDRESS/tachyon/journal/  -Dtachyon.master.pinlist=/pinfiles;/pindata  -Dorg.apache.jasper.compiler.disablejsr199=true  -Dtachyon.user.default.block.size.byte=67108864  -Dtachyon.user.file.buffer.bytes=8388608   -Dtachyon.usezookeeper=true  -Dtachyon.zookeeper.address=bigdata001:2181,bigdata002:2181,bigdata003:2181"

3. Start the Cluster

[[Email protected] tachyon] #./bin/tachyon-stop.sh

Killed  processesKilled  processes192.168.1.103: Killed  processes192.168.1.101: Killed 0 processes192.168.1.102: Killed  processes192.168.1.104: Killed  processes192.168.1.106: Killed  processes192.168.1.105: Killed  processes192.168.1.107: Killed  processes192.168.1.108: Killed  processes

[[Email protected] tachyon] #./bin/tachyon format

192.168.1.101: Formatting Tachyon Worker @ bigdata001192.168.1.102: Formatting Tachyon Worker @ bigdata002192.168.1.103: Formatting Tachyon Worker @ bigdata003192.168.1.104: Formatting Tachyon Worker @ bigdata004192.168.1.105: Formatting Tachyon Worker @ bigdata005192.168.1.106: Formatting Tachyon Worker @ bigdata006192.168.1.107: Formatting Tachyon Worker @ bigdata007192.168.1.102: Removing local data under folder: /mnt/ramdisk/tachyonworker/192.168.1.101: Removing local data under folder: /mnt/ramdisk/tachyonworker/192.168.1.103: Removing local data under folder: /mnt/ramdisk/tachyonworker/192.168.1.104: Removing local data under folder: /mnt/ramdisk/tachyonworker/192.168.1.108: Formatting Tachyon Worker @ bigdata008192.168.1.105: Removing local data under folder: /mnt/ramdisk/tachyonworker/192.168.1.106: Removing local data under folder: /mnt/ramdisk/tachyonworker/192.168.1.107: Removing local data under folder: /mnt/ramdisk/tachyonworker/192.168.1.108: Removing local data under folder: /mnt/ramdisk/tachyonworker/Formatting Tachyon Master @ 192.168.1.101Formatting JOURNAL_FOLDER: hdfs://192.168.1.101:8020/tachyon/journal/Formatting UNDERFS_DATA_FOLDER: hdfs://192.168.1.101:8020/tmp/tachyon/dataFormatting UNDERFS_WORKERS_FOLDER: hdfs://192.168.1.101:8020/tmp/tachyon/workers

[[E-mail protected] tachyon] #./bin/tachyon-start.sh all mount

Killed 0 processesKilled 0 processes192.168.1.103: Killed 0 processes192.168.1.101: Killed 0 processes192.168.1.105: Killed 0 processes192.168.1.102: Killed 0 processes192.168.1.107: Killed 0 processes192.168.1.106: Killed 0 processes192.168.1.104: Killed 0 processes192.168.1.108: Killed 0 processesStarting master @ 192.168.1.101192.168.1.101: Formatting RamFS: /mnt/ramdisk (2gb)192.168.1.102: Formatting RamFS: /mnt/ramdisk (2gb)192.168.1.101: Starting worker @ bigdata001192.168.1.103: Formatting RamFS: /mnt/ramdisk (2gb)192.168.1.102: Starting worker @ bigdata002192.168.1.103: Starting worker @ bigdata003192.168.1.104: Formatting RamFS: /mnt/ramdisk (2gb)192.168.1.105: Formatting RamFS: /mnt/ramdisk (2gb)192.168.1.104: Starting worker @ bigdata004192.168.1.105: Starting worker @ bigdata005192.168.1.106: Formatting RamFS: /mnt/ramdisk (2gb)192.168.1.106: Starting worker @ bigdata006192.168.1.107: Formatting RamFS: /mnt/ramdisk (2gb)192.168.1.107: Starting worker @ bigdata007192.168.1.108: Formatting RamFS: /mnt/ramdisk (2gb)192.168.1.108: Starting worker @ bigdata008

[[Email protected] tachyon] # JPs

The process Number of the master and worker is 8315 master 8458 worker.

Start another master on another master node bigdata002:

[[Email protected] tachyon] #./bin/tachyon-start.sh master

Starting master @ 192.168.1.102

4. Test ha

View on the web page: http: // bigdata001: 19999

Kill the master process of bigdata001. The switching time is about 20 s. Check the new web UI: http: // bigdata002: 19999/home again.

5. View on ZK

[R [email protected] conf] # zkcli. Sh

[ZK: localhost: 2181 (connected) 61] ls/Election
[_ C_ae6213f4-a2e3-46f9-8fc0-5c5c64d7e773-lock-0000000027, _ c_12297d87-56fc-4cd9-8f8d-7312a6af4cc2-lock-0000000026]

[ZK: localhost: 2181 (connected) 63] ls/leader
[Bigdata001: 19998, bigdata002: 19998]

Tachyon cluster: zookeeper-based master high availability (HA) High Availability Configuration implementation

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Tachyon cluster: zookeeper-based master high availability (HA) High Availability Configuration implementation

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Tachyon cluster: zookeeper-based master high availability (HA) High Availability Configuration implementation

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support