Shifei: Hello, my name is Shi fly, from Intel company, Next I introduce you to Tachyon. I'd like to know beforehand if you have heard of Tachyon, or have you got some understanding of tachyon? What about Spark?First of all, I'm from Intel's Big Data team, and our team is focused on software development for big data and the promotion and application of these softw
Tags: hadoop spark tachyon
1. Configure the system environment
1. Clear default firewall rules
# Service iptables saveiptables: Save the firewall rules to/etc/sysconfig/iptables: [OK]
2. Disable SELinux
#cat/etc/sysconfig/selinux|grepSELINUX|grep-v^#SELINUX=disabledSELINUXTYPE=targeted#
3. Configure the IP address
#cat/etc/sysconfig/network-scripts/ifcfg-eth0|grepIPADDRIPADDR=192.168.1.1#
4. Configure the H
Tachyon is a killer Technology in the big data era and a technology that must be mastered in the big data era. With tachyon, distributed machines can share data based on the distributed memory file storage system built on tachyon. This is of extraordinary significance for Machine Collaboration, data sharing, and speed improvement of distributed systems; In this
/dependency, then, read the incremental operation records from continuing editlog (possibly multiple). The editlog content basically corresponds to some operations related to the tachyon File System Client, including file addition, deletion, and rename, add data blocks.
Note that the log records here do not include the actual file content data, but only meta information. Therefore, if the file content in the cache is lost, if it is not persistent, It
1. Modify the hadoop configuration file
1. Modify the core-site.xml File
Add the following attributes so that mapreduce jobs can use the tachyon file system as input and output.
2. Configure hadoop-env.sh
Add environment variables for the tachyon client jar package path at the beginning of the hadoop-env.sh file.
exportHADOOP_CLASSPATH=/usr/local/tachyon/client
because too many Java objects are stored in the memory. If the executor has too many cache objects in the JVM, for example, 80 GB up, how many times of full GC occurs at this time, and you will wonder why my program does not move? You can check the GC log. It was originally in GC.
3. Implement Fault Tolerant tachyon Cluster Based on zookeeper
3.0 prerequisites
Hadoop version: 2.2.0.2.0.6.0-101
Zookeeper
Tachyon.underfs.glusterfs.volumes
Null
Glusterfs volume names, such as Tachyon_vol
Tachyon.underfs.glusterfs.mapred.system.dir
Glusterfs:///mapred/system
Specify The data subdirectory of the MapReduce in Glusterfs as needed
Third, master configuration
Tachyon.master.journal.folder
$tachyon. Home + "/journal/"
Master Journal log Storage Directory
Tachyon.ma
no persistence, there is no binding related lineage information, then the specific contents of the corresponding file will be lost. After the file system information has been restored, Tachyon Master writes the current meta data to the new snapshot image before Tachyon Master officially starts the service. With Zookeepeer enabled, master standby periodically merges Editlog and creates an image of standby,
MapReduce Framework's data. In this case, it is generally necessary to complete the data exchange via disk, which is usually inefficient.When the Tachyon layer is introduced, the data exchange is actually in memory.problem2: The execution engine and the storage engine are the same processThis is the problem that has been mentioned earlier, allowing Spark to manage memory on its own. By default, the task execution ofSpark and the data itself are withi
interaction.
Master is the core. If it fails, it will have a fatal impact on the entire system. single point of failure (spof) is a problem that should be considered by every distributed framework. Zookeeper that applies and implements the paxos algorithm is a powerful tool to solve the consistency problem. HDFS, storm, and hbase all use zookeeper as the carrier of metadata information ha, and Tachyon is no exception.1 worker heartbeat 1.1 overall pr
through heartbeat. The Slave node receives the commands sent by the Master to complete an interaction.
Master is the core. If it fails, it will have a fatal impact on the entire system. single point of failure (spof) is a problem that should be considered by every distributed framework. Zookeeper that applies and implements the Paxos algorithm is a powerful tool to solve the consistency problem. HDFS, Storm, and HBase all use Zookeeper as the carrier of metadata information HA, and
Tachyon is a highly fault-tolerant Distributed file system that allows files to be reliably shared in the cluster framework at the speed of memory, just like Spark and MapReduce. By leveraging information inheritance, memory intrusion, Tachyon gains high performance. The Tachyon working set file is cached in memory and allows different jobs/queries and frameworks
What is Tachyon?Tachyon is a high-performance, fault-tolerant, memory-based, open-source distributed storage System with Java-like file APIs, a plug-in underlying filesystem, compatibility with Hadoop MapReduce, and Apache Spark. Tachyon provides cross-cluster file sharing services that provide memory-level speed for cluster frameworks such as Spark, MapReduce, a
Author: Liu Xuhui Raymond Reprint Please specify the source
Email:colorant at 163.com
blog:http://blog.csdn.net/colorant/
Tachyon is a memory-based distributed file system developed by Li Haoyuan of Amplab, and the starting point is an integral part of Bdas as a amplab.
Overall design ideas
From the Tachyon design goal, is to provide a memory-based distributed file-sharing framework, the need for fault-
hadoop2.2.0 jdk1.7 tachyon0.5.0 No zookeeperBoth Tachyon and Hadoop are pseudo-distributed patterns
1. Modify the Core-site.xml file2. Configure hadoop-env.shAdd an environment variable for the Tachyon client jar package path in the hadoop-env.sh fileExport hadoop_classpath=/home/hadoop/tachyon-0.5. 0-bin/client/target/
where is One of : Format [-S] formats Tachyon (if specified-s parameter, indicating that Underfs does not exist) Bootstrap-conf generates a configuration file if the TFS command line client does not exist loadufs load three, Tachyon command line operations on the existing underlying file system to Tachyon Runtest
Tachyonworker is responsible for interacting with tachyonmaster and sending messages directly through Apache thrift.Tachyonworker is mainly responsible for 3 parts, starting DataServer, Heartbeat, and thrift server,Where DataServer is responsible for the memory data and the user's read transmission, the current thrift support remote reading data, write data can only be written to local, and local read is optimized, local read data will not go to the s
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.