A brief introduction to high-performance, high-fault-tolerant, memory-based, open-source distributed storage Systems Tachyon

Last Update:2015-04-03 Source: Internet

Author: User

Tags apache mesos hadoop mapreduce

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

What is Tachyon?

Tachyon is a high-performance, fault-tolerant, memory-based, open-source distributed storage System with Java-like file APIs, a plug-in underlying filesystem, compatibility with Hadoop MapReduce, and Apache Spark. Tachyon provides cross-cluster file sharing services that provide memory-level speed for cluster frameworks such as Spark, MapReduce, and so on. Tachyon makes full use of the generation (lineage) information between memory and file objects, so it is fast and official claims to be up to 300 times times higher than HDFs throughput. At present, many companies (such as pivotal, EMC, Red Hat, etc.) are already using Tachyon, and more than 60 contributors from 20 organizations or companies (such as Yahoo, UK, Red Hat, etc.) are contributing their code. Tachyon is a storage layer for the UC Berkeley data Analysis Stack (Bdas), and it is also a Fedroa operating system with its own application.

The important characteristics of Tachyon are as follows:

The native API of the

Class Java file Api:tachyon is very similar to the Java file class, provides the InputStream and OutputStream interfaces, and supports memory-mapped IO;
The filesystem interface of Hadoop is implemented with MapReduce and Spark:tachyon, so mapreduce and spark can use tachyon without any modification;
Plug-in low-level filesystem: Tachyon is based on Hadoop and rebuilds the Hadoop platform from the ground up. The Tachyon has a universal interface that facilitates access to different underlying file systems. Currently supported file systems include HDFs, S3, GlusterFS, a single node local file system, and other file system support will soon be implemented. The
supports native raw tables: Tachyon provides local support for multiple columns of data, and provides selections to determine whether hot columns are put into memory to save space;
Browse the file system Web interface: Users can browse the file system through a Web interface, especially in debug mode, Administrators can also view details of each file, such as file location, checkpoint (Checkpoint) path, and so on;
supports command-line interaction: Users can use the command "./bin/tachyon TFS" to interact with Tachyon, such as copying data to the file system and replicating data from the file system;
High fault tolerance: Tachyon has a good fault tolerance mechanism, both master and worker have their own fault-tolerant methods. Master uses zookeeper for fault tolerance, the metadata saved in master uses journal for fault Tolerance, and master monitors the status of individual workers to automatically restart the worker when the worker fails. For specific file data, Tachyon uses generational relationships for fault tolerance. The
Tachyon uses the Master-worker mode, and the running Tachyon system consists of a master and multiple workers. Tachyon Master manages metadata information for all files and also monitors the status of each Tachyon worker. In order to efficiently manage files, Tachyon files are organized in memory by block. File and block information is saved on the master side, and each worker is stored and managed in blocks.

Tachyon was born in the Amplab of UC Berkeley, was started by the lab's computer at PhD Li Haoyuan, and was published based on the Apache License 2.0 Open Source protocol, which is hosted on GitHub and is currently the latest version of 0.6.1. Last year 10, Li Haoyuan in an interview with Infoq, said:

In the long run, they will treat Tachyon like Apache Mesos and Apache Spark, Tachyon will also enter the Apache Software Foundation, where more developers are welcome to join.

Tachyon won a $7.5 million a-round investment in Silicon Valley's VC a16z, the Wall Street Journal told the News recently. Amplab's project also includes an open-source cluster computing environment similar to Hadoop with a memory-distributed dataset enabled Spark, a SQL query language similar to key-value storage PIQL, a distributed system-based machine learning system Mlbase, Multi-core and large-scale SMP system operating system Akaros, low-latency computing cluster Scheduling system sparrow and so on. In addition, the Tachyon website also provides relevant documentation, such as user documentation, developer documentation, etc. For more information about Tachyon, readers can check in on their website or on the wiki page provided by GitHub.

Source:http://www.infoq.com/cn/news/2015/03/tachyon-distributed--system

A brief introduction to high-performance, high-fault-tolerant, memory-based, open-source distributed storage Systems Tachyon

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More