First, Introduction
Tachyon is a highly fault-tolerant Distributed file system that allows files to be reliably shared in the cluster framework at the speed of memory, just like Spark and MapReduce. By leveraging information inheritance, memory intrusion, Tachyon gains high performance. The Tachyon working set file is cached in memory and allows different jobs/queries and frameworks to access the cache file at the speed of memory. " As a result, tachyon can reduce the number of data sets that need to be used frequently by accessing the disk
II. System Structure
Tachyon organizes clusters in a common Master/worker, which is managed by the master node to maintain file system metadata, and file data is maintained in the worker node's memory
In terms of fault tolerance, the main technical points include:
The underlying file system that supports plugable, such as HDFs, is used for persistence of user-specified files
Persistent file system metadata using the journal mechanism
Using zookeeper to build the HA for master
Instead of using replica to replicate memory data, use the idea of lineage like spark rdd for disaster recovery
Third, the work process
1. Initializing the file system
Create and empty the working directory required for Master/worker, which includes the data/worker/journal directory on the underlying persistent file system for the master node. In fact, the worker directory here is used by the worker node (to hold a few zeros of persisted files, missing meta information blocks, etc.), but placed in the master node to create, essentially to simplify the creation of logic (because on the HDFs, Created only once) the directory that is required for the worker node is the local RAMDisk directory, and in the Master's Journal folder, a specific prefix is created for the file system to be formatted with an empty file.
2. Tachyon Master Start-up
First of all, of course, to read the master-related configuration parameters, are now passed through the-d parameter to Java, ideally through the configuration file to do. At present, some of these parameters are set in the Env file, and then through the-d parameter settings, and some directly to die in the-D parameter, but also the startup script is not configured by default, in the MASTERCONF code using the default value by reading a specific format file system to determine whether the filesystem is formatted.
In memory to reconstruct the file system information, Tachyon file system information depends on the journal log saved, Journal includes two parts, one is the meta information at some point in the snapshot image, and the second is the increment log. Tachyon Master starts by reading the file system meta information from the snapshot image file, including information about various data nodes (File/directory/raw table/checkpoint/dependencies, etc.), and then reads the incremental operation records from the continuation Editlog (possibly multiple) , the content of Editlog basically corresponds to some related operations of Tachyon file system client, including the addition, deletion, renaming, adding of data blocks and so on.
Note that the log records here do not include the actual file content data, only meta information, so if the contents of the cache file is lost, if there is no persistence, there is no binding related lineage information, then the specific contents of the corresponding file will be lost. After the file system information has been restored, Tachyon Master writes the current meta data to the new snapshot image before Tachyon Master officially starts the service. With Zookeepeer enabled, master standby periodically merges Editlog and creates an image of standby, and if there is no standby master, it is merged into the new image only during the boot process. Here multiple master concurrency operation Image Editlog, there is no lock or mutex mechanism, do not know whether there will be a competition conflict, data stale or lost problems.
3. File storage
Tachyon files stored on RAMDisk are divided by block (default 1G), and master assigns a blockid to each block, The worker stores the data corresponding to the block directly on the RAMDisk with Blockid as the actual file name.
4. Data Read and Write
Tachyon file read and write, as far as possible through the Javanio API to map files directly into memory, as a data stream for reading and writing operations, in order to avoid the use of large amounts of memory in the Java heap, thereby reducing the cost of GC and improve response speed.
During the read and write process, all meta-related information needs to be performed by calling Tachyonmaster via thrift exposed Serverapi, and the Tachyon file read operation supports both local and remote modes, from the client The API's perspective is transparent to the user.
Read the implementation of the file, its process is basically to get the corresponding file offset location of the block ID corresponding to the local worker to obtain the corresponding ID corresponding to the file name, if the file exists, the client code will notify the worker to lock the corresponding block, The client-side code directly maps the relevant files for Randomaccessfile to read directly, and does not read the actual data through the worker agent.
If there is no worker locally or the file does not exist on the local worker, the client code further obtains the worker corresponding to the block from the Master API, Then through the worker exposed to the DataServer interface to read the contents of the block, within the DataServer, the same continuation lock corresponding block, the process of mapping the file read and return the data to the client. In addition, based on the reading data when using the Tachyonfile API interface, if the use of FileStream interface, when the remote worker does not have a corresponding file block, Remoteblockinstream also attempts to read data from the underlying persisted file system layer (if one exists), while the Readbytebuffer interface does not have a corresponding process.
Tachyon currently only supports local write operations, where write operations can be divided into caches (written to the Tachyon memory file system) and through (written to the underlying persistent file system). The specific type is the legal combination of the above cases, such as single Cache,cache +through and so on. There is also an async pattern: Asynchronously writes to the underlying persistence file system, presumably to optimize the situation where data needs to be persisted, but also for sexual latency.
Iv. Why Tachyon
Now in order to demand a lot of framework for fast and use memory, but persistence is a necessary problem, so the bottleneck appears in the data security and disk I/O, Tachyon is to solve this caused by the
Question 1
Different jobs to share data need to read and write to the disk, the speed is often not ideal
Each job replicates the block-to-memory and the corresponding garbage collection that it needs to perform
Solve the above two problems
Multiple jobs need to use the same data, only need to copy once to Tachyon and the job does not need to garbage collection
Question 2
If the Excutor execution fails, the Excutor cache is lost
Solve:
If Excutor is hung, the cache can be reserved through Tachyon
Question 3
Sharing data between different frameworks is also required by reading and writing disk
Solve:
Tachyon is added to the data processing framework and the HDFS storage framework, and the files can be shared between different computational frameworks by using memory access speed.
Tachyon First Knowledge