A Golang-based distributed storage open source project

Source: Internet
Author: User
Tags posix save file
This is a creation in Article, where the information may have evolved or changed.

Project Address: https://code.google.com/p/weed-fs/

Weed-fs is a simple and high-performance distributed storage System with two goals:

1. Store massive files 2, quick access to the saved files

Weed-fs chose Key~file mapping to implement file addressing, rather than the mechanism that POSIX filesystem already has, which is a bit like a nosql system that you can call "Nofs"

WEED-FS implementation mechanism is to manage the volumes server, rather than in a central phase of the management of the meta-files, volumes server can manage files and their meta-files, this mechanism can greatly alleviate the pressure of the central node, The meta-file can be saved in the memory of the volumes server, and the file is automatically compressed by gzip, thus guaranteeing the file access speed

WEED-FS's theoretical model can be referenced in WEED-FS models after Facebook's Haystack design paper. Http://www.usenix.org/event/osdi10/tech/full_papers/Beaver.pdf

How to use

The master node of Weed-fs runs on port 9333 by default, while the volume node runs on port 8080, for example, you can start a master node and two volume nodes in the following ways, where all nodes run on localhost, But you can run on more than one machine.

Start the master node./weed master

Start the volume node

 weed volume -dir="/tmp" -volumes=0-4 -mserver="localhost:9333" -port=8080 -publicUrl="localhost:8080" & weed volume -dir="/tmp/data2" -volumes=5-7 -mserver="localhost:9333" -port=8081 -publicUrl="localhost:8081" &

Write a file

An example of a simple save file is Curl Http://localhost:9333/dir/assign {"FID": "3,01637037d6", "url": "127.0.0.1:8080", "Publicurl": " localhost:8080 "}

The first step is to send an HTTP GET request to get the file's FID and volume server's url:curl-f file=@/home/chris/myphoto.jpg http://127.0.0.1:8080/3,01637037d6 {" Size ": 43234}

The second step is to send an HTTP multipart POST request (url+ '/' +fid) to store the actual file

Save File ID

You can save the file fid, 3,01637037d6 to any database, 3 represents the volume server ID, which is an unsigned 32-bit integer number, and the 01 after the comma represents the file ID, which is an unsigned 64-bit integer, The last 637037d6 represents a file cookie, which is an unsigned 32-bit integer, and the secure Access file ID and cookie ID of the user-protected URL are hexadecimal encoded, and you can save the tuple in your own format, such as using FID as a string, theoretically you need to 8+1 +16+8=33 bytes bytes

Read file

Here's an example of how to access a URL based on

curl http://localhost:9333/dir/lookup?volumeId=3{"Url":"127.0.0.1:8080","PublicUrl":"localhost:8080"}

First, the URL of the volume server is queried through Volumeid, and as a general rule, the volume server will not be a lot and will not change very often, so you can cache the query results now you can load the required files from the volume server via a URL:/http Localhost:8080/3,01637037d6.jpg

Architecture

For most of the distributed storage System, the file is divided into many chunk, the central node to save the file name and Chunk index mapping, chunk index contains chunk server and chunk handler information, this way can not handle the efficient processing of a large number of small files, and access requests pass through the master node, and in high concurrency, the response is slow.

In Weed-fs, with volumes server management data, each volume size is 32GB, and can hold a large number of files, each storage node has multiple volume nodes, master node only need to manage volume metadata, While the actual file meta-file is stored in each volume, each meta-file size is bytes, all the file access can be processed in memory, and the hard disk operation is only the actual file read to start

Master Server and Volume server

The architecture is very simple, the actual data is stored in volumes, a volume server contains multiple volumes, can support both read and write operations, all volumes are managed by master server, and master server contains volume and volume Server mapping relationship, this static correspondence can be easily cached

For each write request, master server also generates a key for the file, because the write is not read so frequently, so a master server can handle a large number of requests

Read and write files

But the client makes a write request, the master server returns, after which the client wants the volume node to issue a POST request, to transfer the contents of the file in rest, when the client needs to read the file, it needs to be fetched to the master server or cache, and finally used public ur L Get Content

Storage size

In the current design, each volume can store 32GB of data, so the size of a single file is subject to the size of volume, but this capacity can be adjusted in the code

Memory storage

The meta-file information on all volume servers is stored in memory and does not need to be read from the hard disk, each meta-file is a 16-byte mapping table

Comparison of similar products

HDFs is characterized by the segmentation of large files, can be perfect to read and write large files Weedfs is biased towards small files, the pursuit of higher speed and concurrency capabilities

MogileFS has three layers of components: tracers, database, storage nodes. Weedfs has two layers of components: directory server, storage nodes. A single layer of components means: very slow access, very complex operations, and a higher probability of error

The Glusterfs is fully compatible with the POSIX specification, so the more complex Weedfs is only partially compatible with POSIX

Mongo's Gridfs uses MongoDB management separated chunks, each read-write request requires data query meta-file information, for a small number of requests is not a problem, but for high concurrency scenes, it is easy to hang out Weedfs volume management of actual data, Query tasks are spread across volume nodes, so it's easy to handle high-load scenarios

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.