Javaweb Project Architecture Fastdfs Distributed File System

Source: Internet
Author: User

Overview

Distributed filesystem: Distributed File System, DFS, also known as the network file system. A file system that allows files to be shared across multiple hosts on a network, allowing multiple users on multiple machines to share files and storage space.

Fastdfs is an open-source Distributed file system written in C, which takes into account such mechanisms as redundant backup, load balancing, linear expansion, and focuses on high availability, high performance indicators, including: file storage, file synchronization, file access (file upload, file download), etc. Solves the problem of mass storage and load balancing. Particularly suitable for small and medium-sized files (recommended scope: 4KB < file_size <500MB), to document-based online services, such as photo album sites, video sites and so on.

FASTDFS Architecture

The FASTDFS architecture includes tracker server and storage server. The client requests tracker server for file uploads, downloads, and tracker server scheduling, which eventually completes file uploads and downloads by storage server.

Trace Server Tracker Server

The main dispatching work, play a balanced role; responsible for the management of all storage server and group, each storage after the launch will connect Tracker, inform their own group and other information, and maintain a periodic heartbeat. Tracker Group==>[storage serverlist] mapping table based on storage heartbeat information.

Tracker need to manage the meta-information is very small, will be stored in memory, and the meta-information on the tracker is generated by storage reported information, itself does not need to persist any data, which makes tracker very easy to expand, Directly increase the tracker machine can be extended to tracker cluster to serve, cluster each tracker is fully equivalent, all tracker accept stroage heartbeat information, generate metadata information to provide read and write services.

Storage Server Storage Server

The main provision of capacity and backup services, in group, each group can have more than one storage server, the data is backed up by each other. Group-based storage facilitates application isolation, load balancing, and customization of replicas (the number of storage servers within a group is the number of copies of that group), such as storing different application data in different groups to isolate application data. At the same time, the application can be assigned to different group based on the access characteristics of the application to do load balancing, the disadvantage is that the capacity of group is limited by the capacity of single-machine storage, and when the group of machines is broken, data recovery can only rely on the group of other machines in the mainland, resulting in a long recovery time.

The storage of each storage in the group relies on the local file system, storage can be configured with multiple data storage directories, such as 10 disks, mounted in/data/disk1-/data/disk10, respectively, You can configure these 10 directories as storage data store directories. Storage when a write file request is received, one of the storage directories is selected according to the configured rules to store the file. In order to avoid too many files in a single directory, at the first start of the storage, a 2-level subdirectory is created in each data storage directory, each level 256, a total of 65,536 files, the new file will be hashed in a way to be routed to one of the subdirectories, The file data is then stored as a local file in the directory.

Storage policies for Fastdfs

To support large capacity, storage nodes (servers) are organized in a sub-volume (or grouping) manner. The storage system consists of one or more volumes, and the files between the volumes and volumes are independent of each other, and the cumulative file capacity of all volumes is the file capacity of the entire storage system. A volume can consist of one or more storage servers, and the files in a storage server under a volume are the same, and multiple storage servers in the volume play a role of redundant backup and load balancing.

When adding servers to a volume, synchronizing existing files is done automatically by the system, and after synchronization is complete, the system automatically switches the new server to the online service. You can add volumes dynamically when storage space is low or is about to be exhausted. You only need to add one or more servers and configure them as a new volume, which increases the capacity of the storage system.

Upload process for Fastdfs

FASTDFS provides users with basic file access interfaces, such as upload, download, append, delete, etc., which are provided to the user in the form of client libraries.

Storage server periodically sends its own storage information to the tracker server. When there is more than one tracker server in the tracker server cluster, the relationship between the tracker is peer-on, so the client can select any tracker when uploading.

When tracker receives a request from the client to upload a file, the file is assigned a group that can store the file, and when group is selected, it is decided to assign the client to which storage server in the group. When storage server is assigned, the client sends a write file request to storage, and storage assigns a data store directory to the file. Then assign a fileid to the file, and finally generate a file name to store files based on the information above.

Select Tracker Server

When there is more than one tracker server in the cluster, the client can select any trakcer when upload files because tracker is a fully peer relationship.

Select the stored group

When tracker receives a request for upload file, it assigns a group that can store the file, supporting the following rules for selecting Group: 1. Round Robin, all the group polling 2. Specified Group, specify a certain group 3. Load balance, excess storage space Group priority

Select Storage Server

When group is selected, tracker selects a storage server within the group to the client, supporting the following rules for selecting storage: 1. Round Robin, poll 2 across all storage within the group. First server ordered by IP, sorted by IP 3. First server ordered by priority, prioritized (priority is configured on storage)

Select storage Path

When storage server is assigned, the client sends a write file request to storage, storage will assign a data store directory to the file and support the following rules: 1. Round Robin, poll 2 between multiple storage directories. Most of the remaining storage space takes precedence

Generate Fileid

After the storage directory is selected, storage creates a Fileid for the file, which is made up of storage server IP, file creation time, file size, file Crc32, and a random number, and then the binary string is Base64 encoded and converted to a printable string.

Select Level Two directory

When the storage directory is selected, storage will assign a Fileid to the file, each storage directory has a sub-directory of level two 256*256, storage will be a file Fileid two hash (guess), routed to one of the subdirectories, The file is then stored in Fileid as the filename in this subdirectory.

Generate file name

When the file is stored in a subdirectory, the file is considered to be stored successfully, then a file name is generated for the files, which is made up of group, storage directory, level two subdirectory, Fileid, file suffix name (specified by the client, mainly used for distinguishing file types).

Fastdfs file synchronization

When a file is written, the client writes the file to a storage server within the group that the file is considered successful, and after the storage server finishes writing the file, the file is synchronized by the background thread to the other storage servers in the same group.

Each storage write a file, at the same time will write a copy of the Binlog,binlog does not include the file data, only the file name and other meta-information, the Binlog for background synchronization, storage will record the progress of other storage synchronization in the group, So that the last progress can be resumed after the restart, the progress is recorded in a timestamp, so it is best to keep the clocks of all servers in the cluster in sync.

Storage synchronization progress will be reported to the tracker as part of the metadata, Tracke will use synchronization progress as a reference when choosing a read storage.

For example, a group has a, B, c three storage server,a to C synchronization to a progress of T1 (T1 previously written files have been synchronized to B), B to C synchronization to a timestamp of T2 (T2 > T1), Tracker received these synchronization progress information, will be organized, the smallest one as the synchronization timestamp C, in this case, T1 is the synchronization timestamp of C is T1 (that is, all T1 previously written data has been synchronized to c); Similarly, according to the above rules, Tracker generates a synchronization timestamp for a and B.

File download for Fastdfs

After the client UploadFile succeeds, it will get a file name generated by the storage, and the client can then access the file based on the file name.

As with upload file, the client can select any tracker server when downloadfile. Tracker send download request to a tracker, must bring the file name information, Tracke from the file name to resolve the group, size, creation time and other information, and then select a storage for the request to service read requests.

Fastdfs Performance Solutions

FASTDFS Installation
Package version
Fastdfs v5.05
Libfastcommon v1.0.7
Download and install Libfastcommon
    • Download
wget https://github.com/happyfish100/libfastcommon/archive/V1.0.7.tar.gz
    • Extract
tar -xvf V1.0.7.tar.gzcd libfastcommon-1.0.7
    • Compiling, installing
./make.sh./make.sh install
    • Create a soft link
Download and install Fastdfs
    • Download Fastdfs
 wget https://github.com/happyfish100/fastdfs/archive/V5.05.tar.gz
    • Extract
tar -xvf V5.05.tar.gzcd fastdfs-5.05
    • Compiling, installing
./make.sh./make.sh install
Configuring the Tracker Service

After successful installation, there will be a Fdfs directory in the/etc/directory to enter it. You will see three. sample suffix files, this is the sample file that the author gave us, and we need to change the Tracker.conf.sample file to a tracker.conf configuration file and modify it:

cp tracker.conf.sample tracker.confvi tracker.conf

Edit tracker.conf

# 配置文件是否不生效,false 为生效disabled=false# 提供服务的端口port=22122# Tracker 数据和日志目录地址base_path=//home/data/fastdfs# HTTP 服务端口http.server_port=80

Create the tracker base data directory, which is the base_path corresponding directory

mkdir -p /home/data/fastdfs

Use Ln-s to create a soft link

ln -s /usr/bin/fdfs_trackerd /usr/local/binln -s /usr/bin/stop.sh /usr/local/binln -s /usr/bin/restart.sh /usr/local/bin

Start the service

service fdfs_trackerd start

View Monitoring

netstat -unltp|grep fdfs

If you see the 22122 port is listening properly, this time indicates that the tracker service started successfully!

Tracker Server directory and file structure
After the tracker service starts successfully, the data, logs two directories are created under Base_path. The directory structure is as follows:

${base_path}  |__data  |   |__storage_groups.dat:存储分组信息  |   |__storage_servers.dat:存储服务器列表  |__logs  |   
Configuring the Storage Service

Enter the/etc/fdfs directory, copy the FASTDFS memory sample configuration file Storage.conf.sample, and rename it to storage.conf

# cd /etc/fdfs# cp storage.conf.sample storage.conf# vi storage.conf

Edit storage.conf

# 配置文件是否不生效,false 为生效disabled=false# 指定此 storage server 所在 组(卷)group_name=group1# storage server 服务端口port=23000# 心跳间隔时间,单位为秒 (这里是指主动向 tracker server 发送心跳)heart_beat_interval=30# Storage 数据和日志目录地址(根目录必须存在,子目录会自动生成)base_path=/home/data/fastdfs/storage# 存放文件时 storage server 支持多个路径。这里配置存放文件的基路径数目,通常只配一个目录。store_path_count=1# 逐一配置 store_path_count 个路径,索引号基于 0。# 如果不配置 store_path0,那它就和 base_path 对应的路径一样。store_path0=/home/data/fastdfs/storage# FastDFS 存储文件时,采用了两级目录。这里配置存放文件的目录个数。 # 如果本参数只为 N(如: 256),那么 storage server 在初次运行时,会在 store_path 下自动创建 N * N 个存放文件的子目录。subdir_count_per_path=256# tracker_server 的列表 ,会主动连接 tracker_server# 有多个 tracker server 时,每个 tracker server 写一行tracker_server=192.168.1.190:22122# 允许系统同步的时间段 (默认是全天) 。一般用于避免高峰同步产生一些问题而设定。sync_start_time=00:00sync_end_time=23:59

Use Ln-s to create a soft link

ln -s /usr/bin/fdfs_storaged /usr/local/bin

Start the service

service fdfs_storaged start

View Monitoring

netstat -unltp|grep fdfs

Make sure the tracker is started before starting storage. For the first successful launch, data and logs two directories will be created in the/home/data/fastdfs/storage directory. If you see the 23000 port is listening properly, this time indicates that the storage service started successfully!

See if storage and tracker are in communication

/usr/bin/fdfs_monitor /etc/fdfs/storage.conf
Fastdfs Configuring Nginx Modules
Package version
Openresty v1.13.6.1
Fastdfs-nginx-module v1.1.6

Fastdfs the file is stored on the Storage server through the Tracker server, but file replication is required between the same set of storage servers and there is a problem of synchronization latency.

Assuming the Tracker server uploads the file to 192.168.1.190, the file ID has been returned to the client after the upload is successful. At this point the FASTDFS storage cluster mechanism synchronizes this file to the same set of 192.168.1.190, if the file is not yet copied, the client will be unable to access the file if the file is 192.168.1.190. Instead, Fastdfs-nginx-module can redirect files to the source server to fetch files, preventing the client from being unable to access errors due to replication delays.

Download and install Nginx and Fastdfs-nginx-module:

It is recommended that you use Yum to install the following development libraries:

yum install readline-devel pcre-devel openssl-devel -y

Download the latest version and unzip:

wget https://openresty.org/download/openresty-1.13.6.1.tar.gztar -xvf openresty-1.13.6.1.tar.gzwget https://github.com/happyfish100/fastdfs-nginx-module/archive/master.zipunzip master.zip

To configure Nginx installation, add the Fastdfs-nginx-module module:

./configure --add-module=../fastdfs-nginx-module-master/src/

Compile, install:

make && make install

To view Nginx modules:

/usr/local/openresty/nginx/sbin/nginx -v

The following is a description of the success of the Add module

Copy the configuration file from the Fastdfs-nginx-module source to the/etc/fdfs directory and modify:

cp /fastdfs-nginx-module/src/mod_fastdfs.conf /etc/fdfs/
# 连接超时时间connect_timeout=10# Tracker Servertracker_server=192.168.1.190:22122# StorageServer 默认端口storage_server_port=23000# 如果文件ID的uri中包含/group**,则要设置为trueurl_have_group_name = true# Storage 配置的store_path0路径,必须和storage.conf中的一致store_path0=/home/data/fastdfs/storage

Copy some of the Fastdfs configuration files to the/etc/fdfs directory:

cp /fastdfs-nginx-module/src/http.conf /etc/fdfs/cp /fastdfs-nginx-module/src/mime.types /etc/fdfs/

Configure Nginx, modify nginx.conf:

location ~/group([0-9])/M00 {    ngx_fastdfs_module;}

Start Nginx:

[[email protected] sbin]# ./nginxngx_http_fastdfs_set pid=9236

Test upload:

[[email protected] fdfs]# /usr/bin/fdfs_upload_file /etc/fdfs/client.conf /etc/fdfs/4.jpggroup1/M00/00/00/rBD8EFqVACuAI9mcAAC_ornlYSU088.jpg

Deployment structure diagram:

JAVA Client Integration

Pom.xml introduced:

<!-- fastdfs --><dependency>    <groupId>org.csource</groupId>    <artifactId>fastdfs-client-java</artifactId>    <version>1.27</version></dependency>

FDFS_CLIENT.CONF configuration:

#连接tracker服务器超时时长connect_timeout = 2  

Fastdfsclient Upload class:

public class fastdfsclient{private static final String Config_filename = "D:\\itstyle\\src\\main\\resources\\fdfs_clie    Nt.conf ";    private static final String group_name = "Market1";    Private trackerclient trackerclient = null;    Private Trackerserver trackerserver = null;    Private Storageserver storageserver = null;    Private storageclient storageclient = null;        static{try {clientglobal.init (config_filename);        } catch (IOException e) {e.printstacktrace ();        } catch (MyException e) {e.printstacktrace ();        }} public Fastdfsclient () throws Exception {trackerclient = new trackerclient (clientglobal.g_tracker_group);       Trackerserver = Trackerclient.getconnection ();       Storageserver = Trackerclient.getstorestorage (trackerserver);;    Storageclient = new Storageclient (Trackerserver, storageserver); }/** * Upload file * @param file Object * @param filename * @return * */PubliC string[] UploadFile (file file, String fileName) {return uploadfile (file,filename,null); /** * Upload file * @param file Object * @param filename * @param metalist File metadata * @return */P  Ublic string[] UploadFile (file file, String fileName, map<string,string> metalist) {try {byte[]            Buff = Ioutils.tobytearray (new FileInputStream (file));            namevaluepair[] namevaluepairs = null;                if (metalist! = null) {namevaluepairs = new namevaluepair[metalist.size ()];                int index = 0; for (iterator<map.entry<string,string>> Iterator = Metalist.entryset (). Iterator (); Iterator.hasnext ();)                    {map.entry<string,string> Entry = Iterator.next ();                    String name = Entry.getkey ();                    String value = Entry.getvalue ();                namevaluepairs[index++] = new Namevaluepair (name,value); }            }           Return Storageclient.upload_file (Group_name,buff,filename,namevaluepairs);        } catch (Exception e) {e.printstacktrace ();    } return null; /** * Get File metadata * @param fileId file ID * @return */public map<string,string> Getfilemetadata (St Ring groupname,string fileId) {try {namevaluepair[] Metalist = Storageclient.get_metadata (groupname,fi            LEID);                if (metalist! = null) {hashmap<string,string> map = new hashmap<string, string> ();                for (Namevaluepair metaitem:metalist) {map.put (Metaitem.getname (), Metaitem.getvalue ());            } return map;        }} catch (Exception e) {e.printstacktrace ();    } return null; }/** * Delete file * @param fileId file ID * @return Delete failed return-1, otherwise 0 */public int deletefile (String groupname , String fileId) {try {return Storageclient.delete_file (Groupname,fileid);        } catch (Exception e) {e.printstacktrace ();    } return-1; /** * Download File * @param fileId file ID (ID returned after successful upload) * @param outFile File Download Save location * @return */Public I        NT DownloadFile (String groupname,string fileId, File outFile) {FileOutputStream fos = null;            try {byte[] content = Storageclient.download_file (Groupname,fileid);            FOS = new FileOutputStream (outFile);             InputStream ips = new Bytearrayinputstream (content);            Ioutils.copy (Ips,fos);        return 0;        } catch (Exception e) {e.printstacktrace ();                } finally {if (FOS! = null) {try {fos.close ();                } catch (IOException e) {e.printstacktrace ();    }}} return-1; } public static void Main (string[] args) throws Exception {FASTDFSclient client = new Fastdfsclient ();        File File = new file ("D:\\23456.png");        String[] result = client.uploadfile (file, "PNG");        System.out.println (result.length);        System.out.println (Result[0]);    System.out.println (result[1]); }}

Perform the Main method test return:

2group1M00/00/00/rBD8EFqTrNyAWyAkAAKCRJfpzAQ227.png

Source: Https://gitee.com/52itstyle/spring-boot-fastdfs

Xiao Qi

Source: https://blog.52itstyle.com

Sharing is happy, but also witnessed the personal growth process, the article is mostly work experience summary and usually learn to accumulate, based on their own cognitive deficiencies are unavoidable, also please correct, common progress.

Javaweb Project Architecture Fastdfs Distributed File System

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.