Commercial cloud storage services, such as Amazon S3 and Google Cloud Storage, offer highly available, scalable, and unlimited volume object storage services at affordable prices. To speed up the widespread adoption of these cloud products, these providers have developed a good developer ecosystem for their products through explicit APIs and SDKs. The cloud-based file system is a typical product in these active developer communities and has several open source implementations.
S3QL is one of the most popular open source cloud file systems. It is a FUSE file system that offers several commercial or open source cloud storage backend, such as Amazon S3, Google Cloud Storage, Rackspace cloudfiles, and OpenStack. As a full-featured file system, S3QL has a number of powerful features: maximum 2T file size, compression, UNIX properties, encryption, snapshots based on write, immutable trees, duplicate data deletion, and soft, hard link support, and so on. Any data written to the S3QL file system will be compressed and encrypted locally before being transferred to the back of the cloud. When you try to extract content from the S3ql file system, if they are not in the local cache, the corresponding objects will be downloaded from the cloud and then decrypted and decompressed instantly.
The need to be clear is that S3QL does have its limitations. For example, you can't mount the same S3fs file system on several different computers at the same time, and only one computer can access it. In addition, ACLs (access control lists) are not supported.
In this tutorial, I will describe how to configure an encrypted file system with S3QL based on Amazon S3. As a usage example, I'll also show you how to run the Rsync Backup tool on a mounted S3QL file system.
Preparatory work
This tutorial first requires you to create an Amazon AWS account (registration is free, but requires a valid credit card).
Then create an AWS access key (access key ID and secret access key), S3QL Use this information to access your AWS account.
It then accesses the AWS S3 via the AWS Admin panel and creates a new empty bucket for S3QL.
For best performance consideration, select a region that is geographically closest to you.
Install S3QL on Linux
There are precompiled S3QL packages in most Linux distributions.
For Debian, Ubuntu, or Linux Mint:
$ sudo apt-get install S3QL
For Fedora:
$ sudo yum install S3QL
For Arch Linux, use AUR.
First Configuration S3ql
Create the Autoinfo2 file in the ~/.S3QL directory, which is a default configuration file for S3QL. The information in this file includes the required AWS access KEY,S3 bucket name, and the encryption password. This encrypted password will be used to encrypt a randomly generated master key, and the master key will be used to actually encrypt the S3QL file system data.
$ mkdir ~/.s3ql$ VI ~/.s3ql/authinfo2
[S3]storage-url:s3://[bucket-name]backend-login: [Your-access-key-id]backend-password: [Your-secret-access-key] Fs-passphrase: [Your-encryption-passphrase]
The specified AWS S3 bucket needs to be created in advance via the AWS admin panel.
For security purposes, make the Authinfo2 file accessible only to you.
$ chmod ~/.s3ql/authinfo2
Creating the S3ql File system
Now you are ready to create a S3QL file system on the AWS S3.
Use the MKFS.S3QL tool to create a new S3QL file system. The bucket name in this command should match the one specified in the Authinfo2 file. Using the "--ssl" parameter will force SSL to connect to the back-end storage server. By default, the MKFS.S3QL command enables compression and encryption on the S3QL file system.
$ mkfs.s3ql S3://[bucket-name]--ssl
You will be asked to enter an encrypted password. Please enter the password you specified in the ~/.s3ql/autoinfo2 through "fs-passphrase".
If a new file system is created successfully, you will see this output:
Mount S3ql File System
When you create a S3QL file system, the next step is to mount it.
First create a local mount point and then use the MOUNT.S3QL command to mount the S3ql file system.
$ mkdir ~/mnt_s3ql$ mount.s3ql S3://[bucket-name] ~/mnt_s3ql
Mount a S3QL file system does not require privileged users, just make sure you have write permission to the mount point.
Depending on the situation, you can use the "--compress" parameter to specify a compression algorithm (such as LZMA, Bzip2, zlib). LZMA will be used by default in cases where it is not specified. Note If you specify a custom compression algorithm, it will only be applied to the newly created data object and will not affect the existing data object.
$ mount.s3ql--compress bzip2 S3://[bucket-name] ~/mnt_s3ql
For performance reasons, the S3ql file system maintains a local file cache that includes recently accessed (some or all) files. You can customize the size of the file cache by using the "--cachesize" and "--max-cache-entries" options.
Use the "--allow-other" option if you want users other than you to access a mounted S3QL file system.
If you want to export a mounted S3QL file system to another machine via NFS, use the "--nfs" option.
After running MOUNT.S3QL, check to see if the S3ql file system was successfully mounted:
$ DF ~/mnt_s3ql$ Mount | grep S3QL
Uninstalling the S3ql file system
To safely uninstall a S3QL file system that may contain uncommitted data, use the UMOUNT.S3QL command. It will wait for all data (including portions of the local file system cache) to be successfully transferred to the back-end server. Depending on how much data is waiting to be written, this process may take some time.
$ UMOUNT.S3QL ~/MNT_S3QL
View S3QL file system statistics and repair S3QL file system
To view S3QL file system statistics, you can use the S3qlstat command, which displays information such as total data, metadata size, duplicate file deletion rate, and compression rate.
$ s3qlstat ~/MNT_S3QL
You can use the FSCK.S3QL command to check and repair the S3ql file system. Like the fsck command, the file system to be checked must be uninstalled first.
$ FSCK.S3QL S3://[bucket-name]
S3QL use case: Rsync backup
Let me end this tutorial with a popular use case: A local file system backup. To do this, I recommend using the Rsync Incremental Backup tool, especially since S3QL provides a wrapper script (/usr/lib/s3ql/pcp.py) for rsync. This script allows you to recursively replicate the directory tree to the S3QL target using multiple rsync processes.
$/usr/lib/s3ql/pcp.py-h
The following command will use 4 concurrent rsync connections to back up all content in ~/documents to a S3QL file system.
$/usr/lib/s3ql/pcp.py-a--quiet--processes=4 ~/documents ~/mnt_s3ql
These files are first copied to the local file cache and then synchronized to the back-end server in the background.
To learn more about S3QL, such as automatic mounts, snapshots, and immutable trees, I strongly recommend reading the official User's Guide. Welcome to tell me what you think of S3QL and your experience with any other tool.