Windows 8.1 Data deduplication-Planning for Deployment (II)

Last Update:2015-04-21 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I. Planning the deployment goals
Data deduplication for Windows 8.1&server 2012 is designed to be installed on the primary data volume without adding any additional dedicated hardware. This means that you can install and use the feature without affecting the primary workload on the server. The default is non-invasive because they allow the data "lifetime" to reach five days before a particular file is processed, and the default minimum file size is up to KB. The implementation is designed for low memory and CPU utilization. If memory utilization becomes high, the Deduplication feature waits for available resources. Administrators can plan for more aggressive deduplication based on the type of data involved and the frequency and amount of changes to the volume or specific file type.
the ideal workloads for data deduplication include :
General file sharing : Group content Publishing and sharing, user home folder, and profile redirection (offline Files)
software deployment sharing : Software binaries, images, and updates
VHD library : Virtual hard disk (VHD) file storage to be set up to hypervisor

Ii. identify the volumes that are candidates for data deduplication
Deduplication can be good for optimizing storage and reducing disk space consumption: up to 50% to 90% less space is used if appropriate data is applied. When you select data to be deduplicated, you should evaluate the following three-point considerations:
1. is there a duplication in the data ?

File shares or servers, software deployment binaries, or virtual hard disk files that host user documents often have large amounts of duplicate data, so deduplication saves a lot of space. The following table shows the typical data deduplication savings for various content types. The results will vary depending on the data type, mix, and size situation. Before you enable deduplication, it is a good idea to evaluate data samples first.

650) this.width=650; "title=" 2015-04-20_001.png "alt=" wkiol1u1cxlbyi77aacz2c_anrs203.jpg "src=" http://s3.51cto.com /wyfs02/m02/6b/b6/wkiol1u1cxlbyi77aacz2c_anrs203.jpg "/>

2. does the data access mode allow enough time for data deduplication

Files that are frequently changed or accessed constantly by users or applications are not a good candidate for data deduplication. Constantly accessing and changing data may counteract any optimizations that result from data deduplication, and deduplication may not be able to process those files.

Good candidates for data deduplication include file shares, virtual files, or software deployment files that host user documents that contain data that is infrequently modified but often read.
The bad candidates for data deduplication include SQL Server databases that constantly mount and run virtual machines, as well as live Exchange server databases.

Good candidates have enough time to delete duplicate data from the file. You can apply a file retention policy to control the time that data is duplicated in a file, in case of premature or frequent data deduplication for files that are still likely to be heavily modified.

3. does the server have sufficient resources and time to run data deduplication

Data deduplication requires reading, processing, and writing large amounts of data. This consumes server resources and must be considered when planning your deployment. Servers usually have peak activity periods, and there is a time when resource utilization is low. Data deduplication can do most of the work when resources are idle. Servers that are always running at maximum capacity are not a good candidate for data deduplication, even if the deduplication process can use a background optimization job to optimize some files.

Based on observed savings and typical resource utilization, the level of candidate assessment for data deduplication deployment is as follows:

Excellent deduplication candidate : Folder Redirection server, virtualization Depot or provisioning library, software deployment sharing, SQL server and Exchange Server backup volumes;
should be evaluated based on line-of- business servers, static content providers, WEB servers, high-performance computing (HPC);
poor candidates for deduplication : Hyper-V hosts, VDI vhd;wsus; servers running SQL Server or Exchange server; files that are close to or greater than 1 TB in size;

Third, Data deduplication Server and volume requirements
server:
for Data deduplication server requirements, see the following list:
server hardware should meet running Windows Minimum requirements for 8.1&server 2012 Data deduplication is designed to support a minimum configuration, such as a single-processor system with 4 GB of RAM and a SATA hard drive.
If you plan to support data deduplication on multiple volumes on the same server, you will need to plan the appropriate size for the system to ensure that the system can process the data. Typically, the server requires 1 CPU cores and free memory to run the deduplication job on a single volume, which can process approximately GB of data per hour, or about 2 terabytes of data can be processed per day. The data deduplication feature uses an additional CPU core processor and available memory to scale to allow parallel processing of multiple volumes. The
Deduplication feature supports up to 90 volumes at a time, but this feature allows for an additional volume to be processed in addition to allowing each physical CPU core processor to process one volume at a time. Hyper-Threading does not affect this because only the physical kernel processor is available for processing volumes. Systems with 16 CPU cores and 90 volumes will process 17 volumes at a time until all 90 volumes have been fully processed, provided there is sufficient memory. The
Virtual server instance should follow the same guidelines as the physical hardware for server resources.
Volume:
Volumes that are candidates for Deduplication must meet the following requirements:
cannot be a system volume or a boot volume. Data deduplication is not supported on operating system volumes. The
can be partitioned by master boot record (MBR) or GUID partition table (GPT) and must be formatted with the NTFS file system. The
can reside on shared storage, such as storage using Fibre Channel or SAS arrays, or when the ISCSI SAN and Windows failover clusters are fully supported. The
does not rely on cluster shared volumes (CSV). If you convert a volume that supports deduplication to CSV, you can access the data, but you cannot continue the deduplication processing of the file. The
does not rely on Microsoft recovery file System (ReFS). The
must be exposed to the operating system as a non-removable drive. Remote mapped drives are not supported.
Notes
Files with extended properties, encrypted files, files that are less than a. KB, and reparse point files are not processed by the data deduplication feature.

Windows 8.1 enables data deduplication features:

1. Prepare the CAB file (because the official always prompts you to visit the page error!) Recommended I download location: http://pan.baidu.com/s/1o6xEI9s This example is subject to the F-disk;

microsoft-windows-dedup-package~31bf3856ad364e35~amd64~~6.3.9600.16384
microsoft-windows-dedup-package~31bf3856ad364e35~amd64~en-us~6.3.9600.16384
microsoft-windows-fileserver-package~31bf3856ad364e35~amd64~~6.3.9600.16384
microsoft-windows-fileserver-package~31bf3856ad364e35~amd64~en-us~6.3.9600.16384
microsoft-windows-vdsinterop-package~31bf3856ad364e35~amd64~~6.3.9600.16384
microsoft-windows-vdsinterop-package~31bf3856ad364e35~amd64~en-us~6.3.9600.16384

650) this.width=650; "title=" 1.png "alt=" wkiom1u1dgsggngxaamzdz13pyw360.jpg "src=" http://s3.51cto.com/wyfs02/M02/ 6b/ba/wkiom1u1dgsggngxaamzdz13pyw360.jpg "/>

2. Execute the DISM command under the Administrator DOS command:

d ism/online/add-package/packagepath:microsoft-windows-vdsinterop-package~31bf3856ad364e35~amd64~~ 6.3.9600.16384.cab/packagepath:microsoft-windows-vdsinterop-package~31bf3856ad364e35~amd64~en-us~ 6.3.9600.16384.cab/packagepath:microsoft-windows-fileserver-package~31bf3856ad364e35~amd64~~6.3.9600.16384.cab /packagepath:microsoft-windows-fileserver-package~31bf3856ad364e35~amd64~en-us~6.3.9600.16384.cab/packagepath: Microsoft-windows-dedup-package~31bf3856ad364e35~amd64~~6.3.9600.16384.cab/packagepath: Microsoft-windows-dedup-package~31bf3856ad364e35~amd64~en-us~6.3.9600.16384.cab
dism/online/enable-feature/featurename:dedup-core/all

650) this.width=650; "title=" 2.png "alt=" wkiol1u1dzvwwav7aag4_whhi6w788.jpg "src=" http://s3.51cto.com/wyfs02/M02/ 6b/b6/wkiol1u1dzvwwav7aag4_whhi6w788.jpg "/>

3. Enable data deduplication using Windows PowerShell:

1), to enable data deduplication on the volume, run the following Windows PowerShell commands on the server. In this example, Data deduplication is enabled on volume F.

Psc:\> Enable-dedupvolume F:

2), as an alternative, use the following command to set the minimum number of days that the file should be retained before it is deduplicated.

Psc:\> set-dedupvolume F:-minimumfileagedays 10

3), View the data deduplication-enabled volumes :

Psc:\> get-dedupvolume \ Return summary information

Psc:\> Get-dedupvolume | Format-list \ \ Returns more information about volume deduplication settings

650) this.width=650; "title=" 2015-04-20_005.png "alt=" wkiol1u1g0ar5ryhaaxv_sxo_r4068.jpg "src=" http://s3.51cto.com /wyfs02/m02/6b/b6/wkiol1u1g0ar5ryhaaxv_sxo_r4068.jpg "/>

Note: If Minimumfileagedays is set to 0, data deduplication will process all files, regardless of how long they are retained. This is appropriate for a test environment where you want to maximize deduplication. However, in a production environment, it is best to wait a few days (the default is 5 days), because files tend to change a lot in a short period of time before the change rate slows down. This allows for the most efficient use of server resources.

4. Set up data deduplication optimization jobs:

The deduplication feature has built-in jobs that automatically start and optimize specified volumes on a regular basis. The optimization job removes duplicate data on the volume based on policy settings and compresses the file chunks. After the initial optimization is complete, the optimization job runs on the files included in the policy based on the job schedule you configured or the default job schedule that came with the product.

Start-dedupjob The cmdlet triggers the optimization job :

psc:\> start-dedupjob–volume f:–type optimization \ \ Immediate return job asynchronous start

psc:\> start-dedupjob f:–type optimization-wait \ \ Finish job asynchronous start later

Get-dedupjob Cmdlet queries the progress of the job :

Psc:\> Get-dedupjob

Get-dedupjob Displays jobs that are currently running or queued for running

The Get-dedupstatus cmdlet query includes key stats for the savings that are obtained on the volume:

Psc:\> Get-dedupstatus | F1

650) this.width=650; "title=" 2015-04-20_006.png "alt=" wkiom1u1g4gsy1tnaav0sxntdli908.jpg "src=" http://s3.51cto.com /wyfs02/m00/6b/ba/wkiom1u1g4gsy1tnaav0sxntdli908.jpg "/>

Get-dedupstatus shows available space, space saved, optimized files, Inpolicyfiles(based on defined file retention time, size, type, and location criteria, number of files that belong to the volume deduplication policy), and related The drive identifier

Data Cleanup jobs:

psc:\> start-dedupjob f:–type scrubbing \ \ This creates a job that attempts to repair all records in the data deduplication internal corruption log (the I/O is logged to the deduplication file) for damage Bad

The psc:\> start-dedupjob f:–type scrubbing-full \\-full parameter cleans up the entire deleted duplicate data collection and looks for any corruption that causes data access to fail;

650) this.width=650; "title=" 2015-04-21_001.png "src=" http://s3.51cto.com/wyfs02/M00/6B/B6/ Wkiol1u1kckquvaiaatiezyufmc288.jpg "alt=" Wkiol1u1kckquvaiaatiezyufmc288.jpg "/>

Garbage Collection Jobs:

PS c:\> start-dedupjob f:–type garbagecollection \ \ Delete the inaccessible chunks and compress the container with more than 5% data that is not accessed;

PS c:\> start-dedupjob f:–type garbagecollection-full \ \ Compresses all containers to the maximum possible

Relevant information refer to the official website. I wish you a happy life!

This article is from the "Heard" blog, please make sure to keep this source http://wenzhongxiang.blog.51cto.com/6370734/1636286

Windows 8.1 Data deduplication-Planning for Deployment (II)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Windows 8.1 Data deduplication-Planning for Deployment (II)

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support