Determine correct system storage settings for compression, deduplication, erasure coding and validate expected savings
I covered Compression, Deduplication, and Erasure Coding in the NCP Study Guide. However, there’s always more to learn!
Compression
Workloads Recommended for Compression
- Almost all
Workloads Not Ideal for Compression
- Encrypted datasets
- Already compressed datasets (for example, images, audio, or video)
Deduplication
Workloads Recommended for Deduplication
- Base images (cache)—you can manually fingerprint them using vdisk_manipulator
- P2V and V2V when using Hyper-V (ODX uses a full data copy)
- Cross-container clones (not usually recommended because single containers are preferred)
Workloads Not Ideal for Deduplication
- Anything outside the recommendations above. In most cases, compression yields the highest capacity savings and should be used instead.
Erasure Coding
EC-X pairs perfectly with inline compression; you can safely enable them together for maximum efficiency.
The savings from the erasure coding feature depends on the cluster size and coldness of the data.
Consider a 6-node cluster configured with redundancy factor 2. A strip size of 5 is possible: 4 nodes for data and 1 node for parity. Data and parity comprising the erasure coded strip leaves one node in the cluster to ensure that if a node failure occurs, a node is available for rebuild. If you use a strip of (4, 1), the overhead is 25% (1 for parity and 4 for data). Without erasure coding, the overhead is 100%.
So what savings can be attributed to EC-X?
Replication Factor of 2 (RF2) allows the utilization of about 50% of raw storage capacity. EC-X can take this utilization to 80%.
Workloads Recommended for Erasure Coding
- Write once, read many (WORM) workloads
- Backups
- Archives
- File servers
- Log servers
- Email (depending on usage)
Workloads Not Ideal for Erasure Coding
- Anything write- or overwrite-intensive
- VDI