Determine what capacity optimization method(s) should be used based on a given workload
Erasure Coding
- Similar to RAID where parity is calculated, EC encodes a strip of data blocks on different nodes to calculate parity
- In event of failure, parity used to calculate missing data blocks (decoding)
- Data block is an extent group, and each block is on a different node belonging to a different vDisk
- Configurable based on failures to tolerate data blocks/parity blocks
EC Strip Size:
- Ex. RF2 = N+1
- 3 or 4 data blocks + 1 parity strip = 3/1 or 4/1
- Ex. RF3 = N+2
- 3 or 4 data blocks + 2 parity strips = 3/2 or 4/2
Overhead:
- Recommended to have cluster size which is at least 1 more node than combined strip size (data + parity)
- Allows for rebuilding in event of failure
- Ex. 4/1 strip would have 6 nodes
- Encoding is done post-process leveraging Curator MapReduce framework
data:image/s3,"s3://crabby-images/60004/60004a44ca6842d894c35f52893c7103aac6f538" alt="Nutanix Capacity Optimization Methods"
- When Curator scan runs, it finds eligible extent groups to be encoded.
- Must be “write-cold” = haven’t been written to > 1 hour
- Tasks are distributed/throttled via Chronos
data:image/s3,"s3://crabby-images/8d0bf/8d0bfc75c08407b30eacd94a1779370e0fc76c77" alt="Nutanix Capacity Optimization Methods"
data:image/s3,"s3://crabby-images/040cd/040cd9320e9e35c6e258d9559f9b85d7076ecf49" alt="Nutanix Capacity Optimization Methods"
- EC pairs well with Inline Compression
Compression
Capacity Optimization Engineer (COE) performs data transformations to increase data efficiency on disk
Inline
- Sequential streams of data or large I/O in memory before written to disk
- Random I/O’s are written uncompressed to OpLog, coalesced, and then compressed in memory before being written to Extent Store
- Leverages Google Snappy compression library
- For Inline Compression, set the Compression Delay to “0” in minutes.
data:image/s3,"s3://crabby-images/98891/988910d899110ac2dbf17f3596110c577486fcef" alt="Nutanix Capacity Optimization Methods"
Offline
- New write I/O written in uncompressed state following normal I/O path
- After compression delay is met, data = cold (migrated down to HDD tier via ILM) data can be compressed
- Leverages Curator MapReduce framework
- All nodes perform compression task
- Throttled by Chronos
data:image/s3,"s3://crabby-images/fe702/fe702ac96519af6d8a71887d8502ad0145a0fcc8" alt="Nutanix Capacity Optimization Methods"
- For Read/IO, data is decompressed in memory and then I/O is served.
- Heavily accessed data is decompressed in HDD tier and leverages ILM to move up to SSD and/or cache
data:image/s3,"s3://crabby-images/9b04d/9b04dea2fd11bef705d2e19e831f8afa0f6661a7" alt="Nutanix Capacity Optimization Methods"
Elastic Dedupe Engine
- Allows for dedupe in capacity and performance tiers
- Streams of data are fingerprinted during ingest using SHA1 hash at 16k
- Stored persistently as part of blocks’ metadata
- Duplicate data that can be deduplicated isn’t scanned or re-read; dupe copies are just removed.
- Fingerprint refcounts are monitored to track dedupability
data:image/s3,"s3://crabby-images/1c9fa/1c9fad7c1cb12eecb19a0dc367f9d73647bab201" alt="Nutanix Capacity Optimization Methods"
- Intel acceleration is leveraged for SHA1
- When not done on ingest, fingerprinting done as background process
- Where duplicates are found, background process removed data with DSF Map Reduce Framework (Curator)
data:image/s3,"s3://crabby-images/40d05/40d058e1e1478351fd3840fe4e07222d6829a122" alt="Nutanix Capacity Optimization Methods"
Global Deduplication
- DSF can dedupe by just updating metadata pointers
- Same concept in DR/Replication
- Before sending data over the wire, DSF queries remote site to check fingerprint(s) on target
- If nothing, data is compressed/sent to target
- If data exists, no data sent/metadata updated
data:image/s3,"s3://crabby-images/d5762/d5762f4d394f226c365bc8cb27d858ee2163e5af" alt="Nutanix Capacity Optimization Methods"
Storage Tiering + Prioritization
- ILM responsible for triggering data movement events
- Keeps hot data local DSF
- ILM constantly monitors I/O patterns and down/up migrates as necessary
- Local node SSD = highest priority tier for all I/O
- When local SSD utilization is high, disk balancing kicks in to move coldest data on local SSD’s to other SSD’s in cluster
- All CVM’s + SSD’s are used for remote I/O to eliminate bottlenecks
data:image/s3,"s3://crabby-images/28ce0/28ce099744016a1a690dec2b3594f0e632ad6c46" alt="Nutanix Capacity Optimization Methods"
data:image/s3,"s3://crabby-images/fbcb9/fbcb9a3115fddc426b713df7515d484a5212dd51" alt="Nutanix Capacity Optimization Methods"
data:image/s3,"s3://crabby-images/21f03/21f03871c2ba58fb11912cf7204a35ae1bf15d22" alt="Nutanix Capacity Optimization Methods"
Data Locality
- VM data is served locally from CVM on local disks under CVM’s control
- When reading old data (after HA event for instance) I/O will forwarded by local CVM to remote CVM
- DSF will migrate data locally in the background
- Cache Locality: vDisk data stored in Unified Cache. Extents may be remote.
- Extent Locality: vDisk extents are on same node as VM.
data:image/s3,"s3://crabby-images/1761e/1761e97af772c154f81dc56f85553cfc6b5708c9" alt="Nutanix Capacity Optimization Methods"
- Cache locality determined by vDisk ownership
Disk Balancing
- Works on nodes utilization of local storage
- Integrated with DSF ILM
- Leverages Curator
- Scheduled process
data:image/s3,"s3://crabby-images/8a84a/8a84ab5f9a791a6e615deb217bc8b7bbfbd57db5" alt="Nutanix Capacity Optimization Methods"
data:image/s3,"s3://crabby-images/9a10c/9a10ced8fb38970f822953b17ae6faef990e102c" alt="Nutanix Capacity Optimization Methods"
- With “storage only” node, CVM can use nodes full memory to CVM for much larger read cache
data:image/s3,"s3://crabby-images/1895e/1895e809866606af4312fe66f6045b66ce0d545c" alt="Nutanix Capacity Optimization Methods"