Host Disconnects Due to NFS

Ah, another lovely Tuesday. The Monday fires have been put out, TPS reports have all been filed (with the new cover letter no less), and now we can get back to focusing on innovating. Not so fast. It looks like one of our hosts has disconnected from vCenter. This should be fun! All of the…

Read more...

Determine ESXi Installation Media

I was welcomed to my Monday morning with an interesting Solarwinds alert for one of my ESXi hosts: Direct, Local USB Direct-Access (mpx.vmhba32:C0:T0:L0) has a current status of Critical. Awesome! I love it when USB devices fail. But wait, something didn’t add up correctly. It was my understanding that none of my hosts are USB…

Read more...

Create and modify a Protection Domain

You configure a protection domain by using Async DR feature (up to 1-hour RPO) or NearSync DR (up to 1-minute RPO ) by defining a group of entities (VMs and volume groups) that are backed up locally on a cluster and optionally replicated to one or more remote sites. See the DR best practices guide from https://portal.nutanix.com/#/page/solutions for guidance…

Read more...

Configure a Remote Site

A remote site is the target location to store data replications for protected domains. The remote site can be either another physical cluster or a cluster located in a public cloud. To configure a remote physical cluster that can be used as a replication target, do the following: Network and vStore Mappings are used to…

Read more...

Explain failover and failback processes

After a protection domain is replicated to at least one remote site, you can carry out a planned migration of the contained entities by failing over the protection domain. You can also trigger failover in the event of a site disaster. Failover and failback events re-create the VMs and volume groups at the other site,…

Read more...

Describe and differentiate Nutanix data protection technologies such as NearSync, Cloud Connect, and Protection Domains

NearSync Building upon the traditional asynchronous (async) replication capabilities mentioned previously; Nutanix has introduced support for near synchronous replication (NearSync). NearSync provides the best of both worlds: zero impact to primary I/O latency (like async replication) in addition to a very low RPO (like sync replication (metro)). This allows users have a very low RPO…

Read more...

Describe and differentiate component, service, and CVM failover processes such as Disk Failure, CVM Failure, and Node Failure

Disk Failure Monitored via SMART data Hades responsible for monitoring VM impact: HA Event: NO Failed I/O: NO Latency: NO In event of failure, Curator scan occurs immediately Scans metadata to find data previously hosted on failed disk Re-replicates (distributes) to nodes throughout cluster All CVM’s participate CVM Failure Failure = I/O’s redirected to other…

Read more...

Identify Data Resiliency requirements and policies related to a Nutanix Cluster

Data Resiliency Levels The following table shows the level of data resiliency (simultaneous failure) provided for the following combinations of replication factor, minimum number of nodes, and minimum number of blocks. Replication Factor Minimum Number of Nodes Minimum Number of Blocks Data Resiliency 2 3 1 1 node or 1 disk failure 2 3 3…

Read more...