This post is my personal study technical note Nutanix Hyperconverged Solution drafted during my personal knowledge update. This post is not intended to cover all the part of the solution and some note is based on my own understanding of the solution. My intention to draft this note is to outline the key solution elements for quick readers.

logo2

The note is the first part of the whole note.

Solution Components

The Nutanix Solution is composed by below components:

Hardware: Nutanix Node/Block
Software: Nutanix Controller VM & Acropolis Operation System
- Distributed Storage Fabric (DSF)
- App Mobility Fabric (AMF)
- Acropolis Hypervisor (AHV)
Software: Industry standard hypervisor (ESXi, Hyper-V and AHV)
Software: Nutanix Prism Element/Central

nutanx-solution1

Nutanix Node

Nutanix Node – Standard x86 server

Processors
Memory
Local storage (SSD & HDD)
- Home partition is mirrored across first two SSD, metadata is mirrored across SSD, OpLog is distributed across SSD.
- Curator Reservation in each SSD/HDD.
Five Network Adapter
- Two 10GbE adapter – empty SFP+ adapters
- Two 1GbE adapters
- One 10/100 Ethernet adapter for CIMC

Nutanix Block is a bundle of hardware/software housing up to four nodes in 2RU. Node naming in block:

A(front left bottom)
B(front left up)
C(front right up)
D(front right bottom)

nutanix-node1

Disk:

NX-1050 (1 metadata SSD + 4 data HDD)
NX-3050/3060 (1 metadata SSD + 1 hot-tier data SSD + 4 data HDD)
SX-1065-G5, SX-1065S, SX-1065-G4, NX-1020 (1 SSD + 2 data HDD)

Controller VM

The controller VM has below components:

Stargate – Data IO mgmt for cluster (Move data between Hypervisor and Nutanix)
Medusa – Access Interface for Cassandra (Abstraction layer of Casandra)
Cassandra – Distributed metadata store (Run on each node with distributed database)
Zeus – Access Interface for ZooKeeper
Prism – Mgmt Interface (UI, nCLI and API)
Zookeeper – Manage cluster configuration (Run on 3-5 nodes with one leader for writing operation)
Curator – MapReduce and cleanup for cluster (Run on every node with a master node)

nutanix-cvm

Some other services includes:

Genesis – Cluster component & service manager (runs on each node)
Chronos – Job and task scheduler (runs on each node with master elected)
Cerebro – Replication/DR manager(runs on each node with master elected)
Pithos – vDisk configuration manager

The control VM network interface is configured as below:

Network: Backplane LAN – eth2
Network: Hypervisor LAN – eth1
Network: Management LAN – eth0

Nutanix Cluster

General Cluster

Minimum three nodes – unlimited nodes
Max 12 nodes in starter license
Acropolis Slave runs on every CVM with an elected Acropolis Master (scheduling, execution, IPAM, etc.)

Single node cluster is supported for running a limited number of VMs.

Need AOS version 5.5
Unlike single-node replication target (VM creation/Snapshot restoration is not supported in replication target)
2 SSD with min. 2 HDD
Single SSD failure will put node into read-only mode and back to normal until a SSD with Cassandra data been picked up. (Override mode is provided but not best practice)
Read operation is same as multi-node cluster
Write operation will replicate write to two different disks on the same node
No cluster expansion
No Encryption

File System – DSF

Acropolis Distributed Storage Fabric (DSF)

Storage Pool – A group of physical storage device (HDD/SSD). Can span multiple Nutanix nodes.
Storage Container – A logical segmentation of storage pool and contain VMs or files. Map to host with NFS/SMB.
VDisk – A subset of open storage in container providing storage for VM and composed by vBlocks. For NFS container, vDisk creation is handled by cluster.
vBlock – 1MB chunk of vDisk address space.
Volume Group – A collection of logically related virtual disks. Provide benefits for backup, protection, restoration and migration.
Datastore/SMB share – A logical container for files necessary for VM operations.
CVM access SCSI controller directly (ESXi: VM-Direct Path; Hyper-V: Pass Through.)

In general, for single cluster, one storage pool with one container uses all available storage will suit the needs of most customers.

Multi Storage Tiers in DSF (MapReduce Tiering Technology, Map Reduce Tiering will migrate the data across data between SSD/HDD depends on the data temperature.)

nutanix-dsf

Storage Capacity Optimization:

Erasure Coding
- Increase the efficiency, perform erasure coding for cold data
- Data block is written to two or three nodes initially and perform erasure coding later to provide efficiency
Compression
- Option to choose post or inline process
- Post-process need delay time defined by customer, no recommended value. (4-6 hours delay for general user data and file server.)
- Inline-process is recommended for workload perform batch processing.
Deduplication
- Enabled on container or vDisk
- Cache or Capacity deduplication with choice
- Cache deduplication for read cache and disabled by default. Need starter or higher license.
- Capacity deduplication for persistent data. Disabled by default. Need pro or higher license. Enable with cache deduplication enabled.
- RAM requirement (Cache Dedup 24GB/Capacity Dedup 32GB)

Write IO Data Flow

Data IO is passed from VM to Controller VM
Controller VM writes the IO to the OpLog portion on metadata SSD
Data then is replicated across to other metadata SSD on other nodes
The data in OpLog will be drained asynchronously to Lower tier extent store by ILM.

Read IO Data Flow

Read can be initiated from any node.
If the local cache (OpLog or Unified cache) do not have a copy, then reference to local extent store.
If local copy is not available, remote copy will be fetched and store at local for future reads.

Continue to read …

Nutanix Study Notes (Part 2)

Nutanix Study Notes (Part 3)

InfraPCS

Something about IT Infrastructure and Life

Nutanix Study Notes (Part 1)

Solution Components

Nutanix Node

Controller VM

Nutanix Cluster

File System – DSF

2 thoughts on “Nutanix Study Notes (Part 1)”

Leave a Reply Cancel reply