Presenters: Fei Guo & Rajesh Venkatasubramanian
The session was packed and there was a long lineup for folks who were on standby. In that a sense I was glad I signed up early for this session.
Rajesh began by giving an overview on the type of Memory issues usually observed in VMware environments. Active and Idle memory is where our focus should be from overcommitment standpoint.
Some general concepts and best practices were discussed as below.
Memory under commit
- Sum (memory size of all VMs in a host) <= host memory
- There is no reclamation of guest memory even if a VM has lots of free/idle memory
- ESXi always has some free memory as a result
- Sum (memory size of all VMs in a host) > host memory
- ESX may map only a subset of VM memory
- As a result ESXi wants to give more memory to a VM but cannot do that so it uses the Ballooned memory (disk pages). During a contention, memory reclamation is required.
Memory reclamation –
In a random fashion 100 pages are taken out of memory and then reviewed as to how many times the pages were used. This is done for a while and this sampling of data is used to analyze memory reclamation features
- Configured memory size – amount of memory guest sees.
- Memory reservation – specifies the minimum value. Sets reservation priority above active memory if high memory utilization happens all the time.
- Memory Limit – specifies the maximum value. In situations where application users don’t allow appropriate memory settings (right sizing) set the memory limit to avoid unused memory situations.
- Memory shares – specifies the relative priority of VM
- Idle memory – An ‘idle memory tax’ reduces entitlement for memory utilization
Memory reclamation techniques
- Transparent page sharing – background process for removing duplicate memory pages
- Ballooning – “Pushes” memory pressure from ESX host into VM
- Compression – “zips” memory instead of swapping it
- Swapping – writes memory to disk
Low CPU cost technique to de-dupe identical memory pages
Periodically scans pages and finds duplicates using content hashing
ESXi has background thread that scans all pages and ESXi memory is divided into 4kb chunks and computed. If page can be collapsed it is done – e.g. all zero pages are collapsed into a single zero page.
Scanning is done at very low rate so CPU usage is very minimal. This technique is actually preferable because the least amount of resources are used.
VMware uses a balloon driver inside the VM. If more memory needs to be reclaimed ESXi creates a ballon driver and pins the memory to required setting – guest OS does not swap out the memory. We know the balloon driver is not going to use the memory so ESXi can free that memory and provide that to the other VM.
If more ballooning is needed it is done by ESXi to reclaim more memory.
A couple of important things to note –
Ballooning =! Guest Paging
Guest OS paging is a possible side effect of ballooning but is not desired
Cached pages are taken out when extra ballooning is required. If ESXi attempts further ballooning it can hit the actual application pages. If Guest OS has to reclaim those pages then it has to store those pages and this leads to swapping.
Very few large pages are available when ballooning is used.
(click to enlarge image)
Swapping and compression
- When page swapping & ballooning do not work Swapping and Compression are used.
- ESX lacks knowledge of importance of each piece of VM memory
- Randomly chooses a page and compresses or swaps it
- Some memory is compressed and reads/writes are avoided on it. Additionally, swapping to SSD is introduced
When to balloon, compress or swap ?
(click to enlarge)
If host has more than 4% free memory then balloon is used. If ballooned memory is critically low, compression is used. If compression is less than 50% than swap is used.
Additionally, if ballooned memory is > 0 then swap is also used
(click to enlarge)
Using vCOPS we can monitor specific parameters in cluster/datacenter level
(click to enlarge)
Page Sharing & Large Pages
Intra VM – self sharing within a VM
Inter VM – sharing across VMs.
For VDI – 80% of memory sharing is Intra VM sharing
Workload homogeneity mostly affects inter vm sharing
Host features –
- ESXi does not share large pages
- Page sharing scanning thread still works (generates hashes)
Why Large Pages
- Fewer TLB misses
- Faster page table look up time
- ESX enables host large pages by default
- Large pages bring higher memory pressure due to no sharing
- Large pages can be broken when any small page is ballooned or swapped (sharing happens thereafter)
- When ballooning or swapping happens the total amount of large pages is reduced. Large pages delays page sharing until memory is overcommitted.
- Don’t disable page sharing (even if host large pages are used)
- By default host large page is enabled
- Disable ONLY when high consolidation ratio is desired as in VDI
- Install VMware tools and enable ballooning on All VM’s
- Ballooning is much better for performance than host swapping
- Provide sufficient swap space inside guest.
- Place guest swap file/partition on separate disk to allow monitoring guest swap activity through virtualDisk stats
- Swap to SSD vs Swap to HDD – even if smaller SSD is used, swapping out to SSD helps greatly. Performance improves by more than 50% because SSD read latency is much lower than disk
- Don’t disable memory compression
- Host cache is ‘nice to have’
- Host cache can be a small portion of SSD (say 20%)
- Too big host cache potentially is a waste
Which statistics to watch
- mem.swapInRate (Constant nonzero value indicates performance problem)
- mem.latency – % of time waiting for decompression and swap-in. Estimate the performance impact due to compression and swapping
- mem.active – if active is low, reclaimed memory is less likely to be a problem
- virtualDisk.readRate & virtualDisk.writeRate (VM) – for the virtual disk that has the VM’s swap partition. The larger the numbers, the more in-guest swapping is happening
I’ll leave you with a few more slides