Virtual Ascetic

everything about virtualization, storage, and technology

  • Home
  • About

Your likes and shares help increase my enthusiasm to share knowledge. Please leave a comment if you like any post

Archives for November 2012

vSphere Datastore size selection

November 15, 2012 By asceticadmin

VMWare ESXi suite has evolved from the previous years and it is now possible to run more than the 20 VM’s that once was almost a limitation that messed up IT infrastructure design. The reason it messed up the design was that customers were forced to create smaller datastores or waste space (whichever was more preferable) when it came to allocating VM’s per datastore. A lot of customers were really stuck with smaller size VM’s (more than 20 of them) that would make up close to 1Tb but would not allow customers to use 2Tb size because the remaining space would essentially be wasted.

With the latest release of ESXi 5.1 VMWare has worked on it and improved the number of VM’s you can have per datastore without running into I/O contention. But for many of the customers who now are looking beyond just a pure VM per datastore calculation it is always a catch-22 situation as to what should be the ideal size of their datastore.

In personal experience, I have found that customers are best suited to create a 2Tb VMFS datastore and then load balance their VM’s across them. The only problem that arises is that if you are using only 50% of that space what should you do with the rest. I propose that customers use the extra space to create cloned copies or setup temporary VM’s on the free space so that it does not go wasted. This avoids creation of a dedicated datastore that is used for testing purposes. While the optics of mixing production, test, or staging workloads on the same datastore is not preferable many SMB’s might actually not be affected by it. The reason is that a large number of SMB’s usually operate about 50-100 VM’s in their environment. That means three to five datastores can handle their entire workload.

Example

A customer has three datastores with 2Tb space each and runs 75 VM’s. Out of the 75 VM’s about 50 are production and the remaining 25 VM’s are test/dev. Of the 50 production VM’s about 25 are highly critical and the rest are low utilization servers. The size of each server is approximately 50-80Gb in size – with an average of 80Gb the total size will be approx 6000 Gb.

Something that would be relevant for this environment would be is 10 highly critical VM’s with 10 low performance prod servers can be setup on one datastore. This would allow the creation of 2 datastores that are just production related and handle 40 servers. Create two more datastores then and add all remaining production servers in there and add some test/dev servers on which you are never going to perform stress testing. The ideal application type is web servers or file servers. Finally create the datastore to hold all of your remaining test/dev servers. As you see with 4 datastores you accommodated 75 servers with some growth. The layout would look like –

Datastore 1 – 2Tb size – 20 VM’s (Critical Prod + Regular Prod VM’s) – Used space is 1.6Tb
Datastore 2 – 2Tb size – 20 VM’s (Critical Prod + Regular Prod VM’s) – Used space is 1.6Tb
Datastore 3 – 2Tb size – 15 VM’s (Critical Prod + Regular Prod + Some test/dev VM’s) – Used space is 1.2Tb (keep clones etc in this datastore)
Datastore 4 – 2Tb size – 20 VM’s (Remaining Test/Dev VM’s) – Used space is 1.6Tb. Use this for stress testing.

We used 8 Tb of storage capacity to accommodate for 75 VM’s which otherwise would have been split up into several chunk of smaller datastores. When you have new project testing requirements shuffle some of the VM’s into the free space to avoid impacts due to stress testing in the test/dev datastore.

The above is just an indication of the type of datastore split that can be attempted. With the ESXi 5.1 release onwards you can pack more VM’s into a datastore as long the VM’s are not too demanding in terms of I/O and the metrics would be known to you.

If you do not know the I/O metrics let the workload run for a while in the test/dev environment and then review the reads and writes for each day. Also review the total IO/sec traffic throughput for each day.

This post is more relevant for virtualization administrators that may not be very experienced and would want an understanding of how datastore allocation should be performed.

Filed Under: General Tagged With: best practice, datastore, datastore size, vsphere

VAAI (vStorage API for array integration)

November 6, 2012 By asceticadmin

In version 4.1 of the ESX/ESXi software VMWare introduced a new feature called VAAI. As the name suggests it offers integration capabilities with the storage array to offload certain operations from the host to the array. I will discuss why this feature is needed first of all and what some of the use cases would be.

VAAI essentially is a logical step forward in virtualization where you reach a bottleneck in terms of the network and virtualization layer capabilities and uses the storage more efficiently to perform tasks that are truly at the storage layer anyways. So instead of getting the virtualization or network layer involved it allows the storage array to do more than it has traditionally done. Now, many organizations have storage arrays that are under-utilized and others have hyper active storage arrays. If you have hyper active storage arrays that cannot keep up with existing performance constraints I would ask that you tread carefully with your implementation of VAAI. In order to perform certain new tasks there will be a little more load on the storage controllers than you anticipated.  It is important to understand what that extra load can do to your existing performance constraints.

If your trade off however is that you need the activity done soon (assuming your storage controllers don’t run at 90% utilization) then it is quite relevant that you would want to perform storage vMotions on the array instead of having to transfer data over the network or with the hypervisor involved. There are certain requirements to ensure your infrastructure supports VAAI – primarily storage arrays should storage based hardware acceleration. Most newer arrays support that but some legacy storage arrays might not be able to do that. VMware offers a hardware compatibility list that should be referenced.

In ESXi 5.x onwards VMWare made some changes to further improve VAAI. The goal was to encourage customers to use larger datastores and fewer of them. Note that fewer datastores also reduces processing overheard on storage arrays so your general storage performance will see an improvement anyways. Going back to the new improvements NAS Hardware Acceleration is now included with support for Full File clone, Native Snapshot Support, Extended statistics, and Reserve space. Previously, only thin disk provisioning (VMDK) was supported and now thick disks are also supported.

A cool feature is the support for Block Delete to reclaim dead space on thinly provisioned LUN’s. KB#2014849 offers more insight into reclaiming VMFS deleted blocks on thin provisioned storage. To confirm if SCSI UNMAP is supported on a LUN use the following command:

# esxcli storage core device vaai status get -d <naa>

If in the output ‘Delete Status’ is supported then the LUN is capable of sending SCSI UNMAP commands to the array.

To get the VAAI status from command line use the following command on ESXi 5.x

# esxcli storage core device vaai status get

On ESXi 4.x run the following command

# esxcfg-scsidevs -l | egrep “Display Name: |VAAI Status:”

Any changes to the VAAI settings can be viewed at /var/log/vmkernel or /var/log/messages

In conclusion, I do want to mention that VAAI cannot be used when the source and destination vmfs volumes have different block size, the source is RDM and destination is not similar (non-RDM). The source and target volumes should be formatted with the similar disk type – eagerzeroedthick or other options.

For more details refer KB#1021976 or visit the link here – bitly.com/nVkRFW

 

Filed Under: General Tagged With: api, eagerzeroed, RDM, thin provisioning, vaai, vmware, vmware kb, vstorage

Performance tuning on VMWare

November 5, 2012 By asceticadmin

Key issues concerning performance –

  • Bad Performance negatively affects revenue
  • Revenue impact negatively affects technology decisions
  • Technology decisions as a result of bad performance affect IT architecture

It’s a cyclical situation that many of us find ourselves in from time to time. We may run into a performance problem and as a result we spend significant amount of time resolving them.  Our users keep complaining and the business starts losing trust in the IT department. When you then come up with new technology it takes a lot of time to convince the business/user community to move forward. Hence it is better to nip the problem in the bud before it flowers and starts affecting your work.

Performance tuning in a virtual environment has to be considered from various aspects. While the virtualization platform is owned by VMWare through its virtualization layer there are other components at play when considering performance issues. This is one topic very close to my heart so I am going to get into a few details here. For clarifications or further questions feel free to comment at the end of this post. So the components I was referring to are quite a few in number – VM Environment design, Hypervisor and VM Setup, OS, Computer and I/O configuration, HA and DRS rules, network constraints, and so on.

VM Environment Design – Depending on industry or business line of operation, the organizations may finalize a design that suits certain requirements. Those that run mission critical workloads and have the resources will devote sufficient number of servers to process the workload. Their design might involve load balancers or traffic redirectors that can affect performance. The selection of storage array or type of disk is also part of the environment design. For e.g. if you are running random load or sequential workload it is important that the correct infrastructure is in place to handle the type of workload. Usually, if the workload is normal many people don’t pay attention to this aspect. Office politics is another issue where your environment design may not be optimum – the only way to work through this is to review the entire design end to end. I would recommend starting with a visio diagram of your architecture and then laying it out in a excel sheet or a mind map. Then slowly start creating a before and after picture in which you can perform comparison of what the metrics are and what is the end state. After you have successfully understood the gaps in design or capabilities then again re-design for future scale/growth. This final step is very important otherwise you’ll run into performance/flexibility issues again in a short while (~1 to 2 years).

Hypervisor and VM Setup – There is a simple answer usually to every question that is asked about setting up a VM but there are intricacies that can often be missed. When you setup a VM decide what is the optimum configuration for it. Know what you can and cannot do – you cannot create more datastores than what are allowed, you cannot run thin disk provisioning for performance crazy applications, and so on. A VM setup is not about the nuts and bolts of creating a virtual machine, assigning processor and memory, and finally securing it with SRM. Virtual Machine setup also involves understanding the block size (luckily with ESXi 5.1 onwards that is not a concern), paravirtualized SCSI adapters, the bus type, storage I/O Control, multi-pathing, DRS impacts due to the way you may have set it up, and a whole lot more. Now don’t get concerned about how you can understand all of these aspects upfront because your application or build testing can guide you to the correct configuration that needs to be put in place. Problem is not many administrators go back to the drawing board and review the configuration unless the testing itself failed. I once resolved longer backup time and multiple failed backups by increasing the default CPU and Memory allocation for Service Console. With the newer ESXi 5.1 capabilities you no longer have the service console so it’s not a worry but if you are running older releases look at bumping up the service console cpu and memory reservations.

OS – The type of Operating System you selected and the frequency based on which you apply updates matters. Multi-threading capabilities of most linux/unix operating systems is well known and well proven. Not saying that Windows 2008 onwards the operating system does not perform well but there are a lot of constraints around using Windows Server OS specifically when you are working with issue troubleshooting. It is far more easy to audit and establish a clear path of troubleshooting when working with Linux. But when you work with Windows specific databases and applications, it is important for you to setup some level of logging and auditing. Otherwise, in a performance problem it will be difficult to identify what’s a resource problem and which is a application issue. As it usually happens most performance problems are actually blamed on the infrastructure but I have had the opportunity in the past to rebut every one of those events (except one) and actually show that the infrastructure was not at fault. The one exception i listed above was related to disk realignment so that was something again that helped improve performance. I will talk about it in another post since I want to cover some other details with it.

System and I/O configuration – I always ask application owners/vendors about their prefered hardware configuration – then ask for TPM and TPC requirements. Chances are 90% of the time you will find that the application owner/vendor has no clue of it. Then ask the same thing to your hardware vendor or read from the tech specs as to what is supported. VM’s are to be setup on physical hardware at the end of the day but if you don’t know how your hardware is capable of performing then you are working with limited knowledge.  The knowledge of TPM and TPC is usually required in complex environments but then those are the environments usually where performance issues are the most complex to resolve. So every bit helps and every bit makes the business more confident of the services that you are able to provide.  BIOS upgrades, hard drive firmware upgrades, and hyperthreading support are the kind of issues that are often overlooked by many who do not focus on design considerations primarily due to lack of experience. If you are using iSCSI configuration think of how much throughput you are getting to how many ESXi hosts. Maybe you require more data ports, the type of connection you use on the network side may not be sufficient – I once came across a server that had network adapter set to ‘Auto’ for speed settings. I set it manually to full duplex and it resolved a major problem with backup performance. Jumbo ports for data transfer need to be reviewed, disk alignement for faster I/O, multi-pathing, trunking for bandwidth aggregation, and a whole host of other things are to be reviewed. On the OS run IO stats to see what you are observing from inside the VM, then run the io stats on ESXi and finally view the storage array stats. Everything should co-relate or else you are having a problem at one of the layers. Memory leaks are something that occur within the OS but are not clearly visible as performance impacting factors. It is important to troubleshoot for memory leaks due to application code when you are not seeing anything at the infrastructure level.

HA and DRS – How you have setup HA and DRS matters. I have come across numerous questions from peers that I have known who wanted my take on their DRS setup. One thing I always advise everyone to do is to turn on HA and DRS in the early stages because it allows you to understand how the load patterns are changing and accordingly allows you to make tweaks to your setup. Those who don’t have that luxury because they are getting on DRS quite late – start with a smaller cluster and slowly start advancing it further by understanding what impacts are being caused to the servers/VM’s that are not part of the DRS resource pools. Understand how shares, reservations, and limits work and then slowly make changes. Review the number of vMotions for each VM and then see if their resource allocation needs to be tweaked. Use vCOPS to understand how your environment is performing and then review whether overprovisioning or underprovisioning is occuring.

I can essentially go on and on about various performance factors and criteria but it is better to treat each aspect on its own in new blog posts so look forward to other content around this. If you have questions or would like to share your own experience or views please leave a comment. I would be happy to hear from you.

Filed Under: General Tagged With: ascetic, backup, compute, hypervisor, I/O impact, iSCSI, performance, performance impact, performance tuning, processing, storage, virtualization, vmware

RTFM

November 4, 2012 By asceticadmin

I have liked Mike Laverick’s RTFM site for a long time. Many congratulations to him for his blog to be acquired by TechTarget and his recent hiring by VMWare. It takes a lot of learning and effort to get to the point where Mike is and it is entirely due to his efforts.

Besides the point ‘RTFM’ itself, it is important to know why there is so much focus on it. With virtualization technologies like VMWare it is very easy to install and configure environments and you don’t really need the manual to be honest. But when you perform design and architecture in an IT environment it is very important to follow best practices. Again there is a qualification here – smaller environments can sometimes remain incorrectly designed and the flaws won’t show up because the environment is not complex, but as complexity increases the gaps start appearing and become wider and wider. That’s when specialists or consultants can come in and shout (in a hushed voice to themselves) – RTFM. Best practices are nothing but a set of values and processes that need to be taken into consideration.

That was enough talking so let’s get right down to the point with some practical examples.

I recently got a question from someone on multipathing configuration for software iSCSI. They were concerned at how it worked and were not able to provide a confident answer back to their management. I pointed them to the manual at this URL which on page 9 clearly states there is a new UI for the same feature that was only available via CLI in versions earlier than ESXi 5.0. This is not too complex of an effort – I did not get enlightened about this on my own but during one of my queries on iSCSI storage I had stumbled across this information in a VMWare whitepaper (linked above).

Similar to this I have had countless situations which I had to resolve and one that comes to my mind is the snapshot CID chain issue that was very popular in environments that used their datastores almost to max capacity and had issues with snapshot chains primarily because earlier versions of VMWare ESX were not stable enough to handle snapshots.

All in all – the focus is just to encourage all VMWare professionals to get to read the tech requirements and make themselves aware prior to them deploying system architectures or making changes to something that is already stable.

Filed Under: General Tagged With: ascetic, cloud, iSCSI, mike laverick, rtfm, storage, technology, VCP5, virtualization, vmware

Let’s Get Social

  • Google+
  • Linkedin
  • Twitter


Recent Posts

  • VMware’s Photon OS 1.0 Now Available!
  • VMware’s Photon OS 1.0 Now Available!
  • VMware’s Photon OS 1.0 Now Available!
  • VMware’s Photon OS 1.0 Now Available!
  • VMware’s Photon OS 1.0 Now Available!

Archives

  • June 2016
  • May 2016
  • April 2016
  • February 2016
  • September 2015
  • August 2015
  • February 2015
  • January 2015
  • October 2014
  • September 2014
  • August 2014
  • May 2014
  • April 2014
  • March 2014
  • February 2014
  • January 2014
  • October 2013
  • September 2013
  • August 2013
  • June 2013
  • May 2013
  • April 2013
  • March 2013
  • February 2013
  • January 2013
  • November 2012
  • October 2012