Quantcast
Channel: The Virtual Storage Guy » Thin Provisioning

VCE-101 Thin Provisioning Part 1 – The Basics

$
0
0

This week’s VMware Communities Roundtable featured one of my favorite people at VMware, Paul Manning who spoke on thin provisioning. This topic is the inspiration for today’s VCE post. Reducing storage costs is top of mind and it’s gonna take some time covering thin provisioning so let’s begin!

Virtualizing a Datacenter

I am not exaggerating when I share with you that every customer I meet with has elaborate plans to virtualize most, if not all, of the Intel based servers within their data center, yet deploying this new architecture is challenged to move from a portion to a majority of the footprint due to storage costs (both CapEx and OpEx). I’d like to summarize the message from these meetings for you, which goes something like this…

“We love server virtualization! With VMware we’ve reduced our server footprint and are targeting more systems to virtualize.”

virtservers.jpg

“We love converged Ethernet! With Cisco we’ve reduced our port counts and are targeting to have a single platform for voice, data, and user access.”

virtnetwork.jpg

“Storage costs are out of control, can you help? Prior to virtualization we roughly ran 20%-30% of our servers on shared storage arrays. Virtualization is forcing every VM onto shared storage. What do we do?”

virtstorage.jpg



Colour me weird here, but consistently hearing customers facing the same challenge in meeting after meeting inevitably gets an old Sesame Street song stuck playing in my head…

One of these things is not like the others,
One of these things just doesn’t belong,
Can you tell which thing is not like the others
By the time I finish my song?

SSBook&Record.jpg

In Order to Get Off the Ground this Cloud Needs to Lose Some Weight

It’s no secret, these escalating storage costs are stalling the deployment of 100%, or mostly, virtualized data centers. Don’t take my word for it; note the number of technologies VMware has developed and supports in order to help customers reduce the costs associated with storage connectivity (iSCSI & NFS) and storage consumption (linked clones and VMDK thin provisioning).

As one of VMware’s key storage partners NetApp is ‘all in’ in our support of reducing the storage footprint for our customers. For today’s post we’re going to focus on the technical details on and around deploying thin provisioning so you can feel confident in your decisions on when and where to leverage this technology.

Anatomy of a Virtual Disk

I can’t think of a better place to begin this post than by reviewing the three types of virtual disks available in vSphere including their similarities, differences, weaknesses, and strengths.

– The Thick Virtual Disk



This is the traditional virtual disk format most of us have deployed with most of our VMs. This format preallocates the capacity of the virtual disk from the datastore at the time it is created. This format does not format the VMDK at the time of deployment. This means that data, which needs to be written, must pause while the blocks required to store the data are zeroed out. The operation only occurs on demand at anytime an area of the virtual disk, which has never been written to, is required to store data.

thick.jpg

– The Thin Virtual Disk



This virtual disk form is very similar to the thick format with the exception that it does not preallocate the capacity of the virtual disk from the datastore when it is created. When storage capacity is required the VMDK will allocate storage in chunks equal to the size of the file system block. For VMFS this may be between 1MB & 8MB and for NFS it will be equal to the block size on the NFS array. The process of allocating blocks on a shared VMFS datastore is considered a metadata operation and as such will execute SCSI locks on the datastore while the allocation operation is executed. While this process is very brief, it does suspend the write operations of the VMs on the datastore.

thin.jpg

Like the thick format, thin VMDKs are not formatted at the time of deployment. This also means that data that needs to be written must pause while the blocks required to store the data are zeroed out. This operation occurs on demand anytime an area of the virtual disk, which has never been written to, is required to store data.

To summarize the zeroing out and allocation differences between a thick and thin virtual disk just remember both will suspend I/O when writing to new areas of disk which need to be zeroed, but before this can occur with a thin virtual disk it may also have to obtain additional capacity from the datastore.

– The Eager Zeroed Thick Virtual Disk



This virtual disk form is similar to the thick format as it preallocates the capacity of the virtual disk from the datastore when it is created; however, unlike the thick and thin formats an eager zeroed thick virtual disk actually formats all of its data blocks at the time of deployment. This virtual disk format does not include or require the allocation and zeroing on-demand processes.

ezthick.jpg

Characteristics of Virtual Disks

– Thick and Thin Virtual Disks consume the same amount of storage on a storage array



This point usually catches most by surprise, but it is true. If you deploy VMFS datastores on thin provisioned LUNs the data stored in thin and thick VMDKs is the same. Remember, both formats don’t format their blocks until data is required to be stored.

Below is an example of an empty thick VMDK being stored on a thin EMC LUN from an EMC presentation. I share this as an example of how this technology works identically on any array.

ThickonThinEMC.jpg

Note: EMC labeled the disk type as ‘zerothick’ – this is a different name for ‘thick’ (as denoted in the explanation of storage consumption). I highlight as I don’t want readers to confuse it with eager zeroed thick.

– Application and Feature Support Considerations



There are several use cases which require the virtual disks to be converted to either thick or eager zeroed thick. For example VMs configured with MSCS or Fault Tolerance (note you cannot run MSCS in FT) are required to be eager zeroed thick. Now don’t fret this one, thick and thin virtual disks can be ‘inflated’ at any time to the eager zeroed thick format.

I would note that the inflation process is completed directly on the VMDK from within the datastore browser, and not as a part of the VM’s properties.

Understanding Thin – From a Day to Day Perspective

– Thin is only thin on day one



As discussed, thin provisioned virtual disks reduce used storage capacity by not preallocating allocating storage capacity from the datastore and storage array. As one would expect, the size of a VMDK will increase over time; however, the VMDK will be of a greater capacity than the data which can be measured from within the GOS.

Many customers are surprised to discover that the VMDK grew beyond the capacity of the data which it is storing. The reason for this phenomenon is deleted data is stored in the GOS file system. When data is deleted the actual process merely removes the content from the active file system table and marks the blocks as available to be overwritten. The data still resides in the file system and thus in the virtual disk. This is why you can purchase undelete tools like WinUndelete.

– Avoid Running Defrag Utilities inside of VMs



Many of us have operational processes in place which include the execution of file system defragmentation utilities on a regular basis. Speaking for NetApp, this process should not be required once you have migrated a system from direct attached storage to a shared storage array.

The recommendation to not run defrag utilities should be considered when deploying thin provisioned virtual disks as the defragmentation process results in the rewriting the all of the data within a VMDK. This operation can cause a considerable expansion in the size of the virtual disk, costing you your storage savings.

A visualization of the defragmentation Process (from Wikipedia)

FragmentationDefragmentation.gif

Understanding Thin – From an Availability Perspective

– The Goal of Thin Provisioning is Datastore Oversubscription



As we discussed, the only thin provisioned virtual disks are able to reduce the capacity consumed within a datastore and as such it is the only format which one can oversubscribe of the storage capacity of the datastore. On the surface this may sound like a very attractive option; however, it has a few limitations which you must know before you implement.

VM TP-crop.jpg

The challenge to oversubscribing a datastore is that datastore, and all of its components (VMFS, LUNs, etc…) are static in terms of storage capacity. While the capacity of a datastore can be increased on the fly, this process is not automated or policy driven. Should an oversubscribed datastore encounter an out of space condition, all of the running VMs will become unavailable to the end user. In these scenarios the VMs don’t ‘crash’ the ‘pause'; however, applications running inside of VMs may fail if the out of space condition isn’t addressed in a relatively short period of time. For example Oracle databases will remain active for 180 seconds, after that time has elapsed the database will fail.

Note: The out of space condition can occur even if there is used, yet free space in the VMDK if the write is attempting to allocate and zero a free block for the write operation. To be clear, this condition can occur with virtual disks, LUN, and network filesystems.

Ensuring Success with Thin provisioned Virtual Disks

– Ensuring Storage Efficiency



Have I scared you away from deploying thin provisioned virtual disks? My apologies if I have, that isn’t my intention. With any technology there are trade offs, and I wanted to ensure you were well informed, now let’s focus on tackling some of the scary scenarios I have covered.

As we covered earlier GOS file systems hold onto deleted data and this process unintentionally expands the capacity of a thin VMDK. This occurs naturally as file systems age and at an accelerated rate with the running of defrag utilities. You may be surprised to know that with a little bit of effort you can remove the deleted data from the VM file system (i.e. NTFS or EXT3) and reduce the size of the virtual disk.

RTFMShrink.jpg
In order to accomplish this feat there are two phases one must complete. First is to zero out the ‘free’ blocks within in the GOS file system. This can be accomplished by using the ‘shrink disk’ feature within VMTools or with tools like sdelete from Microsoft. The second half, or phase in this process, is to use Storage VMotion to migrate the VMDK to a new datastore.

You should note that this process is manual; however, Mike Laverick has posted the following guide which includes how to automate some of the components in this process. Duncan Epping has also covered automating parts of this process.

– Ensuring VM Availability



Before one places an oversubscribed datastore into production one needs to deploy a mechanism that ensures the datastore never fills. By leveraging a VMware alarms, Storage VMotion, and a little bit of scripting knowledge we can create a datastore capacity monitor and automated migration tool which can ensure the availability of our VMs.

Again we luck out here as Eric Gray at vCritical has shared his expertise and skill by documented a process to implement such a solution.

datastore-emergency-events.png

Remember There’s No Such Thing as a Free Lunch



In this section I have shared with your how the community of VMware experts have begun addressing the hurdles around deploying thin provisioned VMDKs. While these solutions are very solid, there is an aspect that appears to be overlooked; is there an impact the storage array and it related activities?

As you have seen we have been able to overcome some hurdles by implementing scripts that primarily leverage the capabilities of Storage VMotion. This is an area we need to dig into. As Storage VMotion copies data from one datastore to another we need to highlight that the original datastore will still contain the original data of the VMDk. Remember deleted data is still data. As with a GOS file system, the blocks are free but the storage array is still storing the deleted data. These blocks can be reused, but only as they are overwritten by another VM in the original datastore. These blocks are not returned to a global pool for reallocation in another datastore.

Also, the addition of these Storage VMotion process do place additional load on the storage array, storage network, and ESX/ESXi hosts. Please consider scheduling your activity accordingly based on business demands. I realize this may not be possible if failure to immediately migrate a VM will result in multiple VMs becoming unavailable.

My final precaution, should any of the Storage VMotion migrated VMs be included as a part of a SRM recovery plan, their migration to a new datastore will require them to be replicated again, in their entirety, to your DR location. So as in the last paragraph, please schedule your plans accordingly and ensure you have the bandwidth to complete these ‘re-baseline’ operations in any windows you may be assigned for completing these tasks.

In Closing

The storage savings technologies made available by VMware are truly revolutionary as they deliver savings that were unheard of with physical servers and add capabilities not found in most storage devices. As with any technology there are deployment considerations to take into consideration prior to their use and I hope we have covered most of them around thin provisioning.

As I stated at the beginning of this post, this is part 1 of 2. What I have covered today is the basics of how VMware’s thin provisioning works on any storage array from any vendor. Tomorrow I will post part 2 where I will cover technical enhancements, some in the form of the VAAI program and some propriety to NetApp’s Data ONTAP. I will share how these offerings enhance the storage savings efforts of thin provisioning by either eliminating or greatly simplifying a number of the listed hurdles.

Until tomorrow, good night, and remember, Virtualization Changes Everything!

The post appeared first on The Virtual Storage Guy.


VCE-101 Thin Provisioning Part 2 – Going Beyond

$
0
0

This is the second installment of a two-part post focusing on the thin provisioning capabilities native to vSphere. If you haven’t read Part I – The Basics, I’d suggest you do so before proceeding. We’ve covered a lot of content which you really should have a solid grasp of before proceeding.

In part I, I covered the basics of thin provisioning and how it operates with any storage platform. In part II we are going to expand upon a number of the points introduced in part I with a focus on how the storage virtualization functionality in Data ONTAP enhances VMware’s thin provisioning and the over arching goal to reduce both the CapEX and OpEx associated with storage in the virtualization space.

Anatomy of a Virtual Disk

Let’s begin where we started in part I, by reviewing different disk formats. This time let’s consider them with the virtual disks operating at 70% storage capacity.

– The Thick Virtual Disk

 

Again this is the traditional virtual disk format which we all know rather well. From this image you will notice do not receive any storage savings in the datastore, yet we do receive 30% savings on the storage array.

thickwdata.jpgIf you’re asking, “how can a thick VMDK consume less storage on the array?”
You’ll need to go back and read Part I

– The Thin Virtual Disk

 

In this image we have the thin provisioned virtual disk and as you can see we are receiving a 30% storage savings in both the datastore and on the array.

thinwdata.jpg.jpg

– The Eager Zeroed Thick Virtual Disk

 

In this last image we have an eager zeroed thick virtual disk which provides us no savings at either the datastore or on the array.

ezthickwdata.jpg.jpg

 

Taking Thin Provisioning Further…

I’d like to build off of the example list in the previous section and introduce the concept of enabling data deduplication into our discussion. Doing this should not be perceived as a comparison of competing technologies; far from it! My intentions are to clearly demonstrate that by leveraging VMware technologies along with Data ONTAP customers can receive exponentially greater reductions in storage and storage management.

In order to demonstrate the effects of running VMware on deduplicated storage we will need to look at the storage consumption and savings of multiple VMs contained in a shared datastore.

– Three Thin Provisioned Virtual Disks

 

There’s not much to comment on with this image. We have three virtual disks and our savings of 30% in both the data store and array remains with these three as it existed with a single VMDK. It’s pretty straight forward and simple (btw – I like simple).

mulitthin.jpg

– Three Thin Provisioned Virtual Disks plus Data Deduplication

 

Now we’ve got something to talk about! The first thing I need you to consider is that the observable savings vary between the datastore and storage array. While our datastore remained steady at 30%, our array is demonstrating 70% storage savings. This is possible by only storing unique blocks for all of the data within the datastore.

btw – achieving roughly 70% savings is pretty common with our deployments

Just like CPU or memory virtualization in ESX/ESXi, each VM has its own file system and virtual disk, but thru Data ONTAP we are able to share the underlying data blocks for greater efficiency and scaling of the physical hardware (or disk).

thinanddedupeonLUN.jpg

I need to clarify a point: in this example we are storing our VMDKs on LUNs (FC, FCoE, iSCSI). As LUNs are formatted with VMFS the VMware admin natively has no view into the storage array. Any storage virtualization will require tools that bridge this ‘gap’. This statement is true for every array serving LUNs regardless of vendor.

If you deploy with LUNs, I would suggest you take a peek at the NetApp Virtual Storage Console. It is a vCenter 4 plug-in, which among many of its functions provides reporting on the storage utilization through all layers; spanning datastore to physical disks. VMware admins now have direct access to understanding of their storage efficiency.

VSC-SANReport.JPG

– Three Thin Provisioned Virtual Disks plus Data Deduplication on NFS

 

I’m sure those who know me are saying, ‘I knew Vaughn was eventually going to bring up NAS.’

I do so to demonstrate a key point in this image there is something very different. The storage savings in our datastore reports the same savings as in the storage array… 63% savings!

thinonNFS.jpg

Do you recall earlier, when I stated I like simplicity, here’s one of the many reasons… NFS provides transparent storage virtualization between the array and the hypervisor. This architecture and combination of storage savings technologies delivers value to both the VI and storage admin teams. It’s that simple!

In fact NetApp dedupe provides the same storage savings for any type of virtual disk, thick, thin, or eager-zeroed thick. Find a need to inflate a a thin virtual disk to an eager zeroed thick? No problem, it doesn’t have any impact on the storage utilization and reporting in the datastore.

Characteristics of Virtual Disks

 

– Application and Feature Support Considerations

In part I we learned that thin provisioned VMDKs don’t provide ‘clustering supported’ for MSCS and FT. In addition, I cited that we see most integrators shy away from thin with business critical applications. These configurations are perfect scenarios where the introduction of data deduplication can provide storage savings.

This functionality is possible as the presence of data deduplication is unknown to the hypervisor, the VM, the application, etc… s an example: imagine the costs benefits of virtualizing Exchange 2010 and running it on deduplicated storage?

exchange2010.gif

Hint – With Exchange 2010 Microsoft has removed the single instance storage model which has existed all far back as I can recall (which is Exchange 5.5)

– Thin is only thin on day one

In part I we covered the concept that deleted data is still data and how this deleted data will eventually expand a thin provisioned virtual disk thus wasting storage capacity. We also covered how to leverage a number of tools and processes in order to return the VMDK to its minimal and optimal size; however, one has to balance these processes with the impact they will impose on the rest of the data center, specifically resources like replication and replication bandwidth.

This is another area where running VMware on NetApp provides unique value unavailable with traditional storage arrays.

To begin, data deduplication addresses part of this issue by eliminating the redundancy of content on the array. By eliminating the redundancy we eliminate the traditional cost of deleted file system content being stored on the array. For example, open a saved word doc, edit, save, and close it. The file system of the VM will create an entirely new copy of the word doc and will delete the original. Dedupe will reduce the block redundancy between these two copies.

NetApp is already shipping ‘hole punching’ technology today with FC, FCoE, and iSCSI LUNs managed by SnapDrive. This technology is available for RDMs and guest connected storage in VMs. Now this isn’t the end game, we’d still like to ensure that virtual disks can remain as space efficient in an automated fashion.

Our current plans are to expand this functionality to run as a policy on datastores. This is an example of NetApp engineering working to make your life easier. Does this type of simplicity and optimization sound appealing?

This feature was discussed at VMworld 2009 in session TA4881 featuring Arthur Lent and Scott Davis. I’d advocate anyone with access to watch the session in its entirety, ‘hole punching’ or ‘space reclamation’ demo occurs around 52:30 and runs for approximately 1 minute. Note: You must have access to the VMworld 2009 online content in order to view this session.

If you constrained on time and would prefer to view the only the demo (without audio) you can find it here (again VMworld access required).

– Avoid Running Defragmentation Utilities Inside of VMs

Our position remains unchanged since Part I; you should still consider discontinuing the use of defrag utilities within VMs. However, if you can;t we can help undo the havoc these utilities wreak. Defrag utilities rewrite data and their use will expand a thin provisioned virtual disk. While we can;t stop the expansion of the thin VMDK we can prevent the storage growth in on the array by simply a dedupe update following the defrag process. This post defrag process will return storage capacity and replication bandwidth requirements back to an optimized state.

Understanding Thin – From an Availability Perspective

– The Goal of Thin Provisioning is Datastore Oversubscription

In part I, I cautioned against oversubscribing datastores based on the lack of automated storage capacity management. When combined with oversubscription this rigidity establishes an environment where should the datastore fill all of the VMs become inaccessible. In order to overcome this type of failure we discussed implementing a script that monitors datastore capacity and migrates VMs in the event of scarce free space.

What if you could place policies on the datastore and have them resize without human intervention or scripts? This functionality is available today with VMware on NetApp NFS. It really is an elegant solution an ideal when considering oversubscription. This technology provides customers the ability to monitor large global storage pools (NetApp aggregates) versus individual datastores. By definition, the oversubscription of a physical resource requires the monitoring of the physical resource’s capacity.

Why would anyone not allow the storage array to manage the size of datastores versus monitoring and migrating individual VMs? While our goal is efficiency we mustn’t sacrifice simplicity.

Sure this is a very ‘storage centric’ view, but I’d position that form a storage capacity management standpoint it is probably easier to manage datatsores than individual VMs.

As NetApp and VMware support the hot expansion of LUNs and VMFS we should be able to add this process to the datastore monitoring we highlighted in part I.

I’d love to publish a post highlight another blog once someone whips up a script to handle LUNs – hint hint

A Few Additional Enhancements for Thin Provisioned VMDKs

 

– On-Demand Block Allocation

You may recall we shared that thin virtual disks allocate storage from the datastore on-demand and that this process triggers SCSI locks on the datastore, as these operations are metadata updates. SCSI locks are very short lived; however, they do impact the I/O of the datastore. I just wanted to share one point: block allocation events cannot invoke SCSI locks with NFS datastores, as they don’t exist. This means deploy all of the thin provisioned virtual disks you’d like, you’ll never incur an issue related to a metadata updates.

Default VMDK Formats with NFS

When running VMware over NFS the storage array determines whether the virtual disks are thick or thin. I’m happy to share with you that with NetApp storage arrays all VMDKs will be thin provisioned by default. Note the disk format options are grayed out in the wizard.

ProvVMDKonNFS.jpg
I guess you could say as the storage vendor with largest VMware on NFS install base, NetApp also have the largest install base of customers running with thin provisioned VMDKs.

In Closing

In this two part post I’ve attempted to communicate the following points:

• Storage and the associated operational costs aren’t inline with server and network virtualization efforts

• VMware has delivered a wealth of technologies to help address these costs of which we have highlighted thin provisioning inthese posts

• NetApp provides unmatched storage virtualization technologies which enhance the storage saving technologies from VMware resulting in unmatched CapEx and OpEx savings

Virtualization is truly changing everything – heck look at how long we’ve been discussing virtual disk types! For those of you who have NetApp arrays, please implement all of the technologies we have covered in parts I & II of this post (just remember to follow all of the best practices in TR-3749 or TR-3428). For those of you who aren’t running VMware on NetApp, I’d ask you to leverage the storage savings technologies of VMware to their fullest. I trust between these posts and others like this one, by my good friend and adversarial Chad Sakac, you should be well armed on the subject of thin provisioned VMDKs. When you’re ready to go further please check out our gear or allow us to virtualize your arrays with Data ONTAP by introducing a vSeries to your existing storage arrays.

I fear this post will be too technical for some and too sales-y for others. Trying to disseminate information to a broad audience is always a difficult task. Let me know if you found this data useful, by sharing your feedback.

The post appeared first on The Virtual Storage Guy.

Data Deduplication and Compression Means Double Savings

$
0
0

In many ways data deduplication and compression are a lot like salt and pepper. Both of these seasonings enhance the taste of food, each has a distinct flavor and is used in varying quantities depending on the dish being prepared; however, most of the time food tastes better when the two are used together. Similarly, dedupe and compression both reduce storage capacity, yet they are rarely used together in most storage arrays.

Salt and Pepper Shakers

Maybe you’re wondering why I wrote this post. Surely everyone knows about dedupe and compression. Heck, these technologies seemingly are the premier feature of every modern storage platform, right?

I’m not so sure about that. I’m consistently surprised by the number of times these technologies are incorrectly referenced in either a blog post or vendor marketing collateral. While similar in purpose, these technologies provides data reductions for dissimilar data sets. it is critical to understand how these two technologies operate, which application types will benefit from their use but most importantly, understanding how the combination can provide unmatched storage savings across the broadest set of use cases.

A Word on Thin Provisioning

Surely at this point in the post someone is asking, ‘What about Thin Provisioning? It reduces storage capacity.’

That’s simply incorrect, Thin Provisioning is not a data reduction technology. It is a provisioning model that allows one to consume storage on demand by eliminating the preallocation of storage capacity. It allows increased utilization of storage media, but it does not reduce the capacity of the data written to the storage media. I’ll cover T.P. in my next post.

Data Deduplication (A Primer in Layman’s Terms)

Data compression provides savings by eliminating redundancy at the binary level within a block. Data Deduplication (aka dedupe) provides storage savings by eliminating redundant blocks of data.

Storage capacity reduction is accomplished only when there is redundancy in the data set. This means the data set must be comprised of multiple identical files or files that contain a portion of data that is identical to the content found in other files.

Examples of where one will find file redundancy includes home directories and cloud file sharing applications like Citrix ShareFile and VMware Horizon. Block redundancy is rampant in datasets like test & development, QA, virtual machines and virtual desktops. Just think of the number of copies of operating system and application binaries exist in these virtualized environments.

Tech Tip: The smaller the storage block size, the greater the ability to identify and dedupe data. For example, misaligned VMs are deduped with a 512 byte block size, they can’t dedupe with a 4 KB block.

Data Compression

Data compression provides storage savings by eliminating the binary level redundancy within a block of data. Unlike dedupe, compression is not concerned with whether a second copy of the same block exists, it simply wants to store the most efficient block on flash. By storing data in a format that is denser than the native form, compression algorithms “inflate” and “deflate” data, respectively as it is read or written. Examples of common file level compression that we use in our day-to-day lives include MP3 audio and JPG image files.

Compression at the application layer, like a SQL or Oracle database, is somewhat of a balancing act. Faster compression and decompression speeds usually come at the expense of smaller space savings. To cite a less know example, Hadoop commonly offers the following five compression formats:

  • DEFLATE
  • gzip
  • bzip2
  • LZO
  • Snappy

Tech Tip: Compressed data sets can often be compressed on a storage array. This is possible as most admins tend to select optimal application performance over optimal storage savings.

Double Your Savings

From our inception, Pure Storage has focused on enabling All-Flash storage for mass adoption. By implementing multiple data reduction technologies, Pure Storage is able to significantly reduce data capacity and in turn reduce the price of flash. makes flash affordable for all workloads. The savings are universal.

The chart below is the actual storage savings from the entire Pure Storage customer base. Notice how Data Deduplication and Compression combine to double the storage savings we deliver to our customers. No tricks. No thin provisioning. No limits.

2X Results

(clink image to view at full size)

 Storage vendors who only provide a single data reduction technology are inevitably limited to the applications they can affordably provide flash for. Customers are learning to avoid such limited platforms in favor of more universal architectures like Pure Storage to address a larger number of applications and solutions.

2X banner

The post appeared first on The Virtual Storage Guy.

Understanding Thin Provisioning: Does it Reduce Storage Capacity?

$
0
0

I’d like to continue the discussion on storage savings technologies. In my recent post on Dedupe and Compression I attempted to clarify the differences in the data reduction capabilities between each technology and demonstrate how the two deliver even greater savings when used in conjunction. Let’s expand the conversation by digging into Thin Provisioning, likely the most widely available storage savings technology.

Provisioning storage in the traditional sense follows the “If you build it, they will come” aka the Field of Dreams model. Storage capacity is preallocated in anticipation of eventually, someday storing data. As a result this model requires storage and application administrators to develop a set of data growth forecasting skills, which with the benefit of time, we’ve learned are inaccurate more often than not.

At a high level, Thin Provisioning (T.P.) is a virtual provisioning mechanism that allows addressable storage capacity to be provisioned without consuming or reserving physical capacity. The latter is allocated on-demand, as data is being written. This dynamic capability eliminates physical capacity lost as free space within traditional, thick provisioned LUNs, volumes and virtual disks.

If you’d like a deep dive into the details of thin and thick provisioned virtual disks see these posts: Thin Provisioning Part 1: The Basics & Part 2: Going Beyond.

Clarifying the Role of Thin Provisioning

Data Deduplication, Compression and Thin Provisioning take different paths to delivering storage efficiencies. Here’s the simple way I view these technologies:

  • By removing unused reserve space, Thin Provisioning maximizes physical storage capacity
  • By reducing data, Dedupe and Compression increase physical storage capacity

Leveraging multiple storage savings technologies can provide unprecedented gains in usable storage capacity, resulting in significant reduction in storage costs. Pure Storage publishes the Dedupe Ticker, a publicly available real-time display of the actual data reduction and thin provisioning savings realized by our customer base. This data spans all applications and highlights the 2X multiplier I’ve previously referred to. As you can see Dedupe & Compression combine to drive an average 6:1 data reduction.

Stock Ticker 2013

When you add Thin Provisioning to the data reduction technologies, storage savings increase to nearly 13:1. Now I must admit, measuring the savings from Thin Provisioning is somewhat nuanced. While T.P. makes it easier to fill a drive or array to capacity it doesn’t actually increase the amount of addressable storage on the drive or array. As a result the ‘benefits’ of thin provisioning can be overstated. Allow me to demonstrate how easy it is to manipulate the ‘thin provisioning effect on savings’.

Let’s compare 8 TB of data stored on a 10 TB and a 100TB Thin Provisioned LUN. You’ll notice the former provides 20% savings where as the latter 92%.

Thin provisioning effect on savings

The logical data set remained 8 TB and the deduped capacity at 4 TB, yet by simply adjusting the capacity of the T.P. LUN we see an increase in T.P. Savings. There was no impact on the volume of data that is or can be stored.

In Closing

You should absolutely use every storage saving technology available on your array. The combination of Dedupe, Compression and Thin Provisioning along with additional technologies including SCSI unmap, array clones, and the removal of patterns and zeros are critical to scaling storage in the world of fixed physical datacenter capacity. This is the new norm – embrace it!

With that said… be cautious around any emphasis on Thin Provisioning. While I’m not suggesting that there’s no value to T.P.,  you need to watch out for the ‘thin provisioning effect on savings’ as it can set false expectations. Don’t take my word for it, see the example below from Big Storage.

Screen+Shot+2013 09 01+at+9 46 55+AM

 

 

The post appeared first on The Virtual Storage Guy.

Storage Field Day 6 visits Pure Storage

$
0
0

Last week Stephen Foskett and a host of delegates visited the Bay Area for Storage Field Day 6. The crew stopped by Pure Storage to catch up on our vision and direction, whiteboard on the architecture of Purity OE and FlashArray, and our perspetive on asking the right quesitons when considering an all-flash array. Videos of the visit are online and available for anyone who missed the live stream.

Five Years Driving the Flash Revolution (17:48)
Matt KixMoeller, VP Products and Marketing


FlashArray and Purity OE Architecture Whiteboard Session
(56:36)
Neil Vachharajani, Principal Architect


Asking the Right All-Flash Array Questions
(47:00)
Vaughn Stewart, Chief Evangelist

If you’re looking for additional SFD6 coverage you can find posts from the delegates here.

The post appeared first on The Virtual Storage Guy.

Experts Round Table: Flash Mythbusters

$
0
0

The Spiceworks ‘Flash Mythbusters‘ experts round table is available and online for those who missed the live event. Flash technologiests from EMC, Nimble Storage, Pure Storage and SolidFire to set aside their competitive differencees in order to debunk some of the top myths surrounding flash storage.

I find community events like this one compelling as the participants (mostly) drop pitching products in favor of deep discussions around various aspects of a technology.

According to the Spiceworks event ‘Flash Mythbusters‘ was hot – it set records for event registraiton and audience attendance. Check it out!

(update: seems there’s a bug when attempting to play teh Spiceworks video in Safari on Mac & iOS)

(update #2: I’ve updated the video link to YouTube)

I’d like to thank Justin Ong of Spiceworks for hosting and my co-panelsists, Sam Marraccini, Devin Hamilton and Jeramiah Dooley for sharing their perspective and insights.

The post appeared first on The Virtual Storage Guy.