Swinsian dedupe

9/4/2023

Hashing is the process of creating a short fixed-length data string from a large block of data. In case you are wondering how all these blocks of data are tracked, hashing is used. The Deduplication and Compression operation happens during the destage from the cache tier to the capacity tier.

These blocks of data should only be stored once to ensure data is stored efficiently. Each time a duplicate copy of data is stored, space is wasted.

So imagine if a customer has lots of VM’s sharing a datastore and these VM’s keep using the same block of data due to a certain file being written too often. Therefore only storing unique chunks of data. What happens is that blocks of data stay in the cache tier while they are being accessed regularly, but once this trend stops, the deduplication engine checks to see if the block of data that is in the cache tier has already been stored on the capacity tier. The basics of deduplication can be seen in the figure below. It is also important to note that Dedupe and Compression are supported on All Flash Virtual SAN configurations only. Customers benefit from space efficiency as the Virtual SAN cluster will not utilize as much storage as it would if it was not using Dedupe and Compression, hence saving dollars.

One of the many reasons for using Dedupe and Compression is to lower TCO for customers. When talking about Dedupe and Compression, one first needs to determine why an organization would want to use Dedupe & Compression and what these features actually do. These features were requested by VMware customers and I am glad that we listened to the customer. The reason deduplication ratios from different vendors cannot be compared directly with one another is that inputs, methods, and reporting are all different from product to product.Ĭomparing the deduplication ratios of different backup vendors is an incredibly invalid way of comparing a product’s ability to deduplicate data.Virtual SAN 6.2 introduced several highly anticipated product features and in this blog, we’ll focus on some of the coolest ones: Dedupe & Compression. This fact has been true since the very beginning of deduplication, but it is now more true than ever. In this blog, we’ll take a deeper look into why that is, and describe the proper way to compare vendors’ deduplication capabilities. The concept of deduplication ratios was born in the early days of target deduplication. You purchased a product like a Data Domain appliance and sent hundreds of terabytes of backups to an NFS mount, after which the appliance would deduplicate the data. You compared the volume of data sent by the backup product to the amount of disk used on the appliance, and that ratio was used to justify this new type of product.Įven in those early days, however, you couldn’t compare the advertised deduplication ratio of different products, because you had no idea how they created that number.

The biggest reason for this was that you had no idea what type of backups each vendor sent to their appliance, or the change rates that they introduced after each backup - if any. If they wanted to make their deduplication ratio look better, they would simply perform a full backup every time with no change rate. Perform 100 full backups with no change and you have a 100:1 dedupe ratio!Įven if a vendor attempts to mimic a real production environment - with a mix of structured and unstructured data, and a reasonable amount of change - it will not match the ratios and change rate of your environment. You have a different mixture of structured and unstructured data and a different change rate.

0 Comments

Swinsian dedupe

Leave a Reply.

Author

Archives

Categories