Welcome!

Blog Feed Post

Byte-Level Versus Block-Level Deduplication and Backup

Unitrends is supporting byte-level deduplication on its backup appliances with its release 6 (due this month – March, 2010.)  Since there’s some confusion concerning byte-level versus block-level deduplication, I thought I’d take the opportunity to explain the differences between file-level deduplication (what Unitrends supported prior to release 6), block-level deduplication, and byte-level deduplication.

File-level deduplication operates by eliminating redundant files.  Despite what many pundits state, file deduplication is very efficient (and note that I’m stating this even though Unitrends is bringing out byte-level deduplication with release 6.)  The reason is two-fold: the concept of temporal data access locality and the concept of data getting “colder.”  The bottom line to both of these concepts is that statistically data that has been recently used is more likely to be re-used again while data that hasn’t been recently used is less likely to be re-used.  Another consequence of this behavior is the reason that master/differential backup policies have held up so well over time – data usage temporarily tends to be “clumped” together.

The downside of file-level deduplication concerns data reduction on what is typically called “structured data.”  Structured data includes things like databases, e-mail repositories, virtual machine image backups, image-based dissimilar bare metal backups, and the like.  Since there are no files per se, file-level deduplication can’t eliminate redundant data for structured data.

Block-level deduplication has higher overhead than file-level deduplication but has the tremendous advantage of deduplicating structured data.  In addition, block level deduplication can deduplicate at a sub-file level, i.e., when only a section of a file changes block-level deduplication can often enable the unchanged section or sections of the file to continue to be deduplicated.

Byte-level deduplication is a form of block-level deduplication that understands the content, or “semantics”, of the data.  These systems are sometimes called CAS – Content Aware Systems.  Typically, deduplication devices perform block-level deduplication that is content-agnostic – blocks are blocks.  The problem of course is that certain blocks of data are much more likely to change than other blocks of data.  For backup systems, the “metadata” (data about data) that contains information about the actual backup tends to change continuously while the backup data statistically changes much less often.  The advantage to byte-level deduplication is that by understanding the content of the data the system can more efficiently deduplicate the bytes within the data stream that is being deduplicated.

Ironically, file-level deduplication is a form of byte-level deduplication since there must be some degree of content-awareness in order to detect a file versus some other form of data.  But of course the problem as described above is that file-level deduplication can’t handle unstructured data and can’t handle changes at the sub-file level.

What Unitrends has done with its release 6 is to create a byte-level deduplication system that is integrated with the backup appliance so that the appliance inherently deduplicates at the byte-level without the delays between the backup server and backup storage associated with dedicated deduplication devices.  And of course an integrated all-in-one backup appliance doesn’t force the customer to integrate products from different vendors (e.g., the server vendor, the backup software vendor, and the deduplication device vendor, and so on.)

Share

Read the original blog entry...

More Stories By Mark Campbell

Mark Campbell is the COO of Unitrends. He originally joined Unitrends as its CTO in 2008. Unitrends enables its customers the freedom to focus on their business instead of backup. The company achieves this through a scalable, all-in-one backup solutions that no other data protection vendor can provide.  Unitrends integrated backup appliance simply protects businesses’ IT infrastructures at the lowest total cost of ownership in the industry. More companies every day join those who have discovered the customer-obsessed, enterprise-level data protection only Unitrends can offer.

Latest Stories
With tough new regulations coming to Europe on data privacy in May 2018, Calligo will explain why in reality the effect is global and transforms how you consider critical data. EU GDPR fundamentally rewrites the rules for cloud, Big Data and IoT. In his session at 21st Cloud Expo, Adam Ryan, Vice President and General Manager EMEA at Calligo, will examine the regulations and provide insight on how it affects technology, challenges the established rules and will usher in new levels of diligence a...
Existing Big Data solutions are mainly focused on the discovery and analysis of data. The solutions are scalable and highly available but tedious when swapping in and swapping out occurs in disarray and thrashing takes place. The resolution for thrashing through machine learning algorithms and support nomenclature is through simple techniques. Organizations that have been collecting large customer data are increasingly seeing the need to use the data for swapping in and out and thrashing occurs ...
When you focus on a journey from up-close, you look at your own technical and cultural history and how you changed it for the benefit of the customer. This was our starting point: too many integration issues, 13 SWP days and very long cycles. It was evident that in this fast-paced industry we could no longer afford this reality. We needed something that would take us beyond reducing the development lifecycles, CI and Agile methodologies. We made a fundamental difference, even changed our culture...
yperConvergence came to market with the objective of being simple, flexible and to help drive down operating expenses. It reduced the footprint by bundling the compute/storage/network into one box. This brought a new set of challenges as the HyperConverged vendors are very focused on their own proprietary building blocks. If you want to scale in a certain way, let’s say you identified a need for more storage and want to add a device that is not sold by the HyperConverged vendor, forget about it....
As many know, the first generation of Cloud Management Platform (CMP) solutions were designed for managing virtual infrastructure (IaaS) and traditional applications. But that’s no longer enough to satisfy evolving and complex business requirements. In his session at 21st Cloud Expo, Scott Davis, Embotics CTO, will explore how next-generation CMPs ensure organizations can manage cloud-native and microservice-based application architectures, while also facilitating agile DevOps methodology. He wi...
In the enterprise today, connected IoT devices are everywhere – both inside and outside corporate environments. The need to identify, manage, control and secure a quickly growing web of connections and outside devices is making the already challenging task of security even more important, and onerous. In his session at @ThingsExpo, Rich Boyer, CISO and Chief Architect for Security at NTT i3, discussed new ways of thinking and the approaches needed to address the emerging challenges of security i...
Docker containers have brought great opportunities to shorten the deployment process through continuous integration and the delivery of applications and microservices. This applies equally to enterprise data centers as well as the cloud. In his session at 20th Cloud Expo, Jari Kolehmainen, founder and CTO of Kontena, discussed solutions and benefits of a deeply integrated deployment pipeline using technologies such as container management platforms, Docker containers, and the drone.io Cl tool. H...
Cloud adoption is often driven by a desire to increase efficiency, boost agility and save money. All too often, however, the reality involves unpredictable cost spikes and lack of oversight due to resource limitations. In his session at 20th Cloud Expo, Joe Kinsella, CTO and Founder of CloudHealth Technologies, tackled the question: “How do you build a fully optimized cloud?” He will examine: Why TCO is critical to achieving cloud success – and why attendees should be thinking holistically ab...
The question before companies today is not whether to become intelligent, it’s a question of how and how fast. The key is to adopt and deploy an intelligent application strategy while simultaneously preparing to scale that intelligence. In her session at 21st Cloud Expo, Sangeeta Chakraborty, Chief Customer Officer at Ayasdi, will provide a tactical framework to become a truly intelligent enterprise, including how to identify the right applications for AI, how to build a Center of Excellence to...
SYS-CON Events announced today that Datera, that offers a radically new data management architecture, has been named "Exhibitor" of SYS-CON's 21st International Cloud Expo ®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Datera is transforming the traditional datacenter model through modern cloud simplicity. The technology industry is at another major inflection point. The rise of mobile, the Internet of Things, data storage and Big...
Blockchain is a shared, secure record of exchange that establishes trust, accountability and transparency across business networks. Supported by the Linux Foundation's open source, open-standards based Hyperledger Project, Blockchain has the potential to improve regulatory compliance, reduce cost as well as advance trade. Are you curious about how Blockchain is built for business? In her session at 21st Cloud Expo, René Bostic, Technical VP of the IBM Cloud Unit in North America, will discuss th...
An increasing number of companies are creating products that combine data with analytical capabilities. Running interactive queries on Big Data requires complex architectures to store and query data effectively, typically involving data streams, an choosing efficient file format/database and multiple independent systems that are tied together through custom-engineered pipelines. In his session at @BigDataExpo at @ThingsExpo, Tomer Levi, a senior software engineer at Intel’s Advanced Analytics ...
SYS-CON Events announced today that Datera will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Datera offers a radically new approach to data management, where innovative software makes data infrastructure invisible, elastic and able to perform at the highest level. It eliminates hardware lock-in and gives IT organizations the choice to source x86 server nodes, with business model option...
"Cloud computing is certainly changing how people consume storage, how they use it, and what they use it for. It's also making people rethink how they architect their environment," stated Brad Winett, Senior Technologist for DDN Storage, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
With 10 simultaneous tracks, keynotes, general sessions and targeted breakout classes, Cloud Expo and @ThingsExpo are two of the most important technology events of the year. Since its launch over eight years ago, Cloud Expo and @ThingsExpo have presented a rock star faculty as well as showcased hundreds of sponsors and exhibitors! In this blog post, I provide 7 tips on how, as part of our world-class faculty, you can deliver one of the most popular sessions at our events. But before reading the...