Related Topics: Microservices Expo

Blog Feed Post

Data Footprint Reduction (Part 1)

Life beyond dedupe and changing data lifecycles

Over the past couple of weeks there has been a flurry of IT industry activity around data footprint impact reduction with Dell buying Ocarina and IBM acquiring Storwize. For those who want the quick (compacted, reduced) synopsis of what Dell buying Ocarina as well as IBM acquiring Storwize means read this post here along with some of my comments here and here.

Now, before any Drs or Divas of Dedupe get concerned and feel the need to debate dedupes expanding role, success or applicability, relax, take a deep breath, then read on and take another breath before responding if so inclined.

The reason I mention this is that some may mistake this as a piece against or not in favor of dedupe as it talks about life beyond dedupe which could be mistaken as indicating dedupes diminished role which is not the case (read ahead and see figure 5 to see the bigger picture).

Likewise some might feel that since this piece talks about archiving for compliance and non regulatory situations along with compression, data management and other forms of data footprint reduction they may be compelled to defend dedupes honor and future role.

Again, relax, take a deep breath and read on, this is not about the death of dedupe.

Now for others, you might wonder why the dedupe tongue in check humor mentioned above (which is what it is) and the answer is quite simple. The industry in general is drunk on dedupe and in some cases thus having numbed its senses not to mention having blurred its vision of the even bigger opportunities for the business benefits of data footprint reduction beyond todays backup centric or vmware server virtualization dedupe discussions.

Likewise, it is time for the industry to wake (or sober) up and instead of trying to stuff everything under or into the narrowly focused dedupe bottle. Instead, realize that there is a broader umbrella called data footprint impact reduction which includes among other techniques, dedupe, archive, compression, data management, data deletion and thin provisioning across all types of data and applications. What this means is a broader opportunity or market than what exists or being discussed today leveraging different techniques, technologies and best practices.

Consequently this piece is about expanding the discussion to the larger opportunity for vendors or vars to extend their focus to the bigger world of overall data footprint impact reduction beyond where currently focused. Likewise, this is about IT customers realizing that there are more opportunities to address data and storage optimization across your entire organization using various techniques instead of just focusing on backup.

In other words, there is a very bright future for dedupe as well as other techniques and technologies that fall under the data footprint reduction umbrella including data stored online, offline, near line, primary, secondary, tertiary, virtual and in a public or private cloud..

Before going further however lets take a step back and look at some business along with IT issues, challenges and opportunities.

What is the business and IT issue or challenge?
Given that there is no such thing as a data or information recession shown in figure 1, IT organizations of all size are faced with the constant demand to store more data, including multiple copies of the same or similar data, for longer periods of time.

Figure 1: IT resource demand growth continues

The result is an expanding data footprint, increased IT expenses, both capital and operational, due to additional Infrastructure Resource Management (IRM) activities to sustain given levels of application Quality of Service (QoS) delivery shown in figure 2.

Some common IT costs associated with supporting an increased data footprint include among others:

  • Data storage hardware and management software tools acquisition
  • Associated networking or IO connectivity hardware, software and services
  • Recurring maintenance and software renewal fees
  • Facilities fees for floor space, power and cooling along with IT staffing
  • Physical and logical security for data and IT resources
  • Data protection for HA, BC or DR including backup, replication and archiving

Figure 2: IT Resources and cost balancing conflicts and opportunities

Figure 2 shows the result is that IT organizations of all size are faced with having to do more with what they have or with less including maximizing available resources. In addition, IT organizations often have to overcome common footprint constraints (available power, cooling, floor space, server, storage and networking resources, management, budgets, and IT staffing) while supporting business growth.

Figure 2 also shows that to support demand, more resources are needed (real or virtual) in a denser footprint, while maintaining or enhancing QoS plus lowering per unit resource cost. The trick is improving on available resources while maintaining QoS in a cost effective manner. By comparison, traditionally if costs are reduced, one of the other curves (amount of resources or QoS) are often negatively impacted and vice versa. Meanwhile in other situations the result can be moving problems around that later resurface elsewhere. Instead, find, identify, diagnose and prescribe the applicable treatment or form of data footprint reduction or other IT IRM technology, technique or best practices to cure the ailment.

What is driving the expanding data footprint?
Granted more data can be stored in the same or smaller physical footprint than in the past, thus requiring less power and cooling per Gbyte, Tbyte or PByte. Data growth rates necessary to sustain business activity, enhanced IT service delivery and enable new applications are placing continued demands to move, protect, preserve, store and serve data for longer periods of time.

The popularity of rich media and Internet based applications has resulted in explosive growth of unstructured file data requiring new and more scalable storage solutions. Unstructured data includes spreadsheets, Power Point, slide decks, Adobe PDF and word documents, web pages, video and audio JPEG, MP3 and MP4 files. This trend towards increasing data storage requirements does not appear to be slowing anytime soon for organizations of all sizes.

After all, there is no such thing as a data or information recession!

Changing data access lifecycles
Many strategies or marketing stories are built around the premise that shortly after data is created data is seldom, if ever accessed again. The traditional transactional model lends itself to what has become known as information lifecycle management (ILM) where data can and should be archived or moved to lower cost, lower performing, and high density storage or even deleted where possible.

Figure 3 shows as an example on the left side of the diagram the traditional transactional data lifecycle with data being created and then going dormant. The amount of dormant data will vary by the type and size of an organization along with application mix.

Figure 3: Changing access and data lifecycle patterns

However, unlike the transactional data lifecycle models where data can be removed after a period of time, Web 2.0 and related data needs to remain online and readily accessible. Unlike traditional data lifecycles where data goes dormant after a period of time, on the right side of figure 3, data is created and then accessed on an intermittent basis with variable frequency. The frequency between periods of inactivity could be hours, days, weeks or months and, in some cases, there may be sustained periods of activity.

A common example is a video or some other content that gets created and posted to a web site or social networking site such as Face book, Linked in, or You Tube among others. Once the content is discussed, while it may not change, additional comment and collaborative data can be wrapped around the data as additional viewers discover and comment on the content. Solution approaches for the new category and data lifecycle model include low cost, relative good performing high capacity storage such as clustered bulk storage as well as leveraging different forms of data footprint reduction techniques.

Given that a large (and growing) percentage of new data is unstructured, NAS based storage solutions including clustered, bulk, cloud and managed service offerings with file based access are gaining in popularity. To reduce cost along with support increased business demands (figure 2), a growing trend is to utilize clustered, scale out and bulk NAS file systems that support NFS, CIFS for concurrent large and small IOs as well as optionally pNFS for large parallel access of files. These solutions are also increasingly being deployed with either built in or add on accessorized data footprint reduction techniques including archive, policy management, dedupe and compression among others.

What is your data footprint impact?
Your data footprint impact is the total data storage needed to support your various business application and information needs. Your data footprint may be larger than how much actual data storage you have as seen in figure 4. In Figure 4, an example is an organization that has 20TBytes of storage space allocated and being used for databases, email, home directories, shared documents, engineering documents, financial and other data in different formats (structured and unstructured) not to mention varying access patterns.

Figure 4: Expanding data footprint due to data proliferation and copies being retained

Of the 20TBytes of data allocated and used, it is very likely that the consumed storage space is not 100 percent used. Database tables may be sparsely (empty or not fully) allocated and there is likely duplicate data in email and other shared documents or folders. Additionally, of the 20TBytes, 10TBytes are duplicated to three different areas on a regular basis for application testing, training and business analysis and reporting purposes.

The overall data footprint is the total amount of data including all copies plus the additional storage required for supporting that data such as extra disks for Redundant Array of Independent Disks (RAID) protection or remote mirroring.

In this overly simplified example, the data footprint and subsequent storage requirement are several times that of the 20TBytes of data. Consequently, the larger the data footprint the more data storage capacity and performance bandwidth needed, not to mention being managed, protected and housed (powered, cooled, situated in a rack or cabinet on a floor somewhere).

Read the original blog entry...

More Stories By Greg Schulz

Greg Schulz is founder of the Server and StorageIO (StorageIO) Group, an IT industry analyst and consultancy firm. Greg has worked with various server operating systems along with storage and networking software tools, hardware and services. Greg has worked as a programmer, systems administrator, disaster recovery consultant, and storage and capacity planner for various IT organizations. He has worked for various vendors before joining an industry analyst firm and later forming StorageIO.

In addition to his analyst and consulting research duties, Schulz has published over a thousand articles, tips, reports and white papers and is a sought after popular speaker at events around the world. Greg is also author of the books Resilient Storage Network (Elsevier) and The Green and Virtual Data Center (CRC). His blog is at www.storageioblog.com and he can also be found on twitter @storageio.

Latest Stories
"There's plenty of bandwidth out there but it's never in the right place. So what Cedexis does is uses data to work out the best pathways to get data from the origin to the person who wants to get it," explained Simon Jones, Evangelist and Head of Marketing at Cedexis, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Digital Transformation and Disruption, Amazon Style - What You Can Learn. Chris Kocher is a co-founder of Grey Heron, a management and strategic marketing consulting firm. He has 25+ years in both strategic and hands-on operating experience helping executives and investors build revenues and shareholder value. He has consulted with over 130 companies on innovating with new business models, product strategies and monetization. Chris has held management positions at HP and Symantec in addition to ...
Enterprises have taken advantage of IoT to achieve important revenue and cost advantages. What is less apparent is how incumbent enterprises operating at scale have, following success with IoT, built analytic, operations management and software development capabilities - ranging from autonomous vehicles to manageable robotics installations. They have embraced these capabilities as if they were Silicon Valley startups.
In their session at @ThingsExpo, Shyam Varan Nath, Principal Architect at GE, and Ibrahim Gokcen, who leads GE's advanced IoT analytics, focused on the Internet of Things / Industrial Internet and how to make it operational for business end-users. Learn about the challenges posed by machine and sensor data and how to marry it with enterprise data. They also discussed the tips and tricks to provide the Industrial Internet as an end-user consumable service using Big Data Analytics and Industrial C...
René Bostic is the Technical VP of the IBM Cloud Unit in North America. Enjoying her career with IBM during the modern millennial technological era, she is an expert in cloud computing, DevOps and emerging cloud technologies such as Blockchain. Her strengths and core competencies include a proven record of accomplishments in consensus building at all levels to assess, plan, and implement enterprise and cloud computing solutions. René is a member of the Society of Women Engineers (SWE) and a m...
When talking IoT we often focus on the devices, the sensors, the hardware itself. The new smart appliances, the new smart or self-driving cars (which are amalgamations of many ‘things'). When we are looking at the world of IoT, we should take a step back, look at the big picture. What value are these devices providing. IoT is not about the devices, its about the data consumed and generated. The devices are tools, mechanisms, conduits. This paper discusses the considerations when dealing with the...
DXWordEXPO New York 2018, colocated with CloudEXPO New York 2018 will be held November 11-13, 2018, in New York City. Digital Transformation (DX) is a major focus with the introduction of DXWorldEXPO within the program. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term.
To Really Work for Enterprises, MultiCloud Adoption Requires Far Better and Inclusive Cloud Monitoring and Cost Management … But How? Overwhelmingly, even as enterprises have adopted cloud computing and are expanding to multi-cloud computing, IT leaders remain concerned about how to monitor, manage and control costs across hybrid and multi-cloud deployments. It’s clear that traditional IT monitoring and management approaches, designed after all for on-premises data centers, are falling short in ...
With privacy often voiced as the primary concern when using cloud based services, SyncriBox was designed to ensure that the software remains completely under the customer's control. Having both the source and destination files remain under the user?s control, there are no privacy or security issues. Since files are synchronized using Syncrify Server, no third party ever sees these files.
Mobile device usage has increased exponentially during the past several years, as consumers rely on handhelds for everything from news and weather to banking and purchases. What can we expect in the next few years? The way in which we interact with our devices will fundamentally change, as businesses leverage Artificial Intelligence. We already see this taking shape as businesses leverage AI for cost savings and customer responsiveness. This trend will continue, as AI is used for more sophistica...
"We are an integrator of carrier ethernet and bandwidth to get people to connect to the cloud, to the SaaS providers, and the IaaS providers all on ethernet," explained Paul Mako, CEO & CTO of Massive Networks, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
I believe that this may finally be the year that the CIO role ‘crosses the Rubicon,' leaving behind its traditional, IT-focused orientation. But I don't believe that either of the previous predictions of this outcome — fading into oblivion or rising to a business executive level — is correct. Instead, I think this is the year that we will see the role of the CIO transformed into something altogether different.
Cloud-enabled transformation has evolved from cost saving measure to business innovation strategy -- one that combines the cloud with cognitive capabilities to drive market disruption. Learn how you can achieve the insight and agility you need to gain a competitive advantage. Industry-acclaimed CTO and cloud expert, Shankar Kalyana presents. Only the most exceptional IBMers are appointed with the rare distinction of IBM Fellow, the highest technical honor in the company. Shankar has also receive...
"Calligo is a cloud service provider with data privacy at the heart of what we do. We are a typical Infrastructure as a Service cloud provider but it's been designed around data privacy," explained Julian Box, CEO and co-founder of Calligo, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
"NetApp is known as a data management leader but we do a lot more than just data management on-prem with the data centers of our customers. We're also big in the hybrid cloud," explained Wes Talbert, Principal Architect at NetApp, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.