Welcome!

Related Topics: @BigDataExpo, @CloudExpo, @ThingsExpo

@BigDataExpo: Blog Feed Post

Aggregated Data Dilemma | @BigDataExpo #BigData #Analytics #DataScience

Valuable performance and behavioral nuances can be buried in the aggregated data

Okay, I am weird (tell me something that I don’t know, say most of my friends).  For Christmas I wanted a Nike Apple Watch to go with my existing FitBit and Garmin fitness trackers (I look sort of like a cyborg in the photo below…which is always cool).

While I was intrigued by the ability to do all sorts of cool things on the Apple Watch (like take a phone call and talk into my wrist watch like Dick Tracy), the thing that most intrigued me was the ability to buy third-party apps that could yield detailed exercise and health data.  I was hoping that this detailed exercise and health data could help me understand what effect particular behaviors or activities (or lack of particular behaviors and activities) were having on my overall health.

Why is this important to me?  You can thank articles like “Unexpected Heart Attack Triggers” for my health and exercise anxiety.  The article highlighted several things that can trigger a heart attack including:

  • Lack of sleep (definitely an issue, especially when I’m traveling so much)
  • Migraine Headaches (how can you work in technology and not have headaches)
  • Cold Weather (need to find more clients in warmer weather)
  • Big, Heavy Meals (with the exception of Chipotle, right?)
  • Getting Out of Bed in the Morning (see, I knew that was a big danger!!)
  • Alcohol (just like to drink a beer now and then)
  • Coffee (I drink Chai Tea Lattes, that’s technically not coffee, and I know that I shouldn’t admit that I drink Chai Tea Lattes)

So there are many items on that above list that could trigger a heart attack, and I enjoy many of the things on that list (like sleeping and eating and the occasional beer).  Consequently, I thought I’d put my data science experience to work to monitor my exercise and diet behaviors and predict potential health outcomes.

Personal Fitness Analytics
I tested the downloadable data from each of the three devices. The Fitbit offered the easiest way to download my fitness data (and I have TONS of useful fitness and diet tracking suggestions if anyone at Fitbit, Garmin or Apple ever read this blog!!). The problem with the fitness data is that I can only get daily level data (see Table 1).

Table 1:  Daily Fitness Tracking Data

I can add more external data to the aggregated fitness data (e.g., days of the week, days when I travel, how much I travel on those travel days) to come up with some simple plots.

For example, Figure 2 shows a visual correlation between the calories that I burn per step and the days that I travel.  My assumption is that I burn more calories per step when I am doing something that requires more exertion (like running or climbing steps), so it makes sense that on days when I am traveling, I have less opportunities for highly exertive activities.

Figure 2:  How Many Calories I Burn Per Step When Traveling

While this information is “interesting,” unfortunately, data at the aggregated daily level is not actionable.  If I had more detailed or granular fitness data, I’d like to chart what happens to my heart rate (and related stress levels):

  • During an airplane flight
  • When racing through an airport to catch a connecting flight
  • Waking up very early in the morning while traveling
  • Immediately after eating a large meal
  • While I’m doing my taxes (I hate doing my taxes)

The problem is that the data provided by my fitness band is aggregated to a level that is not actionable.  If I had my fitness data at 5 or 10-minute intervals, then I could more easily spot unusual health outcomes and determine (and eventually predict?) what behaviors (e.g., flying in an airplane, eating large meals, heavy exercise exertion, waking up extremely early) might be causing health concerns.

Power of Granular Data
Big Data and data science are all about granular data because valuable performance and behavioral nuances can be buried in the aggregated data.  For example, the chart in Figure 3 shows how additional performance nuances are being uncovered as we transition from a 5-minute to a 1-minute and finally to a 5-second interval in the capture of the performance data.

Figure 3:  Performance Nuances Uncovered in Granular Data

As the data gets more granular, the behavioral and performance nuances buried in the data start to surface. Data at the 5 minute and 1 minute intervals in Figure 3 tell you very little. Aggregated data is the anti-data science. Data at the 5-second interval highlights some potential performance concerns.  In this example, data at the 5-second interval starts to become actionable.

For example, I might notice too sedentary of a heart rate whenever I sit too long on a cross-country flight or my stress level jumping whenever I get another “flight delayed” message while trying to catch a connecting flight. I might then learn to perform some in-seat exercises and walking around during those long flights, or practicing controlled breathing and some simple yoga when enduring yet another flight delay (SFO airport does have a yoga room, and now I know why).

Preparing for an IoT World of Granular Data
Understanding the challenges of capturing and analyzing real-time granular machine and device-generated data will become even more critical as we move into the Internet of Things (IOT), where hundreds of sensors are kicking off tens, hundreds or even thousands of data points per minute.  This will force two specific challenges upon those of us coming from the more traditional human-generated big data world:

  • Real-time data capture and compression
  • Real-time analytics at the edge

For my fitness focus, I might need to expand my Personal Fitness Analysis to capture and analyze more of this detailed data in (near) real-time so that I can become aware of behaviors that are hurting or improving my health and fitness.  Ultimately, my goal is to change my behaviors, but I need to understand (and quantify?) what behaviors lead to desirable health and fitness outcomes (e.g., improved blood pressure, lower weight, less stress).

The post Aggregated Data Dilemma appeared first on InFocus Blog | Dell EMC Services.

Read the original blog entry...

More Stories By William Schmarzo

Bill Schmarzo, author of “Big Data: Understanding How Data Powers Big Business”, is responsible for setting the strategy and defining the Big Data service line offerings and capabilities for the EMC Global Services organization. As part of Bill’s CTO charter, he is responsible for working with organizations to help them identify where and how to start their big data journeys. He’s written several white papers, avid blogger and is a frequent speaker on the use of Big Data and advanced analytics to power organization’s key business initiatives. He also teaches the “Big Data MBA” at the University of San Francisco School of Management.

Bill has nearly three decades of experience in data warehousing, BI and analytics. Bill authored EMC’s Vision Workshop methodology that links an organization’s strategic business initiatives with their supporting data and analytic requirements, and co-authored with Ralph Kimball a series of articles on analytic applications. Bill has served on The Data Warehouse Institute’s faculty as the head of the analytic applications curriculum.

Previously, Bill was the Vice President of Advertiser Analytics at Yahoo and the Vice President of Analytic Applications at Business Objects.

Latest Stories
SYS-CON Events announced today that Outscale, a global pure play Infrastructure as a Service provider and strategic partner of Dassault Systèmes, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Founded in 2010, Outscale simplifies infrastructure complexities and boosts the business agility of its customers. Outscale delivers a secure, reliable and industrial strength solution for its customers, which in...
With major technology companies and startups seriously embracing Cloud strategies, now is the perfect time to attend @CloudExpo | @ThingsExpo, June 6-8, 2017, at the Javits Center in New York City, NY and October 31 - November 2, 2017, Santa Clara Convention Center, CA. Learn what is going on, contribute to the discussions, and ensure that your enterprise is on the right path to Digital Transformation.
SYS-CON Events announced today that EARP Integration will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. EARP Integration is a passionate software house. Since its inception in 2009 the company successfully delivers smart solutions for cities and factories that start their digital transformation. EARP provides bespoke solutions like, for example, advanced enterprise portals, business intelligence systems an...
IBM helps FinTechs and financial services companies build and monetize cognitive-enabled financial services apps quickly and at scale. Hosted on IBM Bluemix, IBM’s platform builds in customer insights, regulatory compliance analytics and security to help reduce development time and testing. In his session at 20th Cloud Expo, Tom Eck, Industry Platforms CTO at IBM Cloud, will discuss how these tools simplify the time-consuming tasks of selection, mapping and data integration, allowing developers ...
Existing Big Data solutions are mainly focused on the discovery and analysis of data. The solutions are scalable and highly available but tedious when swapping in and swapping out occurs in disarray and thrashing takes place. The resolution for thrashing through machine learning algorithms and support nomenclature is through simple techniques. Organizations that have been collecting large customer data are increasingly seeing the need to use the data for swapping in and out and thrashing occurs ...
In his session at 20th Cloud Expo, Brad Winett, Senior Technologist for DDN Storage, will present several current, end-user environments that are using object storage at scale for cloud deployments including private cloud and cloud providers. Details on the top considerations of features and functions for selecting object storage will be included. Brad will also touch on recent developments in tiering technologies that deliver single solution and an end-user view of data across files and objects...
For financial firms, the cloud is going to increasingly become a crucial part of dealing with customers over the next five years and beyond, particularly with the growing use and acceptance of virtual currencies. There are new data storage paradigms on the horizon that will deliver secure solutions for storing and moving sensitive financial data around the world without touching terrestrial networks. In his session at 20th Cloud Expo, Cliff Beek, President of Cloud Constellation Corporation, w...
Most DevOps journeys involve several phases of maturity. Research shows that the inflection point where organizations begin to see maximum value is when they implement tight integration deploying their code to their infrastructure. Success at this level is the last barrier to at-will deployment. Storage, for instance, is more capable than where we read and write data. In his session at @DevOpsSummit at 20th Cloud Expo, Josh Atwell, a Developer Advocate for NetApp, will discuss the role and value...
Amazon started as an online bookseller 20 years ago. Since then, it has evolved into a technology juggernaut that has disrupted multiple markets and industries and touches many aspects of our lives. It is a relentless technology and business model innovator driving disruption throughout numerous ecosystems. Amazon’s AWS revenues alone are approaching $16B a year making it one of the largest IT companies in the world. With dominant offerings in Cloud, IoT, eCommerce, Big Data, AI, Digital Assis...
SYS-CON Events announced today that Progress, a global leader in application development, has been named “Bronze Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Enterprises today are rapidly adopting the cloud, while continuing to retain business-critical/sensitive data inside the firewall. This is creating two separate data silos – one inside the firewall and the other outside the firewall. Cloud ISVs oft...
The 21st International Cloud Expo has announced that its Call for Papers is open. Cloud Expo, to be held October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, brings together Cloud Computing, Big Data, Internet of Things, DevOps, Digital Transformation, Machine Learning and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding busin...
Internet of @ThingsExpo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with the 21st International Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. @ThingsExpo Silicon Valley Call for Papers is now open.
DevOps is often described as a combination of technology and culture. Without both, DevOps isn't complete. However, applying the culture to outdated technology is a recipe for disaster; as response times grow and connections between teams are delayed by technology, the culture will die. A Nutanix Enterprise Cloud has many benefits that provide the needed base for a true DevOps paradigm. In his Day 3 Keynote at 20th Cloud Expo, Chris Brown, a Solutions Marketing Manager at Nutanix, will explore t...
As cloud adoption continues to transform business, today's global enterprises are challenged with managing a growing amount of information living outside of the data center. The rapid adoption of IoT and increasingly mobile workforce are exacerbating the problem. Ensuring secure data sharing and efficient backup poses capacity and bandwidth considerations as well as policy and regulatory compliance issues.
Interested in leveling up on your Cloud Foundry skills? Join IBM for Cloud Foundry Days on June 7 at Cloud Expo New York at the Javits Center in New York City. Cloud Foundry Days is a free half day educational conference and networking event. Come find out why Cloud Foundry is the industry's fastest-growing and most adopted cloud application platform.