Related Topics: @BigDataExpo, @CloudExpo, @ThingsExpo

@BigDataExpo: Blog Feed Post

Your #BigData Analytics | @BigDataExpo #IoT #M2M #BigData #BI #Analytics

Organizations have key business processes that they are constantly trying to re-engineer

Organizations have key business processes that they are constantly trying to re-engineer. These key business processes – loan approvals, college applications, mortgage underwriting, product and component testing, credit applications, medical reviews, employee hiring, environmental testing, requests for proposals, contract bidding, etc. – go through multiple steps, usually involving multiple people with different skill sets, with a business outcome at the end (accept/reject, bid/no bid, pass/fail, retest, reapply, etc.). And while these processes typically include “analytics” that report on how well the processes worked (process effectiveness), the analytics only provide an “after the fact” view on what happened.

Instead of using analytics to measure how well the process worked, how about using predictive and prescriptive analytics to actually direct the process at the beginning?  Instead of analytics that tell you what happened, how about creating analytics at the front of the process that predict what steps in the process are necessary and in what order?  Sometimes the most effective process is the process that you don’t need to execute, or only have to execute in part.

High-Tech Manufacturing Testing Example
We had an engagement with a high-tech manufacturer to help them to leverage analytics and the Internet of Things to optimize their 22-step product and component testing process.  Not only was there a significant amount of capital tied up with their in-process inventory, but the lengthy testing processes also created concerns about excessive and obsolete inventory in an industry where product changes happened constantly.

The manufacturer had lots of data that was coming off of the testing processes, but the data was being used after the fact to tell them where the testing was successful or not.  Instead, our approach was to use that data to predict what tests needed to be run for which components coming from which of the suppliers manufacturing facilities.  Instead of measuring what happened and identifying waste and inefficiencies after the fact, the manufacturer wanted to predict the likely quality of the component (given the extensive amount of data that they could be capturing but today was hitting the manufacturing and testing floors) and identify what tests where needed given that particular situation.  Think dynamic or even smart testing.

We worked with the client to identify all the data that was coming out of all the different testing processes.  We discovered that nearly 90% of the potential data was just “hitting the floor” because the organization did not have a method for capturing and subsequently analyzing this data (most of which was either very detailed log files, or comments and notes being generated by the testers, engineers and technicians during the testing processes).  At a conceptual level, their processing looked like Figure 1 with traditional dashboards and reports that answered the basic operational questions about how the process was working (see Figure 1).

Figure 1: Analytics That Tell You How Each Step Performed

However, we wanted to transform the role of analytics to not just reporting how the processes was working, but we wanted to employ predictive analytics to create a score for each component and then use prescriptive analytics to recommend what tests had to be run and in what order given the results of the predictive score.

We used a technique called the “By Analysis” to brainstorm the “variables and metrics that might be better predictors of testing performance”.  For example, when examining components with a high failure rate, we started the brainstorming process with the following question:

“Show me the percentage of component failures by…”

We asked the workshop participants to brainstorm the “by” variables and metrics that we might want to test.  Here are some of the results:

  • Show me the percentage of component failures by Tester/Technician (age, gender, years of experience, historical performance ratings, most recent training date, most recent testing certifications, months of experience by test machine, level of job satisfaction, years until retirement, etc.)
  • Show me the percentage of component failures by Test Machine (manufacturer, model, install date, last service date, ideal capacity, maximum capacity, test machine operational complexity, etc.)
  • Show me the percentage of component failures by Component (component type, supplier, supplier manufacturing facility, supplier manufacturing machine, supplier component testing results, etc.)
  • Show me the percentage of component failures by Manufacturer (years of service, historical performance, location, distance from distribution center, manufacturing location, lot number, storage location, etc.)
  • Show me the percentage of component failures by Distribution Center (location, build date, last remodel date, local temperature, local humidity, local economic conditions, etc.)
  • Show me the percentage of component failures by Time (Time of day/Day of Week, Time of year, local holidays, seasonality, etc.)
  • Show me the percentage of component failures by Weather (percipation, temperature, humidity, airborne particles, pollution, severe storms, etc.).
  • Show me the percentage of component failures by Labor unrest (strikes, number of plant injuries, safety issues, etc.)
  • Show me the percentage of component failures by Local economic conditions (average hourly wages, paid overtime, union representation, average hours worked per week, etc.)

The data science team started building a score that could be used to predict the quality and reliability of a component from a particular supplier created from a particular manufacturing machine at a particular time of day/week/year under particular weather conditions tested by a particular technician, etc.  Yea, I think you can quickly see how the more detailed data that you have, the more accurate the score.

We were able to create this “Component Quality & Reliability” score that we could use prior to the testing process to tell us what tests we needed to conduct and in what order with a reasonable level of risk (see Figure 2).

Figure 2:  Analytics That Prescribe What Tests to Run

By using the Component Quality & Reliability score, we could determine or predict ahead of time what tests we thought we would need to run and in what order.  The result was a dramatic improvement in the speed and cost of testing, with a minor but manageable increase in component performance risk.

Baseball Analogy
I love sports, particularly baseball.  Baseball has always been a game played with statistics, averages and probabilities.  There are lots of analytics best practices that we can learn from the game of baseball.  And one of the way that analytics is used in baseball is to determine the likelihood of your opponent doing something.

For example, the best baseball fielders understand each individual batter’s tendencies, propensities, averages, statistics and preferences (e.g., where the batter he is likely to hit the ball, what pitches he prefers to hit) and uses that information to position himself on the field in place where the ball is the most likely to be hit.

Then the fielder will make in-game, pitch-by-pitch adjustments based upon:

  • Field dimensions (distance to deep center, distance down the lines, height of the grass, grass or artificial turf, how best to play the Green Monster in Fenway or the vines in Wrigley, etc.)
  • Environment conditions (humidity, temperature, precipitation, wind, position of the sun or lights, etc.)
  • Game situation (number of outs, the inning, score, runners in scoring position, time of day/night, etc.)
  • Pitcher preferences and propensities and in-game effectiveness (getting ahead in the count, number of pitches thrown, current velocity, effectiveness of off speed pitches, etc.)
  • And even more…

The same approach – predicting ahead of time what is likely to happen – works for many business processes, such as underwriting a loan or mortgage.  We would want to learn as much as possible about the players involved in the underwriting process – borrower, property, lenders, appraisers, underwriters – so that we could build a “score” that predicts which steps in the process are important and necessary, and which ones could be skipped without significantly increasing the risk.  For example:

  • Which borrowers have a history of on-time payments, have a sufficient base of assets, have a solid job and salary outlook, don’t pose any retirement cash flow risks, have a reasonable number of dependents (children and potentially parents), etc.
  • Which properties are over-valued given the value of similar properties or properties within the same area, or which properties reside in a high storm or weather risk areas (probability or likelihood of hurricane, tornado, floods, forest fires, earthquakes, zodiac killers, etc.).
  • Which lenders have a history of good loans, have a solid financial foundation, have a solid management team, aren’t in the news for any management shenanigans, don’t have questionable investments, etc.
  • Which appraisers are most effective with what types and locations of properties for what types of loans in what economic situations, have a significant track record of success, have been with the same firm for a reasonable amount of time, have a solid educational background, etc.
  • Which underwriters are most effective with which types of loans for which types of properties, are happy with their job and family situation, have been on the job a reasonable amount of time, have solid performance ratings, don’t have any HR issues, etc.

With this information in hand, we’d be better prepared to know which types of credit applications need what level of scrutiny and what level of risks we would be willing to accept for what level of underwriting return.  Just like the best baseball shortstops and center fielders!

A few things to remember about using analytics to predict what process steps need to be executed and in what order, and which steps can be reduced or skipped given a reasonable increase in risk:

  • The “By Analysis” will fuel creative thinking about additional data sources that are collected (somewhere) but previously considered unusable such as technician comments, technician notes, work orders, product specifications, etc. It is important that to remember that all ideas are worthy of consideration.  Let the data science team determine which variables and metrics are actually worthy of inclusion in the score or model.
  • Through the use of instrumentation or tagging of each step, click and action taken by someone in the process, organizations can start to capture even more detailed and granular data about the testing or application processes. Remember:  more detailed data cannot hurt!
  • Another challenge is the binary nature of the results – pass/fail/retest. Instead of just three states, start contemplating capturing the results along a continuum (“the test was 96.5% successful” versus “the test was successful”).  That additional granularity in the results could prove invaluable in building and fine-tuning your scores and analytic models.
  • Be sure to consider Type I and Type II errors in the planning process to determine criticality and importance of each component or applications. Not all components deserve or require the same level of testing.  For example, the components that keep the cushion of the seat of an airplane in place are not nearly as important as the components that keep the engine on the airplane wing in place (see blog “Understanding Type I and Type II Errors” for more on Type I and Type II Errors).

Using analytics to predict what components or applications need to be tested versus using analytics to measure process effectiveness can provide a magnitude improvement in your key business processes.  In the long-term, it’s the analytics emitted from your key business processes (yielding superior customer, product, operational and market insights) that will differentiate your business.

The post Do Your Big Data Analytics Measure Or Predict? appeared first on InFocus.

More Stories By William Schmarzo

Bill Schmarzo, author of “Big Data: Understanding How Data Powers Big Business”, is responsible for setting the strategy and defining the Big Data service line offerings and capabilities for the EMC Global Services organization. As part of Bill’s CTO charter, he is responsible for working with organizations to help them identify where and how to start their big data journeys. He’s written several white papers, avid blogger and is a frequent speaker on the use of Big Data and advanced analytics to power organization’s key business initiatives. He also teaches the “Big Data MBA” at the University of San Francisco School of Management.

Bill has nearly three decades of experience in data warehousing, BI and analytics. Bill authored EMC’s Vision Workshop methodology that links an organization’s strategic business initiatives with their supporting data and analytic requirements, and co-authored with Ralph Kimball a series of articles on analytic applications. Bill has served on The Data Warehouse Institute’s faculty as the head of the analytic applications curriculum.

Previously, Bill was the Vice President of Advertiser Analytics at Yahoo and the Vice President of Analytic Applications at Business Objects.

Latest Stories
DevOps is often described as a combination of technology and culture. Without both, DevOps isn't complete. However, applying the culture to outdated technology is a recipe for disaster; as response times grow and connections between teams are delayed by technology, the culture will die. A Nutanix Enterprise Cloud has many benefits that provide the needed base for a true DevOps paradigm.
For financial firms, the cloud is going to increasingly become a crucial part of dealing with customers over the next five years and beyond, particularly with the growing use and acceptance of virtual currencies. There are new data storage paradigms on the horizon that will deliver secure solutions for storing and moving sensitive financial data around the world without touching terrestrial networks. In his session at 20th Cloud Expo, Cliff Beek, President of Cloud Constellation Corporation, w...
SYS-CON Events announced today that Outscale, a global pure play Infrastructure as a Service provider and strategic partner of Dassault Systèmes, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Founded in 2010, Outscale simplifies infrastructure complexities and boosts the business agility of its customers. Outscale delivers a secure, reliable and industrial strength solution for its customers, which in...
Cloud applications are seeing a deluge of requests to support the exploding advanced analytics market. “Open analytics” is the emerging strategy to deliver that data through an open data access layer, in the cloud, to be directly consumed by external analytics tools and popular programming languages. An increasing number of data engineers and data scientists use a variety of platforms and advanced analytics languages such as SAS, R, Python and Java, as well as frameworks such as Hadoop and Spark...
SYS-CON Events announced today that Progress, a global leader in application development, has been named “Bronze Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Enterprises today are rapidly adopting the cloud, while continuing to retain business-critical/sensitive data inside the firewall. This is creating two separate data silos – one inside the firewall and the other outside the firewall. Cloud ISVs ofte...
In his session at 20th Cloud Expo, Brad Winett, Senior Technologist for DDN Storage, will present several current, end-user environments that are using object storage at scale for cloud deployments including private cloud and cloud providers. Details on the top considerations of features and functions for selecting object storage will be included. Brad will also touch on recent developments in tiering technologies that deliver single solution and an end-user view of data across files and objects...
SYS-CON Events announced today that Tintri, Inc, a leading provider of enterprise cloud infrastructure, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Tintri offers an enterprise cloud platform built with public cloud-like web services and RESTful APIs. Organizations use Tintri all-flash storage with scale-out and automation as a foundation for their own clouds – to build agile development environments...
In his opening keynote at 20th Cloud Expo, Michael Maximilien, Research Scientist, Architect, and Engineer at IBM, will motivate why realizing the full potential of the cloud and social data requires artificial intelligence. By mixing Cloud Foundry and the rich set of Watson services, IBM's Bluemix is the best cloud operating system for enterprises today, providing rapid development and deployment of applications that can take advantage of the rich catalog of Watson services to help drive insigh...
SYS-CON Events announced today that DivvyCloud will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. DivvyCloud software enables organizations to achieve their cloud computing goals by simplifying and automating security, compliance and cost optimization of public and private cloud infrastructure. Using DivvyCloud, customers can leverage programmatic Bots to identify and remediate common cloud problems in rea...
While some vendors scramble to create and sell you a fancy solution for monitoring your spanking new Amazon Lambdas, hear how you can do it on the cheap using just built-in Java APIs yourself. By exploiting a little-known fact that Lambdas aren’t exactly single threaded, you can effectively identify hot spots in your serverless code. In his session at 20th Cloud Expo, David Martin, Principal Product Owner at CA Technologies, will give a live demonstration and code walkthrough, showing how to ov...
The 21st International Cloud Expo has announced that its Call for Papers is open. Cloud Expo, to be held October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, brings together Cloud Computing, Big Data, Internet of Things, DevOps, Digital Transformation, Machine Learning and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding busin...
A strange thing is happening along the way to the Internet of Things, namely far too many devices to work with and manage. It has become clear that we'll need much higher efficiency user experiences that can allow us to more easily and scalably work with the thousands of devices that will soon be in each of our lives. Enter the conversational interface revolution, combining bots we can literally talk with, gesture to, and even direct with our thoughts, with embedded artificial intelligence, whic...
SYS-CON Events announced today that Cloudistics, an on-premises cloud computing company, has been named “Bronze Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Cloudistics delivers a complete public cloud experience with composable on-premises infrastructures to medium and large enterprises. Its software-defined technology natively converges network, storage, compute, virtualization, and management into a ...
Every successful software product evolves from an idea to an enterprise system. Notably, the same way is passed by the product owner's company. In his session at 20th Cloud Expo, Oleg Lola, CEO of MobiDev, will provide a generalized overview of the evolution of a software product, the product owner, the needs that arise at various stages of this process, and the value brought by a software development partner to the product owner as a response to these needs.
As DevOps methodologies expand their reach across the enterprise, organizations face the daunting challenge of adapting related cloud strategies to ensure optimal alignment, from managing complexity to ensuring proper governance. How can culture, automation, legacy apps and even budget be reexamined to enable this ongoing shift within the modern software factory?