Welcome!

Related Topics: @BigDataExpo, @CloudExpo, @ThingsExpo

@BigDataExpo: Blog Feed Post

Your #BigData Analytics | @BigDataExpo #IoT #M2M #BigData #BI #Analytics

Organizations have key business processes that they are constantly trying to re-engineer

Organizations have key business processes that they are constantly trying to re-engineer. These key business processes – loan approvals, college applications, mortgage underwriting, product and component testing, credit applications, medical reviews, employee hiring, environmental testing, requests for proposals, contract bidding, etc. – go through multiple steps, usually involving multiple people with different skill sets, with a business outcome at the end (accept/reject, bid/no bid, pass/fail, retest, reapply, etc.). And while these processes typically include “analytics” that report on how well the processes worked (process effectiveness), the analytics only provide an “after the fact” view on what happened.

Instead of using analytics to measure how well the process worked, how about using predictive and prescriptive analytics to actually direct the process at the beginning?  Instead of analytics that tell you what happened, how about creating analytics at the front of the process that predict what steps in the process are necessary and in what order?  Sometimes the most effective process is the process that you don’t need to execute, or only have to execute in part.

High-Tech Manufacturing Testing Example
We had an engagement with a high-tech manufacturer to help them to leverage analytics and the Internet of Things to optimize their 22-step product and component testing process.  Not only was there a significant amount of capital tied up with their in-process inventory, but the lengthy testing processes also created concerns about excessive and obsolete inventory in an industry where product changes happened constantly.

The manufacturer had lots of data that was coming off of the testing processes, but the data was being used after the fact to tell them where the testing was successful or not.  Instead, our approach was to use that data to predict what tests needed to be run for which components coming from which of the suppliers manufacturing facilities.  Instead of measuring what happened and identifying waste and inefficiencies after the fact, the manufacturer wanted to predict the likely quality of the component (given the extensive amount of data that they could be capturing but today was hitting the manufacturing and testing floors) and identify what tests where needed given that particular situation.  Think dynamic or even smart testing.

We worked with the client to identify all the data that was coming out of all the different testing processes.  We discovered that nearly 90% of the potential data was just “hitting the floor” because the organization did not have a method for capturing and subsequently analyzing this data (most of which was either very detailed log files, or comments and notes being generated by the testers, engineers and technicians during the testing processes).  At a conceptual level, their processing looked like Figure 1 with traditional dashboards and reports that answered the basic operational questions about how the process was working (see Figure 1).

Figure 1: Analytics That Tell You How Each Step Performed

However, we wanted to transform the role of analytics to not just reporting how the processes was working, but we wanted to employ predictive analytics to create a score for each component and then use prescriptive analytics to recommend what tests had to be run and in what order given the results of the predictive score.

We used a technique called the “By Analysis” to brainstorm the “variables and metrics that might be better predictors of testing performance”.  For example, when examining components with a high failure rate, we started the brainstorming process with the following question:

“Show me the percentage of component failures by…”

We asked the workshop participants to brainstorm the “by” variables and metrics that we might want to test.  Here are some of the results:

  • Show me the percentage of component failures by Tester/Technician (age, gender, years of experience, historical performance ratings, most recent training date, most recent testing certifications, months of experience by test machine, level of job satisfaction, years until retirement, etc.)
  • Show me the percentage of component failures by Test Machine (manufacturer, model, install date, last service date, ideal capacity, maximum capacity, test machine operational complexity, etc.)
  • Show me the percentage of component failures by Component (component type, supplier, supplier manufacturing facility, supplier manufacturing machine, supplier component testing results, etc.)
  • Show me the percentage of component failures by Manufacturer (years of service, historical performance, location, distance from distribution center, manufacturing location, lot number, storage location, etc.)
  • Show me the percentage of component failures by Distribution Center (location, build date, last remodel date, local temperature, local humidity, local economic conditions, etc.)
  • Show me the percentage of component failures by Time (Time of day/Day of Week, Time of year, local holidays, seasonality, etc.)
  • Show me the percentage of component failures by Weather (percipation, temperature, humidity, airborne particles, pollution, severe storms, etc.).
  • Show me the percentage of component failures by Labor unrest (strikes, number of plant injuries, safety issues, etc.)
  • Show me the percentage of component failures by Local economic conditions (average hourly wages, paid overtime, union representation, average hours worked per week, etc.)

The data science team started building a score that could be used to predict the quality and reliability of a component from a particular supplier created from a particular manufacturing machine at a particular time of day/week/year under particular weather conditions tested by a particular technician, etc.  Yea, I think you can quickly see how the more detailed data that you have, the more accurate the score.

We were able to create this “Component Quality & Reliability” score that we could use prior to the testing process to tell us what tests we needed to conduct and in what order with a reasonable level of risk (see Figure 2).

Figure 2:  Analytics That Prescribe What Tests to Run

By using the Component Quality & Reliability score, we could determine or predict ahead of time what tests we thought we would need to run and in what order.  The result was a dramatic improvement in the speed and cost of testing, with a minor but manageable increase in component performance risk.

Baseball Analogy
I love sports, particularly baseball.  Baseball has always been a game played with statistics, averages and probabilities.  There are lots of analytics best practices that we can learn from the game of baseball.  And one of the way that analytics is used in baseball is to determine the likelihood of your opponent doing something.

For example, the best baseball fielders understand each individual batter’s tendencies, propensities, averages, statistics and preferences (e.g., where the batter he is likely to hit the ball, what pitches he prefers to hit) and uses that information to position himself on the field in place where the ball is the most likely to be hit.

Then the fielder will make in-game, pitch-by-pitch adjustments based upon:

  • Field dimensions (distance to deep center, distance down the lines, height of the grass, grass or artificial turf, how best to play the Green Monster in Fenway or the vines in Wrigley, etc.)
  • Environment conditions (humidity, temperature, precipitation, wind, position of the sun or lights, etc.)
  • Game situation (number of outs, the inning, score, runners in scoring position, time of day/night, etc.)
  • Pitcher preferences and propensities and in-game effectiveness (getting ahead in the count, number of pitches thrown, current velocity, effectiveness of off speed pitches, etc.)
  • And even more…

The same approach – predicting ahead of time what is likely to happen – works for many business processes, such as underwriting a loan or mortgage.  We would want to learn as much as possible about the players involved in the underwriting process – borrower, property, lenders, appraisers, underwriters – so that we could build a “score” that predicts which steps in the process are important and necessary, and which ones could be skipped without significantly increasing the risk.  For example:

  • Which borrowers have a history of on-time payments, have a sufficient base of assets, have a solid job and salary outlook, don’t pose any retirement cash flow risks, have a reasonable number of dependents (children and potentially parents), etc.
  • Which properties are over-valued given the value of similar properties or properties within the same area, or which properties reside in a high storm or weather risk areas (probability or likelihood of hurricane, tornado, floods, forest fires, earthquakes, zodiac killers, etc.).
  • Which lenders have a history of good loans, have a solid financial foundation, have a solid management team, aren’t in the news for any management shenanigans, don’t have questionable investments, etc.
  • Which appraisers are most effective with what types and locations of properties for what types of loans in what economic situations, have a significant track record of success, have been with the same firm for a reasonable amount of time, have a solid educational background, etc.
  • Which underwriters are most effective with which types of loans for which types of properties, are happy with their job and family situation, have been on the job a reasonable amount of time, have solid performance ratings, don’t have any HR issues, etc.

With this information in hand, we’d be better prepared to know which types of credit applications need what level of scrutiny and what level of risks we would be willing to accept for what level of underwriting return.  Just like the best baseball shortstops and center fielders!

Summary
A few things to remember about using analytics to predict what process steps need to be executed and in what order, and which steps can be reduced or skipped given a reasonable increase in risk:

  • The “By Analysis” will fuel creative thinking about additional data sources that are collected (somewhere) but previously considered unusable such as technician comments, technician notes, work orders, product specifications, etc. It is important that to remember that all ideas are worthy of consideration.  Let the data science team determine which variables and metrics are actually worthy of inclusion in the score or model.
  • Through the use of instrumentation or tagging of each step, click and action taken by someone in the process, organizations can start to capture even more detailed and granular data about the testing or application processes. Remember:  more detailed data cannot hurt!
  • Another challenge is the binary nature of the results – pass/fail/retest. Instead of just three states, start contemplating capturing the results along a continuum (“the test was 96.5% successful” versus “the test was successful”).  That additional granularity in the results could prove invaluable in building and fine-tuning your scores and analytic models.
  • Be sure to consider Type I and Type II errors in the planning process to determine criticality and importance of each component or applications. Not all components deserve or require the same level of testing.  For example, the components that keep the cushion of the seat of an airplane in place are not nearly as important as the components that keep the engine on the airplane wing in place (see blog “Understanding Type I and Type II Errors” for more on Type I and Type II Errors).

Using analytics to predict what components or applications need to be tested versus using analytics to measure process effectiveness can provide a magnitude improvement in your key business processes.  In the long-term, it’s the analytics emitted from your key business processes (yielding superior customer, product, operational and market insights) that will differentiate your business.

The post Do Your Big Data Analytics Measure Or Predict? appeared first on InFocus.

More Stories By William Schmarzo

Bill Schmarzo, author of “Big Data: Understanding How Data Powers Big Business”, is responsible for setting the strategy and defining the Big Data service line offerings and capabilities for the EMC Global Services organization. As part of Bill’s CTO charter, he is responsible for working with organizations to help them identify where and how to start their big data journeys. He’s written several white papers, avid blogger and is a frequent speaker on the use of Big Data and advanced analytics to power organization’s key business initiatives. He also teaches the “Big Data MBA” at the University of San Francisco School of Management.

Bill has nearly three decades of experience in data warehousing, BI and analytics. Bill authored EMC’s Vision Workshop methodology that links an organization’s strategic business initiatives with their supporting data and analytic requirements, and co-authored with Ralph Kimball a series of articles on analytic applications. Bill has served on The Data Warehouse Institute’s faculty as the head of the analytic applications curriculum.

Previously, Bill was the Vice President of Advertiser Analytics at Yahoo and the Vice President of Analytic Applications at Business Objects.

Latest Stories
First generation hyperconverged solutions have taken the data center by storm, rapidly proliferating in pockets everywhere to provide further consolidation of floor space and workloads. These first generation solutions are not without challenges, however. In his session at 21st Cloud Expo, Wes Talbert, a Principal Architect and results-driven enterprise sales leader at NetApp, will discuss how the HCI solution of tomorrow will integrate with the public cloud to deliver a quality hybrid cloud e...
SYS-CON Events announced today that Yuasa System will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Yuasa System is introducing a multi-purpose endurance testing system for flexible displays, OLED devices, flexible substrates, flat cables, and films in smartphones, wearables, automobiles, and healthcare.
Companies are harnessing data in ways we once associated with science fiction. Analysts have access to a plethora of visualization and reporting tools, but considering the vast amount of data businesses collect and limitations of CPUs, end users are forced to design their structures and systems with limitations. Until now. As the cloud toolkit to analyze data has evolved, GPUs have stepped in to massively parallel SQL, visualization and machine learning.
The session is centered around the tracing of systems on cloud using technologies like ebpf. The goal is to talk about what this technology is all about and what purpose it serves. In his session at 21st Cloud Expo, Shashank Jain, Development Architect at SAP, will touch upon concepts of observability in the cloud and also some of the challenges we have. Generally most cloud-based monitoring tools capture details at a very granular level. To troubleshoot problems this might not be good enough.
DevOps is under attack because developers don’t want to mess with infrastructure. They will happily own their code into production, but want to use platforms instead of raw automation. That’s changing the landscape that we understand as DevOps with both architecture concepts (CloudNative) and process redefinition (SRE). Rob Hirschfeld’s recent work in Kubernetes operations has led to the conclusion that containers and related platforms have changed the way we should be thinking about DevOps and...
When it comes to cloud computing, the ability to turn massive amounts of compute cores on and off on demand sounds attractive to IT staff, who need to manage peaks and valleys in user activity. With cloud bursting, the majority of the data can stay on premises while tapping into compute from public cloud providers, reducing risk and minimizing need to move large files. In his session at 18th Cloud Expo, Scott Jeschonek, Director of Product Management at Avere Systems, discussed the IT and busine...
SYS-CON Events announced today that Dasher Technologies will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Dasher Technologies, Inc. ® is a premier IT solution provider that delivers expert technical resources along with trusted account executives to architect and deliver complete IT solutions and services to help our clients execute their goals, plans and objectives. Since 1999, we'v...
Enterprises have taken advantage of IoT to achieve important revenue and cost advantages. What is less apparent is how incumbent enterprises operating at scale have, following success with IoT, built analytic, operations management and software development capabilities – ranging from autonomous vehicles to manageable robotics installations. They have embraced these capabilities as if they were Silicon Valley startups. As a result, many firms employ new business models that place enormous impor...
We all know that end users experience the Internet primarily with mobile devices. From an app development perspective, we know that successfully responding to the needs of mobile customers depends on rapid DevOps – failing fast, in short, until the right solution evolves in your customers' relationship to your business. Whether you’re decomposing an SOA monolith, or developing a new application cloud natively, it’s not a question of using microservices – not doing so will be a path to eventual b...
SYS-CON Events announced today that Massive Networks, that helps your business operate seamlessly with fast, reliable, and secure internet and network solutions, has been named "Exhibitor" of SYS-CON's 21st International Cloud Expo ®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. As a premier telecommunications provider, Massive Networks is headquartered out of Louisville, Colorado. With years of experience under their belt, their team of...
SYS-CON Events announced today that TidalScale, a leading provider of systems and services, will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. TidalScale has been involved in shaping the computing landscape. They've designed, developed and deployed some of the most important and successful systems and services in the history of the computing industry - internet, Ethernet, operating s...
SYS-CON Events announced today that Taica will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Taica manufacturers Alpha-GEL brand silicone components and materials, which maintain outstanding performance over a wide temperature range -40C to +200C. For more information, visit http://www.taica.co.jp/english/.
SYS-CON Events announced today that MIRAI Inc. will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. MIRAI Inc. are IT consultants from the public sector whose mission is to solve social issues by technology and innovation and to create a meaningful future for people.
Transforming cloud-based data into a reportable format can be a very expensive, time-intensive and complex operation. As a SaaS platform with more than 30 million global users, Cornerstone OnDemand’s challenge was to create a scalable solution that would improve the time it took customers to access their user data. Our Real-Time Data Warehouse (RTDW) process vastly reduced data time-to-availability from 24 hours to just 10 minutes. In his session at 21st Cloud Expo, Mark Goldin, Chief Technolo...
SYS-CON Events announced today that IBM has been named “Diamond Sponsor” of SYS-CON's 21st Cloud Expo, which will take place on October 31 through November 2nd 2017 at the Santa Clara Convention Center in Santa Clara, California.