Blog Feed Post

Refined Thinking like a Data Scientist Series

We don’t need “citizen data scientists”; we need “citizens of data science.”

I’ve written several blogs and conducted numerous student and executive training sessions associated with getting business stakeholders to “think like a data scientist.” We are not trying to turn the business stakeholders into data scientists. Instead we want to train the business stakeholders in “thinking like a data scientist” – to become citizens of data science – by understanding where and how data science can impact their business in order to accelerate organizational adoption.

The Thinking like a Data Scientist process has evolved over time. So I’m going to use this blog as an opportunity to document the refined (and hopefully simplified) process.

The business stakeholder objectives for the “Thinking like a Data Scientist” methodology is:

  • Identify the right decisions to make, predictions to create, and hypotheses to test
  • Evolve from descriptive questions about what happened, to predictive questions about what is likely to happen and prescriptive questions about what actions to take
  • Brainstorm different variables and metrics (data sources) that might yield better predictors of business performance
  • Blend metrics and variables to create actionable scores
  • Identify where and how analytics can optimize key business and operational processes, reduce compliance and security risks, optimize product performance, uncover new business opportunities and create a more compelling user engagement

The “Thinking like a Data Scientist” methodology has evolved as we’ve applied it across client engagements, and have learned what works and what doesn’t work.  So I will use this blog to update the methodology and supporting materials (see flow below).

Thinking Like a Data Scientist series_Figure1

I’ll also use this blog as an opportunity to pull together all the “Thinking like a Data Scientist” blogs into a single location. Besides, pulling all of these blogs in a single blog makes it easier for me when assigning reading to my University of San Francisco business students.

Some Classroom Prerequisites

Before we dive into the methodology, let’s start by defining data science:

Data science is about identifying variables and metrics that might be better predictors of performance.

It is the word “might” that is key to the “Thinking like a Data Scientist” process. The word “might” is a license to be wrong, a license to think outside the box when trying to identify variables and metrics that might be better predictors of performance.

In order to create the “right” analytic models, the Data Science team will embrace a highly iterative, “fail fast / learn faster” environment. The data science team will test different variables, different data transformations, different data enrichments and different analytic algorithms until they have failed enough times to feel “comfortable” with the variables that have been selected. See the blog “Demystifying Data Science” to better understand the role of “might” in the data science process.

Step 1: Identify Target Business Initiative

If you want your data science effort to be relevant and meaningful to the business, start with a key business initiative. A key business initiative is characterized as:

  • Critical to immediate-term business performance (12 to 18 months)
  • Documented (either internally or publicly)
  • Cross-functional (involves more than one business function)
  • Owned and/or championed by a senior business executive
  • Has a measurable financial or Return on Investment goal
  • Has a defined delivery timeframe (9 to 12 months)

Examples of key business initiatives could include:

  • Improving customer retention by 10% may be worth $25M over the next 12 months
  • Reducing obsolete and excess inventory by 10% may be worth $45M over the next 12 months
  • Improving on-time deliveries by 5% may be worth $85M over the next 12 months
  • Reducing unplanned network down-time by 5% may be worth $70M over the next 12 months

These key business initiatives can be found in annual reports, analyst briefings, executive conference presentations, press releases, or maybe just ask your executives what are the organization’s most important business initiatives over the next 12 to 18 months (see Figure 1).

Figure 2: Chipotle’s Annual Report and Their Key Business Initiatives

Figure 1: Chipotle’s Annual Report and Their Key Business Initiatives

Step 2:  Identify Business Stakeholders

Step 2 identifies the business stakeholders (and constituents) are those functions that either impact or are impacted by the targeted business initiative. These stakeholders and constituents are the targets for your “Thinking like a Data Scientist” training as they have the domain knowledge necessary to improve analytic model effectiveness and drive organizational adoption (see Figure 2).

Figure 2: Identify Business Stakeholders or Constituents

Figure 2: Identify Business Stakeholders or Constituents

Ideally for each stakeholder or constituent, you would create a single-slide persona that outlines that stakeholder’s or constituent’s roles, responsibilities, decisions and pain points (see Figure 3).

Figure 3: Business Stakeholder Persona

Figure 3: Business Stakeholder Persona

Step 3:  Identify Business Entities

Step 3 identifies the business entities (sometimes called “strategic nouns”) around which we will create and capture analytic insights. Business entities include customers, patients, students, physicians, store managers, engineers, and agents. But business entities can also include “things” such as jet engines, wind turbines, trucks, cars, medical devices and even buildings (see Figure 4).

Figure 4: Identify Key Business Entities (or Strategic Nouns)

Figure 4: Identify Key Business Entities (or Strategic Nouns)


Ideally the data science team will create an analytic profile for each individual business entity to help in the capture, refinement and re-use of the organization’s analytic insights. Analytic Profiles capture the organization’s analytic assets in a way that facilities the refinement and sharing of those analytic assets across multiple use cases (see Figure 5).

Figure 5: Analytic Profiles

Figure 5: Analytic Profiles


An Analytic Profile consists of metrics, predictive indicators, segments, scores, and business rules that codify the behaviors, preferences, propensities, inclinations, tendencies, interests, associations and affiliations for the organization’s key business entities. See the blog “Analytic Profiles: Key to Data Monetization” for more details on the workings of an Analytic Profile.

Step 4:  Brainstorm Data Sources

Step 4 is focused on leveraging the domain expertise of the business stakeholders to identify those variables and metrics (data sources) that might be better predictors of performance.

To facilitate the brainstorming of data sources, we will take the business stakeholders through an exercise to convert some of their descriptive questions into predictive questions that support the targeted business initiative. That is, we will transition the stakeholders from asking descriptive questions about what happened, to ask predictive questions about what is likely to happen. Figure 6 shows an example of the “descriptive to predictive” questions conversion.

Figure 6: Converting Descriptive Questions to Predictive Questions

Figure 6: Converting Descriptive Questions to Predictive Questions


We then take a couple of the most important predictive questions and add the following phrase to that predictive question in order to facilitate the data source brainstorming process: “…and what data sources might we need to make that prediction?

For example:

  • What will revenues be next month and what data sources might we need to make that prediction?
  • How many new customers are we likely to acquire next quarter and what data sources might we need to make that prediction?

Then ask the stakeholders to work in small groups to identify and capture the potential data sources on Post It notes (one variable or data source per Post It note). We then bring all the stakeholders back together to create an aggregated list of potential variables and metrics (data sources) that the data science team might want to test (see Figure 7).

Figure 7: Brainstorming Data Sources (Variables and Metrics)

Figure 7: Brainstorming Data Sources (Variables and Metrics)


After brainstorming the data sources, then the business stakeholders rank the data sources for each use case based upon that data source’s likely predictive value to that use case (we use a range of 1 to 5 in Figure 14). While this process is highly subjective, it’s surprising how accurate the business stakeholders will be in judging what data sources are likely to be the most relevant (see the final result in Figure 8).

Figure 8: Ranking Data Sources vis-à-vis Use Cases

Figure 8: Ranking Data Sources vis-à-vis Use Cases

Step 5:  Capture and Prioritize Analytic Use Cases

Step 5 brainstorms the decisions necessary to support the targeted business initiative, groups the decisions into similar clusters (use cases), and then prioritizes the use cases based upon business value and implementation feasibility over the next 12 to 18 months.

The decisions are gathered via a series of interviews and facilitated brainstorming sessions with the business stakeholders and constituents (see Figure 9).

Figure 9: Brainstorm Decisions by Stakeholder or Key Constituent

Figure 9: Brainstorm Decisions by Stakeholder or Key Constituent

NOTE: During the facilitated brainstorming sessions, it is critical to remember facilitation rule #1:  All ideas are worthy of consideration!

Next via a facilitated group exercise with the key business stakeholders and constituents, the decisions are grouped together in similar subject areas (see Figure 10).

Figure 10: Group Decisions into Common Subject Areas

Figure 10: Group Decisions into Common Subject Areas

NOTE:  During this facilitated grouping exercise, there will be much discussion to clarify the decisions and the grouping of those decisions into similar use cases.  Capture these conversations, as the insights from these conversations might be instrumental in the data science execution process.

Finally, you want to prioritize use cases (on axis of business value and implementation feasibility) to create an Analytic Use Case Roadmap (see Figure 11).

Figure 11: Prioritize Analytics Use Cases

Figure 11: Prioritize Analytics Use Cases

NOTE:  During the prioritization process, there will again be much discussion about why certain use cases are positioned vis-à-vis other user cases from both a business value and implementation feasibility perspective.  Capture these conversations as they might yield critical insights that impact the ultimate funding of the data science project.

Step 6: Identify Potential Analytic Scores

Step 7 focuses on grouping variables and metrics into similar clusters that the data science team can then explore as the basis for creating analytic “scores” or recommendations. Scores are analytic models comprised of a variety of weighted variables that can be used to support key operational decisions. Maybe the most familiar score is the FICO score, which combines a variety of weighted metrics about a loan applicant’s financial and credit history in order to create a single value, or score, that lenders use to determine a borrower’s likelihood to repay a loan.

For our example, we can start to see two groupings of variables around two potential scores: “Local Economic Potential” and “Local Vitality” (see Figure 12).

Figure 11: Prioritize Analytics Use Cases

Figure 12: Grouping Variables into Potential Scores

Scores are critical components in the “Thinking Like a Data Scientist” process.  They are the collaboration point between the business stakeholders and the data science team in developing analytics to support the decisions and the key business initiative.  Scores support the key operational decisions that the business stakeholders making in support of our targeted business initiative.

Step 7:  Identify Recommendations

Step 8 ties everything together: the scores that support the recommendations to the key operational decisions that support our business initiative. The worksheet in Step 8 is best created in collaboration with business stakeholders (who understand the decisions and can envision the potential recommendations) and the data science team (who understand how to convert the scores into analytic models). See Figure 13.

Figure 13: Linking Decisions to Recommendations to Analytic Scores

Figure 13: Linking Decisions to Recommendations to Analytic Scores

Thinking like a Data Scientist Summary

I expect that this process will continue to evolve as we execute more data science projects and collaborate with the business stakeholders to ensure that the data and the data science work is delivering quantifiable and measurable business value.

As they famously say: Watch this space!

Additional Sources:

“Thinking like a Data Scientist Part I: Understanding Where to Start”

“Thinking like a Data Scientist Part II: Predicting Business Performance”

“Thinking like a Data Scientist Part III: The Role of Scores”

“Thinking like a Data Scientist Part IV: Attitudinal Approach”

“The ‘Thinking’ Part of “Thinking like a Data Scientist”

“Data Science: Identifying Variables that Might Be Better Predictors”

The post Refined Thinking like a Data Scientist Series appeared first on InFocus Blog | Dell EMC Services.

Read the original blog entry...

More Stories By William Schmarzo

Bill Schmarzo, author of “Big Data: Understanding How Data Powers Big Business” and “Big Data MBA: Driving Business Strategies with Data Science”, is responsible for setting strategy and defining the Big Data service offerings for Hitachi Vantara as CTO, IoT and Analytics.

Previously, as a CTO within Dell EMC’s 2,000+ person consulting organization, he works with organizations to identify where and how to start their big data journeys. He’s written white papers, is an avid blogger and is a frequent speaker on the use of Big Data and data science to power an organization’s key business initiatives. He is a University of San Francisco School of Management (SOM) Executive Fellow where he teaches the “Big Data MBA” course. Bill also just completed a research paper on “Determining The Economic Value of Data”. Onalytica recently ranked Bill as #4 Big Data Influencer worldwide.

Bill has over three decades of experience in data warehousing, BI and analytics. Bill authored the Vision Workshop methodology that links an organization’s strategic business initiatives with their supporting data and analytic requirements. Bill serves on the City of San Jose’s Technology Innovation Board, and on the faculties of The Data Warehouse Institute and Strata.

Previously, Bill was vice president of Analytics at Yahoo where he was responsible for the development of Yahoo’s Advertiser and Website analytics products, including the delivery of “actionable insights” through a holistic user experience. Before that, Bill oversaw the Analytic Applications business unit at Business Objects, including the development, marketing and sales of their industry-defining analytic applications.

Bill holds a Masters Business Administration from University of Iowa and a Bachelor of Science degree in Mathematics, Computer Science and Business Administration from Coe College.

Latest Stories
This session will provide an introduction to Cloud driven quality and transformation and highlight the key features that comprise it. A perspective on the cloud transformation lifecycle, transformation levers, and transformation framework will be shared. At Cognizant, we have developed a transformation strategy to enable the migration of business critical workloads to cloud environments. The strategy encompasses a set of transformation levers across the cloud transformation lifecycle to enhance ...
Your job is mostly boring. Many of the IT operations tasks you perform on a day-to-day basis are repetitive and dull. Utilizing automation can improve your work life, automating away the drudgery and embracing the passion for technology that got you started in the first place. In this presentation, I'll talk about what automation is, and how to approach implementing it in the context of IT Operations. Ned will discuss keys to success in the long term and include practical real-world examples. Ge...
The challenges of aggregating data from consumer-oriented devices, such as wearable technologies and smart thermostats, are fairly well-understood. However, there are a new set of challenges for IoT devices that generate megabytes or gigabytes of data per second. Certainly, the infrastructure will have to change, as those volumes of data will likely overwhelm the available bandwidth for aggregating the data into a central repository. Ochandarena discusses a whole new way to think about your next...
So the dumpster is on fire. Again. The site's down. Your boss's face is an ever-deepening purple. And you begin debating whether you should join the #incident channel or call an ambulance to deal with his impending stroke. Yes, we know this is a developer's fault. There's plenty of time for blame later. Postmortems have a macabre name because they were once intended to be Viking-like funerals for someone's job. But we're civilized now. Sort of. So we call them post-incident reviews. Fires are ne...
Whenever a new technology hits the high points of hype, everyone starts talking about it like it will solve all their business problems. Blockchain is one of those technologies. According to Gartner's latest report on the hype cycle of emerging technologies, blockchain has just passed the peak of their hype cycle curve. If you read the news articles about it, one would think it has taken over the technology world. No disruptive technology is without its challenges and potential impediments t...
Hackers took three days to identify and exploit a known vulnerability in Equifax’s web applications. I will share new data that reveals why three days (at most) is the new normal for DevSecOps teams to move new business /security requirements from design into production. This session aims to enlighten DevOps teams, security and development professionals by sharing results from the 4th annual State of the Software Supply Chain Report -- a blend of public and proprietary data with expert researc...
CloudEXPO New York 2018, colocated with DevOpsSUMMIT and DXWorldEXPO New York 2018 will be held November 12-13, 2018, in New York City and will bring together Cloud Computing, FinTech and Blockchain, Digital Transformation, Big Data, Internet of Things, DevOps, AI and Machine Learning to one location.
CloudEXPO | DevOpsSUMMIT | DXWorldEXPO are the world's most influential, independent events where Cloud Computing was coined and where technology buyers and vendors meet to experience and discuss the big picture of Digital Transformation and all of the strategies, tactics, and tools they need to realize their goals. Sponsors of DXWorldEXPO | CloudEXPO benefit from unmatched branding, profile building and lead generation opportunities.
DXWorldEXPO LLC announced today that Nutanix has been named "Platinum Sponsor" of CloudEXPO | DevOpsSUMMIT | DXWorldEXPO New York, which will take place November 12-13, 2018 in New York City. Nutanix makes infrastructure invisible, elevating IT to focus on the applications and services that power their business. The Nutanix Enterprise Cloud Platform blends web-scale engineering and consumer-grade design to natively converge server, storage, virtualization and networking into a resilient, softwar...
The digital transformation is real! To adapt, IT professionals need to transform their own skillset to become more multi-dimensional by gaining both depth and breadth of a wide variety of knowledge and competencies. Historically, while IT has been built on a foundation of specialty (or "I" shaped) silos, the DevOps principle of "shifting left" is opening up opportunities for developers, operational staff, security and others to grow their skills portfolio, advance their careers and become "T"-sh...
Lori MacVittie is a subject matter expert on emerging technology responsible for outbound evangelism across F5's entire product suite. MacVittie has extensive development and technical architecture experience in both high-tech and enterprise organizations, in addition to network and systems administration expertise. Prior to joining F5, MacVittie was an award-winning technology editor at Network Computing Magazine where she evaluated and tested application-focused technologies including app secu...
DXWorldEXPO LLC announced today that Big Data Federation to Exhibit at the 22nd International CloudEXPO, colocated with DevOpsSUMMIT and DXWorldEXPO, November 12-13, 2018 in New York City. Big Data Federation, Inc. develops and applies artificial intelligence to predict financial and economic events that matter. The company uncovers patterns and precise drivers of performance and outcomes with the aid of machine-learning algorithms, big data, and fundamental analysis. Their products are deployed...
ICC is a computer systems integrator and server manufacturing company focused on developing products and product appliances to meet a wide range of computational needs for many industries. Their solutions provide benefits across many environments, such as datacenter deployment, HPC, workstations, storage networks and standalone server installations. ICC has been in business for over 23 years and their phenomenal range of clients include multinational corporations, universities, and small busines...
This sixteen (16) hour course provides an introduction to DevOps, the cultural and professional movement that stresses communication, collaboration, integration and automation in order to improve the flow of work between software developers and IT operations professionals. Improved workflows will result in an improved ability to design, develop, deploy and operate software and services faster.
Headquartered in Plainsboro, NJ, Synametrics Technologies has provided IT professionals and computer systems developers since 1997. Based on the success of their initial product offerings (WinSQL and DeltaCopy), the company continues to create and hone innovative products that help its customers get more from their computer applications, databases and infrastructure. To date, over one million users around the world have chosen Synametrics solutions to help power their accelerated business or per...