Welcome!

Blog Feed Post

Refined Thinking like a Data Scientist Series

We don’t need “citizen data scientists”; we need “citizens of data science.”

I’ve written several blogs and conducted numerous student and executive training sessions associated with getting business stakeholders to “think like a data scientist.” We are not trying to turn the business stakeholders into data scientists. Instead we want to train the business stakeholders in “thinking like a data scientist” – to become citizens of data science – by understanding where and how data science can impact their business in order to accelerate organizational adoption.

The Thinking like a Data Scientist process has evolved over time. So I’m going to use this blog as an opportunity to document the refined (and hopefully simplified) process.

The business stakeholder objectives for the “Thinking like a Data Scientist” methodology is:

  • Identify the right decisions to make, predictions to create, and hypotheses to test
  • Evolve from descriptive questions about what happened, to predictive questions about what is likely to happen and prescriptive questions about what actions to take
  • Brainstorm different variables and metrics (data sources) that might yield better predictors of business performance
  • Blend metrics and variables to create actionable scores
  • Identify where and how analytics can optimize key business and operational processes, reduce compliance and security risks, optimize product performance, uncover new business opportunities and create a more compelling user engagement

The “Thinking like a Data Scientist” methodology has evolved as we’ve applied it across client engagements, and have learned what works and what doesn’t work.  So I will use this blog to update the methodology and supporting materials (see flow below).

Thinking Like a Data Scientist series_Figure1

I’ll also use this blog as an opportunity to pull together all the “Thinking like a Data Scientist” blogs into a single location. Besides, pulling all of these blogs in a single blog makes it easier for me when assigning reading to my University of San Francisco business students.

Some Classroom Prerequisites

Before we dive into the methodology, let’s start by defining data science:

Data science is about identifying variables and metrics that might be better predictors of performance.

It is the word “might” that is key to the “Thinking like a Data Scientist” process. The word “might” is a license to be wrong, a license to think outside the box when trying to identify variables and metrics that might be better predictors of performance.

In order to create the “right” analytic models, the Data Science team will embrace a highly iterative, “fail fast / learn faster” environment. The data science team will test different variables, different data transformations, different data enrichments and different analytic algorithms until they have failed enough times to feel “comfortable” with the variables that have been selected. See the blog “Demystifying Data Science” to better understand the role of “might” in the data science process.

Step 1: Identify Target Business Initiative

If you want your data science effort to be relevant and meaningful to the business, start with a key business initiative. A key business initiative is characterized as:

  • Critical to immediate-term business performance (12 to 18 months)
  • Documented (either internally or publicly)
  • Cross-functional (involves more than one business function)
  • Owned and/or championed by a senior business executive
  • Has a measurable financial or Return on Investment goal
  • Has a defined delivery timeframe (9 to 12 months)

Examples of key business initiatives could include:

  • Improving customer retention by 10% may be worth $25M over the next 12 months
  • Reducing obsolete and excess inventory by 10% may be worth $45M over the next 12 months
  • Improving on-time deliveries by 5% may be worth $85M over the next 12 months
  • Reducing unplanned network down-time by 5% may be worth $70M over the next 12 months

These key business initiatives can be found in annual reports, analyst briefings, executive conference presentations, press releases, or maybe just ask your executives what are the organization’s most important business initiatives over the next 12 to 18 months (see Figure 1).

Figure 2: Chipotle’s Annual Report and Their Key Business Initiatives

Figure 1: Chipotle’s Annual Report and Their Key Business Initiatives

Step 2:  Identify Business Stakeholders

Step 2 identifies the business stakeholders (and constituents) are those functions that either impact or are impacted by the targeted business initiative. These stakeholders and constituents are the targets for your “Thinking like a Data Scientist” training as they have the domain knowledge necessary to improve analytic model effectiveness and drive organizational adoption (see Figure 2).

Figure 2: Identify Business Stakeholders or Constituents

Figure 2: Identify Business Stakeholders or Constituents

Ideally for each stakeholder or constituent, you would create a single-slide persona that outlines that stakeholder’s or constituent’s roles, responsibilities, decisions and pain points (see Figure 3).

Figure 3: Business Stakeholder Persona

Figure 3: Business Stakeholder Persona

Step 3:  Identify Business Entities

Step 3 identifies the business entities (sometimes called “strategic nouns”) around which we will create and capture analytic insights. Business entities include customers, patients, students, physicians, store managers, engineers, and agents. But business entities can also include “things” such as jet engines, wind turbines, trucks, cars, medical devices and even buildings (see Figure 4).

Figure 4: Identify Key Business Entities (or Strategic Nouns)

Figure 4: Identify Key Business Entities (or Strategic Nouns)

 

Ideally the data science team will create an analytic profile for each individual business entity to help in the capture, refinement and re-use of the organization’s analytic insights. Analytic Profiles capture the organization’s analytic assets in a way that facilities the refinement and sharing of those analytic assets across multiple use cases (see Figure 5).

Figure 5: Analytic Profiles

Figure 5: Analytic Profiles

 

An Analytic Profile consists of metrics, predictive indicators, segments, scores, and business rules that codify the behaviors, preferences, propensities, inclinations, tendencies, interests, associations and affiliations for the organization’s key business entities. See the blog “Analytic Profiles: Key to Data Monetization” for more details on the workings of an Analytic Profile.

Step 4:  Brainstorm Data Sources

Step 4 is focused on leveraging the domain expertise of the business stakeholders to identify those variables and metrics (data sources) that might be better predictors of performance.

To facilitate the brainstorming of data sources, we will take the business stakeholders through an exercise to convert some of their descriptive questions into predictive questions that support the targeted business initiative. That is, we will transition the stakeholders from asking descriptive questions about what happened, to ask predictive questions about what is likely to happen. Figure 6 shows an example of the “descriptive to predictive” questions conversion.

Figure 6: Converting Descriptive Questions to Predictive Questions

Figure 6: Converting Descriptive Questions to Predictive Questions

 

We then take a couple of the most important predictive questions and add the following phrase to that predictive question in order to facilitate the data source brainstorming process: “…and what data sources might we need to make that prediction?

For example:

  • What will revenues be next month and what data sources might we need to make that prediction?
  • How many new customers are we likely to acquire next quarter and what data sources might we need to make that prediction?

Then ask the stakeholders to work in small groups to identify and capture the potential data sources on Post It notes (one variable or data source per Post It note). We then bring all the stakeholders back together to create an aggregated list of potential variables and metrics (data sources) that the data science team might want to test (see Figure 7).

Figure 7: Brainstorming Data Sources (Variables and Metrics)

Figure 7: Brainstorming Data Sources (Variables and Metrics)

 

After brainstorming the data sources, then the business stakeholders rank the data sources for each use case based upon that data source’s likely predictive value to that use case (we use a range of 1 to 5 in Figure 14). While this process is highly subjective, it’s surprising how accurate the business stakeholders will be in judging what data sources are likely to be the most relevant (see the final result in Figure 8).

Figure 8: Ranking Data Sources vis-à-vis Use Cases

Figure 8: Ranking Data Sources vis-à-vis Use Cases

Step 5:  Capture and Prioritize Analytic Use Cases

Step 5 brainstorms the decisions necessary to support the targeted business initiative, groups the decisions into similar clusters (use cases), and then prioritizes the use cases based upon business value and implementation feasibility over the next 12 to 18 months.

The decisions are gathered via a series of interviews and facilitated brainstorming sessions with the business stakeholders and constituents (see Figure 9).

Figure 9: Brainstorm Decisions by Stakeholder or Key Constituent

Figure 9: Brainstorm Decisions by Stakeholder or Key Constituent

NOTE: During the facilitated brainstorming sessions, it is critical to remember facilitation rule #1:  All ideas are worthy of consideration!

Next via a facilitated group exercise with the key business stakeholders and constituents, the decisions are grouped together in similar subject areas (see Figure 10).

Figure 10: Group Decisions into Common Subject Areas

Figure 10: Group Decisions into Common Subject Areas

NOTE:  During this facilitated grouping exercise, there will be much discussion to clarify the decisions and the grouping of those decisions into similar use cases.  Capture these conversations, as the insights from these conversations might be instrumental in the data science execution process.

Finally, you want to prioritize use cases (on axis of business value and implementation feasibility) to create an Analytic Use Case Roadmap (see Figure 11).

Figure 11: Prioritize Analytics Use Cases

Figure 11: Prioritize Analytics Use Cases

NOTE:  During the prioritization process, there will again be much discussion about why certain use cases are positioned vis-à-vis other user cases from both a business value and implementation feasibility perspective.  Capture these conversations as they might yield critical insights that impact the ultimate funding of the data science project.

Step 6: Identify Potential Analytic Scores

Step 7 focuses on grouping variables and metrics into similar clusters that the data science team can then explore as the basis for creating analytic “scores” or recommendations. Scores are analytic models comprised of a variety of weighted variables that can be used to support key operational decisions. Maybe the most familiar score is the FICO score, which combines a variety of weighted metrics about a loan applicant’s financial and credit history in order to create a single value, or score, that lenders use to determine a borrower’s likelihood to repay a loan.

For our example, we can start to see two groupings of variables around two potential scores: “Local Economic Potential” and “Local Vitality” (see Figure 12).

Figure 11: Prioritize Analytics Use Cases

Figure 12: Grouping Variables into Potential Scores

Scores are critical components in the “Thinking Like a Data Scientist” process.  They are the collaboration point between the business stakeholders and the data science team in developing analytics to support the decisions and the key business initiative.  Scores support the key operational decisions that the business stakeholders making in support of our targeted business initiative.

Step 7:  Identify Recommendations

Step 8 ties everything together: the scores that support the recommendations to the key operational decisions that support our business initiative. The worksheet in Step 8 is best created in collaboration with business stakeholders (who understand the decisions and can envision the potential recommendations) and the data science team (who understand how to convert the scores into analytic models). See Figure 13.

Figure 13: Linking Decisions to Recommendations to Analytic Scores

Figure 13: Linking Decisions to Recommendations to Analytic Scores

Thinking like a Data Scientist Summary

I expect that this process will continue to evolve as we execute more data science projects and collaborate with the business stakeholders to ensure that the data and the data science work is delivering quantifiable and measurable business value.

As they famously say: Watch this space!

Additional Sources:

“Thinking like a Data Scientist Part I: Understanding Where to Start”

“Thinking like a Data Scientist Part II: Predicting Business Performance”

“Thinking like a Data Scientist Part III: The Role of Scores”

“Thinking like a Data Scientist Part IV: Attitudinal Approach”

“The ‘Thinking’ Part of “Thinking like a Data Scientist”

“Data Science: Identifying Variables that Might Be Better Predictors”

The post Refined Thinking like a Data Scientist Series appeared first on InFocus Blog | Dell EMC Services.

Read the original blog entry...

More Stories By William Schmarzo

Bill Schmarzo, author of “Big Data: Understanding How Data Powers Big Business” and “Big Data MBA: Driving Business Strategies with Data Science”, is responsible for setting strategy and defining the Big Data service offerings for Dell EMC’s Big Data Practice. As a CTO within Dell EMC’s 2,000+ person consulting organization, he works with organizations to identify where and how to start their big data journeys. He’s written white papers, is an avid blogger and is a frequent speaker on the use of Big Data and data science to power an organization’s key business initiatives. He is a University of San Francisco School of Management (SOM) Executive Fellow where he teaches the “Big Data MBA” course. Bill also just completed a research paper on “Determining The Economic Value of Data”. Onalytica recently ranked Bill as #4 Big Data Influencer worldwide. Bill has over three decades of experience in data warehousing, BI and analytics. Bill authored the Vision Workshop methodology that links an organization’s strategic business initiatives with their supporting data and analytic requirements. Bill serves on the City of San Jose’s Technology Innovation Board, and on the faculties of The Data Warehouse Institute and Strata. Previously, Bill was vice president of Analytics at Yahoo where he was responsible for the development of Yahoo’s Advertiser and Website analytics products, including the delivery of “actionable insights” through a holistic user experience. Before that, Bill oversaw the Analytic Applications business unit at Business Objects, including the development, marketing and sales of their industry-defining analytic applications. Bill holds a Masters Business Administration from University of Iowa and a Bachelor of Science degree in Mathematics, Computer Science and Business Administration from Coe College.

Latest Stories
DX World EXPO, LLC, a Lighthouse Point, Florida-based startup trade show producer and the creator of "DXWorldEXPO® - Digital Transformation Conference & Expo" has announced its executive management team. The team is headed by Levent Selamoglu, who has been named CEO. "Now is the time for a truly global DX event, to bring together the leading minds from the technology world in a conversation about Digital Transformation," he said in making the announcement.
"Space Monkey by Vivent Smart Home is a product that is a distributed cloud-based edge storage network. Vivent Smart Home, our parent company, is a smart home provider that places a lot of hard drives across homes in North America," explained JT Olds, Director of Engineering, and Brandon Crowfeather, Product Manager, at Vivint Smart Home, in this SYS-CON.tv interview at @ThingsExpo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
SYS-CON Events announced today that Conference Guru has been named “Media Sponsor” of the 22nd International Cloud Expo, which will take place on June 5-7, 2018, at the Javits Center in New York, NY. A valuable conference experience generates new contacts, sales leads, potential strategic partners and potential investors; helps gather competitive intelligence and even provides inspiration for new products and services. Conference Guru works with conference organizers to pass great deals to gre...
DevOps is under attack because developers don’t want to mess with infrastructure. They will happily own their code into production, but want to use platforms instead of raw automation. That’s changing the landscape that we understand as DevOps with both architecture concepts (CloudNative) and process redefinition (SRE). Rob Hirschfeld’s recent work in Kubernetes operations has led to the conclusion that containers and related platforms have changed the way we should be thinking about DevOps and...
The Internet of Things will challenge the status quo of how IT and development organizations operate. Or will it? Certainly the fog layer of IoT requires special insights about data ontology, security and transactional integrity. But the developmental challenges are the same: People, Process and Platform. In his session at @ThingsExpo, Craig Sproule, CEO of Metavine, demonstrated how to move beyond today's coding paradigm and shared the must-have mindsets for removing complexity from the develop...
In his Opening Keynote at 21st Cloud Expo, John Considine, General Manager of IBM Cloud Infrastructure, led attendees through the exciting evolution of the cloud. He looked at this major disruption from the perspective of technology, business models, and what this means for enterprises of all sizes. John Considine is General Manager of Cloud Infrastructure Services at IBM. In that role he is responsible for leading IBM’s public cloud infrastructure including strategy, development, and offering m...
The next XaaS is CICDaaS. Why? Because CICD saves developers a huge amount of time. CD is an especially great option for projects that require multiple and frequent contributions to be integrated. But… securing CICD best practices is an emerging, essential, yet little understood practice for DevOps teams and their Cloud Service Providers. The only way to get CICD to work in a highly secure environment takes collaboration, patience and persistence. Building CICD in the cloud requires rigorous ar...
Companies are harnessing data in ways we once associated with science fiction. Analysts have access to a plethora of visualization and reporting tools, but considering the vast amount of data businesses collect and limitations of CPUs, end users are forced to design their structures and systems with limitations. Until now. As the cloud toolkit to analyze data has evolved, GPUs have stepped in to massively parallel SQL, visualization and machine learning.
"Evatronix provides design services to companies that need to integrate the IoT technology in their products but they don't necessarily have the expertise, knowledge and design team to do so," explained Adam Morawiec, VP of Business Development at Evatronix, in this SYS-CON.tv interview at @ThingsExpo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
To get the most out of their data, successful companies are not focusing on queries and data lakes, they are actively integrating analytics into their operations with a data-first application development approach. Real-time adjustments to improve revenues, reduce costs, or mitigate risk rely on applications that minimize latency on a variety of data sources. In his session at @BigDataExpo, Jack Norris, Senior Vice President, Data and Applications at MapR Technologies, reviewed best practices to ...
Widespread fragmentation is stalling the growth of the IIoT and making it difficult for partners to work together. The number of software platforms, apps, hardware and connectivity standards is creating paralysis among businesses that are afraid of being locked into a solution. EdgeX Foundry is unifying the community around a common IoT edge framework and an ecosystem of interoperable components.
"ZeroStack is a startup in Silicon Valley. We're solving a very interesting problem around bringing public cloud convenience with private cloud control for enterprises and mid-size companies," explained Kamesh Pemmaraju, VP of Product Management at ZeroStack, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Large industrial manufacturing organizations are adopting the agile principles of cloud software companies. The industrial manufacturing development process has not scaled over time. Now that design CAD teams are geographically distributed, centralizing their work is key. With large multi-gigabyte projects, outdated tools have stifled industrial team agility, time-to-market milestones, and impacted P&L stakeholders.
"Akvelon is a software development company and we also provide consultancy services to folks who are looking to scale or accelerate their engineering roadmaps," explained Jeremiah Mothersell, Marketing Manager at Akvelon, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Enterprises are adopting Kubernetes to accelerate the development and the delivery of cloud-native applications. However, sharing a Kubernetes cluster between members of the same team can be challenging. And, sharing clusters across multiple teams is even harder. Kubernetes offers several constructs to help implement segmentation and isolation. However, these primitives can be complex to understand and apply. As a result, it’s becoming common for enterprises to end up with several clusters. Thi...