|By William Schmarzo||
|January 15, 2017 10:00 AM EST||
The U.S. Presidential election is finally over. The protests are winding down, they’ve stopped burning cars in Oakland (for now), and the talks of California succession are waning. But I am struggling to return to “normal” because in this election, truth got hammered.
Many candidates treated opinions as “truth” and a large portion of the American public grabbed a hold of these “truths” as gospel. It may have been a good time to be in the “fact checking” business, but I’m not sure how effective even the fact checkers could be given the spontaneous nature of “opinions as facts” being thrown around, not to mention the people who create fake news intentionally.
So let’s play a game! Let’s call this game “Separate the Truth from the Myths.” Let’s see how you do.
- Bat Boy Sighted in NYC Subway (probably too expensive to get a condo in Manhattan)
- Obama Appoints Martian Ambassador (but the Senate will request Matt Damon since he’s already lived and farmed on Mars)
- Skynet is a Reality (Hey, even Iron Man showed up at the Senate to tell them so!)
- Ted Cruz Shot JFK (okay, so it actually was his dad, but accusing Ted Cruz is more funny)
All but one of these stories appeared in the highly credible “National Enquirer” or “Weekly World News.” That’s like buying a copy of the “Mad Magazine” (for you old timers) or reading “The Onion” (for you young whippersnappers) expecting the “truth” from these satirical publications (see Figure 1).
Figure 1: Real Headlines from “Less Than Credible” Sources
However the below stories in Figure 2 where plastered across social media sites as if they were the truth, and as you can see from the engagement numbers, lots of people took the time to read these “truths.”
Figure 2: Social Media Fake News and Number of Views
Data Science And Common Sense
As a data scientist, we need to know not to accept the “truth” without applying some common sense. For all the fancy training in neural networks, artificial intelligence and machine learning, it’s hard to replace “common sense” as a necessary data scientist characteristic. Let’s walk through an example of how a data scientist might approach one of the sensational stories that recently popped up on social media (see Figure 3).
Figure 3: The Guardian, September 26, 2016
OMG, murders are up 10.8% in the biggest percentage increase since 1971, according to a highly credible source like the FBI. It’s become the “Walking Dead” out there!
Sensational headlines grab attention and incite fear and dread. “Dirty Laundry” sells. But the problem with data at the aggregate level is that it:
- Distorts the real truth (or root cause) of what’s the problem, and
- It is not actionable
The above headline could lead to the conclusion that the current criminal and rehabilitation policies have failed and everything should be thrown out. But there are no details as to what aspects of these programs are broken and no triage of the root causes in order to explore what might be done to fix the problem. As a data scientist, one must demand the granular details so that we can turn the data into insights in order to make the information actionable, such as:
- Note: The homicide numbers were only available for 2015 since we are still in 2016, but for select cities, the numbers are only getting worse in 2016. For example through November 2016, Chicago has already had a 56 percent increase with 251 more murders in 2016 than 2015. http://www.chicagotribune.com/news/local/breaking/ct-chicago-violence-700-homicides-met-20161201-story.html
- Just ten large cities accounted for 524 additional murders, or ~33% of last year's 1532 murders nationwide (https://www.theguardian.com/us-news/2016/sep/30/us-murder-rate-chicago-fbi-data-police). Here is the breakdown from this article:
This is a good starting point. If we want to address the increase in murders, we need to drill into each individual murder (and attempted murder) in those 10 cities. We need to keep drilling into the granular details in order to identify those variables and metrics that might be predictors of murders and attempted murders.
For example, we could identify the specific blocks of these cities where the murders are occurring, or the time of day and day of week, or the time of the year, or any special events that occurred right before the murders, etc. We could explore other variables that might be indicative of an increase in murder (e.g., % of broken homes, % of children born out of wedlock, % of high school dropouts, % of drug addicts, unemployment rate among male adults, increase in graffiti).
Once we know those variables that are predictive of murders, then we have a focus as to where we can start fixing the problem, taking corrective actions such as adding more police or community outreach, reducing high school dropouts, increasing drug arrests, testing different programs and approaches, measuring program effectiveness, learning and improving. Now that’s thinking like a data scientist.
Data Scientist Lessons Learned
What are the lessons that we can take away from this “opinions as facts” syndrome?
- Common sense is critical. Don’t accept “truths” at face value. Demand more details in order to identify and quantify those variables and metrics that might be predictive or indicative of the researched problem.
- You can’t fix the business – or the country – without drilling into the details and the potential causal factors. We need insights that are drawn from facts that are supported by granular data so that we know what actions to take. With these detailed insights in hand, we now know where to invest our scarce financial and human resources.
- Details matter. At the aggregate level, the headlines may be sensational, but it is not insightful or actionable until you get into the details. Remember Simpson’s Paradox.
- Data quality, accuracy and reasonableness are important, especially if you are trying to make business-impactful decisions based upon that data. Business users, if they are expected to use the data to support decisions, must have confidence in the data. “Facts as Facts” are critical if we want to overcome decisions being made on a traditional basis such as gut, hearsay and history.
The good data scientist learns not to trust anything at first blush; that while opinions might yield variables and metrics that might be better predictors of performance, in the end the data scientists need to validate each of these variables and metrics to quantify if they really are better predictors of performance.
In the movie “Star Wars: The New Hope," the weak-minded Storm Troopers were easily dissuaded from pursuing the truth about the droids by Obi-Wan Kenobi’s use of the Jedi Mind Trick to plant the “truth” in their weak minds.
Don’t be weak-minded about seeking the truth. Use your common sense to challenge the “truth,” and get into the granular details so that one can identify and quantify those variables and metrics that are better predictor or indicators of the problems.
And beware the “These aren’t the Droids you’re looking for” syndrome. That’s for the weak-minded.
Internet of @ThingsExpo, taking place June 6-8, 2017 at the Javits Center in New York City, New York, is co-located with the 20th International Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. @ThingsExpo New York Call for Papers is now open.
Jan. 17, 2017 03:45 AM EST Reads: 3,520
In his session at 19th Cloud Expo, Claude Remillard, Principal Program Manager in Developer Division at Microsoft, contrasted how his team used config as code and immutable patterns for continuous delivery of microservices and apps to the cloud. He showed how the immutable patterns helps developers do away with most of the complexity of config as code-enabling scenarios such as rollback, zero downtime upgrades with far greater simplicity. He also demoed building immutable pipelines in the cloud ...
Jan. 17, 2017 03:45 AM EST Reads: 3,365
DevOps is being widely accepted (if not fully adopted) as essential in enterprise IT. But as Enterprise DevOps gains maturity, expands scope, and increases velocity, the need for data-driven decisions across teams becomes more acute. DevOps teams in any modern business must wrangle the ‘digital exhaust’ from the delivery toolchain, "pervasive" and "cognitive" computing, APIs and services, mobile devices and applications, the Internet of Things, and now even blockchain. In this power panel at @...
Jan. 17, 2017 03:45 AM EST Reads: 2,705
SYS-CON Events announced today that Catchpoint Systems, Inc., a provider of innovative web and infrastructure monitoring solutions, has been named “Silver Sponsor” of SYS-CON's DevOps Summit at 18th Cloud Expo New York, which will take place June 7-9, 2016, at the Javits Center in New York City, NY. Catchpoint is a leading Digital Performance Analytics company that provides unparalleled insight into customer-critical services to help consistently deliver an amazing customer experience. Designed ...
Jan. 17, 2017 03:15 AM EST Reads: 6,209
Web Real-Time Communication APIs have quickly revolutionized what browsers are capable of. In addition to video and audio streams, we can now bi-directionally send arbitrary data over WebRTC's PeerConnection Data Channels. With the advent of Progressive Web Apps and new hardware APIs such as WebBluetooh and WebUSB, we can finally enable users to stitch together the Internet of Things directly from their browsers while communicating privately and securely in a decentralized way.
Jan. 17, 2017 03:00 AM EST Reads: 488
While many government agencies have embraced the idea of employing cloud computing as a tool for increasing the efficiency and flexibility of IT, many still struggle with large scale adoption. The challenge is mainly attributed to the federated structure of these agencies as well as the immaturity of brokerage and governance tools and models. Initiatives like FedRAMP are a great first step toward solving many of these challenges but there are a lot of unknowns that are yet to be tackled. In hi...
Jan. 17, 2017 02:15 AM EST Reads: 3,739
With the proliferation of both SQL and NoSQL databases, organizations can now target specific fit-for-purpose database tools for their different application needs regarding scalability, ease of use, ACID support, etc. Platform as a Service offerings make this even easier now, enabling developers to roll out their own database infrastructure in minutes with minimal management overhead. However, this same amount of flexibility also comes with the challenges of picking the right tool, on the right ...
Jan. 17, 2017 01:45 AM EST Reads: 5,212
The cloud market growth today is largely in public clouds. While there is a lot of spend in IT departments in virtualization, these aren’t yet translating into a true “cloud” experience within the enterprise. What is stopping the growth of the “private cloud” market? In his general session at 18th Cloud Expo, Nara Rajagopalan, CEO of Accelerite, explored the challenges in deploying, managing, and getting adoption for a private cloud within an enterprise. What are the key differences between wh...
Jan. 17, 2017 12:45 AM EST Reads: 6,010
One of the hottest areas in cloud right now is DRaaS and related offerings. In his session at 16th Cloud Expo, Dale Levesque, Disaster Recovery Product Manager with Windstream's Cloud and Data Center Marketing team, will discuss the benefits of the cloud model, which far outweigh the traditional approach, and how enterprises need to ensure that their needs are properly being met.
Jan. 16, 2017 11:45 PM EST Reads: 4,199
WebRTC has had a real tough three or four years, and so have those working with it. Only a few short years ago, the development world were excited about WebRTC and proclaiming how awesome it was. You might have played with the technology a couple of years ago, only to find the extra infrastructure requirements were painful to implement and poorly documented. This probably left a bitter taste in your mouth, especially when things went wrong.
Jan. 16, 2017 09:00 PM EST Reads: 7,446
Up until last year, enterprises that were looking into cloud services usually undertook a long-term pilot with one of the large cloud providers, running test and dev workloads in the cloud. With cloud’s transition to mainstream adoption in 2015, and with enterprises migrating more and more workloads into the cloud and in between public and private environments, the single-provider approach must be revisited. In his session at 18th Cloud Expo, Yoav Mor, multi-cloud solution evangelist at Cloudy...
Jan. 16, 2017 08:45 PM EST Reads: 4,580
When you focus on a journey from up-close, you look at your own technical and cultural history and how you changed it for the benefit of the customer. This was our starting point: too many integration issues, 13 SWP days and very long cycles. It was evident that in this fast-paced industry we could no longer afford this reality. We needed something that would take us beyond reducing the development lifecycles, CI and Agile methodologies. We made a fundamental difference, even changed our culture...
Jan. 16, 2017 08:00 PM EST Reads: 682
The proper isolation of resources is essential for multi-tenant environments. The traditional approach to isolate resources is, however, rather heavyweight. In his session at 18th Cloud Expo, Igor Drobiazko, co-founder of elastic.io, drew upon his own experience with operating a Docker container-based infrastructure on a large scale and present a lightweight solution for resource isolation using microservices. He also discussed the implementation of microservices in data and application integrat...
Jan. 16, 2017 06:45 PM EST Reads: 3,493
All organizations that did not originate this moment have a pre-existing culture as well as legacy technology and processes that can be more or less amenable to DevOps implementation. That organizational culture is influenced by the personalities and management styles of Executive Management, the wider culture in which the organization is situated, and the personalities of key team members at all levels of the organization. This culture and entrenched interests usually throw a wrench in the work...
Jan. 16, 2017 06:00 PM EST Reads: 412
Containers have changed the mind of IT in DevOps. They enable developers to work with dev, test, stage and production environments identically. Containers provide the right abstraction for microservices and many cloud platforms have integrated them into deployment pipelines. DevOps and containers together help companies achieve their business goals faster and more effectively. In his session at DevOps Summit, Ruslan Synytsky, CEO and Co-founder of Jelastic, reviewed the current landscape of Dev...
Jan. 16, 2017 05:00 PM EST Reads: 4,026