Blog Feed Post

Using Machine Learning to Stop Fake News

Given all the brilliant things that are happening today with machine learning and artificial intelligence, I just don’t understand why “fake news” is still an issue. I think the solution is right in front of us; that is, if social media networks are really serious about addressing this problem.

Facebook is one of the biggest culprits in tolerating fake news, and that probably has a lot to do with the “economics of social engagement.” An article titled “Future of Social Media” summarizes the challenge nicely:

“While it’s great that everyone and her brother has access to create content online, offering a more diverse and thriving online market, this also generates stronger competition for your content to break through the clutter and be seen.

In fact, there will be a time in which the amount of content internet users can consume will be outweighed by the amount of content produced. Schaefer calls this “Content Shock” which, unfortunately, is uneconomical.”

Figure 1 shows the area of “Content Shock,” when the ability to create content outstrips the ability for humans to consume it.

Economic Content and Reason for Machine Learning

Figure 1: Economics of Content and “Content Shock”

The article recommends to “create content that will stand out” in order to draw attention and create engagement. Well, nothing draws attention and creates engagement like “fake news”. For example, here are some examples of fake news articles and the number of Facebook engagements each of these articles drove[1]:

  • “Pope Francis shocks world, endorses Donald Trump for president” – 960,000 Facebook engagements
  • “WikiLeaks confirms Hillary sold weapons to ISIS … Then drops another bombshell” – 789,000 Facebook engagements
  • “FBI agent suspected in Hillary email leaks found dead in apartment murder-suicide” – 567,000 Facebook engagements

That’s an awful lot of Facebook engagements with news that isn’t true, but the “news” certainly does “stand out” in the crowded content space and it certainly does drive engagement.

Solving the Fake News Problem

So assuming that the social media networks truly are motivated to solve the “fake news” problem, here is how I would do it.

  • Step 1: Leverage crowdsourcing to flag potential fake news articles. Social media networks could create a “Fake News” button that flags potential fake news, like Yahoo Mail does today to flag potential spam (see Figure 2).
Figure 2:  Flagging Potential Email Spam in Yahoo Mail

Figure 2:  Flagging Potential Email Spam in Yahoo Mail

  • Step 2: Human Reviewers would need to review the flagged “Fake News” articles to determine which ones are fake and which ones are not fake. Maybe the Reviews could even add additional information (metadata data?) that captures information such as “degree of fakeness” (i.e., is it an outright lie or is it just a slight twisting of the facts) and “severity of fakeness” (i.e., fake news about a celebrity isn’t nearly as severe as fake news about a political candidate. Heck, there are certain celebrities whose fame seems to be based entirely upon fake news… the Kardashians?).
  • Step 3: Apply Supervised Machine Learning algorithms against the flagged potential “fake news” articles to find (quantify) correlations and predictors (i.e., combinations of words, phrases and topics) of “fake news” outcomes. Then use the resulting “fake news” models on new articles to score the article’s “level of fakeness.” Remember, Supervised Machine Learning algorithms identify and quantify relationships between potential predictive variables and metrics against known outcomes (e.g., spam, fraudulent transaction, machine failure, web click, purchase transaction) gathered from historical (training) data sets and then applies the models to new data sets.
  • Step 4: Create “Reader Credibility Scores” to rank credibility of people flagging fake news articles. It is critical to create reader credibility scores (think FICO score or Uber driver and passenger scores) to measure the integrity of folks who are flagging potential fake news (as well as those that are also promoting fake news). That will help to identify “trolls[2]” who are just trying to perpetuate the fake stories or cast doubt on real news.

Amazon already supports the flagging of potential “Trolls” and “fake reviews” in their customer reviews (see Figure 3).

Figure 3:  Flagging Fake Reviews

Figure 3:  Flagging Fake Reviews

  • Step 5: Create “Publisher Credibility Scores” that measures the credibility and reliability of each publisher or source of the article. This score would be comprised of the results of the fakeness analysis (how many fake articles is that publisher responsible for) but could also include other variables such as number of employees working for the publisher and tenure in the business (e.g., Wall Street Journal has around 3,600 employees and has been publishing since 1851 versus Liberty Writers News which has 2 employees and has been publishing since only 2015). Heck, there is even a Wikipedia page “List of fake news websites” that lists known fake news sites, such as Liberty Writers News, American News, Disclose TV, Drudgereport.com and World Truth TV.

Freedom of Speech and Type I/Type II Errors

Machine Learning could certainly help to mitigate and flag fake news, but probably cannot and should not even try to eliminate it entirely. Why? It’s the First Amendment of the Constitution and it’s called Freedom of Speech.

One important consideration as social media organizations look to squelch fake news is to not violate Freedom of Speech. So instead of an outright deletion of questionable publications (other than for pornographic, liable or hate crime reasons), it might be better for the social media sits to use some sort of “Degrees of Truth” indicator that could accompany each publication or article. These indicators might look like something in Figure 4.

Figure 4:  Degrees of Truthfulness Indicators

Figure 4:  Degrees of Truthfulness Indicators

The cost to society of letting a few fake news articles to get published (false positive) greatly outweighs the potential costs of blocking potentially valid news (false negatives). So one will need to err on the side of allowing some level of fake news to ensure that one is not blocking real (though maybe controversial) news. See my blog “Understanding Type I and Type II Errors” to learn more about the potential costs and liabilities associated with Type I and Type II errors.

Machine Learning to End of Fake News

Ending Fake News seems like the perfect application of machine learning. Organizations like Yahoo, Google and Microsoft have been using machine learning for years now to catch spam (see article “Google Says Its AI Catches 99.9 Percent Of Gmail Spam”.)  And companies like McAfee and Symantec employee machine learning to catch viruses (see article “Malware Detection with Machine Learning Methods”.)

Fake news looks a lot like spam and a virus to me. Should be an easy problem to solve, if one really wants to.

[1] http://www.cnbc.com/2016/12/30/read-all-about-it-the-biggest-fake-news-stories-of-2016.html

[2] A troll is a person who sows discord on the Internet by starting arguments or upsetting people, by posting inflammatory, extraneous, or off-topic messages with the intent of provoking readers into an emotional response or of otherwise disrupting normal, on-topic discussion. https://en.wikipedia.org/wiki/Internet_troll

The post Using Machine Learning to Stop Fake News appeared first on InFocus Blog | Dell EMC Services.

Read the original blog entry...

More Stories By William Schmarzo

Bill Schmarzo, author of “Big Data: Understanding How Data Powers Big Business”, is responsible for setting the strategy and defining the Big Data service line offerings and capabilities for the EMC Global Services organization. As part of Bill’s CTO charter, he is responsible for working with organizations to help them identify where and how to start their big data journeys. He’s written several white papers, avid blogger and is a frequent speaker on the use of Big Data and advanced analytics to power organization’s key business initiatives. He also teaches the “Big Data MBA” at the University of San Francisco School of Management.

Bill has nearly three decades of experience in data warehousing, BI and analytics. Bill authored EMC’s Vision Workshop methodology that links an organization’s strategic business initiatives with their supporting data and analytic requirements, and co-authored with Ralph Kimball a series of articles on analytic applications. Bill has served on The Data Warehouse Institute’s faculty as the head of the analytic applications curriculum.

Previously, Bill was the Vice President of Advertiser Analytics at Yahoo and the Vice President of Analytic Applications at Business Objects.

Latest Stories
In his session at @DevOpsSummit at 20th Cloud Expo, Kelly Looney, director of DevOps consulting for Skytap, showed how an incremental approach to introducing containers into complex, distributed applications results in modernization with less risk and more reward. He also shared the story of how Skytap used Docker to get out of the business of managing infrastructure, and into the business of delivering innovation and business value. Attendees learned how up-front planning allows for a clean sep...
Blockchain is a shared, secure record of exchange that establishes trust, accountability and transparency across supply chain networks. Supported by the Linux Foundation's open source, open-standards based Hyperledger Project, Blockchain has the potential to improve regulatory compliance, reduce cost and time for product recall as well as advance trade. Are you curious about Blockchain and how it can provide you with new opportunities for innovation and growth? In her session at 20th Cloud Exp...
IoT is at the core or many Digital Transformation initiatives with the goal of re-inventing a company's business model. We all agree that collecting relevant IoT data will result in massive amounts of data needing to be stored. However, with the rapid development of IoT devices and ongoing business model transformation, we are not able to predict the volume and growth of IoT data. And with the lack of IoT history, traditional methods of IT and infrastructure planning based on the past do not app...
To get the most out of their data, successful companies are not focusing on queries and data lakes, they are actively integrating analytics into their operations with a data-first application development approach. Real-time adjustments to improve revenues, reduce costs, or mitigate risk rely on applications that minimize latency on a variety of data sources. Jack Norris reviews best practices to show how companies develop, deploy, and dynamically update these applications and how this data-first...
Intelligent Automation is now one of the key business imperatives for CIOs and CISOs impacting all areas of business today. In his session at 21st Cloud Expo, Brian Boeggeman, VP Alliances & Partnerships at Ayehu, will talk about how business value is created and delivered through intelligent automation to today’s enterprises. The open ecosystem platform approach toward Intelligent Automation that Ayehu delivers to the market is core to enabling the creation of the self-driving enterprise.
"At the keynote this morning we spoke about the value proposition of Nutanix, of having a DevOps culture and a mindset, and the business outcomes of achieving agility and scale, which everybody here is trying to accomplish," noted Mark Lavi, DevOps Solution Architect at Nutanix, in this SYS-CON.tv interview at @DevOpsSummit at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
Internet-of-Things discussions can end up either going down the consumer gadget rabbit hole or focused on the sort of data logging that industrial manufacturers have been doing forever. However, in fact, companies today are already using IoT data both to optimize their operational technology and to improve the experience of customer interactions in novel ways. In his session at @ThingsExpo, Gordon Haff, Red Hat Technology Evangelist, shared examples from a wide range of industries – including en...
In IT, we sometimes coin terms for things before we know exactly what they are and how they’ll be used. The resulting terms may capture a common set of aspirations and goals – as “cloud” did broadly for on-demand, self-service, and flexible computing. But such a term can also lump together diverse and even competing practices, technologies, and priorities to the point where important distinctions are glossed over and lost.
"We're here to tell the world about our cloud-scale infrastructure that we have at Juniper combined with the world-class security that we put into the cloud," explained Lisa Guess, VP of Systems Engineering at Juniper Networks, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
Enterprise architects are increasingly adopting multi-cloud strategies as they seek to utilize existing data center assets, leverage the advantages of cloud computing and avoid cloud vendor lock-in. This requires a globally aware traffic management strategy that can monitor infrastructure health across data centers and end-user experience globally, while responding to control changes and system specification at the speed of today’s DevOps teams. In his session at 20th Cloud Expo, Josh Gray, Chie...
Consumers increasingly expect their electronic "things" to be connected to smart phones, tablets and the Internet. When that thing happens to be a medical device, the risks and benefits of connectivity must be carefully weighed. Once the decision is made that connecting the device is beneficial, medical device manufacturers must design their products to maintain patient safety and prevent compromised personal health information in the face of cybersecurity threats. In his session at @ThingsExpo...
All organizations that did not originate this moment have a pre-existing culture as well as legacy technology and processes that can be more or less amenable to DevOps implementation. That organizational culture is influenced by the personalities and management styles of Executive Management, the wider culture in which the organization is situated, and the personalities of key team members at all levels of the organization. This culture and entrenched interests usually throw a wrench in the work...
"We're a cybersecurity firm that specializes in engineering security solutions both at the software and hardware level. Security cannot be an after-the-fact afterthought, which is what it's become," stated Richard Blech, Chief Executive Officer at Secure Channels, in this SYS-CON.tv interview at @ThingsExpo, held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA.
In his session at 20th Cloud Expo, Mike Johnston, an infrastructure engineer at Supergiant.io, discussed how to use Kubernetes to set up a SaaS infrastructure for your business. Mike Johnston is an infrastructure engineer at Supergiant.io with over 12 years of experience designing, deploying, and maintaining server and workstation infrastructure at all scales. He has experience with brick and mortar data centers as well as cloud providers like Digital Ocean, Amazon Web Services, and Rackspace. H...
You know you need the cloud, but you’re hesitant to simply dump everything at Amazon since you know that not all workloads are suitable for cloud. You know that you want the kind of ease of use and scalability that you get with public cloud, but your applications are architected in a way that makes the public cloud a non-starter. You’re looking at private cloud solutions based on hyperconverged infrastructure, but you’re concerned with the limits inherent in those technologies.