Welcome!

Blog Feed Post

Data Journalism at The Guardian

UK newspaper, The Guardian, has done some pioneering work to use data, and to engage readers in exploring data to share their own insights. The paper’s Simon Rogers and Google’s Kathryn Hurley shared some of the lessons at Strata this morning.

Rough notes follow.

Not going to talk about big projects like riots and Wikileaks and MP’s expenses… Going to talk about the day-to-day process of hacking around with data.

Open data journalism – more than just Google spreadsheets. Much more of a two-way process than simply writing and disseminating stories.

Numbers need context. Journalists need the skills to interpret, probe, and tell a data-backed story.

First rule of what we do – find the key data behind a story and make it public. Guardian Datablog and Data Store used to push out data relevant to the main news stories of the week.

Lots of data is available, but it’s locked up in a wide range of data sets. A lot of the Data team’s work is involved with pulling freely available data together in one place – making it comparable and useful.

Get past raw numbers, and show how they have changed over time. Measures and units and groupings change, so how do you actually compare like with like?

Don’t always just rely upon the algorithm… Need the knowledge and the question-asking capabilities to wonder whether or not the result is too good to be true. Often, it will be wrong.

Olympics… lots of data, but very little was open. IOC sold the data, and refused to allow it to be shared.

Kathryn Hurley at Google… spent the last week working directly with the Guardian team… Learned…

  • News drives the stories
  • Data journalism moves fast
  • Quick and easy tools reign supreme

What does this mean for other businesses?

  • Know what matters
  • Find the data to back it up (internally, from government, from public data sites, from data markets, etc)
  • Clean the data (a lot! Sometimes just normalisation, sometimes more serious)
    • plugging tools like Google Refine
  • Sometimes the data you have isn’t enough – find more
  • Tell the story – visualisation matters, interactivity helps
  • Sharing the data to support your story – make it available for download, or offer an api

Tools need to get easier to use and richer, to let data journalists (and others) get the results they need more quickly, and with less coding.

Published data needs to be more logically formatted… PDFs derived from printed documents are designed for human reading, not for machine processing.

Image of The Guardian‘s offices by Flickr user Mark Hillary.

Read the original blog entry...

More Stories By Paul Miller

Paul Miller works at the interface between the worlds of Cloud Computing and the Semantic Web, providing the insights that enable you to exploit the next wave as we approach the World Wide Database.

He blogs at www.cloudofdata.com.

Latest Stories
DX World EXPO, LLC., a Lighthouse Point, Florida-based startup trade show producer and the creator of "DXWorldEXPO® - Digital Transformation Conference & Expo" has announced its executive management team. The team is headed by Levent Selamoglu, who has been named CEO. "Now is the time for a truly global DX event, to bring together the leading minds from the technology world in a conversation about Digital Transformation," he said in making the announcement.
"We've been engaging with a lot of customers including Panasonic, we've been involved with Cisco and now we're working with the U.S. government - the Department of Homeland Security," explained Peter Jung, Chief Product Officer at Pulzze Systems, in this SYS-CON.tv interview at @ThingsExpo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"We're here to tell the world about our cloud-scale infrastructure that we have at Juniper combined with the world-class security that we put into the cloud," explained Lisa Guess, VP of Systems Engineering at Juniper Networks, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"Peak 10 is a hybrid infrastructure provider across the nation. We are in the thick of things when it comes to hybrid IT," explained , Chief Technology Officer at Peak 10, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"I will be talking about ChatOps and ChatOps as a way to solve some problems in the DevOps space," explained Himanshu Chhetri, CTO of Addteq, in this SYS-CON.tv interview at @DevOpsSummit at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"We are focused on SAP running in the clouds, to make this super easy because we believe in the tremendous value of those powerful worlds - SAP and the cloud," explained Frank Stienhans, CTO of Ocean9, Inc., in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
While the focus and objectives of IoT initiatives are many and diverse, they all share a few common attributes, and one of those is the network. Commonly, that network includes the Internet, over which there isn't any real control for performance and availability. Or is there? The current state of the art for Big Data analytics, as applied to network telemetry, offers new opportunities for improving and assuring operational integrity. In his session at @ThingsExpo, Jim Frey, Vice President of S...
"As we've gone out into the public cloud we've seen that over time we may have lost a few things - we've lost control, we've given up cost to a certain extent, and then security, flexibility," explained Steve Conner, VP of Sales at Cloudistics,in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
Your homes and cars can be automated and self-serviced. Why can't your storage? From simply asking questions to analyze and troubleshoot your infrastructure, to provisioning storage with snapshots, recovery and replication, your wildest sci-fi dream has come true. In his session at @DevOpsSummit at 20th Cloud Expo, Dan Florea, Director of Product Management at Tintri, provided a ChatOps demo where you can talk to your storage and manage it from anywhere, through Slack and similar services with...
The financial services market is one of the most data-driven industries in the world, yet it’s bogged down by legacy CPU technologies that simply can’t keep up with the task of querying and visualizing billions of records. In his session at 20th Cloud Expo, Karthik Lalithraj, a Principal Solutions Architect at Kinetica, discussed how the advent of advanced in-database analytics on the GPU makes it possible to run sophisticated data science workloads on the same database that is housing the rich...
"At the keynote this morning we spoke about the value proposition of Nutanix, of having a DevOps culture and a mindset, and the business outcomes of achieving agility and scale, which everybody here is trying to accomplish," noted Mark Lavi, DevOps Solution Architect at Nutanix, in this SYS-CON.tv interview at @DevOpsSummit at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"We are an IT services solution provider and we sell software to support those solutions. Our focus and key areas are around security, enterprise monitoring, and continuous delivery optimization," noted John Balsavage, President of A&I Solutions, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"We want to show that our solution is far less expensive with a much better total cost of ownership so we announced several key features. One is called geo-distributed erasure coding, another is support for KVM and we introduced a new capability called Multi-Part," explained Tim Desai, Senior Product Marketing Manager at Hitachi Data Systems, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
There is a huge demand for responsive, real-time mobile and web experiences, but current architectural patterns do not easily accommodate applications that respond to events in real time. Common solutions using message queues or HTTP long-polling quickly lead to resiliency, scalability and development velocity challenges. In his session at 21st Cloud Expo, Ryland Degnan, a Senior Software Engineer on the Netflix Edge Platform team, will discuss how by leveraging a reactive stream-based protocol,...
DevOps at Cloud Expo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 21st Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to w...