Measuring Editorial Quality for Digital Journalism (Beyond Performance Metrics)

“Quality journalism” can mean different things, depending on a publication’s audience, business objectives, industry focus, and other factors. Photo: Getty Images.

“Quality journalism” can mean different things, depending on a publication’s audience, business objectives, industry focus, and other factors. Photo: Getty Images.

Granite Media was born out of the desire to create a new kind of digital media company, one that could find financial stability and growth in today’s increasingly programmatic media marketplace while still publishing high quality reads. As the company passes its second birthday, we have much to celebrate, having built a model that supports four thriving sites (Work + Money, Far & Wide, FamilyMinded, and Stadium Talk), publishing daily original stories in finance, travel, family and sports.

Measuring the performance of stories is part of the DNA of any modern media company, and like many, we keep a close watch on the vital statistics of how each story performs in the digital ecosystem, tallying various metrics of how both readers and advertisers are interacting with each story. In fact, we’ve built a custom data warehouse, internally called ‘Jupiter’, that we use to collect and manage the real-time financial and statistical data that helps us run our business.

Granite aims to do even more. Yes, we need to measure how “well” a story does, but we also want to measure how “good” a story is. One of our founding principles is to promote high quality journalism to our audiences. We believe that quality storytelling will be rewarded by higher reader satisfaction and engagement. That’s good for business, and it makes us like the work that we do. 

But how do we know if a story we publish is a “quality read?” And what do we really mean by “quality,” anyway? Can we list out the criteria, and should it be the same for various kinds of stories? We are well aware that not all stories that go viral are what most would consider “quality journalism,” and likewise, not all the stories we consider among our best work will attract mass market readership. 

What gets measured gets done

At Granite, we’ve now worked with hundreds of professional authors, both staff and freelance writers. The diverse array of writing style and experience that these writers have brought was an impetus for laying down some guidelines and criteria for what we consider quality writing for our publications. Trying to define quality writing is, at its heart, a subjective task, but we thought that by breaking it down into component parts, we could better orient our company towards a common goal. In the spirit of “what gets measured gets done,” we hoped that this effort would allow us to keep our focus on improving the kind of journalism we most value. We wanted to devise a way to allow “graders” to assess our stories to give us actionable data concerning our journalistic quality.

Granted, this endeavor might not be as clear-cut as measuring bounce rate, time on story, CPM rates or SEO distribution, but we felt we could identify key elements of what editorial excellence means to us at Granite and measure the degree to which they are present in a given story. The first challenge was to articulate what kinds of stories were we trying to tell, and by extension not tell. 

The Granite approach to storytelling is geared toward informing, entertaining, stoking curiosity about the world, and connecting people with their passions. A Granite story can be ‘great’ by doing only one of those things brilliantly, or by serving multiple purposes at once. There is no one formula for a great story, fair enough. We started by assessing how well a writer delivered against the specific assignment that they were given. In other words, “For this topic, and for what the story was trying to achieve, how well did the author do?”

We began by identifying the editorial elements, the component parts of a story, we most valued as an editorial team. We also discussed previous studies on attempts to define and grade editorial quality and how they could help us think about the challenge for us. The challenge was to develop a grading rubric that was robust enough to capture the essence of what we mean by quality, but also simple enough to be practical in a business setting. “Quality journalism” can mean different things, depending on a publication’s audience, business objectives, industry focus, and other factors. Publications don’t exist in a vacuum; they aim to serve their chosen audience and business model. An academic journal is very different from a humor website, an investigative human rights news desk is very different from a travel magazine, and so on. 

A rubric to fit our vision

As an editorial team, we tested our initial rubric on a two-week period of stories and found wide variability among the various graders, realizing we needed to further hone and describe what we wanted to measure. The first draft of the rubric, and thus the results, suffered from being too general in scope. We simplified the rubric, reducing the number of overall categories and making the remaining list more finite and focused. A second staff grading with the more focused rubric showed better grouping on the results (in other words, we agreed on which stories got high vs low marks). We opted for a 5-star grading system for each criteria since it captures some granularity but keeps the grading experience fast and simple.

The current rubric consists of the following seven categories, our editorial values:

  • Organization/ Framing, measuring how well the story framework carries the reader along and gives the article a sense of flow.

  • Content/ Development, measuring how well a story uses supporting details and insight to develop its theme.

  • Presentation/ Art, measuring how well the use of visuals adds to the meaning of the story and enhances its storytelling.

  • Sourcing/ Reporting/ Data, measuring the depth and diversity of sources and/or data to support a story’s claims, including level of attribution and original reporting.

  • Voice/ Style/ Narrative, measuring how well a story uses style or voice to enhance its force, create momentum, and deepen the reading experience.

  • Emotional Experience/ Surprise/ Joy, measuring how well the story stokes curiosity in the topic, invokes an emotional response, and is able to move the reader in some way.

  • Grammar/ Mechanics/ Accuracy, measuring the impact of errors (spelling, grammatical or factual) in a story.

This rubric is intended to judge how well each story serves the general reader, within the context of the work Granite publishes, and, as such, may or may not be applicable to other publications. In addition, Granite publishes various kinds of stories that may lend themselves in different ways to the criteria above. For instance, a pure list-based story may lack a traditional sense of narrative development, but the choice and order of the list items hopefully will create an internal rhythm and sense of progress that satisfies the story’s aim. Likewise, a first-person essay might not contain the same sourcing structure that a reporting-based article may contain, but the author may convey authority over the story and introduce evidence and outside sources that enhance the story.

The five-point scale is designed to cast the middle grade (3) as a relatively neutral grade, or in other words, the assessment of that category is OK but neither great nor bad. The grades lower than that middle score indicate problems that detract from the overall quality of the story. Likewise, the grades over the middle indicate the story has characteristics that show strong or outstanding quality. 

Recognizing that factual and grammatical errors detract from a story’s quality but an error-free story should be merely the baseline for professional work, the last category, Grammar/ Mechanics/ Accuracy, has only two scores: a negative one (1) for factual or grammatical mistakes, and a neutral one (3) for a story that is free of errors. 

The first category, “Organization/ Framing,” is the starting point because it assesses how well that article sets up and follows its own internal logic and establishes the context by which the other elements will be seen and assessed. The penultimate category, “Emotional Experience/ Surprise/ Joy,” gets to the heart of the editorial experience we are trying to create, in essence: does this story work, does it move you, are you glad you read it? For that reason, we decided this category should be weighted double the other categories.

A striking correlation

One of the clear benefits of establishing the rubric and working to better articulate our editorial values has been that it has given us a common vocabulary and point of reference at the company. This has eased communication between departments and has helped the editorial team hone its vision and brainstorming. 

There is of course the practical challenge of putting this rubric into operational practice. It’s time consuming and potentially expensive to grade every story. A writer can't grade themselves objectively, of course, and asking staff and/or third-party consultants to grade every story would take too much time and budget. So instead, we have proceeded by sampling stories, for instance taking a two-week bloc, and grading those as a way to track our general progress over time. On a day-to-day basis, we rely on the senior editors' judgement and try to find other (easier) metrics as proxies.

One interesting discovery of these early tests was a striking correlation between a subjectively-reviewed “high quality” story, meaning a story that earned consistently high marks from graders, and the “time-on-story” performance metric. No other objective metric (such as bounce rate or revenue per story view) correlated as well with a “quality read.”  We ran a linear regression model on 40 stories to see how well various performance metrics accounted for differences in the subjective grades. We compared each performance metric based on R-squared results. In statistics, a high R-squared means that the KPI goes a long way towards explaining or predicting the quality grade (but with plenty of limitations). Here are the results:

Granite Media internal study based on 40 stories published in July 2019. Values based on simple linear regressions.

Granite Media internal study based on 40 stories published in July 2019. Values based on simple linear regressions.

Granite Media internal study based on 40 stories published in July 2019. Resulting R-squared value for average time on story to the subjective quality grades was 64.50%.

Granite Media internal study based on 40 stories published in July 2019. Resulting R-squared value for average time on story to the subjective quality grades was 64.50%.

Writer Matthew Crawford has said, "Attention is a resource — a person has only so much of it,” and it is clear that attention economics was very much in play throughout our study. The stories our graders determined were the best, those with consistent scores of fours and fives, were also the stories that our audience spend the most time with. The correlation made sense to us instinctively, but it was terrific to see the numbers match our assumption. As a result, we have begun to think of the “time on story” data as a rough shorthand for “quality.” Granted, this is preliminary and premature, but it is a powerful thread that bears following up on and sussing out. 

Our intention is to grow this project to include third-party graders, combining the judgement of people inside and outside the company, to bring the picture of how well we are doing, our successes and failures, into sharper and more precise focus. As we continue to build Granite, we believe that testing and measuring quality, in essence holding ourselves to our promise to produce great storytelling, is the path to becoming a strong and sustainable media company.