Modern Data Literacy Guide
Published:
What is data journalism
Data journalism is a field of journalism and media studies involving the use of datasets, figures, and graphics to illustrate key views of trends, populations, and more to the readers, such that the readers can “see” the implied view, and crucially, draw their own conclusions from associated interpretations of the datasets and views provided by the publisher.
How does data affect our daily lives?
And how does this relate to attribution?
Well, glass-half full, it doesn’t. Most media and journal outlets do not provide the public with overwhelming graphics, trends, graphs, mostly because they don’t poll/test well in test audiences! News programs and financial media, weather, and political news, however, often display simplified statistics from sources. Those sources may be “official” governmental datasets, in which case the chance of it being good information is very good. Unofficial sources, are the wild west. They could be unattributed, profit-driven and industry funded, propaganda, unattributed, partisan, and/or deceptive. This is the key distinction we are not taught to see in modern data journalism. Attribution is one key component that focuses on “official” data sources, models, and statistics.
But what about my intuition that official sources could be biased?
Its not that they have to be biased. Teams that give equal weight and time to diverse opinons statistically lead to better outcomes. Hearing multiple opinions is at the core of mutualism because of specialization in labor markets; hearing everyone’s opinion on the team leads to stronger organizations and more responsive teams.
What about industry influenced journalism is inherently bad?
Industry influenced media is different from Industry Opinions in Media. When taking in factors related to economic news, lawsuits involving private companies or conglomerates/corporations, or celebrities of media and industry providing canned statistics to journalists, there are bound to be instances of industry opinions, guidance, and models/statistics thrown into the media. Why is industry influence a bad thing?
It’s not just that it usally leads to bias and misinformation, it’s that there is a history of experts claiming one thing without giving attribution, a core component of liability, to data or figures that influence the readers opinion, sometimes in dramatic ways.
But journalism has changed. Anyone with their phone can be a journalist. Who do I trust?
That’s true, there’s more independent media than ever. Trust is a you issue. Who do you trust? Do you notice that you trust certain media sources over others? Do you dislike certain media outlets or repeatedly dismiss all of “their” arguments as lies or propaganda. Well, that’s a good thing in a way because you’re already critical of government and media. That’s absolutely the first step.
A good rule of thumb, is to listen for mentions of data sources, or for them to be written into the bottom of data graphics or articles as references. This helps you understand who helped provide the information for the graphic and what makes sense as far as how to give a judgment to the quality or validity of the graphic.
What are primary sources?
Primary sources make the backbone of quality data journalism, but it’s not a term you hear regularly outside of the academic institution. Primary sources may be government, academics, non-profits, or even industry! But how does this differ from the low-quality, biased, or unattributed sources you mention earlier? Well for one, we’ve included government and non-profit sources, as well as the concept of attribution which helps preserve the chain of custody of the data and analysis, and possibly using official sources on the issue such as government, or certain reputable non-profits. Reputable is the key-word there of course, but again the media that is reputable has a certain degree of professional attribution of the source of the data used in their interpretations.
What is data literacy?
Data literacy is a skill everyone should have! Not kidding or exaggerating. Data is used in media or official communications to inform. News media or government organizations want the public to know key statistics, trends, and interpretations of official sources and statistics used to measure key indicators of our countries financial health, economic bottlenecks, new industry trends or market changes, environmental and agricultural indicators. Well what is an indicator? An indicator may be a key market statistic, such as GDP per capita. Not all GDP numbers are GDP per capita, sometimes just a countries GDP. Or it may be a key indicator for ocean acidification, threat of algal blooms, and temperature swells that could threaten fishing, such as NOAA forecasts. With this in mind, there is a whole world of information about things that can and often do affect us.
The key is learning how to interpret it.
How should you interpret line graphs?
Line graphs in this definition include bar charts showing trends, scatterplots with fitted lines, boxplots showing moving average, or nearly any 2D graph with a trend captured. Important components are error bars, description of what those error bars mean, and a description of the model fit, if any.
Line graphs should display the correct trend, and may often be biased. For example, one may exaggerate the difference between two points and thus its trend with oddly “scaled” axes. For example, consider the difference between 400 and 600, 50% more, and again with a transformed y-axis only showing from 400 to 600.
How should you interpret scatterplots?
Scatterplots often show a sort of trend, possible correlation, and again a sense of the distribution of pairwise points in a 2D, x-y coordinate system. The key statistic is the type and formulation of the adjusted R^2 or the pearson r^2. Typically on a scale of 0-1, they refer to the degree of agreement among the residuals. A perfect agreement of 0.99 or 1 suggests high “correlation.”
How should you interpret statistics?
Statistics is the tricikiest of the three mentioned data or graphical elements to consider carefully when concerning models or graphical presentations of data in media or academic journals. A statistic is any number or similar datum, a singular metric be it a simple count, or a more advanced formulated metric. Statistics is the field concerning the design and interpretation of statistics informing researchers about the population of study.