As news people, we tell truth all the time.
Or are we?!
More and more news are having data as hard evidence to quantify objective facts, so it looks firm, professional and utterly true.
I hate to break it to you, but this is NOT the case! How? Here’s a common example suggested by Dr. Tong Tiejun during the monthly colloquium held by the Data & News Society.In China, student who wants to attend university must take a nation-wide entry exam. And every few years, there are news pieces about how are those past champions of the exam from different province doing now. And the result is always that they are not the best in their generation. As a journalist who has no knowledge in statistics, this conclusion could be easily drawn with genuine data. But, we were wrong!
Here’s why. In the world of statistics, conducting a survey involves many steps, and each one of them might lead to a false result, whether intentionally or unintentionally. In the case we mention above, it went wrong from the very beginning: sampling. Those exam champions are completing against all the rest of the students who takes exam the same year, for sure, the odds are against them.
As Darrell Huff stated in his book, How to Lie with Statistics (1954), “The secret language of statistics, so appealing in a fact-minded culture, is employed to sensationalize, inflate, confuse, and oversimplify. Statistical methods and statistical terms are necessary in reporting the mass data of social and economic trends, business conditions, opinion polls, the census. But without writers who use the works with honesty and understanding and readers who know what they mean, the result can only be semantic nonsense.”
“So, are you calling me a liar now?” Absolutely not! As a matter of fact, a series of data or survey is always true; it’s just a matter of presenting. In order to avoid common misunderstanding we have to first get to know some common methods of misleading.
Besides the sampling errors above, there are many more types of errors:
Coverage Error: It lies on the discrepancy between the target population and the sampled population based on sampling frame. The survey result will be invalid if large discrepancy exists.
Non-response Error: Non-respondents often differ systematically from respondents. Such biases are difficult to eliminate since the precise reasons for non-response are unknown. If the reasons for non-response are correlated with some unmeasured background variables, completely ignoring the missing data may lead to seriously biased results.
Interviewer Error: Interviewer may misinterpret the questions caused by ambiguous definition. In addition, friendly interviewers may gain a higher response rate.
Respondent Error: Untruthful answers, memory errors, wrong answers caused by misunderstanding questions, etc.
Instrumental Error: Words in questionnaires may be ambiguous or have more than one meaning. Such error of inconsistent measurement is difficult to avoid completely.
Data collection Error: Different data collection methods place very different demands on respondents and hence affect how they respond, especially when the questions deal with abstract ideas or sensitive issues.
And there’s one more error we run into consistently nowadays, I like to call it Visual Error.
Few years back, as Huff state in his book, in order to draw attention from the reader, bar chart with accurate data was often changed into something more eye-appealing, like one sac of money v.s. two sac of money. And this gives an altered impression, but readers want to know the truth, so bar charts are back in the game, but with a great make-up.
We like 3D imagines now, statisticians do not. Since the vanish point in the 3D imagines could lead to illusions, like the one showing below, it is with no doubt that item A is about the same as item C, right?
So let’s take another angle and look from above. Now we can get a real idea from different proportion of item A and C. A is actually more than twice as much as item C. This trick is often used in the marketing area, like the most well-known tech genius as well as marketing master-mind, Steve Jobs. (See here, http://initiumlab.com/blog/20160316-eight-myths-about-data-journalism/)
So take it from here, if you are a reader, be alert on various data presented in news, and think before you take them in.
If you are a news guy/gal, equip yourself with enough statistics knowledge and analyse them fairly and truly.
(Or you can always ask a pro, of course.)