How (not) to deal with missing data: An economist’s take on a controversial study

Link: https://retractionwatch.com/2024/02/21/how-not-to-deal-with-missing-data-an-economists-take-on-a-controversial-study/

Graphic:

Excerpt:

I was reminded of this student’s clever ploy when Frederik Joelving, a journalist with Retraction Watch, recently contacted me about a published paper written by two prominent economists, Almas Heshmati and Mike Tsionas, on green innovations in 27 countries during the years 1990 through 2018. Joelving had been contacted by a PhD student who had been working with the same data used by Heshmati and Tsionas. The student knew the data in the article had large gaps and was “dumbstruck” by the paper’s assertion these data came from a “balanced panel.” Panel data are cross-sectional data for, say, individuals, businesses, or countries at different points in time. A “balanced panel” has complete cross-section data at every point in time; an unbalanced panel has missing observations. This student knew firsthand there were lots of missing observations in these data.

The student contacted Heshmati and eventually obtained spreadsheets of the data he had used in the paper. Heshmati acknowledged that, although he and his coauthor had not mentioned this fact in the paper, the data had gaps. He revealed in an email that these gaps had been filled by using Excel’s autofill function: “We used (forward and) backward trend imputations to replace the few missing unit values….using 2, 3, or 4 observed units before or after the missing units.”  

That statement is striking for two reasons. First, far from being a “few” missing values, nearly 2,000 observations for the 19 variables that appear in their paper are missing (13% of the data set). Second, the flexibility of using two, three, or four adjacent values is concerning. Joelving played around with Excel’s autofill function and found that changing the number of adjacent units had a large effect on the estimates of missing values.

Joelving also found that Excel’s autofill function sometimes generated negative values, which were, in theory, impossible for some data. For example, Korea is missing R&Dinv (green R&D investments) data for 1990-1998. Heshmati and Tsionas used Excel’s autofill with three years of data (1999, 2000, and 2001) to create data for the nine missing years. The imputed values for 1990-1996 were negative, so the authors set these equal to the positive 1997 value.

Author(s): Gary Smith

Publication Date: 21 Feb 2024

Publication Site: Retraction Watch

Exclusive: Elsevier to retract paper by economist who failed to disclose data tinkering

Link: https://retractionwatch.com/2024/02/22/exclusive-elsevier-to-retract-paper-by-economist-who-failed-to-disclose-data-tinkering/

Excerpt:

A paper on green innovation that drew sharp rebuke for using questionable and undisclosed methods to replace missing data will be retracted, its publisher told Retraction Watch.

Previous work by one of the authors, a professor of economics in Sweden, is also facing scrutiny, according to another publisher. 

As we reported earlier this month, Almas Heshmati of Jönköping University mended a dataset full of gaps by liberally applying Excel’s autofill function and copying data between countries – operations other experts described as “horrendous” and “beyond concern.”

Heshmati and his coauthor, Mike Tsionas, a professor of economics at Lancaster University in the UK who died recently, made no mention of missing data or how they dealt with them in their 2023 article, “Green innovations and patents in OECD countries.” Instead, the paper gave the impression of a complete dataset. One economist argued in a guest post on our site that there was “no justification” for such lack of disclosure.

Elsevier, in whose Journal of Cleaner Production the study appeared, moved quickly on the new information. A spokesperson for the publisher told us yesterday: “We have investigated the paper and can confirm that it will be retracted.”

Author(s): Frederik Joelving

Publication Date: 22 Feb 2024

Publication Site: Retraction Watch

[109] Data Falsificada (Part 1): “Clusterfake”

Link: https://datacolada.org/109

Graphic:

Excerpt:

Two summers ago, we published a post (Colada 98: .htm) about a study reported within a famous article on dishonesty (.htm). That study was a field experiment conducted at an auto insurance company (The Hartford). It was supervised by Dan Ariely, and it contains data that were fabricated. We don’t know for sure who fabricated those data, but we know for sure that none of Ariely’s co-authors – Shu, Gino, Mazar, or Bazerman – did it [1]. The paper has since been retracted (.htm).

That auto insurance field experiment was Study 3 in the paper.

It turns out that Study 1’s data were also tampered with…but by a different person.

That’s right:
Two different people independently faked data for two different studies in a paper about dishonesty.

The paper’s three studies allegedly show that people are less likely to act dishonestly when they sign an honesty pledge at the top of a form rather than at the bottom of a form. Study 1 was run at the University of North Carolina (UNC) in 2010. Gino, who was a professor at UNC prior to joining Harvard in 2010, was the only author involved in the data collection and analysis of Study 1 [2].

Author(s): Uri Simonsohn, Leif Nelson, and Joseph Simmons

Publication Date: 17 Jun 2023

Publication Site: Data Colada

Underdispersion in the reported Covid-19 case and death numbers may suggest data manipulations

Link: https://www.medrxiv.org/content/10.1101/2022.02.11.22270841v1

doi: https://doi.org/10.1101/2022.02.11.22270841

Graphic:

Abstract:

We suggest a statistical test for underdispersion in the reported Covid-19 case and death numbers, compared to the variance expected under the Poisson distribution. Screening all countries in the World Health Organization (WHO) dataset for evidence of underdispersion yields 21 country with statistically significant underdispersion. Most of the countries in this list are known, based on the excess mortality data, to strongly undercount Covid deaths. We argue that Poisson underdispersion provides a simple and useful test to detect reporting anomalies and highlight unreliable data.

Author(s): Dmitry Kobak

Publication Date: 13 Feb 2022

Publication Site: medRXiV

Are some countries faking their covid-19 death counts?

Link: https://www.economist.com/graphic-detail/2022/02/25/are-some-countries-faking-their-covid-19-death-counts

Graphic:

Excerpt:

Irregular statistical variation has proven a powerful forensic tool for detecting possible fraud in academic research, accounting statements and election tallies. Now similar techniques are helping to find a new subgenre of faked numbers: covid-19 death tolls.

That is the conclusion of a new study to be published in Significance, a statistics magazine, by the researcher Dmitry Kobak. Mr Kobak has a penchant for such studies—he previously demonstrated fraud in Russian elections based on anomalous tallies from polling stations. His latest study examines how reported death tolls vary over time. He finds that this variance is suspiciously low in a clutch of countries—almost exclusively those without a functioning democracy or a free press.

Mr Kobak uses a test based on the “Poisson distribution”. This is named after a French statistician who first noticed that when modelling certain kinds of counts, such as the number of people who enter a railway station in an hour, the distribution takes on a specific shape with one mathematically pleasing property: the mean of the distribution is equal to its variance.

This idea can be useful in modelling the number of covid deaths, but requires one extension. Unlike a typical Poisson process, the number of people who die of covid can be correlated from one day to the next—superspreader events, for example, lead to spikes in deaths. As a result, the distribution of deaths should be what statisticians call “overdispersed”—the variance should be greater than the mean. Jonas Schöley, a demographer not involved with Mr Kobak’s research, says he has never in his career encountered death tallies that would fail this test.

….

The Russian numbers offer an example of abnormal neatness. In August 2021 daily death tallies went no lower than 746 and no higher than 799. Russia’s invariant numbers continued into the first week of September, ranging from 792 to 799. A back-of-the-envelope calculation shows that such a low-variation week would occur by chance once every 2,747 years.

Publication Date: 25 Feb 2022

Publication Site: The Economist

F.B.I. Investigating Whether Cuomo Aides Gave False Data on Nursing Homes

Link: https://www.nytimes.com/2021/03/19/nyregion/cuomo-nursing-homes-covid.html

Excerpt:

The state revealed the full count — which added thousands of additional deaths — only in January, after a report by the state attorney general suggested an undercount, and after a state court ordered the data be made public in response to a lawsuit filed by the Empire Center, a conservative think tank. As of this month, New York has recorded the deaths of more than 15,000 nursing home residents with Covid-19.

Melissa DeRosa, Mr. Cuomo’s top aide, tried to explain why the administration had withheld the data last year to state lawmakers in a conference call, saying she and others “froze” because of the federal request for data, which came in late August as the governor faced criticism over nursing homes.

But more than two months earlier, in June, Ms. DeRosa and other aides removed such data from a report prepared by the Health Department, an investigation by The New York Times found.

Author(s): J. David Goodman, Nicole Hong and Luis Ferré-Sadurní

Publication Date: 19 March 2021

Publication Site: New York Times

Federal probe into nursing home COVID-19 death coverup circles closer to Cuomo

Link: https://nypost.com/2021/03/19/federal-probe-into-nursing-home-covid-death-coverup-gets-closer-to-cuomo/

Excerpt:

Investigators have contacted lawyers for Cuomo’s aides, interviewed senior state Health Department officials and subpoenaed the governor’s office for documents relating to the alleged data coverup, the sources said.

The New York Times first reported the probe earlier Friday, citing four unnamed sources with knowledge of the investigation.

Health officials are being grilled about nursing home-related COVID-19 case and death data the state submitted last year to the Justice Department, sources told The Post.

Author(s): Larry Celona

Publication Date: 19 March 2021

Publication Site: NY Post

Cuomo Administration Questioned CDC Official About Covid-19 Nursing-Home Death Data

Link: https://www.wsj.com/articles/cuomo-administration-questioned-u-s-health-officials-about-nursing-home-data-11615408672

Excerpt:

When New York Gov. Andrew Cuomo’s administration learned last year that the federal government was about to release data on Covid-19 deaths in nursing homes, state officials were concerned: Would the federal numbers tell the public a different story than the state’s own?

…..
No other state requested details about the data release, according to the official. The federal health officials on the call described the scope of the data, how it was collected and how it compared with the state’s data.

When it was released, the federal data included fatalities inside both nursing homes and hospitals. But the federal tally of 3,525 deaths was lower than the state’s total of 5,944 because nursing homes weren’t required to report deaths to the federal government from March and April, the deadliest period for the state in the pandemic.

Author(s): Joe Palazzolo, Jimmy Vielkind

Publication Date: 10 March 2021

Publication Site: Wall Street Journal

Why Cuomo Cooked the Books on Nursing-Home Deaths

Link: https://www.nationalreview.com/2021/03/why-cuomo-cooked-the-books-on-nursing-home-deaths/

Excerpt:

What DeRosa told lawmakers had them aghast. Not only had Cuomo misled them; he had, in DeRosa’s telling, done it in order to keep relevant information hidden from U.S. investigators. If the latter were true, Cuomo administration officials could well be guilty of federal-obstruction and false-statements crimes. In other words, so shameful was their actual reason for covering up nursing-home deaths — namely, to make a wayward governor look like a fantasy hero — that Cuomo administration officials figured it was better to be seen as potentially felonious than to admit their crude political motivation.

As the New York Times reported on Thursday night, in the spring of 2020, DeRosa and other members of Cuomo’s inner circle, who have no public-health background, studiously purged the nursing-home death data from a report compiled by state health officials. The Justice Department was not eyeing them at the time. That happened months later, in August, when the feds began seeking information about the treatment of, and record-keeping about, COVID-stricken nursing-home residents by New York and three other states.

So what was going on at the time of the purge? Well — whaddya know! — it turns out that was just when Cuomo was quietly securing the state ethics approvals that would permit him to earn outside income from a book he’d decided to write. The book would inform the world about his unparalleled mastery of the COVID crisis — which, oddly enough, he contemplated as a work of nonfiction.

Author(s): Andrew McCarthy

Publication Date: 6 March 2021

Publication Site: National Review