It never hurt to keep something of a cumulative record of one's activities when investigating any phenomenon, including secondary analyses.
In the case of the work produced in the lab of Qian Zhang, I have been trying to understand their work, and what appears to have gone wrong with their reporting, for some time. Unbeknownst to me at the time in 2014, I was already encountering one of the lab's papers when by the luck of the draw I was asked to review a manuscript that I would later find was coauthored by Zhang. As I have previously noticed, that paper had a lot of problems and I recommended as constructively as I could that the paper not be published. It was published anyway.
More explicitly, I found a weapons priming article published in Personality and Individual Differences at the start of 2016. It was an empirical study and one that fit the inclusion criteria for a meta-analysis that I was working on at the time. However, I ran into some really odd statistical reporting, leaving me unsure as to what I should use to estimate an effect size. So I sent what I thought was a very polite email to the corresponding author and heard nothing. After a lot of head-scratching, I figured out a way to extract effect size estimates that I felt semi-comfortable with. In essence the authors had no main effect for weapon primes on aggressive thoughts - and it showed in the effect size estimate and confidence intervals. That study really had a minimal impact on the overall mean effect size for weapon primes on aggressive cognitive outcomes in my meta-analysis. I ran analyses and later re-ran analyses and went on with my life.
I probably saw a tweet by Joe Hilgard who was reporting some oddities in another Zhang et al paper sometime in the spring of 2018. That got me wondering what all I was missing. I made a few notes, bookmarked what I needed to bookmark, and came back to that question a bit later in the summer of 2018 when I had a bit of time and breathing room. By this point I could comb through the usual archives, EBSCO databases, ResearchGate, and Google Scholar, and was able to hone in on a fairly small set of English-language empirical articles coauthored by Qian Zhang of Southwest University. I saved all the PDF files, and did something that I am unsure if anyone had done already: I ran the articles through statcheck. With one exception at the time, all the papers I ran through statcheck that had the necessary elements reported (test stat value, p-value, degrees of freedom) showed serious decision errors. In other words, the conclusions the authors were drawing in these articles were patently false based on what they had reported. I was also able to document that the reported degrees of freedom were inconsistent within articles, and often much smaller than the reported sample sizes. There were some very strange tables in many of these articles that presumably reported means and standard deviations but looked more like poorly constructed ANOVA summary tables.
I first began tweeting about what I was finding in mid-to-late September 2018. I think between some conversations via Twitter and email, I at least was convinced that I had spotted something odd, and that my conclusions so far as they went were accurate. Joe Hilgard was especially helpful in confirming what I had found, and then going well beyond that. Someone else honed in on inaccuracies in the reporting of the number of reaction time trials reported in this body of articles. So that went on throughout the fall of 2018. By this juncture, there were a few folks tweeting and retweeting about this lab's troubling body of work, some of these issues were documented by individuals in PubPeer, and editors were being contacted, with varying degrees of success.
By spring of this year, the first corrections were published - one in Youth and Society and a corrigendum in Personality and Individual Differences. To what extent those corrections can be trusted is still an open question. At that point, I began blogging my findings and concerns here, in addition to the occasion tweet.
This summer, a new batch of errata were made public concerning articles published in journals hosted by a publisher called Scientific Research. Needless to say, once I became aware of these errata, I downloaded those and examined them. That has consumed a lot of space on this blog since. As you are now well aware, these errata themselves require errata.
I think I have been clear about my motivation throughout. Something looked wrong. I used some tools now at my disposal to test my hunch and found that my hunch appeared to be correct. I then communicated with others who are stakeholders in aggression research, as we depend on the accuracy of the work of our fellow researchers in order to get to as close an approximation of the truth as is humanly possible. At the end of the day, that is the bottom line - to be able to trust that the results in front of me are a close approximation of the truth. If they are not, then something has to be done. If authors won't cooperate, maybe editors will. If editors don't cooperate, then there is always a bit of public agitation to try to shake things up. In a sense, maybe my role in this unfolding series of events is to have started a conversation by documenting what I could about some articles that appeared to be problematic. If the published record is made more accurate - however that must occur - I will be satisfied with the small part I was able to play in the process. Data sleuthing, and the follow-up work required in the process, is time-consuming and really cannot be done alone.
One other thing to note - I have only searched for English-language articles published by Qian Zhang's lab. I do not read or speak Mandarin, so I may well be missing out on a number of potentially problematic articles in Chinese-language psychological journals. If someone who does know of such articles wishes to contact me please do. I leave my DM open on Twitter for a reason. I would especially be curious to know if there are any duplicate publications of data that we are not detecting.
As noted before, how all this landed on my radar was really just the luck of the draw. A simple peer review roughly five years ago, and a weird weapons priming article that I read almost four years ago were what set these events in motion. Maybe I would have noticed something was off regardless. After all, this lab's work is in my particular wheelhouse. Maybe I would not have. Hard to say. All water under the bridge now. What is left is what I suspect will be a collective effort to get these articles properly corrected or retracted.
No comments:
Post a Comment