I don't think I need to re-litigate my reasoning for recommending a rejection when I was a peer reviewer on that particular article, nor my disappointment that the article still was published anyway. Water under the bridge. I think what I want to do is to share some screen shots of the analyses in question as well as to note a few other odds and ends that always bugged me about that particular paper.
I am keeping my focus to Study 2, as that seems to be the portion of the paper that is most problematic. Keep in mind that there were 240 children who participated in the experiment. One of the burning questions is why the degrees of freedom in the denominator for so many of the analyses were so low. As the authors provided no descriptive statistics (including n's) it is often difficult to know exactly what is happening, but I might have a guess. If you follow the Zhang lab's progression since near the start of this decade, sample sizes have increased in their published work. I have a sneaking hunch that the authors copied and pasted text from prior articles and did not necessarily adequately update the degrees of freedom reported. The df for simple effects analyses may actually be correct, but there is no real way of knowing given the lack of descriptive statistics reported.
One problem is that there seemed to be something of a shifting dependent variable (DV). In the first analysis where the authors attempted to establish a main effect, the authors only used the mean reaction times (rt) for aggressive words as the DV. In subsequent analyses, the authors used a mean difference in reaction times (rt neutral minus rt aggressive) as the DV. That created some confusion already.
So let's start with the main analysis, as I do have a screen shot I used in a tweet a while back:
Let's look at the rest of the results. The Game Type by Gender interaction analyses were, um, a bit unusual.
How about the analyses examining a potential interaction of game type and trait aggressiveness? That doesn't exactly look great:
It also helps to go back and look at the Method section and see how the authors determined how many trials each participant would experience in the experiment:
As I stated previously:
The authors selected 60 goal words for their reaction time task: 30 aggressive and 30 non-aggressive. These goal words are presented individually in four blocks of trials. The authors claim that their participants completed 120 trials total, when the actual total would appear to be 240 trials. I had fewer trials for adult participants in an experiment I ran over a couple decades ago and that was a nearly hour-long ordeal for my participants. I can only imagine the heroic level of attention and perseverance required of these children to complete this particular experiment. I do have to wonder if the authors tested for potential fatigue or practice effects that might have been detectable across blocks of trials. Doing so was standard operating procedure in our lab in the Aggression Lab at Mizzou back in the 1990s. Reporting those findings would have also been done - at least in a footnote when submitted for publication.
Finally, I just want to say something about the way the authors described the personality measure they used. The authors appeared to be interested in obtaining an overall assessment of aggressiveness. The Buss & Perry AQ is arguably defensible for such an endeavor. The authors have a tendency to repeat the original reliability coefficients reported by Buss and Perry (1992), but given that the authors only examined overall trait aggressiveness, and given that they presumably had to translate this instrument into Chinese, the authors would have been better served by reporting the reliability coefficient(s) that they specifically obtained, rather than doing little more than copying and pasting the same basic statement they make in other papers published by the Zhang lab. It really takes getting to the General Discussion section before the authors even obliquely mention that this instrument was translated, as well as to more specifically recommend an adaptation of the BPAQ specifically for Chinese-speaking and reading individuals.
This was a paper that had so many question marks that it should never have been published in the first place. That it did somehow slip through the peer review system is indeed unfortunate. If the authors are unable or unwilling to make the necessary corrections, it is up to the editorial team at the journal to do so. I hope that they will in due time. I know that I have asked.
If not retracted, any corrections would have to report the necessary descriptive statistics upon which the analyses for Study 2 were based, as well as provide the correct inferential statistics: accurate F-tests, df, and p-values. Yes, that means tables would be necessary. That is not a bad thing. The actual Coefficient Alphas used in the specific study for their version of the BPAQ should be reported, instead of simply repeating what Buss and Perry reported for the original English language version of the instrument in the previous century. The editorial team should insist on examining the original data themselves so that they can confirm that any corrections made are indeed correct, or so that they can determine that the data set is so hopelessly botched that the findings reported cannot be trusted, hence necessitating a retraction.
How all this landed on my radar is really just the luck of the draw. I was asked to review a paper in 2014, and I had the time and interest in doing so. The topic of the paper was in my wheelhouse, so I agreed to do so. I recommended a rejection, which in hindsight was sound. I moved on with my life. A couple years later I would read a weapons priming effect paper that was really odd and with reported analyses that were difficult to trust. I didn't make the connection until an ex-coauthor of mine appeared on a paper that appeared to have originated from this lab. At that point I scoured the databases until I could locate every English-language paper published by this lab, and discovered that this specific paper - which I recommended rejecting - had been published as well. In the process, I was able to notice that there was a distinct similarity among all the papers - how they were formatted, the types of analyses, and the types of data analytic errors. I realized pretty quickly that "holy forking shirtballs, this is awful." I honestly don't know if what I have read in this series of papers amounts to gross incompetence or fraud. I do know that it does not belong in the published record.
To be continued (unfortunately)....
Zheng, J., & Zhang, Q. (2016). Priming effect of computer game violence on children’s aggression levels. Social Behavior and Personality: An International Journal, 44(10), 1747–1759. doi:10.2224/sbp.2016.44.10.1747
Footnote: The lyric comes from the chorus in "German Shepherds" by Wire. Toward the the end of this post near the end of the last paragraph, I make a reference to some common expressions used in the TV series, The Good Place.