Let's focus on Study 1 of Zheng and Zhang (2016). It should have been fairly simple, at least in terms of data reporting. However it happened, the authors honed into two video games that they thought were equivalent in terms of any confound aside from violent content, and merely needed to run a pilot study to demonstrate that they could back that up with solid evidence. It should have been a slam-dunk.
Not so fast.
The good news is that, unlike Study 2, the analyses of the age data are actually mathematically plausible. That's swell. I noticed that the authors had some Likert-style questions to rate the games on a variety of dimensions, which makes sense. The scaling was reported to be on a 1 to 5 scale, in which 1 meant very low and 5 meant very high for each dimension. My intention was to focus on Tables 1 and 2. If a 1 to 5 Likert scale was used for each of the items used to rate the games, there were some problems. One glaring problem is that there is no way that there could be means above 5. And yet, for Violent Content and Violent Images dimensions, the mean was definitely above 5 in each case. That does not compute. I have no idea what scaling was used on the questionnaires actually used. I can perhaps assume a 1 to 7 Likert scale. Certainly doing so would make some means and standard deviations that seemed mathematically impossible seem at least with in the realm of plausibility. But there is no way to know. We do not have the data. We do not have any of the materials and protocols. We have to take everything on faith. I had intended to have a set of images of SPRITE analyses on Table 1 and Table 2, but didn't see the point.
Then we have the usual problem with degrees of freedom. With a 2x2 mixed ANOVA, with game type as a repeated measure and "gender" as a between-subjects factor, the degrees of freedom would not have deviated much from the sample size of 220. I think we can all agree with that. Degrees of freedom below 100 would be impossible. And yet the analyses reported do just that. It does not help much that Table 1 is mislabeled as t-test results. If we assumed paired sample t-tests, degrees of freedom for each item would have been 219. Again, the reported degrees of freedom do not compute.
What I can say with some certainty is that Zheng and Zhang (2016) should not be included in any meta-analysis addressing violent video games and aggression or media violence and aggression. My efforts to address some of these issues with the editorial staff never went very far. It's so funny how problems with a published paper lead editorial staff to go on vacation. I get it. I'd rather be out of town and away from email contact when someone emails (with evidence) concerns about a published paper. Unfortunately, if the data and analyses cannot be trusted, we have a problem. This is precisely the sort of paper that, once published, ends up included in meta-analyses. Meta-analysts who would rather exclude findings that are, at best, questionable will be pressured to include such papers anyway. How much that biases the overall findings is clearly a concern. And yet the attitude seems to be to let it go. The attitude is that the status quo is sufficient. One flawed study surely could not hurt that much? We simply don't know. The same lab persisted, with samples of over 3,000, to publish research relevant to media violence researchers. Several of those papers ended up retracted. Others probably should have been, but probably won't due to whatever political reasons one might imagine.
All I can say is the truth is there. I've tried to lay it out. If someone wants to run with it and help make our science a bit better, I welcome you and your efforts.