Thursday, April 25, 2019

"And bad mistakes/I've made a few"*: another media violence experiment gone wrong

I have covered facets Tian, Zhang, Cao, and Rodkin (2016) in two previous posts. One post covered an allegedly non-significant 3-way interaction that, based on what was reported, would have been significant and that turned out to be identical to another paper authored by this same set of researchers. You can read what I had to say here. I also questioned the reporting of the Stroop task used by Tian et al. (2016). You can read what I had to say about that task here.

Now I just want to concentrate on some data reporting errors that really anyone who can download a pdf file and upload it to would be able to detect.

Out of the eight statistical tests Statcheck could analyze (remember that the ninth did not have the necessary degrees of freedom to allow Statcheck to do its thing), three of those statistical tests were shown to have not only errors, but decision errors. That means that the authors made a conclusion about statistical significance that was wrong based upon the numbers they reported. Here are the three decision errors that Statcheck detected:

1. The Stimulus by Gender interaction was reported as F(1, 157) =1.67, p < 0.01. In actuality, if this F is correct, then p = 0.19816. In other words, there would be no significant interaction to report. Any further subsample analyses are arguably beside the point if that is the case.

2. In fact, to bolster my argument about point 1, let's look at the next decision error. The effect of the stimulus on males was reported as significant, F(1, 210) =3.41, p < 0.01. Not so fast. According to Statcheck the actual p = 0.06621. The effect of Stimulus on the male subsample was nonsignificant.

3. Finally, let's note that the authors report a significant Stimulus by Aggressive Personality Type interaction, F(1, 227) =1.78, p < 0.01. Wrong. According to the Statcheck analysis, the actual p = 0.18349. That interaction was not significant.

As noted earlier, the authors reported a 3-way interaction as nonsignificant, when in all actuality it would have to have been. That means that a very subtle and nuanced analysis of the results never happened, leading us to question the validity of the authors' conclusions.

Statcheck is a wonderful tool for post-peer-review. It is of course limited in what it can do, and ultimately is dependent upon what the authors report. It is no substitute for having an existing data set available for those wishing to reproduce the authors' findings. However, in a pinch, Statcheck comes in handy as a preliminary indicator of what may have gone right and what may have gone wrong.

I am sure there is plenty more that could be asked about this article. The choice to divide up the AQ total score in the way the authors did was arguably arbitrary. I have to wonder if it would have been better to test for an interaction of Stimulus by Aggressive Personality Type using regression analysis instead. That is obviously a more complicated set of analyses, but one can gain some information that might be lost when splitting the scores of the participants on the Buss and Perry AQ as the authors did here and in other articles they have published. Perhaps I am splitting hairs. The lack of development of an aggressive personality instrument that would take into consideration the nuances of the Chinese language (at least Mandarin), including validation, is perhaps more troubling. The authors seem somewhat cognizant that merely translating a pre-existing instrument into their native language is not ideal. It would have been helpful if the authors had at least reported their own reliability numbers, rather than continuing to rely on those published in the original Buss and Perry paper, especially given that these authors had no interest in any of the subscales for the purposes of their research maybe only a Coefficient Alpha for the total AQ score would have sufficed. Referring to the dependent variable as aggression when it really is nothing more than a measure of aggressive cognition is certainly confusing if not a bit misleading. Having seen that terminology botched in enough articles over the years, I suppose that has become one of my pet peeves.

It is what it is. Once more a paper from this particular lab is one I would not trust and would be very hesitant to cite.


Tian, J. , Zhang, Q. , Cao, J. and Rodkin, P. (2016). The Short-Term Effect of Online Violent Stimuli on Aggression. Open Journal of Medical Psychology, 5, 35-42. doi: 10.4236/ojmp.2016.52005

*The quote in the title comes from the song, We Are The Champions by Queen. It is one of my favorite songs.

