Sunday, July 31, 2022

The NeverEnding Story: Zheng and Zhang (2016) Pt. 3

Whenever I have a few seconds of spare time and feel like torturing myself, I go back to reading a paper I have blogged about previously (see here and here). Each reading reveals more errors, and my remarks over the previous blog posts reflect that. Initially I thought Study 1 was probably okay or less problematic than Study 2. However, Study 1 is every bit as problematic as Study 2. I think I was so overwhelmed by the insane amount of errors in Study 2 that I had no energy left to devote to Study 1. And I do want to circle around to Study 1. But first, I want to add one more remark to Study 2.

With regard to Study 2, I focused on the very odd reporting of degrees of freedom (df) for each statistical analysis, given that the experiment had 240 participants. I showed that if we were to believe those df to be correct (hint: we shouldn't), there were several decision errors. And to top it off, the authors try to write off what appears to be a statistically significant 3-way interaction as non-significant. That would still be the case even if the appropriate df were reported. The so-called main effect of violent video games on reaction time to aggressive versus non-aggressive goal words was inadequate. As noted before, not only were the df undoubtedly wrong, but the analysis does not compare the difference in reaction times between the treatment and control conditions. I would have expected either a 2x2 ANOVA demonstrating the interaction or the authors to compute the differences (in milliseconds) between aggressive and non-aggressive goal words for both the treatment and control groups, and then to compute the appropriate one-way ANOVA or t-test. Anderson et al (1998) took this latter approach and were quite successful. At least the authors offered means for that main analysis. In subsequent analyses, the authors quickly dispense with reporting means at all. In no case do the authors report standard deviations. That's the capsule summary of my critique up to this point. Now to add the proverbial cherry on top: the one time that the authors do report the mean and standard deviation together was when reporting the age of the participants, and even then the authors manage to make a mess of things. 

Recall that the authors had a sample of 240 children ranging in age from 9 to 12 years for Study 2. The mean age for the participants was 11.66 with a standard deviation of 1.23. Since age can be treated as integer data, I used a post-peer-review tool called SPRITE to make sure that the mean age and standard deviation were mathematically possible. To do so, I entered the range of possible ages (as provided by the authors), the target mean and standard deviation, and the maximum number of distributions to generate. To my chagrin, I got an error message. Specifically, I was informed by SPRITE that the target standard deviation I had provided, based on what the authors reported, was too large. The largest mathematically plausible standard deviation was 1.17. Even something as elementary as the mean and standard deviation of participants' age gets messed up. You can try SPRITE for yourself and determine if what I am finding is correct. My guess is you will. Below is the result I obtained. I prefer to show my work.





So Study 2 is not to be trusted at all. What about Study 1? It's a mess for its own reasons. I'll circle back to that in a future post.

Friday, July 29, 2022

A Blast From the Past: Retractions and Meta-Analysis Edition

I stumbled across this article, Media and aggression research retracted under scrutiny, and found it to be an interesting short read. The article's author chronicles some recent retractions, and what had been another on-going investigation of several papers coauthored by Qian Zhang of Southwest University. I've written enough about his work over the last few years. I think referring to many of Zhang's papers having "been called into question" is a fair assessment. 

Part of the story chronicles Samuel West, who included one of Zhang's papers in his meta-analysis at the request of a reviewer. His meta-analysis would undergo another round of peer review around the time he learned of that particular Zhang paper being under investigation at the same journal. Ouch. West certainly has legitimate concerns about including a potentially dodgy finding in his meta-analysis. In this case, the paper by Zhang and colleagues was not retracted, but I am sure West has his misgivings about including the paper in his database in the first place. I can certainly empathize. My most recent published meta-analysis included one of Zhang's papers that would eventually get retracted early this year. That said, there are plenty of papers generated from Zhang's lab with obvious problems, or, in the case of his more recent work, have problems that are more cleverly hidden. I agree with Amy Orben that the fact that problematic studies continue to remain in journals and meta-analyses is "a major problem" when we think about how politicized media violence research is. Requiring archiving of data, data analyses, and research protocols probably helps to the extent that it is required - at least anything that might be incorrect or fraudulent can more easily be sniffed out. Otherwise, one can only hope for sleuths with enough time on their hands and no concerns for career repercussions for blowing the whistle on published papers that should have never seen the light of day. Good luck with that.

I do take issue with Zhang's characterization of Hilgard as someone who is "just trying to make his name based just on claiming that everyone else does bad research." I get that Zhang is a bit sore about the retractions, and Hilgard was the person who contacted Zhang and a plethora of journal editors regarding the papers in question. That said, there was plenty of chatter about Zhang's work in 2018 and onward, and there were probably several of us who just wanted to know that we hadn't gone insane, and that the obvious data errors, including degrees of freedom that were inaccurate, means and standard deviations that were mathematically impossible, and tables that made no sense really were what we thought they were. Hilgard was far and away better connected to the sphere of media violence research as an active researcher himself, and had the data analytic know-how and the connections that come with being at a R-1 university to do what needed to be done. Aside from that, Hilgard made plenty of positive contributions to the methodology side of psychological science, and from interacting with him online and in person over the years, I'll simply say he's a good person to know. 

I think this article is somewhat helpful in pointing out that even those who believe there is a link between violent content in media (such as video games) and aggression can view Zhang's work and see it for what it is, and express an appropriate level of skepticism. At the end of the day, one can take a philosophical perspective that there is "no one right way to look at the data" and that's all well and good. But at the end of the day, if the analyses show decision errors, and the means and standard deviations forming the basis for those analyses are simply mathematically impossible, the only reasonable conclusion that can be made is that the data and analyses in their present form cannot be accepted as valid. 

The only bone I really have to pick is that the author characterizes the body of media violence research as asking the question of whether or not "violent entertainment causes violence". Although I am aware that there are researchers in this area of inquiry who would draw that conclusion, there are plenty of other investigators who view what we can learn based on our available methods much more cautiously (a lot of aggression is mild, after all). There are also plenty of skeptics who doubt that there is any link between media violence and even the mild forms of aggression that we can measure. As far as I am aware, there is no link between exposure to violent content in mass media and violent behavior in everyday life. All that said, this is a useful article that captures a series of events that I know quite intimately. 

Suddenly, I am in the mood for some cartoon violence. I think I'll watch some early episodes of Rick and Morty. Goodnight.

Monday, July 25, 2022

The 50th anniversary of the article that brought an end to the Tuskegee Syphilis Study

Here's an article I strongly recommend reading. The study itself is something I and my colleagues in my department discuss in our methods courses as an example of flagrantly unethical research. Although not a psychology study by any stretch, it is a cautionary tale of the abuses that have occurred (and can potentially occur) that exploit marginalized people.

Friday, July 22, 2022

Food for thought

 Read Academe is is suffering from foreign occupiers: Lessons from Vaclev Havel for a profession in decline. In this case the problem is one of how the academy is run, which is very much top-down, with an emphasis on branding trumping pretty much everything else. In some senses, it is reminiscent of existence in the Warsaw Pact version of Eastern Europe, as this author sees it. And we are suffering a brain drain in faculty and students as a consequence.

Friday, February 4, 2022

A long-overdue retraction

After sounding the alarm bells several years ago, a paper that I had failed to get retracted (the editor of PAID at the time offered a superficially "better" Corrigendum in 2019 instead) is now officially retracted. Dr. Joe Hilgard really put the work in to make it happen. Here is his story:

The saga of this weapons priming article is over. There are plenty of articles remaining that have yet to be adequately scrutinized.

Sunday, November 7, 2021

The interaction of mere exposure to weapons and provocation: A preliminary p-curve analysis

The primary hypothesis of interest in Berkowitz and LePage (1967) was an interaction of exposure to weapons (rifles vs. badminton racquets/no stimuli) and provocation (high vs low) on aggression (which was measured in terms of number electric shocks participants believed they were giving to the person who had just given them feedback. The authors predicted that the interaction would be statistically significant and would find that those participants who had been highly provoked and had short-term exposure to rifles would show by far the highest level of aggression. Their hypothesis was successfully confirmed. However, subsequent efforts to replicate that interaction were rather inconsistent. This space is not the place to repeat that history, as I have discussed it elsewhere on this blog and in recent literature reviews published in the National Social Science Journal (Benjamin 2019) and in an encyclopedic chapter (Benjamin, 2021). In the Benjamin et al. (2018) meta-analysis, we did look very broadly at whether or not the weapons effect appeared stronger in studies in highly provoked conditions than in neutral/low-provoked conditions. There appeared to be a trend, although there were some problems with our approach. We examined all DVs (cognitive, affective, appraisal, and behavioral) rather than just focus on behavioral. That probably inflated the naive effect size for the neutral/low provoking condition and deflated the naive effect size in the high provoking condition. That said, we can note that depending on the method of publication bias analysis used, when correcting for publication bias, there is some reason to doubt that the sort of noticeably strong effect in highly provoking conditions in which weapons were present was particularly robust. 

Another way of testing the robustness of the weapons effect hypothesis as presented by Berkowitz and LePage (1967), which is central to the viability of Weapons Effect Theory (yes, as I have noted, that is a thing), is to use a method called p-curves. This is fairly straightforward to do. All I need is a collection of studies that have a test of an interaction between the mere presence or absence of weapons and some manipulation of provocation as independent variables, and a dependent measure of an aggressive behavioral outcome. There really isn't an overwhelming number of published articles, so the task of finding a collection of published research was fairly simple. I just had to look through my old collection of articles and look to make sure no one had added anything new that would satisfy my criteria. That turned out to not be a problem. 

So to be clear, my inclusion criteria for articles (or studies within articles) were:

1. manipulation of short-term exposure to weapon (usually weapons vs. neutral objects)

2. manipulation of provocation (high vs. low/none)

3. explicit test of interaction of weapon exposure and provocation level

4. behavioral measure of aggression

Of the studies I am aware of, that left me with 11 potential studies to examine. Unfortunately, five were excluded for failing to report the necessary 2-way interaction or any simple effects analyses: Buss et al. (1972, Exp. 5), Page and O'Neal (1977), Caprara et al. (1984), Cahoon and Edmonds (1984, 1985). The remaining six were entered into a p-curve disclosure table: Berkowitz and LePage (1967), Fischer et al., (1969), Ellis et al. (1971), Page and Scheidt (1971, Exp. 1), Frodi, (1975), and Leyens and Parke (1975). Once I completed the table, I entered the relevant test statistics, degrees of freedom, and p-values into the p-curve app 4.06. Three studies were excluded from the analysis due to the 2-way interaction being non-significant. Of the remaining three studies included, here is the graph:



As you might guess, there isn't a lot of diagnostic information to go on. According to the summary that was printed:

P-Curve analysis combines the half and full p-curve to make inferences about evidential value. In particular, if the half p-curve test is right-skewed with p<.05 or both the half and full test are right-skewed with p<.1, then p-curve analysis indicates the presence of evidential value. This combination test, introduced in Simonsohn, Simmons and Nelson (2015 .pdf) 'Better P-Curves' paper, is much more robust to ambitious p-hacking than the simple full p-curve test is.

Here neither condition is met; hence p-curve does not indicate evidential value.

Similarly, p-curve analysis indicates that evidential value is inadequate or absent if the 33% power test is p<.05 for the full p-curve or both the half p-curve and binomial 33% power test are p<.1. Here neither condition is met; so p-curve does not indicate evidential value is inadequate nor absent.

From the available evidence, there does not appear to be much to suggest that the Weapons Effect as initially proposed by Berkowitz and LePage (1967) was either replicable or provided adequate evidence to suggest that the effect as first proposed was worth further exploration. There might have been some alternatives worth further exploration (including Ellis et al. (1971) who wanted to explore associations we might make via operant conditioning). That work seemed to fizzle out. We know what happened next. Around the mid-1970s, researchers interested in this area of inquiry concentrated their efforts to explore the short term effect of weapons only under highly provoking condition (with a few exceptions in field research) and took the original finding as gospel. By 1990, the Carlson et al. (1990) meta-analysis seemed to make it official, and research shifted to cognitive priming effects. I will try a p-curve analysis on that line of weapons effect research next, as the cognitive route appears to be the primary mechanism for a behavioral effect to occur.


Saturday, November 6, 2021

An interesting take on the 2021 Virginia Gubernatorial election

The Virginia Gubernatorial election was a predictably close one, and one that led to a GOP candidate winning this particular off-year election. I suppose there are any of a number of takes to be had. One factor that had my attention was the GOP candidate's (Youngkin) focus on Critical Race Theory (CRT) which is probably part of the curriculum in the context of advanced coursework in Legal Studies, but not a factor in the K-12 system. Youngkin made it a point to advocate for parents having "more of a say" in their kids' education, in the context of the moral panic over CRT that has developed over the past year. Did Youngkin's strategy work? The answer turns out to be complicated. Those who enjoy poring over the cross-tabs in public opinion polls found that it succeeded, but not in the way that it has been spun in the media:


The network exit poll, released on Nov. 2, showed the same pattern. Youngkin got 62 percent of the white vote and 13 percent of the Black vote, a gap of 49 points. But among voters who said parents should have a lot of “say in what schools teach”—about half the electorate—he got 90 percent of the white vote and only 19 percent of the Black vote, a gap of 71 points. The idea that parents should have more say in the curriculum—Youngkin’s central message—had become racially loaded. And the loading was specific to race: Other demographic gaps for which data were reported in the exit poll—between men and women, and between white college graduates and whites who hadn’t graduated from college—get smaller, not bigger, when you narrow your focus from the entire sample to the subset of voters who said parents should have a lot of say in what schools taught. Only the racial gap increases.

The exit poll didn’t ask voters about CRT, but it did ask about confederate monuments on government property. Sixty percent of white voters said the monuments should be left in place, not removed, and 87 percent of those voters went to Youngkin. That was 25 points higher than his overall share of white voters. The election had become demonstrably polarized, not just by race but by attitudes toward the history of racism. All the evidence indicates that Youngkin’s attacks on CRT played a role in this polarization. 

So, in a way, the strategy of honing in on this latest moral panic did work in gaining the favor of white voters, but that's it. As a newer Southern Strategy tactic, focusing on CRT, demonizing it, and tying it (inaccurately) to the public school systems is considerably more sophisticated than earlier efforts. The end result appears to be to further sow divisions among white voters, and between subsets of white voters and the rest of the voting population, in order to maintain hegemony. As a strategy, it may just work to an extent. School boards and more localized policymakers are ill-prepared for what awaits them in the upcoming months and years.