The primary hypothesis of interest in Berkowitz and LePage (1967) was an interaction of exposure to weapons (rifles vs. badminton racquets/no stimuli) and provocation (high vs low) on aggression (which was measured in terms of number electric shocks participants believed they were giving to the person who had just given them feedback. The authors predicted that the interaction would be statistically significant and would find that those participants who had been highly provoked and had short-term exposure to rifles would show by far the highest level of aggression. Their hypothesis was successfully confirmed. However, subsequent efforts to replicate that interaction were rather inconsistent. This space is not the place to repeat that history, as I have discussed it elsewhere on this blog and in recent literature reviews published in the National Social Science Journal (Benjamin 2019) and in an encyclopedic chapter (Benjamin, 2021). In the Benjamin et al. (2018) meta-analysis, we did look very broadly at whether or not the weapons effect appeared stronger in studies in highly provoked conditions than in neutral/low-provoked conditions. There appeared to be a trend, although there were some problems with our approach. We examined all DVs (cognitive, affective, appraisal, and behavioral) rather than just focus on behavioral. That probably inflated the naive effect size for the neutral/low provoking condition and deflated the naive effect size in the high provoking condition. That said, we can note that depending on the method of publication bias analysis used, when correcting for publication bias, there is some reason to doubt that the sort of noticeably strong effect in highly provoking conditions in which weapons were present was particularly robust.
Another way of testing the robustness of the weapons effect hypothesis as presented by Berkowitz and LePage (1967), which is central to the viability of Weapons Effect Theory (yes, as I have noted, that is a thing), is to use a method called p-curves. This is fairly straightforward to do. All I need is a collection of studies that have a test of an interaction between the mere presence or absence of weapons and some manipulation of provocation as independent variables, and a dependent measure of an aggressive behavioral outcome. There really isn't an overwhelming number of published articles, so the task of finding a collection of published research was fairly simple. I just had to look through my old collection of articles and look to make sure no one had added anything new that would satisfy my criteria. That turned out to not be a problem.
So to be clear, my inclusion criteria for articles (or studies within articles) were:
1. manipulation of short-term exposure to weapon (usually weapons vs. neutral objects)
2. manipulation of provocation (high vs. low/none)
3. explicit test of interaction of weapon exposure and provocation level
4. behavioral measure of aggression
Of the studies I am aware of, that left me with 11 potential studies to examine. Unfortunately, five were excluded for failing to report the necessary 2-way interaction or any simple effects analyses: Buss et al. (1972, Exp. 5), Page and O'Neal (1977), Caprara et al. (1984), Cahoon and Edmonds (1984, 1985). The remaining six were entered into a p-curve disclosure table: Berkowitz and LePage (1967), Fischer et al., (1969), Ellis et al. (1971), Page and Scheidt (1971, Exp. 1), Frodi, (1975), and Leyens and Parke (1975). Once I completed the table, I entered the relevant test statistics, degrees of freedom, and p-values into the p-curve app 4.06. Three studies were excluded from the analysis due to the 2-way interaction being non-significant. Of the remaining three studies included, here is the graph:
As you might guess, there isn't a lot of diagnostic information to go on. According to the summary that was printed:
P-Curve analysis combines the half and full p-curve to make inferences about evidential value. In particular, if the half p-curve test is right-skewed with p<.05 or both the half and full test are right-skewed with p<.1, then p-curve analysis indicates the presence of evidential value. This combination test, introduced in Simonsohn, Simmons and Nelson (2015 .pdf) 'Better P-Curves' paper, is much more robust to ambitious p-hacking than the simple full p-curve test is.
Here neither condition is met; hence p-curve does not indicate evidential value.
Similarly, p-curve analysis indicates that evidential value is inadequate or absent if the 33% power test is p<.05 for the full p-curve or both the half p-curve and binomial 33% power test are p<.1. Here neither condition is met; so p-curve does not indicate evidential value is inadequate nor absent.
From the available evidence, there does not appear to be much to suggest that the Weapons Effect as initially proposed by Berkowitz and LePage (1967) was either replicable or provided adequate evidence to suggest that the effect as first proposed was worth further exploration. There might have been some alternatives worth further exploration (including Ellis et al. (1971) who wanted to explore associations we might make via operant conditioning). That work seemed to fizzle out. We know what happened next. Around the mid-1970s, researchers interested in this area of inquiry concentrated their efforts to explore the short term effect of weapons only under highly provoking condition (with a few exceptions in field research) and took the original finding as gospel. By 1990, the Carlson et al. (1990) meta-analysis seemed to make it official, and research shifted to cognitive priming effects. I will try a p-curve analysis on that line of weapons effect research next, as the cognitive route appears to be the primary mechanism for a behavioral effect to occur.