Tuesday, December 25, 2018

Rage Becomes Her

I noticed one of my articles got cited in a book by Soraya Chemali. Haven't yet had the opportunity to read it, although the odds are pretty good that it would be of interest to me (and members of my family). It's nice to see academic work used in works that are not explicitly scholarship - especially works that are not explicitly scholarship.

Monday, December 17, 2018

Something I hope to talk about soon

I got involved in a bit of data sleuthing a few months ago. We'll say this one is truly a team effort. I hope to talk about what we found and what resulted once we contacted journal editors, etc. For now, I can safely say that our efforts are bearing some fruit. Seeing a corrigendum to one of the problematic articles (one which was technically still in press, although published online) made my weekend. It is a start. Given that there is a bit of a pattern to the lab involved, and that we are not talking about isolated mistakes, I really hope the remaining editors act responsibly and in a timely manner. The problems with one article are relatively minor. With others, there are serious problems with data analyses as reported, poorly constructed tables, miscalculations about the number of trials in cognitive experiments, degrees of freedom that don't match the reported sample sizes, and potential self-plagiarism (the latter of which I know all too well).

For a long time I have told my undergrad methods students that peer review is a first line of defense, but it is far from perfect. Increasingly I am advocating for post-peer review, both where I have a public presence and in the classroom. I don't view this as an adversarial process - and in fact I hold nothing personally against any of the individuals authoring the articles in question. My concern, and I think anyone's concern, should be that we do our best to get it right. If independent individuals spot serious problems, we have an obligation to take those concerns seriously and work with those individuals and with our respective editors to correct whatever errors were made - for the sake of the psychological sciences.

Update: Just noticed a second corrigendum. There are easily a half dozen more to go. We'll see what happens, but it looks like journal editors are at least taking our concerns seriously.

Tuesday, December 11, 2018

Just a quick thought

A few months ago I shared some analyses showing that there appeared to be an interesting moderator of the influence of weapons on behavioral outcomes: an allegiance effect. The effect size for studies run by former students, post-docs, and coauthors of Berkowitz was moderate, but the effect size for independent researchers was essentially negligible. What to make of that is ultimately going to be speculation. I think it is worthwhile simply to note that this moderator appears to be an important one for making some sense of what was done in this line of research, especially from the late 1960s through the early 1980s. I'll also reiterate something else that seems to be strikingly obvious: this appears to be a highly politicized area of research. Just re-reading literature reviews from proponents and skeptics, it is very apparent that the players involved at the time largely talked past each other. I am now more convinced than ever that we really need to see work by truly independent third parties, and all the better if they're running multi-lab large N registered replication reports (RRRs). The latter especially have been helpful in shedding some insight into the likely magnitude of other social psychology effects, and could do so here as well. I think Table 2 in my recently published meta-analysis provides something of a roadmap as to what might be happening with behavioral outcomes, and the evidence is far from comforting for anyone who is a proponent. That table is not necessarily a nail in the coffin either. Some properly powered behavioral research I believe is underway, and I am going to be eager for those findings to be made public. The evidence from that work will be critical to how I approach this line of research going forward.

Sunday, December 9, 2018

Rethinking Turner, Layton, and Simons (1975)

Let's revisit what is a frequently cited set of field experiments purporting to support the notion that the mere presence of a weapon influences aggressive behavior:
The weapons effect occurs outside of the lab too. In one field experiment,[2] a confederate driving a pickup truck purposely remained stalled at a traffic light for 12 seconds to see whether the motorists trapped behind him would honk their horns (the measure of aggression). The truck contained either a .303-calibre military rifle in a gun rack mounted to the rear window, or no rifle. The results showed that motorists were more likely to honk their horns if the confederate was driving a truck with a gun visible in the rear window than if the confederate was driving the same truck but with no gun. What is amazing about this study is that you would have to be pretty stupid to honk your horn at a driver with a military rifle in his truck—if you were thinking, that is! But people were not thinking—they just naturally honked their horns after seeing the gun. The mere presence of a weapon automatically triggered aggression.

The above description could come from practically any social psychology textbook describing the weapons effect, and probably serves as an exemplar for why I increasingly hate teaching classic experiments in my own field, except perhaps as cautionary tales. As the title suggests, this is a typical description of a series of experiments reported by Turner, Layton, and Simons (1975). Joe Hilgard aptly sums up what appeared to have happened:

Turner, Layton, and Simons (1975) report a bizzare experiment in which an experimenter driving a pickup truck loitered at a traffic light. When the light turned green, the experimenter idled for a further 12 seconds, waiting to see if the driver trapped behind would honk. Honking, the researchers argued, would constitute a form of aggressive behavior.

The design was a 3 (Prime) × 2 (Visibility) design. For the Prime factor, the experimenter's truck featured either an empty gun rack (control), a gun rack with a fully-visible .303-caliber military rifle and a bumper sticker with the word "Friend" (Friendly Rifle), or a gun rack with a .303 rifle and a bumper sticker with the word "Vengeance" (Aggressive Rifle). The experimenter driving the pickup was made visible or invisible by the use of a curtain in the rear window.

There were 92 subjects, about 15/cell. The sample is restricted to males driving late-model privately-owned vehicles for some reason.

The authors reasoned that seeing the rifle would prime aggressive thoughts, which would inspire aggressive behavior, leading to more honking. They run five different planned complex contrasts and find that the Rifle/Vengeance combination inspired honking relative to the No Rifle and Rifle/Friend combo, but only when the curtain was closed, F(1, 86) = 5.98, p = .017. That seems like a very suspiciously post-hoc subgroup analysis to me.

A second study in Turner, Layton, and Simons (1975) collects a larger sample of men and women driving vehicles of all years. The design was a 2 (Rifle: present, absent) × 2 (Bumper Sticker: "Vengeance", absent) design with 200 subjects. They divide this further by driver's sex and by a median split on vehicle year. They find that the Rifle/Vengeance condition increased honking relative to the other three, but only among newer-vehicle male drivers, F(1, 129) = 4.03, p = .047. But then they report that the Rifle/Vengeance condition decreased honking among older-vehicle male drivers, F(1, 129) = 5.23, p = .024! No results were found among female drivers.
In summary, outside of perhaps one subgroup, assuming one believes the findings, there appears to not only be no priming of a weapon on aggressive behavior, but arguably the opposite: seeing a weapon in a vehicle suppressed horn-honking. When I was computing effect sizes for my recently published weapons effect meta-analysis, I noticed that overall, the Cohen's d was negative. That actually makes more sense to me.

Here is a screen shot of Table 3, which summarizes Study 3:

At bare minimum we might be able to make a case that privileged males (based on the cars they drove) are the one subsample that would honk their horns even when it seemed irrational. Otherwise, it appears that non-privileged males and females overall (no distinction is made on the whether or not female subjects drove new cars or older cars) showed either no effect or a suppression effect!

Late last decade, a student and I attempted a replication of the old Turner et al. (1975) research. In our case, we used a different DV, latency of horn-honking: in other words how long it took the driver behind the truck to start honking, measured in seconds (admittedly, my student's measure was crude: seconds were measured based on a confederate's wristwatch, when a stopwatch might have been more appropriate). The prime stimulus used was a bumper sticker of an AK-47 that was placed conspicuously on the rear window of the truck in the treatment condition. There was no sticker in the control condition. We ended up with null findings. If anything the presence of the AK-47 sticker trended (although nonsignificantly) in a negative direction. Admittedly our sample was small (cell sizes of 10 in each condition), and so my student merely wrote up the results to complete the requirement of a methods course he was in. It is possible that with a large enough sample, we would have been able to show fairly conclusively that drivers generally have the good sense not to try to provoke those who drive with weapons or even images of weapons. Or we may have ended up with a simple null finding, and given the low power of our study, that is a fair enough assessment.

I've often wondered what to make of this set of experiments, beyond the obvious conclusion that Turner et al. (1975) did not actually replicate the classic Berkowitz and LePage (1967) lab experiment. I am now wondering if there may be another plausible explanation. There is a body of research showing that individuals who are exposed to images of guns and knives embedded within an array of images are pretty good at primary threat appraisal. That is, they notice the images faster (based on reaction time) and they tend to show more caution (again based on reaction time) when primed with these images (see Sulikowski & Burke, 2014, for a recent set of experiments). Bottom line is that we may want to reinterpret the horn-honking experiments of Turner et al. (1975) and the work my student did with me as follows: weapons do not increase horn-honking behavior, to the extent we have used it as a proxy for aggression. Rather, it is likely that weapons either have no impact on horn-honking , or suppress the impulse to engage in horn-honking. This latter conclusion is consistent with the findings of some evolutionary psychologists who study threat appraisal. Individuals who encounter a potentially threatening stimulus are probably going to be more cautious around those who display such stimuli to the extent that they are motivated toward self-preservation. The adaptive response to seeing someone driving a vehicle with a gun on a gun-rack or a sticker of a weapons-grade firearm is to refrain from horn-honking, and if that is not possible, to at least delay horn-honking for as long as possible. At least that is an explanation that strikes me as sensible. Beyond that possible very tentative conclusion, I would suggest a lot of caution when interpreting not only field experiments purporting to demonstrate a link between short-term exposure to weapons or weapon images and aggressive behavioral outcomes, but lab experiments as well.In the meantime, stay skeptical.

Saturday, December 8, 2018

When is a replication not a replication?

Let's imagine a scenario. A researcher several years ago designs a study with five treatment conditions and is mainly interested in a planned contrast between condition 1 and the remaining four conditions. That finding appears statistically significant. A few years later, the same researcher runs a second experiment that appears to be based on the same protocols, but with a larger sample (both good ideas) and finds the same planned contrast is no longer significant. That is problematic for the researcher. So, what to do? Here is where we meet some forking paths. One choice is to report the findings as they appear and acknowledge that the original finding did not replicate. Admittedly finding journals to publish non-replications is still a bit of a challenge (too much so in my professional opinion), so that option may seem a bit unsavory. Perhaps another theory driven path is available. The researcher could note that other than the controller used in condition 1 and condition 2 (and the same for condition 3 and condition 4), the stimulus is identical. So, taking a different path, the researcher combines conditions 1 and 2 to form a new category and does the same with conditions 3 and 4. Condition 5 remains the same. Now, a significant ANOVA is obtainable and the researcher can plausibly argue that the findings show that this new category (conditions 1 and 2 combined) really is distinct from the neutral condition, thus supporting a theoretical model. The reported findings now look good for publication in a higher impact journal. The researcher did not find what she/he initially set out to find, but did find something. But did the researcher really replicate the original findings? If based on the prior published work, the answer appears to be no. The original planned contrast between condition 1 and the other conditions does not replicate. Does the researcher have a finding that tells us something possibly interesting or useful? Maybe. Maybe not. Does the revised analysis appear to be consistent with an established theoretical model? Apparently. Does the new finding tell us something about everyday life that the original would not have already told us had it successfully replicated? That's highly questionable. At bare minimum, in the strict sense of how we define a replication (i.e., a study that finds similar results to the original and/or to similar other studies) the study in question fails to do so. That happens with many psychological phenomena, especially ones that are quite novel and counter-intuitive.

This is a real scenario. I am remaining deliberately vague as to avoid picking on someone (usually not productive and likely only to result in defensiveness, which is the last thing we need as we move to a more open science) but also to point out that the process of following research from start to finish is one in which we find ourselves faced with many forking paths, and sometimes the ones we choose take us far from our intended destination (far more productive). Someone else noted that what we sometimes call p-hacking or HARKing (the latter seems to have occurred in this scenario) is akin to experimenter bias, and should be treated as such. We as researchers make a number of decisions - often outside of conscious awareness - that influence the outcome of our work. That includes the statistical side of our work as well. I like the idea as it avoids unnecessary shaming while allowing skeptics the space needed to point out potential problems that appear to have occurred. That seems healthier. As far as the real scenario above, it did not take me long once the presumed replication report was published online to realize that the findings were not actually a replication. Poking around at the data set (it was helpful the author made that available) was very crucial and what I was able to reproduce coincided with what others were also noticing before me. The bottom line was that having a set of registered protocols, the data, and the research report were all very helpful in determining that the conclusion the researcher wished to draw was apparently erroneous. It happens. In the process, I gained some insight into the paths the researcher chose to make the analysis decisions and conclusions that she/he made. The insights I was able to arrive at were hardly novel, and others have drawn similar insights prior to me.

Here's the thing I have to keep in mind going forward. My little corner of the psychological sciences is going through some pretty massive changes. It sometimes feels like walking through a virtual minefield in a video game, given the names, egos, and reputations involved. I am optimistic that if we can place our focus where it belongs - on the methodology and the data themselves - rather than focusing on the personalities, we'll wind up with the sort of open science worthy of the name. Getting there will not be easy.

In the meantime, we will still deal with reported replications that really are not replications.

Friday, December 7, 2018

Better late than never, I suppose.

A few years ago, a student and I submitted a paper for consideration in a journal, based on a talk given at a conference honoring George Gerbner. Even after acceptance, it took a good couple years for it to go into print. Rather belatedly, I'd find out was actually published in 2016. As I learned, academic publishing in Hungary is far from seamless. But it did come out. Sara Oelke and I were grateful for that. It is nice when an undergraduate student project at a relatively obscure institution like mine can coauthor a published article.

Update: A later replication of Experiment 2 (using identical protocols) actually fits in with a pattern the Open Science Foundation reported in 2015. The finding in this case was still statistically significant, but the effect size was noticeably smaller in the replication study than in the original. So some potential for a decline effect there. I really need to write up the replication attempt, noting that it was not a faithful replication in terms of effect size, but still appeared to be statistically significant. I am a bit more cautious about the effectiveness framing approach we used to influence attitudes toward torture as a result.

Saturday, December 1, 2018

Is fragile masculinity related to voting preferences in the US?

The tentative answer appears to be that it might be. Regions where search terms associated with fragile masculinity (e.g., "erectile dysfunction", "how to get girls") appeared to be ones that also voted for Trump by higher margins in 2016. The pattern does not seem to apply to prior Presidential cycles (2008, 2012). The authors also conducted a preregistered study examining if this pattern could be extended to Congressional electoral cycles and it appears that at least for the 2018 cycle the answer is tentatively yes. This is correlational research, and the authors don't try to make causal claims, which is to their credit. What to make of this set of findings? I am not really sure yet. File this one under curiosities.