Social Psychology in the Information Age: 2018

Tuesday, December 25, 2018

Rage Becomes Her

I noticed one of my articles got cited in a book by Soraya Chemali. Haven't yet had the opportunity to read it, although the odds are pretty good that it would be of interest to me (and members of my family). It's nice to see academic work used in works that are not explicitly scholarship - especially works that are not explicitly scholarship.

Monday, December 17, 2018

Something I hope to talk about soon

I got involved in a bit of data sleuthing a few months ago. We'll say this one is truly a team effort. I hope to talk about what we found and what resulted once we contacted journal editors, etc. For now, I can safely say that our efforts are bearing some fruit. Seeing a corrigendum to one of the problematic articles (one which was technically still in press, although published online) made my weekend. It is a start. Given that there is a bit of a pattern to the lab involved, and that we are not talking about isolated mistakes, I really hope the remaining editors act responsibly and in a timely manner. The problems with one article are relatively minor. With others, there are serious problems with data analyses as reported, poorly constructed tables, miscalculations about the number of trials in cognitive experiments, degrees of freedom that don't match the reported sample sizes, and potential self-plagiarism (the latter of which I know all too well).

For a long time I have told my undergrad methods students that peer review is a first line of defense, but it is far from perfect. Increasingly I am advocating for post-peer review, both where I have a public presence and in the classroom. I don't view this as an adversarial process - and in fact I hold nothing personally against any of the individuals authoring the articles in question. My concern, and I think anyone's concern, should be that we do our best to get it right. If independent individuals spot serious problems, we have an obligation to take those concerns seriously and work with those individuals and with our respective editors to correct whatever errors were made - for the sake of the psychological sciences.

Update: Just noticed a second corrigendum. There are easily a half dozen more to go. We'll see what happens, but it looks like journal editors are at least taking our concerns seriously.

Tuesday, December 11, 2018

Just a quick thought

A few months ago I shared some analyses showing that there appeared to be an interesting moderator of the influence of weapons on behavioral outcomes: an allegiance effect. The effect size for studies run by former students, post-docs, and coauthors of Berkowitz was moderate, but the effect size for independent researchers was essentially negligible. What to make of that is ultimately going to be speculation. I think it is worthwhile simply to note that this moderator appears to be an important one for making some sense of what was done in this line of research, especially from the late 1960s through the early 1980s. I'll also reiterate something else that seems to be strikingly obvious: this appears to be a highly politicized area of research. Just re-reading literature reviews from proponents and skeptics, it is very apparent that the players involved at the time largely talked past each other. I am now more convinced than ever that we really need to see work by truly independent third parties, and all the better if they're running multi-lab large N registered replication reports (RRRs). The latter especially have been helpful in shedding some insight into the likely magnitude of other social psychology effects, and could do so here as well. I think Table 2 in my recently published meta-analysis provides something of a roadmap as to what might be happening with behavioral outcomes, and the evidence is far from comforting for anyone who is a proponent. That table is not necessarily a nail in the coffin either. Some properly powered behavioral research I believe is underway, and I am going to be eager for those findings to be made public. The evidence from that work will be critical to how I approach this line of research going forward.

Sunday, December 9, 2018

Rethinking Turner, Layton, and Simons (1975)

Let's revisit what is a frequently cited set of field experiments purporting to support the notion that the mere presence of a weapon influences aggressive behavior:

The weapons effect occurs outside of the lab too. In one field experiment,[2] a confederate driving a pickup truck purposely remained stalled at a traffic light for 12 seconds to see whether the motorists trapped behind him would honk their horns (the measure of aggression). The truck contained either a .303-calibre military rifle in a gun rack mounted to the rear window, or no rifle. The results showed that motorists were more likely to honk their horns if the confederate was driving a truck with a gun visible in the rear window than if the confederate was driving the same truck but with no gun. What is amazing about this study is that you would have to be pretty stupid to honk your horn at a driver with a military rifle in his truck—if you were thinking, that is! But people were not thinking—they just naturally honked their horns after seeing the gun. The mere presence of a weapon automatically triggered aggression.

The above description could come from practically any social psychology textbook describing the weapons effect, and probably serves as an exemplar for why I increasingly hate teaching classic experiments in my own field, except perhaps as cautionary tales. As the title suggests, this is a typical description of a series of experiments reported by Turner, Layton, and Simons (1975). Joe Hilgard aptly sums up what appeared to have happened:

Turner, Layton, and Simons (1975) report a bizzare experiment in which an experimenter driving a pickup truck loitered at a traffic light. When the light turned green, the experimenter idled for a further 12 seconds, waiting to see if the driver trapped behind would honk. Honking, the researchers argued, would constitute a form of aggressive behavior.

The design was a 3 (Prime) × 2 (Visibility) design. For the Prime factor, the experimenter's truck featured either an empty gun rack (control), a gun rack with a fully-visible .303-caliber military rifle and a bumper sticker with the word "Friend" (Friendly Rifle), or a gun rack with a .303 rifle and a bumper sticker with the word "Vengeance" (Aggressive Rifle). The experimenter driving the pickup was made visible or invisible by the use of a curtain in the rear window.

There were 92 subjects, about 15/cell. The sample is restricted to males driving late-model privately-owned vehicles for some reason.

The authors reasoned that seeing the rifle would prime aggressive thoughts, which would inspire aggressive behavior, leading to more honking. They run five different planned complex contrasts and find that the Rifle/Vengeance combination inspired honking relative to the No Rifle and Rifle/Friend combo, but only when the curtain was closed, F(1, 86) = 5.98, p = .017. That seems like a very suspiciously post-hoc subgroup analysis to me.

A second study in Turner, Layton, and Simons (1975) collects a larger sample of men and women driving vehicles of all years. The design was a 2 (Rifle: present, absent) × 2 (Bumper Sticker: "Vengeance", absent) design with 200 subjects. They divide this further by driver's sex and by a median split on vehicle year. They find that the Rifle/Vengeance condition increased honking relative to the other three, but only among newer-vehicle male drivers, F(1, 129) = 4.03, p = .047. But then they report that the Rifle/Vengeance condition decreased honking among older-vehicle male drivers, F(1, 129) = 5.23, p = .024! No results were found among female drivers.

In summary, outside of perhaps one subgroup, assuming one believes the findings, there appears to not only be no priming of a weapon on aggressive behavior, but arguably the opposite: seeing a weapon in a vehicle suppressed horn-honking. When I was computing effect sizes for my recently published weapons effect meta-analysis, I noticed that overall, the Cohen's d was negative. That actually makes more sense to me.

Here is a screen shot of Table 3, which summarizes Study 3:

At bare minimum we might be able to make a case that privileged males (based on the cars they drove) are the one subsample that would honk their horns even when it seemed irrational. Otherwise, it appears that non-privileged males and females overall (no distinction is made on the whether or not female subjects drove new cars or older cars) showed either no effect or a suppression effect!

Late last decade, a student and I attempted a replication of the old Turner et al. (1975) research. In our case, we used a different DV, latency of horn-honking: in other words how long it took the driver behind the truck to start honking, measured in seconds (admittedly, my student's measure was crude: seconds were measured based on a confederate's wristwatch, when a stopwatch might have been more appropriate). The prime stimulus used was a bumper sticker of an AK-47 that was placed conspicuously on the rear window of the truck in the treatment condition. There was no sticker in the control condition. We ended up with null findings. If anything the presence of the AK-47 sticker trended (although nonsignificantly) in a negative direction. Admittedly our sample was small (cell sizes of 10 in each condition), and so my student merely wrote up the results to complete the requirement of a methods course he was in. It is possible that with a large enough sample, we would have been able to show fairly conclusively that drivers generally have the good sense not to try to provoke those who drive with weapons or even images of weapons. Or we may have ended up with a simple null finding, and given the low power of our study, that is a fair enough assessment.

I've often wondered what to make of this set of experiments, beyond the obvious conclusion that Turner et al. (1975) did not actually replicate the classic Berkowitz and LePage (1967) lab experiment. I am now wondering if there may be another plausible explanation. There is a body of research showing that individuals who are exposed to images of guns and knives embedded within an array of images are pretty good at primary threat appraisal. That is, they notice the images faster (based on reaction time) and they tend to show more caution (again based on reaction time) when primed with these images (see Sulikowski & Burke, 2014, for a recent set of experiments). Bottom line is that we may want to reinterpret the horn-honking experiments of Turner et al. (1975) and the work my student did with me as follows: weapons do not increase horn-honking behavior, to the extent we have used it as a proxy for aggression. Rather, it is likely that weapons either have no impact on horn-honking , or suppress the impulse to engage in horn-honking. This latter conclusion is consistent with the findings of some evolutionary psychologists who study threat appraisal. Individuals who encounter a potentially threatening stimulus are probably going to be more cautious around those who display such stimuli to the extent that they are motivated toward self-preservation. The adaptive response to seeing someone driving a vehicle with a gun on a gun-rack or a sticker of a weapons-grade firearm is to refrain from horn-honking, and if that is not possible, to at least delay horn-honking for as long as possible. At least that is an explanation that strikes me as sensible. Beyond that possible very tentative conclusion, I would suggest a lot of caution when interpreting not only field experiments purporting to demonstrate a link between short-term exposure to weapons or weapon images and aggressive behavioral outcomes, but lab experiments as well.In the meantime, stay skeptical.

Saturday, December 8, 2018

When is a replication not a replication?

Let's imagine a scenario. A researcher several years ago designs a study with five treatment conditions and is mainly interested in a planned contrast between condition 1 and the remaining four conditions. That finding appears statistically significant. A few years later, the same researcher runs a second experiment that appears to be based on the same protocols, but with a larger sample (both good ideas) and finds the same planned contrast is no longer significant. That is problematic for the researcher. So, what to do? Here is where we meet some forking paths. One choice is to report the findings as they appear and acknowledge that the original finding did not replicate. Admittedly finding journals to publish non-replications is still a bit of a challenge (too much so in my professional opinion), so that option may seem a bit unsavory. Perhaps another theory driven path is available. The researcher could note that other than the controller used in condition 1 and condition 2 (and the same for condition 3 and condition 4), the stimulus is identical. So, taking a different path, the researcher combines conditions 1 and 2 to form a new category and does the same with conditions 3 and 4. Condition 5 remains the same. Now, a significant ANOVA is obtainable and the researcher can plausibly argue that the findings show that this new category (conditions 1 and 2 combined) really is distinct from the neutral condition, thus supporting a theoretical model. The reported findings now look good for publication in a higher impact journal. The researcher did not find what she/he initially set out to find, but did find something. But did the researcher really replicate the original findings? If based on the prior published work, the answer appears to be no. The original planned contrast between condition 1 and the other conditions does not replicate. Does the researcher have a finding that tells us something possibly interesting or useful? Maybe. Maybe not. Does the revised analysis appear to be consistent with an established theoretical model? Apparently. Does the new finding tell us something about everyday life that the original would not have already told us had it successfully replicated? That's highly questionable. At bare minimum, in the strict sense of how we define a replication (i.e., a study that finds similar results to the original and/or to similar other studies) the study in question fails to do so. That happens with many psychological phenomena, especially ones that are quite novel and counter-intuitive.

This is a real scenario. I am remaining deliberately vague as to avoid picking on someone (usually not productive and likely only to result in defensiveness, which is the last thing we need as we move to a more open science) but also to point out that the process of following research from start to finish is one in which we find ourselves faced with many forking paths, and sometimes the ones we choose take us far from our intended destination (far more productive). Someone else noted that what we sometimes call p-hacking or HARKing (the latter seems to have occurred in this scenario) is akin to experimenter bias, and should be treated as such. We as researchers make a number of decisions - often outside of conscious awareness - that influence the outcome of our work. That includes the statistical side of our work as well. I like the idea as it avoids unnecessary shaming while allowing skeptics the space needed to point out potential problems that appear to have occurred. That seems healthier. As far as the real scenario above, it did not take me long once the presumed replication report was published online to realize that the findings were not actually a replication. Poking around at the data set (it was helpful the author made that available) was very crucial and what I was able to reproduce coincided with what others were also noticing before me. The bottom line was that having a set of registered protocols, the data, and the research report were all very helpful in determining that the conclusion the researcher wished to draw was apparently erroneous. It happens. In the process, I gained some insight into the paths the researcher chose to make the analysis decisions and conclusions that she/he made. The insights I was able to arrive at were hardly novel, and others have drawn similar insights prior to me.

Here's the thing I have to keep in mind going forward. My little corner of the psychological sciences is going through some pretty massive changes. It sometimes feels like walking through a virtual minefield in a video game, given the names, egos, and reputations involved. I am optimistic that if we can place our focus where it belongs - on the methodology and the data themselves - rather than focusing on the personalities, we'll wind up with the sort of open science worthy of the name. Getting there will not be easy.

In the meantime, we will still deal with reported replications that really are not replications.

Friday, December 7, 2018

Better late than never, I suppose.

A few years ago, a student and I submitted a paper for consideration in a journal, based on a talk given at a conference honoring George Gerbner. Even after acceptance, it took a good couple years for it to go into print. Rather belatedly, I'd find out was actually published in 2016. As I learned, academic publishing in Hungary is far from seamless. But it did come out. Sara Oelke and I were grateful for that. It is nice when an undergraduate student project at a relatively obscure institution like mine can coauthor a published article.

Benjamin, A. J., Jr., & *Oelke, S. E. (2016). Framing effects on attitudes toward torture. Kommunikáció, Média, Gazdaság, 13 (1), 229-241.

Update: A later replication of Experiment 2 (using identical protocols) actually fits in with a pattern the Open Science Foundation reported in 2015. The finding in this case was still statistically significant, but the effect size was noticeably smaller in the replication study than in the original. So some potential for a decline effect there. I really need to write up the replication attempt, noting that it was not a faithful replication in terms of effect size, but still appeared to be statistically significant. I am a bit more cautious about the effectiveness framing approach we used to influence attitudes toward torture as a result.

Saturday, December 1, 2018

Is fragile masculinity related to voting preferences in the US?

The tentative answer appears to be that it might be. Regions where search terms associated with fragile masculinity (e.g., "erectile dysfunction", "how to get girls") appeared to be ones that also voted for Trump by higher margins in 2016. The pattern does not seem to apply to prior Presidential cycles (2008, 2012). The authors also conducted a preregistered study examining if this pattern could be extended to Congressional electoral cycles and it appears that at least for the 2018 cycle the answer is tentatively yes. This is correlational research, and the authors don't try to make causal claims, which is to their credit. What to make of this set of findings? I am not really sure yet. File this one under curiosities.

Thursday, November 29, 2018

This Time Could Be Different

Here's a link to the podcast at The Black Goat. Give it a listen. There was certainly still at least some talk of reform back when I was in grad school. Obviously, that went nowhere in a hurry. So here we are. Maybe we'll get it right this time.

Wednesday, November 28, 2018

Motivation

No matter our background, no matter our vocation, there has to be something that gets us up in the morning. For me, lately, that is anger.

At what? Let's just say that the crisis which goes by many names (replication crisis, replicability crisis, methodological crisis) felt like a punch to the gut - and one I just did not see coming. As I digested what had happened and what was happening, I had to change my perspective about a field that defines a significant part of my identity. Initially I was a bit sanguine. Then as reality sank in, I got pissed off. After all, to maintain any semblance of integrity, I had to alert students in many of my classes that there were whole sections of textbooks that they were probably best ignoring, or viewing only as cautionary tales. That meant accepting that students would ask me, what was real, and for me to not necessarily feel like I had a satisfactory answer. A substantial chunk of work in my corner of our aching science seems to needlessly scare the hell out of people, and that work is not aging well. In fact the moral panics over video games and violence or screen time and any of a number of purported negative psychological health outcomes remind me of the moral panics that I grew up with: Dungeons and Dragons was supposed to damage teens psychologically, as were the lyrics of songs from many of my favorite bands of the time (remember that I enjoyed and still enjoy punk and punk-derived music from the late 1970s-mid 1980s). At the time I would see people make causal claims from correlational data (or merely out of thin air) and I would just think, "bullshit." One could say that I did become an educator, and maybe that questionable life choice is an outcome of questionable life choices I made in my youth, including my pop culture interests and activities of the day. I am pissed at a system of dissemination of our work that relies on the funds (at least indirectly) of our citizens but which once published becomes the property of some conglomerate that then sells the content back to the citizens at an insane profit, and sometimes with peer review and editorial standards that differ little from what most of us rightfully deride as predatory journals.

Thankfully, from punk I got both the attitude and the politics. The attitude is the easy part. The politics actually took a good deal of thought. And so here I am again. It would be easy to adopt a pose of casual contempt or indifference and merely sneer as I preview a textbook or read the latest journal article. That's not me. I actually care. So maybe a little anarchy (not in the sense of chaos!) will do us some good about now. Things get shaken up a bit and if that leads to the sort of changes we need (more open communication and archiving our work, more equality and equity in the profession as opposed to rigid hierarchies) I'm in. Reading much of what is coming out of the open science proponents is the equivalent of putting on an old familiar Black Flag or Dead Kennedys LP. Hell, sometimes I do both, especially if I am at the office on a weekend and can crank up the volume. The punks at their best were angry and thoughtful. They wanted to knock down stuff, but they also were also wanting to replace whatever was knocked down with something better (which was of course ever an open-ended question as to what that would entail). Whatever form that something better takes, I hope for a science that truly gives itself away in the public interest, rather than get coopted into some neoliberal facsimile of open science that merely repeats the same mistakes of the past. Doing what I can, as an educator and scholar who has little privilege or leverage to offer other than adding to the voices in the proverbial wilderness is enough for now. That gets me up in the morning, like clockwork.

Tuesday, November 27, 2018

Everything Went Black

I nicked that title from an old Black Flag album, from when they were temporarily not Black Flag due to a legal battle with their former label. Apparently issuing Damaged under Black Flag's new label really pissed off the suits at MCA. Eventually the label owned by MCA went under and Black Flag returned with a vengeance. Among bands in the hardcore scene circa the mid-1980s, Black Flag did not fit comfortably. It is not clear that they were even punk by the time they released some of my favorite recordings in 1984 and 1985. The band had moved into much more experimental territory, we elements of metal, and more importantly free jazz thrown into the mix. Add to that a very confrontational set of artists who clearly did not relish the ever present threat of violence at shows where their new sounds were increasingly alienating their core audience.

The way I saw it at the time, although I may not have worded it as such is that there was a crisis in the punk scene. The old formulas just did not seem to work any more, and openly admitting so was a good way to get sucker-punched, or stomped. One would certainly be shunned even if a punch were never thrown. So the old formulas remained in place, and punk became "another meaningless fad" (to nick a line from Dead Kennedys). What to do when what appeared to work before no longer does? One answer is to ignore it or wish it away. I certainly watched enough people come and go who did that back in the day. Another approach was to abandon what no longer worked and move in a different direction - ideally still embodying the ideals of the movement. Black Flag were quite adept at doing so for a few years. So was Flux (formerly Flux of Pink Indians), whose last album, Uncarved Block, was unlike any UK anarchopunk LP at the time. Probably should mention Chumbawamba while I am at it. There is something refreshing about searching for a new path when the old one has turned into a dead end. It happens in the arts, the sciences, and in life. As someone who was never more than one of the scenesters during the 1980s punk era, I knew it was time to follow some different muses when it became obvious that all that was left at the clubs and parties were folks who had the style and the attitude down, but who never really understood the ideas or the politics.

On some level, what I recall from a formative part of my early years serves as an allegory for what has gone on in my aching corner of the sciences as a methodological crisis has continued to unfold. There is so much I would love to write about. Problems in my little corner of the psychological sciences are the same ones affecting the rest of our aching field. Unfortunately when I am passionate about something that actually matters to me, I write with the heat of a thousand suns. Although that heat may not be aimed at one specific person or group, there is the chance it will be treated as such, placing me in a position that I find uncomfortable. Having to scrub this blog of content in order to prevent a situation from escalating is something I will not go through again. That is simply not tenable given the time it takes me to write, along with my numerous other commitments. When you are not a person of privilege (in the academic world, I and the institutions where I work are truly among the unprivileged), consequences hit twice as hard as for anyone else. Don't feel bad. I don't. Just the way it is. If you want to feel anything, feel anger. Then do something to make academic life more equitable. I guess I never really left my punk roots, and perhaps there is a reason I do have a good deal of empathy for those among psychology's reformers who advocate burning everything to the ground.

I am honestly not sure what I am going to do with this blog. I considered just deleting it altogether and look for other avenues to work out ideas, look at some problems that desperately need to be looked at, etc. Maybe that's the way to go. Maybe I will figure out a way to write as I wish. Time will tell.

Monday, October 29, 2018

Harbingers of the replication crisis

I am going to post a series of tweets by James Heathers (who if you are not following on Twitter, you are really missing out) highlighting a series of passages dating back over half a century. They serve as warnings that, had they been heeded, would have left many of my peers in my corner of the sciences feeling a lot different about the soundness of the work we cite in our own research and teach to students:

Thesis: 'methodological crisis' in science is NOT the sudden realisation of a problem. It is a well characterised problem which benefited hugely in recognition by a change in how scientists communicate and collaborate. Central issues were outlined clearly before 1970.
— 🏴James Heathers🏴 (@jamesheathers) October 27, 2018

Sterling, 1959. Publication bias, the file drawer problem, and the cult of significance. pic.twitter.com/N8sOnSNrUi
— 🏴James Heathers🏴 (@jamesheathers) October 27, 2018

Cohen, 1962. Social scientific studies are, in general, substantially underpowered. pic.twitter.com/wVwCws1gNu
— 🏴James Heathers🏴 (@jamesheathers) October 27, 2018

Forscher, 1963. The pursuit of publication rather than the pursuit of reliable results. pic.twitter.com/6m5IJMaS0J
— 🏴James Heathers🏴 (@jamesheathers) October 27, 2018

Platt, 1964. Topic-hopping, weak theory, and the role of induction over time. pic.twitter.com/zZUxaNtfxS
— 🏴James Heathers🏴 (@jamesheathers) October 27, 2018

Meehl, 1967. The role of the 'cute', surprising, or counter-intuitive results as an eventual outcome for capitalizing on chance. pic.twitter.com/XrManBcdNH
— 🏴James Heathers🏴 (@jamesheathers) October 27, 2018

Lykken, 1968. The weakness of statistical significance in isolation, the need for replication, the central importance of methods, and a whole lot more. pic.twitter.com/e5UgZepmft
— 🏴James Heathers🏴 (@jamesheathers) October 27, 2018

Does all of the above form a coherent body of work that people read at the time? No idea. Probably not.

But - these papers do address, straightforwardly and in better prose than we're allowed to write now, the heart of issues that a lot of people feel blindsided by at present.
— 🏴James Heathers🏴 (@jamesheathers) October 27, 2018

Let's not only heed those who tried to warn us over a half century ago, but heed those who are warning us now. I will not be around in a half century, but I will be around long enough to suss out if we're going to be actually progressing as a science or if we are just going to run over the same old ground.

Wednesday, October 24, 2018

Getting back to flexible measures:

The Buss Teacher-Learner approach seems to fit. Thankfully one set of authors was unwittingly honest enough to demonstrate.

The Buss Teacher-Learner method was another flexible measure among aggression researchers. Thankfully these authors provided us analyses for almost all the various ways the DV could be measured. Guess which version was the basis for their article. pic.twitter.com/OPEh8LSWZj
— James Benjamin (@AJBenjaminJr) October 22, 2018

We're more likely to see the CRTT used these days, but that more updated approach has been shown to be problematic. Those of us who genuinely care about aggression research need to do better. We need to know that our techniques are reliable, that they can be validated, that their administration and analysis can be standardized. Until then, we have no idea what we are really measuring, and simply finding a p<.05 isn't cutting it.

Saturday, October 13, 2018

RIP Bernardo "Bernie" Carducci

Bernardo Carducci passed away late in September. Both of us are alums of California State University, Fullerton. He attended and graduated from CSUF a number of years before I did, but he was a friend of several of the faculty there and he would often talk to his former mentors and to those of us presenting as students back in the day. I did not see him in person very often, but recall a man who was extremely outgoing and full of life. His work on shyness, while outside my normal specialty area, was of some intrinsic interest to me. My last contact with him had been over composing a couple chapters for an encyclopedia he was in the process of editing. He was one of those rare individuals who was adept at coming across via electronic media much as he might in person. Hopefully encyclopedia will serve as a part of his legacy. He will be missed.

Sunday, September 30, 2018

How France Created the Metric System

Although we mostly ignore the Metric System in the US, it is a standard of measurement that influences so much of our lives, and even more so the lives of our fellow human beings. This quick BBC article highlights how the Metric System came into being, as well as the difficulty in its mainstreaming - as the article notes, it took about a century. It was truly an accomplishment with revolutionary origins, and one that truly changed the world.

Friday, September 28, 2018

Following up on Wansink

Andrew Gelman is on point in this post. I will give you this clip as a starting point:

I particularly liked this article by David Randall—not because he quoted me, but because he crisply laid out the key issues:

The irreproducibility crisis cost Brian Wansink his job. Over a 25-year career, Mr. Wansink developed an international reputation as an expert on eating behavior. He was the main popularizer of the notion that large portions lead inevitably to overeating. But Mr. Wansink resigned last week . . . after an investigative faculty committee found he had committed a litany of academic breaches: “misreporting of research data, problematic statistical techniques, failure to properly document and preserve research results” and more. . . . Mr. Wansink’s fall from grace began with a 2016 blog post . . . [which] prompted a small group of skeptics to take a hard look at Mr. Wansink’s past scholarship. Their analysis, published in January 2017, turned up an astonishing variety and quantity of errors in his statistical procedures and data. . . . A generation of Mr. Wansink’s journal editors and fellow scientists failed to notice anything wrong with his research—a powerful indictment of the current system of academic peer review, in which only subject-matter experts are invited to comment on a paper before publication. . . . P-hacking, cherry-picking data and other arbitrary techniques have sadly become standard practices for scientists seeking publishable results. Many scientists do these things inadvertently [emphasis added], not realizing that the way they work is likely to lead to irreplicable results. Let something good come from Mr. Wansink’s downfall.

But some other reports missed the point, in a way that I’ve discussed before: they’re focusing on “p-hacking” and bad behavior rather than the larger problem of researchers expecting routine discovery.

That is I think how we should be focusing here. This is partially about scientists engaging in questionable behavior, but the focus should not be to pillory them. Rather, we should ask ourselves about a research culture that demands we find positive results each time we run a study. News flash: we're going to get a lot of findings that are at best inconclusive if we run enough studies. We should also focus on the fundamentals of research design along with making sure that any instruments used for measurement (whether behavioral, cognitive, attitudinal, etc.) are sufficiently reliable and have been validated. When I asked in an earlier post about how many Wansinks there are, I think I would want to clarify that question with a statement: the bulk of these scientists who could be the potential next Wansink are often well-intentioned individuals who are attempting to adapt to a particular set of environmental contingencies [1] (ones that reinforce positive results, or what Gelman calls routine discovery), and who are using measures that are quite frankly barely warmed over crap. In my area of social psychology I would further urge making sure that the theoretical models we rely on for our particular specialty areas are really ones that are measuring up. In aggression research, it is increasingly obvious to me that one model I relied on since my grad school days really needs to be rethought or altogether abandoned.

As we move forward, we do need to figure out what we can learn from the case of Brian Wansink, or anyone else for whom we might encounter a checkered history of questionable findings. I would recommend focusing less on the shortcomings of the individual (there is no need to create monsters) and focus instead on the behaviors, and how to change those behaviors (both individually and collectively).

[1] I am no Skinnerian, but I do teach Conditioning and Learning from time to time. I always loved that term, environmental contingencies.

Wednesday, September 26, 2018

One of my projects...

is not a project in the normal sense of the term. I have been interested in the work of a specific lab for a while. Some of the findings reported in an article I stumbled upon a couple years back did not make sense to me, and I found extracting effect sizes I needed for a project I was then in the midst of updating to be rather frustrating and annoying. More recently, I stumbled upon some work by this same lab (I was looking for fresh articles that are relevant to my primary research areas), and noticed the same basic pattern of reporting and apparent reporting mistakes. I've been sharing that publicly on Twitter, and so have others. A number of us sort of stumbled on to a pattern of poor reporting in articles produced by this lab in both predatory journals and legitimate journals. Thankfully there are tools one can use post-publication to examine findings (I've been partial to statcheck), and those have been, shall we say, illuminating. This is not the sort of stuff I'll put on a CV. It won't count towards research as my institution defines it, nor will most folks end up really caring. What can be done is to clean up a portion of a literature that is in desperate need of cleaning up - to correct the record wherever possible. I am often astounded and appalled at what manages to slip through peer review. We really need to do better.

Friday, September 21, 2018

Update

Hi! I am in the process of adding a few links that may be of use to you. I have added a widget with links to statistical tools you can use to double check reported findings four yourself. Think of these as helpful for post-peer review. I will definitely vouch for statcheck. It works quite well. The others are newer, but look promising. SPRITE was one of the techniques used in the process of successfully scrutinizing Wansink's research (leading to over a dozen retractions and counting), which in and of itself makes it worth working with, in my humble opinion. Basically, we need to be able to look at findings and ask ourselves if they are genuinely plausible, or if some error or foul play may have been involved. I have also added some podcasts that I have found especially useful over the last several months, and hope to incite you all to give those a listen. Each is hosted by psychologists who are genuinely concerned with the current state of our science, and each will provoke a good deal of thought. If you are listening to these podcasts, you are missing out. I will keep adding resources as time permits.

Thursday, September 20, 2018

Data sleuths - a positive article

This article, The Truth Squad, is well worth your time. Here is a clip:

For scientists who find themselves in the crosshairs, the experience can feel bruising. Several years ago, the Tilburg group—now more than a dozen faculty members and students—unveiled an algorithm, dubbed statcheck, to spot potential statistical problems in psychology studies. They ran it on tens of thousands of papers and posted the troubling results on PubPeer, a website for discussion of published papers. Some researchers felt unfairly attacked; one eminent psychologist insinuated that the group was part of a “self-appointed data police” harassing members of the research community.

Van Assen and Wicherts say it was worth stepping on some toes to get the message across, and to flag mistakes in the literature. Members of the group have become outspoken advocates for statistical honesty, publishing editorials and papers with tips for how to avoid biases, and they have won fans. “I'm amazed that they were able to build that group. It feels very progressive to me,” says psychologist Simine Vazire of the University of California, Davis, a past chair of the executive committee of the Society for the Improvement of Psychological Science (SIPS).

The work by the Tilburg center and others, including SIPS and COS, is beginning to have an impact. The practice of preregistering studies—declaring a plan for the research in advance, which can lessen the chance of dodgy analyses—is growing rapidly (see story, p. 1192), as is making the data behind research papers immediately available so others can check the findings. Wicherts and others are optimistic that the perverse incentives of careerist academia, to hoard data and sacrifice rigor for headline-generating findings, will ultimately be fixed. “We created the culture,” Nosek says. “We can change the culture.”

Read the rest. One of the really cool things is finding their work in PubPeer (a website we social psychologists should utilize much more). This group's statcheck software, and what it can do is truly amazing and necessary. Let's just say that when I see the name Nuijten in the comments for a particular article, I pay keen attention. Among the people I respect in my area, statcheck has generally found no errors or only minor errors that don't change the basic thrust of their findings. Among some others, well, that's another story.

This is a useful article, and one that makes clear that although there is a bit of a paradigm shift in our field, we're far from a warm embrace of an open science approach. I am optimistic that the expectations for open data sharing, open sharing of research protocols prior to running research, etc., will be far more favorable this time next decade, but I am girding myself for the possibility that it may take considerably longer to get to that point. I am guessing that when the paradigm truly shifts, it will seem sudden. The momentum was there already, and thankfully we've gone well beyond the mere talk of change that my cohort basically managed. So there is that. Progress of a sort.

Be smart: the main motive of the various data sleuths is to make our science better. These are people who are not trying to destroy careers or hurt feelings, but rather are simply making sure that the work we do is as close an approximation to the truth as is humanly possible. My advice for mid and late career researchers is to embrace this new paradigm rather than resist. I can guarantee that the new generation of psychological scientists will not have the patience for business as usual.

Brian Wansink has "retired"

The story began with a simple blog post. The post caught my eye initially because the behavior of the principal investigator responsible for that post, Brian Wansink, seemed to be bragging about how he exploited an international student as he made a very ham-handed point about work ethic. Those sorts of posts will get someone on my radar very quickly. But what was equally disturbing was that he essentially copped to engaging in any of a number of questionable research practices as if it was all perfectly okay. I'll give him points for being brazen. I made a brief blog post of my own when his post began making the rounds on Twitter. In the interim, Wansink's methodology has been challenged, data sets have been scrutinized, and he has ended up with upwards of 13 retractions and many more corrections. He continued over the last year and a half as if all was business as usual - or at least that seemed to be the public front. But behind the scenes it was apparently anything but. His university began an extensive investigation of allegations of misconduct. Yesterday, there was a big announcement from Cornell that there was a statement about Wansink coming out Friday. Well, I guess Thursday is the new Friday. That announcement happened.Wansink is "retiring" at the end of the academic year, will be removed from any teaching and research responsibility, and will be paid handsomely to cooperate as his university continues its investigation.No matter how much Wansink spins the situation, Cornell makes it abundantly clear that the reason for his "retirement" is due to some very shoddy research practices. Cornell is hardly acting heroically. The institution is protecting itself once it became apparent that one of its prized researchers was just not going to be bringing in the grant money he once did.

Wansink did not just do some hokey experiments that were somewhat eye-catching. He appeared on various morning news shows plugging his lab's findings, in the process fooling the public. His lab's reported findings were used by policymakers, and although perhaps the fact that those findings are in question is not quite life and death, they certainly did not benefit the public interest. Here is a tweet that gives you some idea of how policymakers used his research (from a plenary speech given at SPSP 2012):

It will be tempting to write off Wansink as a guy who did flashy-but-silly studies. Don't. He was taken seriously by scientists, policymakers, and the public. Here is his bio from an invited plenary address he gave to an audience of peers (SPSP 2012) https://t.co/wuu59LRCcL pic.twitter.com/dWIpMiIsxr
— Sanjay Srivastava (@hardsci) September 20, 2018

The sleuths who did the grunt work to discover problems with Wansink's work will never be thanked. They will never receive awards, nor will they be able to put those efforts on their respective CVs. But we all owe them a huge debt of gratitude. For the record, they are Nick Brown, James Heathers, Jordan Anaya, and Tim van der Zee. They have exposed some questionable research practices at great risk to their own careers. Perhaps more to the point, they have exposed Wansink's research practices as symptomatic of an academic culture that privileges quantity over quality, an emphasis on appearing in high impact journals, statistically significant findings over nonsignificant findings, research that can be used as clickbait, and secretiveness. That broader culture is what needs to be changed. As James Heathers would no doubt argue, we change the culture by using the tools available to detect questionable practices, and to rethink how we do peer review - and making certain that we instruct our students to do likewise. We need to be more open in sharing our methodology and our data (that is the point of registering or preregistering our research protocols and archiving our data so that our peers may examine them). We need to rethink what is important in doing our research. Is it about getting a flashy finding that can be easily published in high impact journals and net TED talks, or are we more interested in simply being truth seekers and truth tellers, no matter what the data are telling us? How many publications do we really need? How many citations do we really need? Those to me are questions that we need to be asking at each step in our careers. How much should we demand of editors and authors as peer reviewers? Why should we take the authors' findings as gospel? Could journals vet articles (possibly using software like SPRITE) to ascertain the plausibility of the data analyses, and if so, why are they not doing so?

There is some speculation that had Wansink not made that fateful blog post in December of 2016, he would still be go about business as usual, and he would never have faced any repercussions for his faulty research. That is a distinct possibility. A more optimistic case can be made that the truth would have caught up to him eventually, as the events that led to the replication crisis continue to unfold, and as our own research culture is one that is more in tune with rooting out questionable work. Maybe he would not be retiring at the end of the spring term of 2019, but a few years later - still under a cloud. I also wonder how things might have played out if Wansink had tried a different approach. When his research practices were initially challenged, he doubled down. What if he had cooperated with the sleuths who wanted to get to the truth about his findings? What if, faced with evidence of his mistakes, he had embraced those and taken an active role in correcting the record, and an active role in changing practices in his lab? He might have still ended up with a series of retractions and faced plenty of uncomfortable questions from any of a variety of stakeholders. The story might have had a less tragic ending.

This is not a moment for celebration, although there is some comfort in knowing that at least the record in one area of the sciences is being corrected. This is a moment for reflection. How did we arrive at this point? How many more Wansinks are in our midst? What can we do as researchers, as peer reviewers, and in our capacity to do post-peer review to leave our respective areas of the psychological sciences just a bit better than they were when we started? How do we make sure that we actually earn the trust of the public? Those are the questions I am asking myself tonight.

Friday, September 14, 2018

Reforming Psychology: Who Are These People?

Let's continue just a little bit from my last post. Right now I am merely thinking out loud, so take at least some of this with a few grains of salt. The Chronicle article I linked to in that earlier report was quite adept at finding some of the more extreme statements and magnifying them, as well as at times proving to be factually incorrect (Bem's infamous ESP article in JPSP was published in 2011, not 2010!). That makes for clicks, and presumably ad revenue, but may not exactly shed light on who the stakeholders are.

Among the reformers, I suspect that this is a varied group, representing multiple specialties, and at various levels of prominence within the academic world. Some are grad students who probably have the idealism and zeal I once experienced when I was a grad student, and who like me are legitimately frustrated by their relative lack of power to change a status quo that leaves a lot to be desired. Others are post-docs and early career researchers whose fates hang in the balance based on evaluations by some of the very people whose work they may be criticizing. Hiring decisions and tenure decisions are certainly a consideration. Others may be primarily educators, but who also could be caught in the cross-hairs of those who have considerably more prestige. For those of us who are a bit less prominent, it is easier for those used to getting their way to fling unfounded accusations at us, knowing full well that for now they will be taken at face value in the public sphere. At least in these early moments, the effort to reform psychological science appears to be a high-risk enterprise.

There may be a great deal of diversity in terms of how to go about reform. Going with my generally cautious nature, I might want to tread cautiously - test drive various approaches to making our work more transparent and see what works and what doesn't. Others may want a more immediate payoff. Some of us may disagree on methodological and statistical practices. The impression I get is that regardless of where the reformers stand, there is a consensus that the status quo no longer works, and that the system needs to be changed. The other impression I get is that there is a passion for science in all of its messiness. These are not people with vendettas, but rather people who want to do work that matters, that gets at closer approximations of the truth. If someone's work gets criticized, it has nothing to do with some need to take down someone famous, but to get at what is real or not real about the foundations underlying their claims in specific works. I wish this were understood better. For the educators among reformers, we just want to know that what we teach our undergrads actually is reality-based. We may want to develop and/or find guidance in how to teach open science to research methods students, or to show how a classic study was debunked in our content courses. Of course keep in mind that I am basing these observations on a relatively small handful of interactions over the last few months in particular. Certainly I have not done any systematic data collection, nor am I aware of much of any. I do think it is useful to realize that SIPS is evenly split between men and women in its membership, and really does have a diverse representation as far as career levels (although I think more toward early career), specialties, and teaching load. I think it is also useful to realize that SIPS is likely only one part of a broader cohort of reformers, and so any article discussing reforms to psychological science needs to take that into account.

As for those defending the status quo. I suspect there is also a great deal of variation. That said, the loudest voices are clearly mid and late career scholars, many of whom perceive having a great deal to lose. There has to be some existential crisis that occurs when one realizes that the body of work making up a substantial portion of one's career was all apparently for nothing. I am under the impression that at least a subset have achieved a good deal of prestige, have leveraged that prominence to amass profits from book deals, speaking engagements, etc. and that efforts to debunk their work could be seen as a threat to all the trappings of what they might consider success. Hence, the temptation to occasionally lob phrases like "methodological terrorists" at the data sleuths among the reformers. As an outsider looking in to the upper echelons of the academic world, my impression is that most of the status quo folks are generally decent, well-intentioned folks, who have grown accustom to a certain way of doing things and benefit from that status quo. I wish I could tell the most worried among them that their worries about a relatively new reform movement are unfounded. I know I would not be listened to. I have a bit of personal experience in that regard. Scholars scrutinizing data sets are not "out to get you" but are interested in making sure that what you claimed in published reports checks out. I suspect that argument will fall on deaf ears.

I'd also like to add something else: I don't really think that psychology is any meaner now than it was when I started out as a grad student in the 1990s. I have certainly witnessed rather contentious debates and conversations at presentation sessions, have been told in no uncertain terms that my own posters were bullshit (I usually would try to engage those folks a bit, out of curiosity more than anything else), and have seen the work of early scholars ripped to shreds. What has changed is the technology. The conversation now plays out on blogs (although those are pretty old-school by now) and social media (Twitter, Facebook groups, etc.). We can now publicly witness in as close to real time as our social media allow what used to occur only behind the relatively closed doors of academic conferences and colloquia - and journal article rebuttals that were behind paywalls. Personally I find the current environment refreshing. It is no more or no less "mean" than it was then. Some individuals in our field truly behave in a toxic manner - but that was true back in the day. What is also refreshing that it is now easier to debunk findings and easier to do so in the public sphere than ever before. I see that not as a sign of a science in trouble, but of one that is actually in the process of figuring itself out at long last. I somehow doubt that mid-career and late-career scholars are leaving in droves because the environment now is not so comfortable. If that were the case, the job market for all the rest of us would be insanely good right now. Hint: the job market is about as bleak as it was this time last year.

A bit about where I am coming from: Right now I have my sleeves rolled up as I go about my work as an educator. I am trying to figure out how to convey what is happening in psych to my students so that they know what is coming their way as they enter the workforce, graduate school, and onward. I am trying to figure out how to go about engaging them to constructively think about what they read in their textbooks and in various mass media outlets, and to sort out what it means when classic research turns out to be wrong. I am trying to sort out how to create a more open-science friendly environment in my methods courses. I want to teach stats just a bit better than I currently do. When I look at those particular goals, it is clear that what I am wanting aligns well with those working to reform our field. I can also say from experience that my conversations have been nothing short of pleasant. And even when some work I was involved in got taken to task (I am assuming if you are reading this you know my history) nothing got said that was in someway undeserved, or untoward. Quite the contrary.

I cast my lot with the reformers - first quietly and then increasingly vocally. I decided to do so because I remember what I wanted to see changed in psychology back when I was in grad school, and I am disappointed that so little transpired in the way of reform back then. There is now hope that things will be different, and that what emerges will be a psychology that really does live up to its billing as a science whose findings matter and can be trusted. I base that on evidence in editorial leadership changes, journals at least tentatively taking steps to enforce more openness from authors, etc. It's a good start. Like I might say in other contexts, there is so much to be done.

Postscript: as time permits, I will start linking to blogs and podcasts that I think will enlighten you. I have been looking at what I have in the way of links and blogroll and realize that it needs an overhaul. Stay tuned...

Reforming Psychology: We're Not Going to Burn it Down!

This post is merely a placeholder for something I want to spend some time discussing with those of you who come here later. There has been a spirited discussion on Twitter and Facebook regarding a recent article in The Chronicle of Higher Education (hopefully this link will get you behind its paywall - if not, my apologies in advance). For the time being I will state that although I have not yet attended a SIPS conference (something I will make certain to correct in the near future), my impression of SIPS is a bit different than what is characterized in the article. I get the impression that these are essentially reformers, something that is increasingly near and dear to me, who want to take tangible actions to improve the work we do. I also get the impression that in general these are folks who largely share some things I value:

1. An interest in fostering a psychological science that is open, cooperative, supportive, and forgiving.

2. An interest in viewing our work as researchers and reformers as a set of tangible behaviors.

I've blogged before about the replication crisis. My views on what has emerged from the fallout have certainly evolved. It is very obvious that there are some serious problems, especially in my own specialty area (generally social psychology, and more specifically in the area of aggression research), and that those serious problems need to be addressed. Those problems are ones that are fixable. There is no need to burn down anything.

I'll have more to say in a bit.