Social Psychology in the Information Age: April 2019

Tuesday, April 30, 2019

Funny, but sad

Here is something that amounts to some light relief:

Soooooo....what is the independent variable again? Please, dear authors, keep your story straight. Is it game type or movie type? Hint: the article is supposed to be about violent video games. Also, why is gender italicized? There are of course other questions. If you read the paper, it becomes obvious that the analyses do not fit the definition of a MANCOVA: there is a difference score used as the DV and no covariate identified anywhere in the paper. So we don't have multiple DVs and no covariates, hence no MANCOVA as any sane human being with basic statistics training would understand the term. The table itself makes no sense. Yes there are numbers. Wonderful. We have no idea what those numbers mean. We can be certain of something: an F value of less than 1 is not going to be statistically significant under any circumstances in the known universe. That is truly impossible.

I can't help but laugh when I read this paper. Note that this is one that is early enough in Zhang's career where he was still putting his family name first in his English-language publications. That would change not too long afterward. Yes I do laugh when I read an error-ridden paper that made its way through peer review. But it is out of sadness. Many years ago, I had a Spanish instructor in high school who noticed that we as a class were having a bad week. We weren't studying, we were not responding to his questions in Spanish, we simply were not there. So he started making jokes. At some point he said, "I just have to laugh, but really this is sad." I got the message loud and clear. He was disappointed. We could have done so much better. As I read this and other papers from this lab, I find myself with that same sense of disappointment. Maybe productivity was more important. Maybe playing the game, which seems to consume so many of us in the academic world (including me for a while) took precedence to our charge of being truth seekers and truth tellers. Whatever the motivation, the end result is another botched paper. We collectively in the psychological sciences deserve better.

A postscript of sorts - research ethics and the Zhang lab

Although I don't have evidence that the Zhang lab was involved in any academic misconduct, and I have no intention of making accusations to that effect, I do think that some of the data reporting itself is at best indicative of incompetent reporting. All I can do is speculate, as I am unaware of anyone who has managed to actually look at this lab's data. What I can note is that there is a published record, and that there are a number of errors that appear across those published articles. Given the number of questions I think any reasonable reader of media violence research might have, Zhang and various members of his lab owe it to us to answer those questions and to provide us with the necessary data and protocols to accurately judge what went sideways.

The Ministry of Education of the People's Republic of China has its set of guidelines for what might constitute academic misconduct and the process involved in opening and conducting an investigation. Accordingly, Southwest University has its own set of policies regarding research ethics and investigations of potential misconduct. The university has a committee charged with that responsibility. What I am hoping is that enough readers who follow my blog will reach out to those in charge and demand answers, demand accountability. I think a very good case can be made that there are some problem papers authored by members of this lab in need of some form of correction, in whatever form that might take. How that gets handled is ultimately up to what Southwest University finds and to the editors themselves who are in charge of journals that have published the research in question. That will help all of us collectively, assuming Zhang's university and the editors of the journals who published his work do the right thing here.

Science is not self-correcting. It is dependent on conscious human effort to detect mistakes, reach out to others, give them a chance to respond, and failing that continue to agitate for corrections. When errors are found, it is up to those responsible for those errors to step up and take responsibility - and to do what is necessary to make things right. We should not have to hope that someone with just the right expertise is willing to make some noise in an academic environment that is essentially hostile to whistle-blowers, and keep making noise until something finally happens. That is time consuming and ultimately draining.

Once I saw Zhang and colleagues' weapons priming paper, published in PAID in 2016, I could not unsee it. There were questions about that paper that continued to gnaw at me, and led me to read more and ask more questions. Before long, I was neck-deep in a veritable swamp of error-ridden articles published out of that lab. I take some cold comfort in knowing I am not alone, and that there are others with far better methodological skills and reach than I possess who can make some noise as well. Bottom line: something is broken and I want it fixed. Make some noise.

Monday, April 29, 2019

It Doesn't Add Up: Postscript on Tian, Zhang, Cao, & Rodkin (2016)

I have been covering significant facets of this particular media violence article by Tian et al. (2016) elsewhere. You can read my previous post here. I had also noted earlier a weird apparent copy-and-paste situation regarding a reported non-significant 3-way interaction here. Finally I spent a bit of time on the Stroop task the authors purported to use, which you can read about here.

I think there is a theme that emerges if you read enough of the articles published by authors in Qian Zhang's lab. That them can be described as follows: It doesn't add up. There are just so many oddities when you read this particular article, or any of a number of their articles, that cataloging them all can be a bit overwhelming. I should never, ever, have to make a statement about that when I read anyone's work. I certainly don't expect absolute perfection, and I have certainly made my share of blunders. Those get fixed. Just the way it is.

So what am I on about this time? Let's get to the skinny. Let's note that there are definitely some problems with the way the authors assess trait aggressiveness. I won't go into the weeds as far as some questions about the Buss and Perry Aggression Questionnaire, although I will note that there have been some questions about its psychometric properties. When I have some time, maybe that would make for an interesting blog post. We know the authors just apparently translated the items to Chinese and left it at that. We get no reliability estimates other than the ones Buss and Perry originally provided. It's weird, and something I would probably ask about in a peer review, but it is what it is now that it's published. I am not exactly a big fan of artificially dividing up scores on a questionnaire into "high" and "low" groups. At least the authors didn't use a median split. That said, we have no idea what happened to those in the middle. I want to assume that participants were pretested and then recruited to participate, but I am not exactly confident that occurred. The authors really don't provide adequate detail to know how these groups were formed, and if those were the only participants analyzed. It would be something else if a whole subset of participants provided their data and were thrown out of these analyses without any explanation as to why that was done. We're just left to guess.

Means are just plain weird throughout this article. So, sometimes are standard deviations. Case in point:

Take a careful look. If you are anything like me, you might be looking at that and shaking your head. Perhaps you might even mutter, "Whiskey. Tango. Foxtrot." There is absolutely no way to convince me that a mean of 504.27 is going to have a standard deviation of 549.27. That is one hell of a typo, my friends. Now take a look at the means that go with Table 2. The presumed means for Table 1 are from the whole sample, yeah? That's what I am assuming. But. But. But. What is with those subsample means in Table 2? The means for the Aggressive Words column just barely include the mean for Aggressive words for the whole sample. How does that happen? And those standard deviations? Then look at the Nonaggressive Words column. The mean in Table 1 could not exist if we were to get the average of the means for Nonaggressive Words in the violent and non-violent movie conditions.

Something does not add up. Think on this for a bit. Say you are planning a meta-analysis in which media violence (broadly defined) is the IV and you are examining any of a number of aggression-related outcomes as your DV. What are you going to use to estimate effect size, and more importantly are you going to trust the computations you obtain? If you are like me, you might pound the computer desk a couple times and then try to nicely ask the corresponding author for data. The corresponding author uses SPSS just like I do (I know this because I tracked down his website), so the process of reproducing analyses and so on would be seamless for me. Don't count on it ever happening - at least not voluntarily. I've tried that before and was stonewalled. Not even a response. Not kidding. Of course, I am just some nobody from a small university in Arkansas. Who am I to question these obviously very important findings? You know how it goes. The rabble start asking for data, and the next thing you know, it's anarchy.

What I am left with is a potential mess. I could do my best to employ whatever formulas I might use for effect size computations, but I would have to ask myself if what I was computing was merely garbage - and worse, potential garbage artificially inflating mean effect size estimates. So, what I would have is a study that should be included in a meta-analysis based on its content, but maybe should not be based on the reported analyses that appear to be at best incompetently reported (that would be the charitable take). I don't like being in that position when I work on a meta-analysis. Full disclosure: I thought an update on an old Bettencourt & Kernahan (1997) analysis on male-female differences as a moderator of aggression-inducing stimuli (such as violent media) would be a fun little project to pursue, and this article was one that would have otherwise been of genuine interest to me - especially since such differences are taken a bit more seriously than in the past, as have methods for assessing publication bias.

Look. If you read through that entire article, you will be stupefied by the sheer magnitude of the errors contained within. I have questions. How did this get past peer review? Did the editor actually read through the manuscript before sending it out to peer reviewers? Why was this manuscript not desk rejected? If these seem like leading questions, that is now my intention. This article should have never seen the light of day in its present form.

What to do? The editor has a clear ethical obligation to demand data from this lab. Those data should still be archived and analyzable. If there are privacy concerns, it is easy enough to remove identifying info from a data set prior to sharing it. No biggie. If the editor cannot get cooperation, the editor has an obligation to lean on the university that allowed this research to happen - in this case, Southwest University in China. The university does have an ethics council. I know. I looked that up as well. Questions about data accuracy should certainly be within the scope of that ethic council's responsibilities. At the end of the day, there is no question that some form of correction is required. I don't know if a corrigendum would suffice (with accurate numbers and a publicly accessible data set just to set things straight) or if a straight-up retraction is in order. What I do know is that the status quo cannot stand, man.

In the meantime, reader beware. This is not the only article from this particular lab with serious problems. It is arguably not even the most egregiously error-ridden article - and that is really saying something. Understanding how various forms of mass media influence how we think, feel, and behave strikes me as something of a worthwhile activity, regardless of what the data ultimately tell us. Cross-cultural research is absolutely crucial. But, we have to be able to trust that what we are reading is work where the authors have done due diligence prior to submission, and that everyone else in the chain from ethics committees to editors have done their due diligence as well. The legitimacy of the psychological sciences hangs in the balance.

Thursday, April 25, 2019

"And bad mistakes/I've made a few"*: another media violence experiment gone wrong

I have covered facets Tian, Zhang, Cao, and Rodkin (2016) in two previous posts. One post covered an allegedly non-significant 3-way interaction that, based on what was reported, would have been significant and that turned out to be identical to another paper authored by this same set of researchers. You can read what I had to say here. I also questioned the reporting of the Stroop task used by Tian et al. (2016). You can read what I had to say about that task here.

Now I just want to concentrate on some data reporting errors that really anyone who can download a pdf file and upload it to statcheck.io would be able to detect.

Out of the eight statistical tests Statcheck could analyze (remember that the ninth did not have the necessary degrees of freedom to allow Statcheck to do its thing), three of those statistical tests were shown to have not only errors, but decision errors. That means that the authors made a conclusion about statistical significance that was wrong based upon the numbers they reported. Here are the three decision errors that Statcheck detected:

1. The Stimulus by Gender interaction was reported as F(1, 157) =1.67, p < 0.01. In actuality, if this F is correct, then p = 0.19816. In other words, there would be no significant interaction to report. Any further subsample analyses are arguably beside the point if that is the case.

2. In fact, to bolster my argument about point 1, let's look at the next decision error. The effect of the stimulus on males was reported as significant, F(1, 210) =3.41, p < 0.01. Not so fast. According to Statcheck the actual p = 0.06621. The effect of Stimulus on the male subsample was nonsignificant.

3. Finally, let's note that the authors report a significant Stimulus by Aggressive Personality Type interaction, F(1, 227) =1.78, p < 0.01. Wrong. According to the Statcheck analysis, the actual p = 0.18349. That interaction was not significant.

As noted earlier, the authors reported a 3-way interaction as nonsignificant, when in all actuality it would have to have been. That means that a very subtle and nuanced analysis of the results never happened, leading us to question the validity of the authors' conclusions.

Statcheck is a wonderful tool for post-peer-review. It is of course limited in what it can do, and ultimately is dependent upon what the authors report. It is no substitute for having an existing data set available for those wishing to reproduce the authors' findings. However, in a pinch, Statcheck comes in handy as a preliminary indicator of what may have gone right and what may have gone wrong.

I am sure there is plenty more that could be asked about this article. The choice to divide up the AQ total score in the way the authors did was arguably arbitrary. I have to wonder if it would have been better to test for an interaction of Stimulus by Aggressive Personality Type using regression analysis instead. That is obviously a more complicated set of analyses, but one can gain some information that might be lost when splitting the scores of the participants on the Buss and Perry AQ as the authors did here and in other articles they have published. Perhaps I am splitting hairs. The lack of development of an aggressive personality instrument that would take into consideration the nuances of the Chinese language (at least Mandarin), including validation, is perhaps more troubling. The authors seem somewhat cognizant that merely translating a pre-existing instrument into their native language is not ideal. It would have been helpful if the authors had at least reported their own reliability numbers, rather than continuing to rely on those published in the original Buss and Perry paper, especially given that these authors had no interest in any of the subscales for the purposes of their research maybe only a Coefficient Alpha for the total AQ score would have sufficed. Referring to the dependent variable as aggression when it really is nothing more than a measure of aggressive cognition is certainly confusing if not a bit misleading. Having seen that terminology botched in enough articles over the years, I suppose that has become one of my pet peeves.

It is what it is. Once more a paper from this particular lab is one I would not trust and would be very hesitant to cite.

Reference:

Tian, J. , Zhang, Q. , Cao, J. and Rodkin, P. (2016). The Short-Term Effect of Online Violent Stimuli on Aggression. Open Journal of Medical Psychology, 5, 35-42. doi: 10.4236/ojmp.2016.52005

*The quote in the title comes from the song, We Are The Champions by Queen. It is one of my favorite songs.

Postscript to the preceding

The reason for my most recent series of posts is to lay bare an important point: in order to understand what our peers are doing when they present their research in any venue (including peer review journals), we need to be able to verify what they are claiming to have done. If anything good emerges from the conversations that have been occurring since the dawn of our current replication crisis (one which is really about so much more than simply replication), it will be the emergence of a trust-but-verify culture. That culture will necessarily have to be international. In order to get at the sorts of approximations of the truth we are supposed to value, we need to be as transparent as we are able. That includes sharing data (cleaned of individually identifying info) and protocols - and not just via email requests. In order to move forward, we are going to have to be able to trust each other. There may be varying incentives across nationalities. That is just a harsh reality we'll have to deal with. It will have to be dealt with internationally and collectively. Some of us will have louder voices than others. Some of us will be more influential due to various forms of privilege. The point is to use our voice, not only as individual scholars but collectively.

Wednesday, April 24, 2019

A tale of two Stroop tasks

Exhibit A:

Exhibit B:

Notice the difference? In Exhibit A, there are 60 words that are matched with 5 different colors. Each word appears with one of each color over five trials. There is a total number of trials of 300. In Exhibit B, there are 100 words total. Each word is matched with each color, but over four blocks of 25 trials per block? I have worked with other sorts of reaction time tasks, and generally there is some painstaking work that goes into making sure that each stimulus-response pair gets presented once in each trial, and the total number of trials adds up logically, as is the case with Exhibit A. What I see with Exhibit B does not quite add up. What am I missing?

Sunday, April 21, 2019

Maybe replication is not always a good thing

Check out these two photos. Notice the similarity?

The top image is from here. Since you'll probably be directed to the corrected version of the article, I will recommend going here to view the original, and also taking a moment to read through the Corrigendum. The Corrigendum is hardly ideal in this case, but seems to clear at least some of the wreckage. Moving on...

The second article comes from here.

So each article reports the findings suggesting a three-way interaction is not significant. In each case, the authors are wrong. I noted that with the weapon-priming article earlier.

But wait. There's more.

Notice that although each article is testing a different prime stimulus (the passage from the top image is one where weapons were primes, and the passage from the bottom is one where level of video game violence is the prime), samples representing different populations (youth in the passage for the weapons priming article and college students in the article examining violent video games as primes), and samples that differed at least somewhat in size, these authors miraculously obtain the same test statistic. A miracle, you say? Bull, I say. This is a case where it would be very helpful for the public to have access to data from each study, as there is reason to wonder how the same finding was obtained in each with the aforementioned differences duly noted.

There is something seriously rotten in the state of aggression research, dear readers. It is past time that we took notice, as there is a pervasive pattern of problems with published articles generated from this particular lab. If it were just one bum article, I could probably write it off as "mistakes were made" and let go. We're way beyond that. The real worry is that authors from this lab are getting in to collaborative relationships with other researchers in the US and EU who are generally reputable. I have to wonder how much their partners know of the problems that exist with their already existing published record. In some cases I wonder how their partners got chosen. The late Philip Rodkin was a researcher in bullying. He did not appear to have much of a presence in media violence research prior to teaming up with the Zhang lab. What expertise do more recent coauthors have with media violence research? How well do they know their new collaborators' work?

At the end of the day, I think it is safe to say that this is a case of unwanted replication. It is again a stark reminder that peer review is a porous filter.

One other thing to note. Rodkin passed away in May 2014. The weapons priming paper was first submitted in July of that year. The violent video game study on which Rodkin appears as a coauthor did not get submitted until October 2015. I have no way of knowing what Rodkin's role was on either manuscript, and although I probably will speculate in personal conversations, I won't do so here publicly as that is probably irresponsible. There are certainly ethical ways to handle a situation where a contributor to a research endeavor dies, and I hope that the editors in each instance were made aware of the circumstances at the time of submission.

References:

Tian, J. , Zhang, Q. , Cao, J. and Rodkin, P. (2016). The Short-Term Effect of Online Violent Stimuli on Aggression. Open Journal of Medical Psychology, 5, 35-42. doi: 10.4236/ojmp.2016.52005

Zhang, Q, Tian, J., Cao, J., Zhang, D., & Rodkin, P. (2016). Exposure to weapon pictures and subsequent aggression in adolescence. Personality and Individual Differences, 90, 113-118. doi: 10.1016/j.paid.2015.09.017.

Saturday, April 20, 2019

A reminder that peer review is a very porous filter: Zheng & Zhang (2016)

This particular article is of interest to me as I was a peer reviewer on an earlier draft of the manuscript. I recommended rejection at the time, and was disappointed to see the article ultimately published with minimal changes. My initial point of contention was fairly basic: the manuscript appeared to be a rough draft, rather than one that had been finalized and ready for review. I noticed enough grammatical errors to have concerns about its appropriateness in the journal for which I was reviewing, or really any journal for that matter. If this had been an undergraduate project, I would have likely marked up the draft, and made gentle suggestions for how to improve the paper prior to the due date for a final draft. Instead, I had to respond to what was presumably the work of seasoned professionals. Then there were some glaring questions about research design and some of the data analyses, as something seemed just a little off. Anyway, I recommend rejection and move on with my life.

Then a few years later I see this same paper in a peer review journal. Look. Things happen. Peer reviewers are often under enormous stress and even the best editors are going to let some real dross pollute the literature from time to time. We're all only human, after all. So, I want to cut everyone on that side of the equation some slack.

The article itself is divided into two studies. I will focus primarily on Study 2, as that is where the most obvious problems can be detected. Study 1 was intended to determine two video games that were as equivalent as possible on a number of dimensions, except of course for the presence or absence of violent content. The reader can be the judge of whether or not the authors succeeded in that endeavor. At bare minimum, the reporting of those findings appears to be clean.

Study 2 is where things get interesting. Remember, the authors have three objectives when testing this sample of youths: 1) determining that there is a main effect of violent content of video games on accessibility of aggressive cognition (although the authors do not quite word it that way), 2) determining if there is a gender by video game interaction, and 3) determining if there is a trait aggressiveness by video game interaction. So far, so good.

It is clear that things start to fall apart in the method section. The authors selected 60 goal words for their reaction time task: 30 aggressive and 30 non-aggressive. These goal words are presented individually in four blocks of trials. The authors claim that their participants completed 120 trials total, when the actual total would appear to be 240 trials. I had fewer trials for adult participants in an experiment I ran over a couple decades ago and that was a nearly hour-long ordeal for my participants. I can only imagine the heroic level of attention and perseverance required of these children to complete this particular experiment. I do have to wonder if the authors tested for potential fatigue or practice effects that might have been detectable across blocks of trials. Doing so was standard operating procedure in our lab in the Aggression Lab at Mizzou back in the 1990s. Reporting those findings would have also been done - at least in a footnote when submitted for publication.

The results section is, to phrase this as nicely as possible, a mess. First, the way the authors go about reporting a main effect for video game violence on aggressive cognition is all wrong. The authors look only at the reaction times on aggressive words. What the authors should have done is compare difference scores between aggressive and neutral words in each condition - in other words, did the treatment cause more of a change from baseline than the control condition? It appears the answer is no when we look at the subsequent analyses. In those analyses, the DV is a difference score. On that basis alone, we can rule out a main effect of level of video game violence on aggressive cognition. As that sort of finding is a cornerstone of some cognitive models of aggression used to buttress an argument regarding the dangers of violent video games, the lack of a main effect on mere aggressive cognition is one that should raise eyebrows.

What happens when we look at potential interaction effects? Do subsamples save the day? Given the reporting errors that I could detect from a simple Statcheck run, that, too may be questionable, depending on what we are looking at. For example the authors manage to misreport the actual main effect of video game violence on aggressive cognition: F(1, 54) = 3.58 as p < .05 instead of p = .064 as computed by Statcheck. Oops. So much for that. The game by gender interaction was actually statistically significant, although not quite to the extent the authors reported: F(1, 62) = 4.89 as p < .01 instead of p = .031 as computed by Statcheck. Maybe subsamples will save the day. The aggressive cognition of boys seemed to be influenced by the level of violence in the game they played. The same effect was not found for girls. There were no obvious errors for the analyses of interaction between video game and trait aggressiveness. Subsample analyses seem to show that the effect is found among highly aggressive children but not those of moderate or low levels of aggressiveness.

There is of course a three-way interaction to contend with. The authors claim there was none, but they were wrong: F(1, 63) = 5.27, p > .05 instead of what I found on Statcheck, which indicated p = .025. That is a pretty serious decision error. Hence, the requisite follow-up analyses for this three-way interaction were apparently never performed or reported. There are of course some questions about why degrees of freedom vary so much from analysis to analysis. Although I am sure that there is a simple explanation for those anomalies, the authors don't bother to do so. We as the readers are left in the dark.

The reported findings are just error-prone enough to question the conclusions the authors wish to draw. My guess is that if the three-way interaction was truly significant, the authors would have a much more nuanced explanation of what the children in their experiment were experiencing. Regrettably, that opportunity is lost. At bare minimum, we as readers can say that the authors do not have the evidence to back up their claim that violent video games prime aggressive cognition. Those of us who have even a minimal background in media violence research can certainly question whether the authors' work for this experiment added to our understanding about the underlying processes presumed to exist between exposure to an aggression-inducing stimulus (in this case violent video games) and real life aggression. I have no idea if a correction or retraction will ever be forthcoming, but I can only hope that something is done to clarify the scientific record. At bare minimum, I would recommend against citing this particular article at face value. Those conducting meta-analyses and those who are looking for cautionary tales of how to not report one's findings probably will be citing this article for the foreseeable future.

Zheng, J., & Zhang, Q. (2016). Priming effect of computer game violence on children’s aggression levels. Social Behavior and Personality: An International Journal, 44(10), 1747–1759. doi:10.2224/sbp.2016.44.10.1747

Thursday, April 18, 2019

More to come

The Zhang lab in China has more than one problematic article. I will be spending time on the others as I am able. My initial motivation to continue to pursue the matter came from the initial realization that 1) this was a set of researchers who were not interested in a cooperative dialog, and more importantly a willingness to share data and analyses, and 2) that there was at least one faulty article I apparently reviewed and recommended for rejection that got published anyway. The peer review system is not exactly the sort of line of defense against questionable research practices and poor reporting that is often portrayed in textbooks. In principle, research that falls outside the usual US/EU confines is to be encouraged. In practice, that research needs to be rigorous (regardless of its location, and regardless of the sample).

Wednesday, April 17, 2019

A reminder that changing one's perspective is worthwhile

Part of the fallout of the last couple years' worth of drama with both the weapons effect meta-analysis and with a retracted narrative review that really should have never made it past peer review, let alone submitted in the first place, has been a change the way I view the weapons effect research literature. That drama has been very public to a degree that I ordinarily find discomforting. However, it had to be dealt with. It did get dealt with in a way that I am more or less okay with.

At the end of the day, my perspective on the weapons effect (or weapons priming effect) literature came down to what the data analyses were telling me. If numbers could speak, they would be screaming that we as researchers really needed to reassess the state of the weapons effect literature, and more specifically acknowledge that early skeptics like Arnold Buss may well have been right all along. Regrettably numbers cannot speak, so it is up to us as researchers to do our best to speak for them. My approach was first to listen to those numbers and then to advocate for them. And the numbers, especially for aggressive behavioral outcomes, are damning.

Working with coauthors who were at cross-purposes was not fun - especially with the meta-analysis. I had a third author who was and still is desperate to keep the standard weapons effect narrative intact and a second author who had regrettably been caught up in an unfortunate situation where there had been a database error with which he had no involvement and in which he was understandably desperate to protect his reputation. For the record, I am very sympathetic to the second author's plight, and have and will continue to do anything I can do to protect his reputation with regard to that particular project. If he has no further dealings with my third author, I will consider myself justified in taking the stance I did regarding his credibility. Our second author's very important work was what opened my eyes for good. It is conceivable that short term exposure to weapons primes aggressive thoughts (though recall that is not a guarantee) and primary threat appraisal. However, there is no solid evidence that short term weapons exposure primes aggressive behavioral outcomes (which in lab and field experiments are very mild). Hence, this is a literature that tells us next to nothing about short term exposure to weapons and real life violence.

Those experiences left me intellectually paralyzed for a while. At one point, I really could not write - as in literally could not write. I second-guessed every word, fearing it would duplicate something I had written previously or would amount to some sort of categorical error. That is a lousy way to exist. So last summer, I started the process of a reset, if you will. I asked myself what kind of paper I would write on the weapons effect if I had no coauthor or coauthors. I wrote it. I put it through a plagiarism software I now subscribe to for good measure. I tried submitting it and found a journal that would accept it eventually. That was a relief. Although I was a bit jarred that anyone would want me to speak on the topic, I now have on two occasions. One had to be done virtually after my spouse had what amounted to a debilitating injury. The other occurred on Monday. Both appear to have been well-received. That was a relief. Stating that a body of research that appears in Introductory Psych and Social Psych textbooks is probably not holding up is hardly easy. It is not clear to me that becoming a skeptic on this particular area of research is winning friends and influencing people. But it at least is an honest perspective that can be backed up with tangible evidence. I can live with that.

Earning an award for the weapons effect paper I wrote over the summer and eventually got accepted was a pleasant shock to my system. I feel uncomfortable bragging about myself, and the paper was hardly anything earth-shattering. It was merely a way to make the correction that I would have made in another context had I been allowed to do so. The journal itself is one that probably does not have an impact factor, although its editor and reviewers are very solid when it comes to their work (I know - I have done some peer review for the journal in question in years past). The point was not to appear in a high impact venue, but rather just to get the info out there without the baggage that comes with paywalls, gatekeepers who have a stake in maintaining a false status quo, etc. I will at some point hang that award plaque in my office, and most likely no one will ever notice it. But it will be a reminder to me of a painful era that I somehow managed to survive, and one in which I made the first steps to finding my way forward. Starting the process of closing the books on a research literature that once defined me is just that - a starting point. I think I can now move on both as a researcher and more importantly as an educator. There are new matters that came to my attention thanks to that set of experiences that will occupy me for a while. I know where I need to focus now.

Sunday, April 7, 2019

Null Results and Some Housekeeping

This is a sort of postscript to the preceding.

If I am able to do anything even remotely right in what remains of my career and lifespan, it is to advocate for making null findings and non-replications public. In an ideal world, publishers would readily accept such findings and researchers would not self-censor. This is hardly an ideal world. I have to deal with that. However, here's the thing. I don't want to go about my work with false confidence about an effect. Not only may I be wrong, but I might unwittingly lead others down blind alleys, and that is something I am not comfortable doing.

With regard to the cognitive piece of the weapons priming effect, I am perfectly comfortable if indeed there are very few null findings when aggressive thoughts are measured and that most findings are indeed positive findings. That set of circumstances would be one that validates a couple very important set of experiments that highlighted my early career. Nothing would feel better for me, and I enjoy feeling good as much as the next person. I don't want to have false hope. I also want to communicate something else that often gets overlooked in our rush to find results that are p<.05: it is possible to get null findings or a non-replication and still do everything right. It might even be arguably probable. To the extent that I can, I want to foster a research culture that encourages researchers to come public when they aren't replicating a finding, for example. Take the individual who tried to do a subliminal priming experiment where weapons were the stimuli and aggressive thoughts served as a DV. That person was convinced he must have done something wrong. From what I recall of our conversation some two decades ago, I had the impression that this person had done his homework when it came to research design and analysis. What if we missed a golden opportunity to learn something useful, simply because we existed in a period where non-replications of that nature might inconvenience the gatekeepers in our particular field? That is more than merely a shame, and one that reflects badly not on my peer but instead on our research culture.

At the end of the day, I want to have confidence that something I thought I found is real. If there is no empirical basis for that confidence, I need to know that too. When you discover a finding that is kind of novel or quirky, go ahead and put that out there. I have no qualms doing so. That also means I have to have no qualms about being shown that my finding was a fluke. That's just the way we have to work if the science of psychology is to advance. So with regard to the weapons priming research, I really do want folks to come forward if they have non-replications that I do not know about. I will be grateful, and you will help me know how to communicate with my students and with members of my local community. If there are a bunch of unpublished successful replications, let me know as well - I will remain discrete to the extent you require of me. If something I worked on in grad school is actually right on the money, I will be grateful just to know that. Regardless, we need more openness and transparency.

One more thing. I do have a comments policy, even if it has never been explicitly stated before. I have my blog programmed to allow unmoderated comments for the first two weeks after a post is published. After that, any new comments go through moderation. I do that largely because I have no idea how frequently I can blog or how frequently I can log in to this account. Comments that are on point, whether or not they agree with a position I take are always welcome. I mainly want to protect this blog and my readers from spam. So far, I have never had to worry about trolling or abusive behavior, although I will deal with such situations appropriately if they ever occur. Basic rule of thumb: don't be a prick. Seems simple enough.

The area of research that has occupied my life is one that is filled with disagreement and controversy. I am clearly now at odds with some former coauthors as to my take on what the evidence on those matters tells us. That is an interesting space to occupy both personally and professionally. I'm okay with it. Whether those involved are okay with it is up to them, and has no bearing on how I proceed as a psychologist who wishes to communicate directly with the public.I rarely get much in the way of interaction here. More of my productive interactions seem to occur through Twitter. I am okay with that as well. This is merely a space where I can work out my thoughts in a form that is longer than 288 characters, but less formally than I would in a conference presentation or article.

Do weapons prime aggressive thoughts? Not always!

This is probably as good a time as any to mention that Zhang et al. (2016) is not the only experiment showing no apparent priming effect of weapons on aggressive thoughts. Arguably, anyone who follows this particular literature closely - and admittedly that is only a small handful of personality and social psychology researchers - is well aware of the two experiments in William Deuser's 1994 dissertation. To my knowledge, both experiments were well-designed and executed. Mendoza's (1972) dissertation also deserves mention. In that experiment, children were exposed to toys that had weapons or neutral toys. The cognitive dependent variable was content from participants' responses to a projective test. Those findings were non-significant for boys and girls across multiple sessions.

Those are the null findings that are at least in the public record. When we get to non-published results that found no link between weapon primes and aggressive thoughts (however manipulated or measured), there is so much that is unknown. What little I do know is probably more a matter of personal communication and hearsay. Unfortunately, those do not exactly lend to effect sizes that can be computed and integrated. For example, I am aware of one effort at the end of the 1990s to run an experiment similar to the sort that Bartholow and I had run, except that weapons vs neutral objects were subliminally primed. That experiment was a nonreplication. Given what we know now of the subliminal priming literature (which is littered with non-replications) this is not surprising. How I would love to get a hold of those data and protocols, to the extent that those might have been written up. I am aware of another weapons priming experiment from this decade that was designed to have a bit more ecological validity. As I understand it, the undergraduate student attached to that particular project bailed part-way through data collection, and the project was dropped. No way of knowing whether a replication happened or not there. There are quite likely other non-replications and half-completed projects stored on hard drives somewhere that no one knows about. From the perspective of a meta-analyst, this is especially frustrating as I am left hoping that publication bias assessments are adequately reflecting reality.

The cornerstone of the weapons effect narrative is that weapons reliably prime aggressive thoughts, which presumably leads to a chain of psychological events leading up to an increase in aggressive behavioral outcomes. What would happen if we needed to remove that cornerstone? My guess is that what remains of an already shaky narrative would crumble. As the state of the literature currently stands, the question of weapons priming aggressive behavioral outcome is inconclusive at best. That may or may not change in the near future.

So, what can we do? I am at least going to try to move the conversation a little. One way you can help me do so is if you have collected data where you have a weapon prime (weapon vs neutral images or words) and some cognitive outcome variable (reaction times on pronunciation task, lexical decision task, Stroop task, etc.; scores from a word completion task), talk to me. Better yet, make your protocols, data, and analyses available. If it turns out that aggressive cognitive outcomes are reliably predicted by weapon primes, that's fantastic. But if they are not, I think the scientific community and the public have a right to know. Seems fair enough, right? So if you have something that needs to be brought to light, talk to me. I'm easy to find on Twitter (I answer DMs regularly) or email (that's public record). I even have links to all my social media. Contact me at any time. I will get back to you.

Closing the books on a correction (Zhang et al., 2016)

When I was updating the weapons effect database for a then-in-progress meta-analysis a little over three years ago, I ran across a paper by Zhang, Tian, Cao, Zhang, & Rodkin (2016). You can read the original here, as it required significant corrections. The Corrigendum can be found here.

Initially, I was excited, as it is not often that one finds a weapons effect paper published that is based on non-American or non-European samples. There were obvious problems from the start. First, although the authors purport to measure aggression in adolescents (in reality the sample were pre-adolescent children), in reality the dependent variable was a difference in reaction time between aggressive and non-aggressive words. To put it another way, the authors were merely measuring accessibility of aggressive thoughts that presumably would be primed by mere exposure to weapons.

The analyses themselves never quite added up, which made determining an accurate effect size estimate from their work to be, shall we say, a wee bit challenging. I attempted to contact the corresponding author asking for data and any code or syntax used in the hopes of reproducing the analyses and getting the information necessary and obtaining the effect size estimate that would most closely approximate the truth. That email was sent on January 26, 2016. I never heard from Qian Zhang. I figured out a work-around in order to obtain a satisfactory-enough effect size estimate and moved in.

But that paper always bothered me once the initial excitement wore off. I am well aware that I am far from alone in having some serious questions about the Zhang et al. (2016) article. Some of those could be written off as potential typos: there were some weird discrepancies in degrees of freedom across the analyses. The authors contended that they established that they had replicated work I had been involved in conducting (Anderson, Benjamin, & Bartholow, 1998) by simply examining if reaction times to aggressive words were more rapid when primed with weapons than neutral images. In our experiments, we used the difference between aggressive and non-aggressive words as our dependent variable. And based on the degrees of freedom reported, it appeared that the analysis was based on one subsample, as opposed to the complete sample. So obviously there are some red flags.

The various subsample analyses using a proper difference score (they call it AAS) also looked a bit off. And of course the MANOVA table seemed unusual, especially since the unit of analysis appeared to be their difference score (reaction times for aggressive words minus non-aggressive words) - a single dependent variable - as opposed to multiple dependent variables. Although I have rarely used MANOVA and am unlikely to use MANOVA in my own research, I certainly had enough training to know what such analyses should look like. My understanding is that one would report MS, df, and F values for each IV-DV relationship, with the understanding that there will be at least two DVs for every IV. A cursory glance at the most recent edition I had of a classic textbook on multivariate statistics by Tabachnick and Fidell (2012) convinced me that the summary table reported in the article was inappropriate, and would confuse readers rather than enlighten them. There were other questions about the extent to which the authors more or less copied and pasted content from the Buss and Perry (1992) article in which they present their Aggression Questionnaire. Those as of yet have not been adequately addressed, and I suspect they never will.

So, I ran the analyses the authors provided in statcheck.io. I had even more questions. There were numerous errors, including decision errors even assuming that the test statistics and their respective degrees of freedom were accurate. Just to give you a flavor, here are my initial statcheck analyses:

As you can see, the authors misreport F(1, 155) = 1.75 p < .05 (actual p = .188), F(1, 288) = 3.76 p < .01 (actual p = .054), and F(1, 244) = 1.67, p < .05 (actual p = .197). The authors also appeared to misreport a three-way interaction as non-significant that clearly was statistically significant. Statcheck could not catch that one due to the authors' failure to include any degrees of freedom in their report. Basically, there was no good reason to trust the analyses at this point. Keep in mind that what I have done here is something that anyone with a basic graduate-level grounding in data analysis and access to Statcheck could compute. Anyone can reproduce what I did. That said, communicating with others about my findings was comforting: I was not alone in seeing what was clearly wrong.

In consultation with some of my peers, something else jumped out: the authors reported an incorrect number of trials. The authors reported 36 primes and 50 goal words which were each randomly paired. The authors reported a total number of trials as 900. However, if you do the math, it becomes obvious that the actual number of trials was 1800. As someone who was once involved in conducting reaction time experiments, I know the importance of not only assessing the necessary number of trials depending on the number of stimuli and target words that must be randomly paired, but also the importance of accurately reporting the number of trials required of participants. It is possible that given their description in the article itself, the authors took the number 18 (for weapons, for example) and multiplied it by 50. In itself, that seems like a probable and honest error. It happens, although it would have been helpful for this sort of thing to have been worked out in the peer review process.

The corrections in the corrigendum suggest a rather massive correction to the article. The presumed MANOVA table never quite gets completely resolved to satisfaction, and a lingering decision error remains. The authors also start using the term marginally significant to refer to a subsample analysis that made me cringe. The concept of marginal significance was supposed to have been swept into the dustbin of history a long time ago. We are well enough along into the 21st century to avoid that vain attempt to rescue a finding altogether. Whether the corrections noted in the corrigendum are sufficient to save the conclusions the authors wished to make in the article is questionable. At minimum, we can conclude that Zhang et al. (2016) did not find evidence of weapon pictures priming aggressive thoughts, and even their effort to base a partial replication on subsample analyses was not sufficient. It is a non-replication, plain and simple.

My recommendation is not to cite Zhang et al. (2016) unless absolutely necessary. If one is conducting a relevant meta-analysis, citation is probably unavoidable. Otherwise, the article is probably worth citing if one is writing about questionable reporting of research, or perhaps as an example of research that fails to replicate a weapons priming effect.

Please note that the intention is not to attack this set of researchers. My concern is strictly on the research report itself, and the apparent inaccuracies contained in the original research report. I am quite pleased that however it transpired, the editor and authors were able to quickly make corrections in this instance. Mistakes get made. The point is to make an effort to fix them when they are noticed. That should at least be normal science. So kudos to those involved in making the effort to do the right thing here.

References

Anderson, C. A., Benjamin, A. J., Jr., & Bartholow, B. D. (1998). Does the gun pull the trigger? Automatic priming effects of weapon pictures and weapon names. Psychological Science, 9, 308-314. doi: 10.1111/1467-9280.00061

Buss, A. H., & Perry, M. (1992). The aggression questionnaire. Journal of Personality and Social Psychology, 63, 452-459. doi:10.1037/0022-3514.63.3.452

Tabachnick, B. G., & Fidell, L. S. (2012). Using multivariate statistics. New York: Pearson.

Zhang, Q, Tian, J., Cao, J., Zhang, D., & Rodkin, P. (2016). Exposure to weapon pictures and subsequent aggression in adolescence. Personality and Individual Differences, 90, 113-118. doi: 10.1016/j.paid.2015.09.017.

Thursday, April 4, 2019

What attracted me to those who are trying to reform the psychological sciences?

That is a question I ask myself quite a bit. Actually the answer is fairly mundane. As the saying goes" "I've seen stuff. I don't recommend it."

Part of the lived reality of working on a meta-analysis (or the three I have worked on) is that you end up going down many of your research area's dark alleys. You see things that frankly cannot be unseen. What is even more jarring is how recently some of those dark alleys were constructed. I've seen it all: overgeneralization based on small samples relying on mild operational definitions of the variables under consideration, poorly validated measures, the mere fact that many studies are inadequately powered, and so on.

If you ever wonder why phenomena do not replicate, just wander down a few of our own dark alleys and you will understand rather quickly. The meta-analysis on the weapons effect, for which I was the lead author, was a huge turning point for me. Between the allegiance effects, the underpowered research, and some questions about how any of the measures employed were actually validated, I ended up with questions that had no satisfactory answer. I've been able to show that the decline effect that others had found when examining Type A Behavior Pattern and health outcomes also applied to aggressive behavioral outcomes. I was not surprised - only disappointed in the quality of the research conducted. That much of the research was conducted at a point in time in which there were already serious questions about the validity of what we refer to as Type A personality is itself rather disappointing. And yet, that work persisted for a while, often with small samples. I have also documented in my last two meta-analyses duplicate publications. Yes, the same data sets manage to appear in at least two different journals. I have questions. Regrettably, those who could answer are long since retired, if not deceased. Conduct a meta-analysis, and expect to find ethical breaches, ranging from potential questionable research practices to outright fraud.

That's a long way of saying that I get the need for doing whatever can be done to make what we do in the psychological sciences better: validated instruments, registered protocols and analysis plans, proper power analyses, and so on. There are many who are counting on us getting it as close to right as is humanly possible. Those include not only students, but the citizens who fund our work. There is no point to "giving away the science of psychology in the public interest (as George Miller would have put it) if we are not doing due diligence at the planning phase of our work.

Asking early research professionals to shoulder the burden is unfair. Those who are in more privileged positions need to step up. We need to be willing to speak truth to the teeth of power, otherwise there is no point in us even continuing, as all we have is a pretense with little substance. I wish I could say doing so would make one more marketable and so on. The reality is far more stark. At minimum we need to go to work knowing we have a clean conscience. Doing so will maintain public trust in our work. Failure is not something I even want to contemplate.

So I am a reformer. However long I am around in an academic environment, that is my primary role. Wherever I can support those who do the heavy lifting, I must do so. I have undergraduate students and members of the public in my community counting on it. In reality, we all do.

About those "worthless" Humanities degrees?

Well, they are not so "worthless" after all.

A clip:

Take a look at the skills employers say they’re after. LinkedIn’s research on the most sought-after job skills by employers for 2019 found that the three most-wanted “soft skills” were creativity, persuasion and collaboration, while one of the five top “hard skills” was people management. A full 56% of UK employers surveyed said their staff lacked essential teamwork skills and 46% thought it was a problem that their employees struggled with handling feelings, whether theirs or others’. It’s not just UK employers: one 2017 study found that the fastest-growing jobs in the US in the last 30 years have almost all specifically required a high level of social skills.

Or take it directly from two top executives at tech giant Microsoft who wrote recently: "As computers behave more like humans, the social sciences and humanities will become even more important. Languages, art, history, economics, ethics, philosophy, psychology and human development courses can teach critical, philosophical and ethics-based skills that will be instrumental in the development and management of AI solutions.

Worth noting: During our December graduation ceremony, fully half the names read, and a bit over half the names of graduates listed in the program were individuals who pursued degrees in the Humanities (we'll include the Social Sciences under that umbrella given how my university is organized). There are soft skills that can make one potentially flexible for any of a number of opportunities post-graduation. In our culture's obsession with workforce development, the Humanities (broadly defined) often are overlooked in the conversations among policymakers. That's not to denigrate the necessity of preparing students for life after graduation, but to bear in mind that there are lessons and skills acquired within the various Humanities majors that can prove valuable in a variety of fields that, on the surface, have nothing directly to do with the degree they earn. And yet, they may well be the people you want selling your next house or the next time you need orthopedic surgery.

Wednesday, April 3, 2019

What Would Buffy Do?

Here is a somewhat whimsical tweet:

To this day, I still ask the question that any generally worthwhile person would ask: WWBD (What Would Buffy Do)? pic.twitter.com/UOC0S8cOgB
— James Benjamin (@AJBenjaminJr) March 27, 2019

I followed that up with the following statement:

There are times when I think back to Buffy's conflicts with the Watchers' Council and I notice how relevant that set of conflicts is to how reformers in the psychological sciences deal with an established hierarchy and rules that are still currently in place.

I am a fan of the TV series (and comic books that followed the series) Buffy the Vampire Slayer. I sometimes like to remark that the first three seasons were part of what got me through grad school. Perhaps that overstates things slightly, but it was a series that was in the right place at the right time.

A number of facets of that series fascinated me then, and continue to fascinate me now. One is the on-going conflict that Buffy Summers had initially with Rupert Giles (her Watcher) and by extension the Watchers' Council.

Buffy was never particularly keen on the mythology perpetuated by the Watchers' Council. You know that whole bit about how "unto each generation a Slayer is born", right? That never sat well with her, and to a certain degree the "One Slayer Theory" was effectively debunked the moment Buffy briefly (as in for a few seconds) died, before being revived. After that, there were effectively two slayers - the more notorious of those being Faith. Another story for another time.

She also was not too keen on the rituals and rites, nor the secretiveness that came with being a Slayer. Buffy let several non-slayers into her circle of friends and allies over the course of the series, initially to the chagrin of Giles. Maybe a bit more openness would help with slayage, I could imagine Buffy reasoning. Buffy also is adept at uncovering some of the ethically questionable practices of the Watchers' Council, including their use of torture and kidnapping (as experienced by Faith), and in doing so eventually severs her ties with the Watchers' Council. Indeed, that organization gets exposed over the course of the series as moribund and out of touch with changing realities that require action. Ultimately, she sets on a course that empowers potential slayers at all levels to become involved in the work that she had initially been told she alone must do. A certain amount of openness and cooperation, and a willingness to keep an open mind toward those who might not seem like allies on the surface proved beneficial by the end of the television series.

I doubt I am the first to connect the ethos of Buffy the Vampire Slayer to the struggles we are going through within the psychological sciences (replication, measurement, and theoretical crises) and the rather lackadaisical approach by those most positioned to effect change, to connect those reforming as struggling with a hierarchy that rewards maintaining an increasingly untenable status quo. I don't yet know anyone else who has made that connection. It's a series I have lately been rewatching and finding renewed inspiration.

Perhaps I will write this up into something a bit more formal. There is an actual journal devoted to Whedon Studies, which does cover Buffy the Vampire Slayer in quite a bit of detail. Pop culture and fandom are of some interest to me, even if I have not had much of an excuse to really explore that avenue in greater detail.

In the meantime, as I teach and as I deal with research projects, I ask myself the question, what would Buffy do? The answer to that question is often my guide for action.

Postscript to the preceding

What do you do when something you wrote is discredited? I had to confront that last year around this time. In my case, that meant coming to terms that an article on which I was a lead author turned out to have too much duplicate content. That's a nice way of saying it was self-plagiarized. I can say that the one thing that saved me is I saved all my emails with my coauthor on that one. Retraction Watch got my account correct. Regrettably, the site could not get the other players in that particular tragedy of errors to comment. As I said, I have all the emails. I also have evidence that the only original material was whatever I added to that manuscript. You take cold comfort where you can get it.

As time has passed, I have come to become more thankful for that retraction. The article was hopelessly out of date by the time it got published. The whole quality control and peer review process was so hopelessly messed up that I will likely never trust that journal outlet again. I also don't quite trust myself again. Writing turned out to be enormously difficult for a while. I now subscribe to plagiarism software on my own dime in order to assess that any new work I write is genuinely original. I actually re-ran prior articles I authored to make sure they were sufficiently original. For a while my whole thought process was paralyzed.

So, I have been putting safeguards in place. I have also used the fallout from that set of events as a way of assessing my priorities in life. One had to do with where the weapons effect fit in. I am tied to that line of work, so it will never quite leave me, but I am realizing that I am at a point where I can start tying up loose ends. Writing an article on the weapons effect as a solo author and finding an outlet that would give it an honest and thorough peer review was a start. Aside from a couple minor typos that never quite went away, I can now say that I can come out the other side as someone who can still write. More importantly, I decided to shift my focus from trying to establish or confirm media effects to making sure that dependent variables I rely on - and more specifically their operational definitions - make sense and actually do the job they were intended to do. That work - namely the question of validity - will keep me going for a while.

The real lesson is to acknowledge where one is in error, work earnestly to correct that error, and find a way to move on. I am back to working largely in the shadows, like any of a number of literary and cinematic characters I admire, to continue the necessary work of improving the psychological sciences. There really is something to be said for obscurity. I have come to appreciate my relative lack of prominence more and more. In the meantime, there are assignments to be graded, undergraduates to be mentored, and engagements in my local community that will not win me any prizes, but which may serve those looking for answers that can occupy my time. I can live with that. Can you?

Tuesday, April 2, 2019

A brief outline of my road to weapons effect skepticism

I have a much longer post in the works. At the moment it is nowhere near ready for public consumption. I do think that in the meantime it might be helpful to lay down some talking points regarding my particular journey:

1. I was involved in some of the early research that showed a link between mere exposure to various weapon images and accessibility of aggressive thoughts. That's part of my history, along with my general acceptance that the Carlson et al. (1990) meta-analysis had essentially closed the case regarding the link between the mere exposure of weapons on aggressive behavior.

2. Post-graduate school, as a professor who primarily focuses on instruction, I continued to find interest in the weapons effect. Research opportunities were few and far between, of course. I continued to share what I knew about the available research.

3. At some point around the start of this decade, I got serious about updating the old Carlson et al. (1990) meta-analysis. I was reproducing the original effect sizes using a now very antiquated software called D-Stat (way too limiting - never use again). That was successful insofar as it went. I got an offer to do something considerably more sophisticated, and was promised access to better software and expertise. I could not refuse.

4. I would end up coauthoring the occasional narrative literature review that gave a glowing portrayal of the weapons effect literature. I believed what I wrote at the time as it was consistent with the findings I was involved in generating from the new meta-analysis, and generally supportive of the original Carlson et al. (1990) meta-analysis. In hindsight, I came to realize I was wrong.

5. Eventually an updated meta-analysis I coauthor gets accepted for publication. Yay. Then boo. There turned out to be a database error that I and the individual who was third author on the article never caught. The fact that we never caught it still bugs me to this day. I have emails documenting my requests to said coauthor to verify that the database was accurate over the period prior to publication.

6. Reanalyses required a rethink. Publication bias is a serious problem with this literature. Establishing a link between exposure to weapons and aggressive behavioral outcomes is difficult at best, and probably not doable. Moderators that appeared to be interesting were not so interesting.

7. How do you go about redrafting an article when each of the coauthors is working at cross-purposes? Hint: it ain't easy.

8. Bottom line: I cannot speak for my coauthors, but I cannot unsee what I saw. Based on the available evidence, I can no longer have confidence that the weapons effect is a legitimate phenomenon. That is not a knock on Berkowitz, but is rather the cold hard truth as I see it after looking at the evidence available was available.

9. Initial analyses I ran last fall after the revised manuscript was accepted show that there is also a potential allegiance effect. That really needs further exploration.

10. Although there are analyses that I certainly would love to run or wish I had run, the bottom line remains: there are likely issues with sample size and research design in these experiments that makes assessing behavioral outcomes darned difficult at best. As a social prime, short-term exposure to weapons may not be particularly interesting.

11. I honestly believed an effect was real that apparently was not. Once the evidence to the contrary socked me in the jaw (to borrow a phrase from Wittgenstein), I had to change my perspective. Doing so was not easy, but I have no regrets.

I'll lay out the longer version of this later.

Monday, April 1, 2019

When social narratives and historical/empirical facts clash

This is just intended to be a quick fly-by post, inspired by a talk an anthropologist friend of mine gave at a local bookstore over the weekend. In discussing the tourist industry centered on the frontier mythology that dominates my community, he noted that there were social facts that often hid historical facts. By the way, countering the prevailing narrative is not an easy task, and a great way to make a few enemies along the way.

Anyway, his presentation got me thinking a good deal about how we go about teaching the psychological sciences, and more specifically social psychology. Any of us who have ever taken an introductory psychology course will inevitably read and be lectured on the story of Kitty Genovese, who was murdered in Queens in the early 1960s. I won't repeat the story here, but I will note that what we often see portrayed in textbooks is not quite what actually happened. The coverage spawned research on the Bystander Effect, which may or may not be replicable. As a narrative, Genevese's murder has been used by social conservatives as an example of the breakdown of traditional moral values in modern society (part of the subtext is that Ms. Genovese was a lesbian), and by social psychologists to further their narrative of the potentially overwhelming power of the situation. Policymakers have used the early Bystander Effect research based upon the myth of Kitty Genovese to pass "Good Samaritan" laws. The Bystander Effect and the myth of Kitty Genovese that spawned that research has been monetized by ABC in the reality series, What Would You Do. There is power and profit to be had by maintaining the narrative while burying the historical facts. After several decades, the damage is done. The APA's coverage of the murder of Kitty Genovese effectively debunked the myth a decade ago. And yet it still persists. If more non-replications of the Bystander Effect across nations and cultures are reported, that is great for science - to the extent that truth is great for science. However, my guess is that the debunked classic work will remain part of the narrative, shared on social media, and in textbooks for the foreseeable future.

In my own little corner of research, there is a sort of social narrative that has taken hold regarding various stimuli that are supposed to influence aggressive and violent behavior. It is taken as a given that media violence is causally related to aggression and even violence, even though skeptics have successfully countered the narrative with ample data to the contrary. Even something as superficial as short-term exposure to a gun or a knife is supposed to lead to aggressive behavior. The classic Berkowitz and LePage (1967) experiment is portrayed in social psychology textbooks as a prime example of how guns can trigger aggressive behavioral responses. Now lab experiments like the one Berkowitz and LePage conducted are often very artificial, and hence hard to believe are real. But what if I were to tell you that some researchers went out into the field and found the same effect? You'd be dazzled, right? Turner and his colleagues (1975) ran a series of field experiments that involved drivers blocking other drivers at intersections. They measured whether or not a horn was honked as their measure of aggression. After all, a horn honk is loud and annoying, and often in urban environments is used as the next best thing to screaming at other drivers. At least that is the thinking. Sometimes the driver (a member of the research team) drove a vehicle with a rifle on a gun rack. Other times the driver did not. The story goes that Turner and colleagues found that when the gun was present, the blocked drivers honked their horns. Case closed. The problem was, Turner and colleagues never actually found what we present in textbooks. Except for a possible subsample - males driving late model vehicles - for most drivers the sight of a firearm actually suppressed horn honking! That actually makes sense. If you are behind some jerk at a green light who has a firearm visibly displayed, honking at them is a great way to become a winner of a Darwin Award! What was actually happening then, is that with the possible exception of privileged males, drivers tended to make the correct assessment that there was a potential threat in their vicinity and that they should act cautiously. For the record, as far as I am aware, after the Turner and colleagues report was published, no one has been able to find support that the mere presence of a gun elicits horn honking. And yet the false social narrative continues to perpetuate. Who benefits? I honestly don't know. What I do know that I see the narrative of the weapons effect (or weapons priming effect) used by those who want to advocate for stricter gun laws - a position I tend to agree with, although the weapons effect as a body of research is probably very ineffective as a means of building an argument. Those who benefit from censoring mass media may benefit again from the power they accumulate. Heck, enough lobbying got rid of gun emojis on iPhones a few years ago, even though there is scant evidence that such emojis have any real impact on real world aggression, let alone violence.

Finally, I am reminded of something from my youth. When I was a teen, the PBS series Cosmos was aired. I had read Sagan's book The Cosmic Connection just prior, and of course was dazzled by the series, and eventually by the book. I still think it is worth reading, though with a caveat. Sagan tells the story of Hypatia, a scientist that probably any contemporary girl would want to look up to, and her demise. As the story goes, Hypatia was the victim of a gruesome assassination incited by a Bishop in Alexandria (now in modern day Egypt), and that the religious extremists of the time subsequently burned down the Library of Alexandria. Eventually I would do some further reading and realize that Hypatia's apparent assassination was much more complex of a story than the one Sagan told, and that the demise of the Library of Alexandria (which was truly a state-of-the-art research center for its time) was one that occurred over the course of centuries. Sagan's tale was one of how mindless fanaticism destroyed knowledge. It's a narrative that I am quite sympathetic toward. And yet, the tale is probably not quite accurate. The details surrounding Hypatia's murder are still debated by historians. The Library's demise is one that can be attributed to multiple causes, including government neglect, as well as the ravages of several wars.

Popular social narratives may play on confirmation bias - a phenomenon any of us is prone to experiencing - but the historical or empirical record may tell another story altogether. If the lessons from a story seem too good to be true, they probably are. A healthy dose of skepticism is advised, even if not particularly popular. In the behavioral and social sciences, we are supposed to be working toward finding approximations of the truth. We are not myth makers and story tellers. To the extent that we accept and perpetuate myths, we are doing little more than science fiction. If that is all we have to offer, we do not deserve the public's trust. I think we can do better, and often really do better.