You all know that I have done just a bit of data sleuthing here or there. I do so with no real fancy background in statistics. I have sufficient course work to teach stats courses at the undergraduate level, but I am no quantitative psychologist. So, I appreciate articles like How to Be a Statistical Detective. The author lays out some common problems and how any of us can use our already existing skills to detect those problems. I use some of these resources already, and am reasonably adept at using a calculator. I will likely add more links to these resources to this blog.
This article is behind a paywall, but I suspect my more enterprising readers already know how to obtain a copy. This article is fundamental reading.
The blog of Dr. Arlin James Benjamin, Jr., Social Psychologist
Friday, December 27, 2019
Saturday, December 21, 2019
Oblique Strategies
I found the card deck of Oblique Strategies developed by Brian Eno and Peter Schmidt (1975) to be quite useful. Although I have never owned an original deck (those were quite pricey back when I was a grad student), thanks to the early days of the world wide web, I could find early sites that would generate Oblique Strategies that I would use in my creative process while working on my dissertation. Now, well, into the 21st century, Daniel Lakens has developed this cool shiny app that will provide random Oblique Strategies for your own inspiration. I will make this link available elsewhere on my blog so that it is easily accessible.
What drew me to the dilemmas posed by the deck - or virtual deck - was that they required a certain amount of willingness to think "outside the box" or outside of one's normal professional parameters. From my vantage point, there is something healthy about that. Give it a try. See how your writing changes. See how you view everything from study design to data analytic strategies, to - yes - writing up a research report. At the end of the day we social scientists are still creators. We may be data driven creators, but creators nonetheless. Besides, we should have some fun with our work as truth seekers and truth tellers.
What drew me to the dilemmas posed by the deck - or virtual deck - was that they required a certain amount of willingness to think "outside the box" or outside of one's normal professional parameters. From my vantage point, there is something healthy about that. Give it a try. See how your writing changes. See how you view everything from study design to data analytic strategies, to - yes - writing up a research report. At the end of the day we social scientists are still creators. We may be data driven creators, but creators nonetheless. Besides, we should have some fun with our work as truth seekers and truth tellers.
Friday, December 6, 2019
Now about those Youth and Society retractions involving Qian Zhang
Hopefully you have had a moment to digest the recent article about the retraction of two of Qian Zhang's papers in Retraction Watch. I began tweeting about one of the articles in question around late September, 2018. You can follow this link to my tweet storm for the now retracted Zhang, Espelage, and Zhang (2018) paper. Under a pseudonym, I initially documented some concerns about both papers in PubPeer: Zhang, Espelage, and Zhang (2018) and Zhang, Espelage, and Rost (2018).
Really, what I did was to upload the papers into the online version of Statcheck, and flag any decision inconsistencies I noticed. I also tried to be mindful of any other oddities that seemed to stick out at that time. I might make note of df that seemed unusual given the reported sample size, for example, or problems with tables presuming to report means and standard deviations. By the time I would have looked at these two papers, I suspect that I was already concerned that papers from Zhang's lab showed a pervasive pattern of errors. Sadly, these two were no different.
With regard to Zhang, Espelage, and Zhang (2018), a Statcheck scan showed three decision errors. In this case, these were errors where the authors reported findings as statistically significant when they were not - given the test statistic value, the degrees of freedom, and the level of significance the authors tried to report.
The first decision inconsistency has to do with an assertion that playing violent video games increased accessibility of aggressive thoughts. The authors initially reported the effect as F(1, 51) = 2.87, p < .05. The actual p-value would have been .09634, according to Statcheck. In other words, there is no main effect for violent content of video games in this sample. Nor was a video game type by gender interaction found: F(1, 65) = 3.58, p < .01 actual p-value: p = 0.06293. Finally, there is no game type by age interaction: F(1, 64) = 3.64, p < .05 actual p-value: p = 0.06090. Stranger still, the sample was approximately 3000 students. Why were the denominator degrees of freedom so small for these reported test statistics? Something did not add up. Table 1 from Zhang, Espelage, and Zhang (2018) was also completely impossible to interpret - an issue I have highlighted in other papers that have been published from his lab:
A few months later, a correction would be published in which the authors would purport to correct a number of errors found on several of the pages of the original article, as well as Table 1. That was wonderful insofar as it went. However, there was a new oddity. The authors purported to only use 500 of the 3000 participants in order to have a "true experiment" - which was one of the more interesting uses of that term I have read over the course of my career. And as Joe Hilgard has aptly noticed, problems with the descriptive statistics continued to be pervasive - implausible and impossible cell means and marginal means and standard deviations, for example.
With regard to the Zhang, Espelage, and Rost (2018) article, my initial flag was simply for some decision errors in Study 1, in which the authors were attempting to establish that their stimulus materials were equivalent across a number of variables except for the level of violent content, and consistent across subsamples, such as sex of participant (male/female). Given the difficulty that exists in obtaining, say, film clips that are sufficiently equivalent except for level of violence, due diligence in using materials that are as equivalent as possible, except for the IV or IVs is to be admired. Unfortunately, There were several decision errors that I flagged after a Statcheck run.
As noted at the time, contra the authors' assertion, there was evidence that there were some rated differences between the violent film (Street Fighter) and the nonviolent film (Air Crisis) in terms of pleasantness - t(798) = 2.32, p > .05 actual p-value: p = 0.02059 - and fear - t(798) = 2.13, p > .05 actual p-value: p = 0.03348 To the extent that failure to control for those factors might have impacted subsequent analyses in Study 2 is of course debatable. It is clear that the authors cannot demonstrate, based on their reported analyses, that they had films that were equivalent on variables that they identified as important to hold constant with only violent content varying. The final decision inconsistency suggested that there was a sex difference in ratings of the variable fear, contrary to authors claim, t(798) = -2.14, p > .05 actual p-value: p = 0.03266. How much that impacted the experiment in study 2 was not something I thought I could assess, but I found it troubling and worth flagging. At minimum, the film clips were less equivalent than reported, and the subsamples were potentially reacting differently to these film clips than reported.
Although I did not comment on Study 2, Hilgard demonstrated that there was a consistency in the pattern of reported means in this paper were strikingly similar to the pattern of means reported in a couple of earlier papers in which Zhang was a lead or coauthor in 2013. That is troubling. If you have followed some of my coverage of Zhang's work on my blog, you are well aware that I have actually discovered at least one instance in which a reported test statistic was directly copied and pasted from one paper to another. Make of it what you will. Dr. Hilgard was able to eventually get a hold of the data that were to accompany a correction to that article, and as noted in the coverage in Retraction Watch, the data and analyses were fatally flawed.
I was noticing a pervasive pattern of errors in these papers, along with others I was reading by Zhang and colleagues at the time. These are the first papers on which Zhang is a lead or coauthor to be retracted. I am willing to bet that these will not be the last, given the evidence I have been sharing with you all here and on Twitter over the last year. I have already probably stated this repeatedly about these retractions - I am relieved. There is no joy to be had here. This has been a bad week for the authors involved. Also please note that I am taking great care here not to assign motive. I think that the evidence speaks for itself that the research was poorly conducted and poorly analyzed. That can happen for any of a number of reasons. I don't know any of the authors involved. I have some awareness of Dr. Espelage's work in bullying, but that is a bit outside my own specialty area. My impression of her work has always been favorable, and these retractions notwithstanding, I see no reason to change my impression of her work on bullying.
If I was sounding alarms in 2018 and onward, it is because Zhang had begun to increasingly enlist as collaborators well-regarded American and European researchers, and was beginning to publish in top-tier journals in various specialties within the Psychological Sciences, such as child and adolescent development and aggression. Given that I thought a reasonable case could be made that Zhang's reputation for well-conducted and analyzed research was far from ideal, I did not want to see otherwise reputable researchers put their careers on the line. My fears to a certain degree are now being realized.
Note that in the preparation of this post, I relied heavily on my tweets from Sept. 24, 2018 and a couple posts I published pseudonymously in PubPeer (see links above). And credit where it is due. I am glad I could get a conversation started about these papers (and others) by this lab. Joe Hilgard has clearly put a great deal of effort and talent into clearing the record since. Really we owe him a debt of gratitude. And also a debt of gratitude to those who have asked questions on Twitter, retweeted, and refused to let up on the pressure. Science is not self-correcting. It takes people who care to actively do the correcting.
Really, what I did was to upload the papers into the online version of Statcheck, and flag any decision inconsistencies I noticed. I also tried to be mindful of any other oddities that seemed to stick out at that time. I might make note of df that seemed unusual given the reported sample size, for example, or problems with tables presuming to report means and standard deviations. By the time I would have looked at these two papers, I suspect that I was already concerned that papers from Zhang's lab showed a pervasive pattern of errors. Sadly, these two were no different.
With regard to Zhang, Espelage, and Zhang (2018), a Statcheck scan showed three decision errors. In this case, these were errors where the authors reported findings as statistically significant when they were not - given the test statistic value, the degrees of freedom, and the level of significance the authors tried to report.
The first decision inconsistency has to do with an assertion that playing violent video games increased accessibility of aggressive thoughts. The authors initially reported the effect as F(1, 51) = 2.87, p < .05. The actual p-value would have been .09634, according to Statcheck. In other words, there is no main effect for violent content of video games in this sample. Nor was a video game type by gender interaction found: F(1, 65) = 3.58, p < .01 actual p-value: p = 0.06293. Finally, there is no game type by age interaction: F(1, 64) = 3.64, p < .05 actual p-value: p = 0.06090. Stranger still, the sample was approximately 3000 students. Why were the denominator degrees of freedom so small for these reported test statistics? Something did not add up. Table 1 from Zhang, Espelage, and Zhang (2018) was also completely impossible to interpret - an issue I have highlighted in other papers that have been published from his lab:
A few months later, a correction would be published in which the authors would purport to correct a number of errors found on several of the pages of the original article, as well as Table 1. That was wonderful insofar as it went. However, there was a new oddity. The authors purported to only use 500 of the 3000 participants in order to have a "true experiment" - which was one of the more interesting uses of that term I have read over the course of my career. And as Joe Hilgard has aptly noticed, problems with the descriptive statistics continued to be pervasive - implausible and impossible cell means and marginal means and standard deviations, for example.
With regard to the Zhang, Espelage, and Rost (2018) article, my initial flag was simply for some decision errors in Study 1, in which the authors were attempting to establish that their stimulus materials were equivalent across a number of variables except for the level of violent content, and consistent across subsamples, such as sex of participant (male/female). Given the difficulty that exists in obtaining, say, film clips that are sufficiently equivalent except for level of violence, due diligence in using materials that are as equivalent as possible, except for the IV or IVs is to be admired. Unfortunately, There were several decision errors that I flagged after a Statcheck run.
As noted at the time, contra the authors' assertion, there was evidence that there were some rated differences between the violent film (Street Fighter) and the nonviolent film (Air Crisis) in terms of pleasantness - t(798) = 2.32, p > .05 actual p-value: p = 0.02059 - and fear - t(798) = 2.13, p > .05 actual p-value: p = 0.03348 To the extent that failure to control for those factors might have impacted subsequent analyses in Study 2 is of course debatable. It is clear that the authors cannot demonstrate, based on their reported analyses, that they had films that were equivalent on variables that they identified as important to hold constant with only violent content varying. The final decision inconsistency suggested that there was a sex difference in ratings of the variable fear, contrary to authors claim, t(798) = -2.14, p > .05 actual p-value: p = 0.03266. How much that impacted the experiment in study 2 was not something I thought I could assess, but I found it troubling and worth flagging. At minimum, the film clips were less equivalent than reported, and the subsamples were potentially reacting differently to these film clips than reported.
Although I did not comment on Study 2, Hilgard demonstrated that there was a consistency in the pattern of reported means in this paper were strikingly similar to the pattern of means reported in a couple of earlier papers in which Zhang was a lead or coauthor in 2013. That is troubling. If you have followed some of my coverage of Zhang's work on my blog, you are well aware that I have actually discovered at least one instance in which a reported test statistic was directly copied and pasted from one paper to another. Make of it what you will. Dr. Hilgard was able to eventually get a hold of the data that were to accompany a correction to that article, and as noted in the coverage in Retraction Watch, the data and analyses were fatally flawed.
I was noticing a pervasive pattern of errors in these papers, along with others I was reading by Zhang and colleagues at the time. These are the first papers on which Zhang is a lead or coauthor to be retracted. I am willing to bet that these will not be the last, given the evidence I have been sharing with you all here and on Twitter over the last year. I have already probably stated this repeatedly about these retractions - I am relieved. There is no joy to be had here. This has been a bad week for the authors involved. Also please note that I am taking great care here not to assign motive. I think that the evidence speaks for itself that the research was poorly conducted and poorly analyzed. That can happen for any of a number of reasons. I don't know any of the authors involved. I have some awareness of Dr. Espelage's work in bullying, but that is a bit outside my own specialty area. My impression of her work has always been favorable, and these retractions notwithstanding, I see no reason to change my impression of her work on bullying.
If I was sounding alarms in 2018 and onward, it is because Zhang had begun to increasingly enlist as collaborators well-regarded American and European researchers, and was beginning to publish in top-tier journals in various specialties within the Psychological Sciences, such as child and adolescent development and aggression. Given that I thought a reasonable case could be made that Zhang's reputation for well-conducted and analyzed research was far from ideal, I did not want to see otherwise reputable researchers put their careers on the line. My fears to a certain degree are now being realized.
Note that in the preparation of this post, I relied heavily on my tweets from Sept. 24, 2018 and a couple posts I published pseudonymously in PubPeer (see links above). And credit where it is due. I am glad I could get a conversation started about these papers (and others) by this lab. Joe Hilgard has clearly put a great deal of effort and talent into clearing the record since. Really we owe him a debt of gratitude. And also a debt of gratitude to those who have asked questions on Twitter, retweeted, and refused to let up on the pressure. Science is not self-correcting. It takes people who care to actively do the correcting.
For those visiting from Retraction Watch:
Retraction Watch posted an article about two retractions of articles in which Qian Zhang of Southwest University in China was the lead author. Since some of you might be interested in what I've documented about other published articles from Zhang's lab, your best bet is to either type Zhang in the search field for this blog. Or just follow this link, where I have done the work for you. I'll have more to say about these specific articles in a little bit. I think I documented some of my concerns on Twitter last year and pseudonymously on PubPeer. In the meantime, I am relieved to see two very flawed articles removed from the published record. Joe Hilgard deserves a tremendous amount of credit for his work reanalyzing some data he was able to obtain from the lab (and his meticulous documentation of the flaws in these papers), and for his persistence in contacting the Editor in Chief of Youth and Society. I am also grateful for tools like Statcheck, which enabled me to very quickly spot some of the problems with these papers.
Saturday, November 30, 2019
Revisiting the weapons effect database: The allegiance effect redux
A little over a year ago, I blogged about some of the missed opportunities from the Benjamin et al. (2018) meta-analysis. One of those missed opportunities was to examine something known as an investigator allegiance effect (Luborsky et al., 2006). As I noted at the time, I gave credit where credit was due (Sanjay Srivastava, personal communication) for the idea. I simply found a pattern as I was going back over the old database and Dr. Srivastava quite aptly told me what I was probably seeing. It wasn't too difficult to run some basic meta-analytic results through CMA software and tentatively demonstrate that there appears to be something of an allegiance effect.
So, just to break it all down, let's recall when the weapons effect appears to occur. Based on the old Berkowitz and LePage (1967) experiment, the weapons effect appears to occur when individuals are exposed to a weapon and are highly provoked. Under those circumstances, the short-term exposure to weapons instigates an increase in aggressive behavior. Note that this is presumably what Carlson et al. (1990) found in their early meta-analysis. So far, so good. Now, let's see where things get interesting.
I have been occasionally updating the database. Recently have added some behavioral research, although it is focused on low provocation conditions. I am aware of some work recently conducted under conditions of high provocation, but have yet to procure those analyses. That said, I have been going over the computations, double and triple checking them once more, cross-validating them and so on. Not glamorous work, but necessary. I can provide basic analyses along with funnel plots. If there is what Luborsky et al. (2006) define as an allegiance effect, the overall mean effect size for work conducted by researchers associated with Berkowitz should be considerably different than work conducted by non-affiliated researchers. The fairest test I could think of was to concentrate on studies in which there was a specific measure of provocation, a specific behavioral measure of aggression, and - more to the point - to concentrate on high provocation subsamples, based on the rationale provided by Berkowitz and LePage (1967) and Carlson et al. (1990). I coded these studies based on whether the authors were in some way affiliated with Berkowitz (e.g., former grad students, post-docs, or coauthors) or were independent. That was fairly easy to do. Just took a minimal amount of detective work. I then ran the analyses.
Here is the mixed-effects analysis:
The funnel plot also looks pretty asymmetrical for those in the allegiance group (i.e. labelled yes). The funnel plot for those studies in the non-allegiance group appears more symmetrical. Studies in the allegiance group may be showing considerable publication bias, which should be of concern. Null studies, if they exist, are not included.
Above is the funnel plot for studies from the allegiance group.
Above is the funnel plot for studies from the non-allegiance group.
I can slice these analyses any of a number of ways. For example, I could simply examine subsamples that are intended to be direct replications of the Berkowitz and LePage (1967) paper. I can simply collapse across all subsamples, which is what I did here. Either way, the mean effect size will trend higher when the authors are interconnected. I can also document that publication bias is a serious concern when examining funnel plots of the papers in which the authors have some allegiance to Berkowitz than when not. That should be concerning.
I want to play with these data further as time permits. I am hoping to incite a peer to share some unpublished data with me so that I can update the database. My guess is that the findings will be even more damning. I say so relying only on the basic analyses that CMA provides along with the funnel plots.
For better or for worse I am arguably one of the primary weapons effect experts - to the extent that we define the weapons effect as the influence of short-term exposure of weapons on aggressive behavioral outcomes as measured in lab or field experiments. That expertise is documented in some published empirical work - notably Anderson et al. (1998) in which I was responsible for Experiment 2, and Bartholow et al. (2005) in which I was also primarily responsible for Experiment 2, as well as the meta-analysis on which I was the primary author (Benjamin et al., 2018). I do know this area of research very well, am quite capable of looking at the data available, and changing my mind if the analyses dictate - in other words, I am as close to objective as one can get when examining this particular topic. I am also a reluctant expert given that the data dictate the necessity of adopting a considerably more skeptical stance after years of believing the phenomenon to be unquestionably real. I do have an obligation to report the truth as it appears in the literature.
As it stands, not only should we be concerned that the aggressive behavioral outcomes reported in Berkowitz and LePage (1967) represent something of an urban myth, but that in general the mythology appears to be largely due to mostly published reports by a particular group of highly affiliated authors. There appears to be an allegiance effect in this literature. Whether a similar effect exists in the broader body of media violence studies remains to be seen, but I would not be surprised if such an allegiance effect existed.
So, just to break it all down, let's recall when the weapons effect appears to occur. Based on the old Berkowitz and LePage (1967) experiment, the weapons effect appears to occur when individuals are exposed to a weapon and are highly provoked. Under those circumstances, the short-term exposure to weapons instigates an increase in aggressive behavior. Note that this is presumably what Carlson et al. (1990) found in their early meta-analysis. So far, so good. Now, let's see where things get interesting.
I have been occasionally updating the database. Recently have added some behavioral research, although it is focused on low provocation conditions. I am aware of some work recently conducted under conditions of high provocation, but have yet to procure those analyses. That said, I have been going over the computations, double and triple checking them once more, cross-validating them and so on. Not glamorous work, but necessary. I can provide basic analyses along with funnel plots. If there is what Luborsky et al. (2006) define as an allegiance effect, the overall mean effect size for work conducted by researchers associated with Berkowitz should be considerably different than work conducted by non-affiliated researchers. The fairest test I could think of was to concentrate on studies in which there was a specific measure of provocation, a specific behavioral measure of aggression, and - more to the point - to concentrate on high provocation subsamples, based on the rationale provided by Berkowitz and LePage (1967) and Carlson et al. (1990). I coded these studies based on whether the authors were in some way affiliated with Berkowitz (e.g., former grad students, post-docs, or coauthors) or were independent. That was fairly easy to do. Just took a minimal amount of detective work. I then ran the analyses.
Here is the mixed-effects analysis:
Above is the funnel plot for studies from the allegiance group.
Above is the funnel plot for studies from the non-allegiance group.
I can slice these analyses any of a number of ways. For example, I could simply examine subsamples that are intended to be direct replications of the Berkowitz and LePage (1967) paper. I can simply collapse across all subsamples, which is what I did here. Either way, the mean effect size will trend higher when the authors are interconnected. I can also document that publication bias is a serious concern when examining funnel plots of the papers in which the authors have some allegiance to Berkowitz than when not. That should be concerning.
I want to play with these data further as time permits. I am hoping to incite a peer to share some unpublished data with me so that I can update the database. My guess is that the findings will be even more damning. I say so relying only on the basic analyses that CMA provides along with the funnel plots.
For better or for worse I am arguably one of the primary weapons effect experts - to the extent that we define the weapons effect as the influence of short-term exposure of weapons on aggressive behavioral outcomes as measured in lab or field experiments. That expertise is documented in some published empirical work - notably Anderson et al. (1998) in which I was responsible for Experiment 2, and Bartholow et al. (2005) in which I was also primarily responsible for Experiment 2, as well as the meta-analysis on which I was the primary author (Benjamin et al., 2018). I do know this area of research very well, am quite capable of looking at the data available, and changing my mind if the analyses dictate - in other words, I am as close to objective as one can get when examining this particular topic. I am also a reluctant expert given that the data dictate the necessity of adopting a considerably more skeptical stance after years of believing the phenomenon to be unquestionably real. I do have an obligation to report the truth as it appears in the literature.
As it stands, not only should we be concerned that the aggressive behavioral outcomes reported in Berkowitz and LePage (1967) represent something of an urban myth, but that in general the mythology appears to be largely due to mostly published reports by a particular group of highly affiliated authors. There appears to be an allegiance effect in this literature. Whether a similar effect exists in the broader body of media violence studies remains to be seen, but I would not be surprised if such an allegiance effect existed.
Friday, November 8, 2019
The New Academia
Rebecca Willen has an interesting post up on Medium about some alternatives to the tradition academic model of conducting research. I doubt traditional academia is going anywhere, but it is quite clear that independent institutions can fill in some gaps and provide some freedoms that might not be afforded elsewhere. In the meantime, this article offers an overview of the potential challenges independent researchers might face and how those challenges can be successfully handled. Worth a read.
Monday, November 4, 2019
Another resource for sleuths
This tweet by Elizabeth Bik is very useful:
The site she used to detect a publication that was self-plagiarized not only in terms of data and analyses but also in terms of text can be found here: Similarity Texter. I will be adding that site to this blog's links. I think as a peer reviewer it will help in detecting potential problem documents. Obviously I see the utility for post-peer review. Finally, any of us as authors who publish multiple articles and chapters on the same topic would do well to run our manuscripts through this particular website prior to submission to any publishing portal. Let's be real and accept that the major publishing houses are very lax when it comes to screening for potential duplicate publication, in spite of the enormous profits that they make from taxpayers across the planet. We should also be real about the quality of peer review. As someone who has been horrified to receive feedback on manuscript from a supposedly reputable journal in less than 48 hours, I think a good case can be made as an author for taking things into your own hands as much as possible. That along with statcheck can save some embarrassment as well as ensure that we as researchers and authors do due diligence to serve the public good.Such similarities are easily found by putting text between quotes into Google Scholar https://t.co/AqNTE1SCSd— Elisabeth Bik (@MicrobiomDigest) November 4, 2019
Then, analyze textual similarities in more detail in SimTexter, provided by @WeWuWiWo here: https://t.co/BYGAHNF3G1#TextForensics pic.twitter.com/FDBPxVCAfg
Friday, November 1, 2019
To summarize, for the moment, my series on the Zhang lab's strange media violence research
It never hurt to keep something of a cumulative record of one's activities when investigating any phenomenon, including secondary analyses.
In the case of the work produced in the lab of Qian Zhang, I have been trying to understand their work, and what appears to have gone wrong with their reporting, for some time. Unbeknownst to me at the time in 2014, I was already encountering one of the lab's papers when by the luck of the draw I was asked to review a manuscript that I would later find was coauthored by Zhang. As I have previously noticed, that paper had a lot of problems and I recommended as constructively as I could that the paper not be published. It was published anyway.
More explicitly, I found a weapons priming article published in Personality and Individual Differences at the start of 2016. It was an empirical study and one that fit the inclusion criteria for a meta-analysis that I was working on at the time. However, I ran into some really odd statistical reporting, leaving me unsure as to what I should use to estimate an effect size. So I sent what I thought was a very polite email to the corresponding author and heard nothing. After a lot of head-scratching, I figured out a way to extract effect size estimates that I felt semi-comfortable with. In essence the authors had no main effect for weapon primes on aggressive thoughts - and it showed in the effect size estimate and confidence intervals. That study really had a minimal impact on the overall mean effect size for weapon primes on aggressive cognitive outcomes in my meta-analysis. I ran analyses and later re-ran analyses and went on with my life.
I probably saw a tweet by Joe Hilgard who was reporting some oddities in another Zhang et al paper sometime in the spring of 2018. That got me wondering what all I was missing. I made a few notes, bookmarked what I needed to bookmark, and came back to that question a bit later in the summer of 2018 when I had a bit of time and breathing room. By this point I could comb through the usual archives, EBSCO databases, ResearchGate, and Google Scholar, and was able to hone in on a fairly small set of English-language empirical articles coauthored by Qian Zhang of Southwest University. I saved all the PDF files, and did something that I am unsure if anyone had done already: I ran the articles through statcheck. With one exception at the time, all the papers I ran through statcheck that had the necessary elements reported (test stat value, p-value, degrees of freedom) showed serious decision errors. In other words, the conclusions the authors were drawing in these articles were patently false based on what they had reported. I was also able to document that the reported degrees of freedom were inconsistent within articles, and often much smaller than the reported sample sizes. There were some very strange tables in many of these articles that presumably reported means and standard deviations but looked more like poorly constructed ANOVA summary tables.
I first began tweeting about what I was finding in mid-to-late September 2018. I think between some conversations via Twitter and email, I at least was convinced that I had spotted something odd, and that my conclusions so far as they went were accurate. Joe Hilgard was especially helpful in confirming what I had found, and then going well beyond that. Someone else honed in on inaccuracies in the reporting of the number of reaction time trials reported in this body of articles. So that went on throughout the fall of 2018. By this juncture, there were a few folks tweeting and retweeting about this lab's troubling body of work, some of these issues were documented by individuals in PubPeer, and editors were being contacted, with varying degrees of success.
By spring of this year, the first corrections were published - one in Youth and Society and a corrigendum in Personality and Individual Differences. To what extent those corrections can be trusted is still an open question. At that point, I began blogging my findings and concerns here, in addition to the occasion tweet.
This summer, a new batch of errata were made public concerning articles published in journals hosted by a publisher called Scientific Research. Needless to say, once I became aware of these errata, I downloaded those and examined them. That has consumed a lot of space on this blog since. As you are now well aware, these errata themselves require errata.
I think I have been clear about my motivation throughout. Something looked wrong. I used some tools now at my disposal to test my hunch and found that my hunch appeared to be correct. I then communicated with others who are stakeholders in aggression research, as we depend on the accuracy of the work of our fellow researchers in order to get to as close an approximation of the truth as is humanly possible. At the end of the day, that is the bottom line - to be able to trust that the results in front of me are a close approximation of the truth. If they are not, then something has to be done. If authors won't cooperate, maybe editors will. If editors don't cooperate, then there is always a bit of public agitation to try to shake things up. In a sense, maybe my role in this unfolding series of events is to have started a conversation by documenting what I could about some articles that appeared to be problematic. If the published record is made more accurate - however that must occur - I will be satisfied with the small part I was able to play in the process. Data sleuthing, and the follow-up work required in the process, is time-consuming and really cannot be done alone.
One other thing to note - I have only searched for English-language articles published by Qian Zhang's lab. I do not read or speak Mandarin, so I may well be missing out on a number of potentially problematic articles in Chinese-language psychological journals. If someone who does know of such articles wishes to contact me please do. I leave my DM open on Twitter for a reason. I would especially be curious to know if there are any duplicate publications of data that we are not detecting.
As noted before, how all this landed on my radar was really just the luck of the draw. A simple peer review roughly five years ago, and a weird weapons priming article that I read almost four years ago were what set these events in motion. Maybe I would have noticed something was off regardless. After all, this lab's work is in my particular wheelhouse. Maybe I would not have. Hard to say. All water under the bridge now. What is left is what I suspect will be a collective effort to get these articles properly corrected or retracted.
In the case of the work produced in the lab of Qian Zhang, I have been trying to understand their work, and what appears to have gone wrong with their reporting, for some time. Unbeknownst to me at the time in 2014, I was already encountering one of the lab's papers when by the luck of the draw I was asked to review a manuscript that I would later find was coauthored by Zhang. As I have previously noticed, that paper had a lot of problems and I recommended as constructively as I could that the paper not be published. It was published anyway.
More explicitly, I found a weapons priming article published in Personality and Individual Differences at the start of 2016. It was an empirical study and one that fit the inclusion criteria for a meta-analysis that I was working on at the time. However, I ran into some really odd statistical reporting, leaving me unsure as to what I should use to estimate an effect size. So I sent what I thought was a very polite email to the corresponding author and heard nothing. After a lot of head-scratching, I figured out a way to extract effect size estimates that I felt semi-comfortable with. In essence the authors had no main effect for weapon primes on aggressive thoughts - and it showed in the effect size estimate and confidence intervals. That study really had a minimal impact on the overall mean effect size for weapon primes on aggressive cognitive outcomes in my meta-analysis. I ran analyses and later re-ran analyses and went on with my life.
I probably saw a tweet by Joe Hilgard who was reporting some oddities in another Zhang et al paper sometime in the spring of 2018. That got me wondering what all I was missing. I made a few notes, bookmarked what I needed to bookmark, and came back to that question a bit later in the summer of 2018 when I had a bit of time and breathing room. By this point I could comb through the usual archives, EBSCO databases, ResearchGate, and Google Scholar, and was able to hone in on a fairly small set of English-language empirical articles coauthored by Qian Zhang of Southwest University. I saved all the PDF files, and did something that I am unsure if anyone had done already: I ran the articles through statcheck. With one exception at the time, all the papers I ran through statcheck that had the necessary elements reported (test stat value, p-value, degrees of freedom) showed serious decision errors. In other words, the conclusions the authors were drawing in these articles were patently false based on what they had reported. I was also able to document that the reported degrees of freedom were inconsistent within articles, and often much smaller than the reported sample sizes. There were some very strange tables in many of these articles that presumably reported means and standard deviations but looked more like poorly constructed ANOVA summary tables.
I first began tweeting about what I was finding in mid-to-late September 2018. I think between some conversations via Twitter and email, I at least was convinced that I had spotted something odd, and that my conclusions so far as they went were accurate. Joe Hilgard was especially helpful in confirming what I had found, and then going well beyond that. Someone else honed in on inaccuracies in the reporting of the number of reaction time trials reported in this body of articles. So that went on throughout the fall of 2018. By this juncture, there were a few folks tweeting and retweeting about this lab's troubling body of work, some of these issues were documented by individuals in PubPeer, and editors were being contacted, with varying degrees of success.
By spring of this year, the first corrections were published - one in Youth and Society and a corrigendum in Personality and Individual Differences. To what extent those corrections can be trusted is still an open question. At that point, I began blogging my findings and concerns here, in addition to the occasion tweet.
This summer, a new batch of errata were made public concerning articles published in journals hosted by a publisher called Scientific Research. Needless to say, once I became aware of these errata, I downloaded those and examined them. That has consumed a lot of space on this blog since. As you are now well aware, these errata themselves require errata.
I think I have been clear about my motivation throughout. Something looked wrong. I used some tools now at my disposal to test my hunch and found that my hunch appeared to be correct. I then communicated with others who are stakeholders in aggression research, as we depend on the accuracy of the work of our fellow researchers in order to get to as close an approximation of the truth as is humanly possible. At the end of the day, that is the bottom line - to be able to trust that the results in front of me are a close approximation of the truth. If they are not, then something has to be done. If authors won't cooperate, maybe editors will. If editors don't cooperate, then there is always a bit of public agitation to try to shake things up. In a sense, maybe my role in this unfolding series of events is to have started a conversation by documenting what I could about some articles that appeared to be problematic. If the published record is made more accurate - however that must occur - I will be satisfied with the small part I was able to play in the process. Data sleuthing, and the follow-up work required in the process, is time-consuming and really cannot be done alone.
One other thing to note - I have only searched for English-language articles published by Qian Zhang's lab. I do not read or speak Mandarin, so I may well be missing out on a number of potentially problematic articles in Chinese-language psychological journals. If someone who does know of such articles wishes to contact me please do. I leave my DM open on Twitter for a reason. I would especially be curious to know if there are any duplicate publications of data that we are not detecting.
As noted before, how all this landed on my radar was really just the luck of the draw. A simple peer review roughly five years ago, and a weird weapons priming article that I read almost four years ago were what set these events in motion. Maybe I would have noticed something was off regardless. After all, this lab's work is in my particular wheelhouse. Maybe I would not have. Hard to say. All water under the bridge now. What is left is what I suspect will be a collective effort to get these articles properly corrected or retracted.
Monday, October 28, 2019
Consistency counts for something, right? Zhang et al. (2019)
If you manage to stumble upon this Zhang et al. (2019) paper, published in Aggressive Behavior, you'll notice that this lab really loves to use a variation of the Stroop Task. Nothing wrong with that in and of itself. It is, after all, presumably one of several ways to attempt to measure the accessibility of aggressive cognition. One can get mean differences between reactions times (rt) for aggressive words and for nonaggressive words under different priming conditions and see if the stimuli with what we believe is violent content make aggressive thoughts more accessible - in this case with reactions times being higher for aggressive words than nonaggressive words (hence, higher positive difference scores). I don't really want to get you too much into the weeds, but I just think having that context is useful in this instance.
So far so good, yeah?
Not so fast. Usually the differences we find in rt between aggressive and nonaggressive words in these various tasks - including the Stroop Task - are very small. We're talking maybe single digit or small double digit differences in milliseconds. As has been the case with several other studies where Zhang and colleagues have had to publish errata, that's not quite what happens here. Joe Hilgard certainly noticed (see his note in PubPeer). Take a peek for yourself:
Hilgard notes another oddity as well as the general tendency for the primary author (Qian Zhang) to essentially stonewall requests for data. This is yet another paper I would be hesitant to cite without access to data, given that this lab already has an interesting publishing history, including some very error-prone errata for several papers published from this decade.
Note that I am only commenting very briefly on the cognitive outcomes. The authors also have data analyzed using a competitive reaction time task. Maybe I'll comment more about that at a later date.
As always, reader beware.
Reference:
Zhang, Q., Cao, Y., Gao, J., Yang, X., Rost, D. H., Cheng, G., Teng, Z., & Espelage, D. L. (2019). Effects of cartoon violence on aggressive thoughts and aggressive behaviors. Aggressive Behavior, 45, 489-497. doi: 10.1002/ab.21836
So far so good, yeah?
Not so fast. Usually the differences we find in rt between aggressive and nonaggressive words in these various tasks - including the Stroop Task - are very small. We're talking maybe single digit or small double digit differences in milliseconds. As has been the case with several other studies where Zhang and colleagues have had to publish errata, that's not quite what happens here. Joe Hilgard certainly noticed (see his note in PubPeer). Take a peek for yourself:
Hilgard notes another oddity as well as the general tendency for the primary author (Qian Zhang) to essentially stonewall requests for data. This is yet another paper I would be hesitant to cite without access to data, given that this lab already has an interesting publishing history, including some very error-prone errata for several papers published from this decade.
Note that I am only commenting very briefly on the cognitive outcomes. The authors also have data analyzed using a competitive reaction time task. Maybe I'll comment more about that at a later date.
As always, reader beware.
Reference:
Zhang, Q., Cao, Y., Gao, J., Yang, X., Rost, D. H., Cheng, G., Teng, Z., & Espelage, D. L. (2019). Effects of cartoon violence on aggressive thoughts and aggressive behaviors. Aggressive Behavior, 45, 489-497. doi: 10.1002/ab.21836
Sunday, October 27, 2019
Postscript to the preceding
I am under the impression that the body of errata and corrigenda from the Zhang lab were composed as hastily and without care as were the original articles themselves. I wonder how much scrutiny the editorial teams of these respective journals gave these corrections as they were submitted. I worry that little scrutiny was involved, and it is a shame that once more post-peer-review scrutiny is all that is available.
Erratum to Zhang, Zhang, & Wang (2013) has errors
This is a follow up to my commentary on the following paper:
Zhang, Q. , Zhang, D. & Wang, L. (2013). Is Aggressive Trait Responsible for Violence? Priming Effects of Aggressive Words and Violent Movies. Psychology, 4, 96-100. doi: 10.4236/psych.2013.42013
The erratum can be found here.
It is disheartening when an erratum ends up being more problematic than the original published article. One thing that struck me immediately is that the authors continue to insist that they ran a MANCOVA. As I stated previously:
In essence, my initial complaint remains unaddressed. One change, Table 4 is now Table 1, and it has different numbers in it. Great. I still have no idea (nor would any reasonably-minded reader), based on the description given, what the authors used as a covariate nor do I know what purported multiple DVs were used simultaneously. This is not an analysis I use very often in my own work, although I have certainly done so in the past. I do have an idea of how MANOVA and MANCOVA tables would be set up, and how those analyses would be described. I did a fair amount of that for my first year project at Mizzou a long time ago. The authors used as their DV a difference score (diff between RT aggressive words vs RT nonaggressive words), which would rule out the need for a MANOVA. And since no covariate is specified, a MANCOVA would be ruled out. I am going to make a wild guess that the partial summary table that comprises Table 1 will end up being nonsensical as have been similar tables generated in papers by this lab, including errata and corrigenda. I don't expect to be able to generate the necessary error MS, which I could then use to estimate the pooled SD.
I also want to note that the description of Table 2 as characterized by the authors and the numbers in Table 2 do not match up. I find that troubling. I am assuming that the authors mislabeled the columns, and intended for the low trait and high trait columns to be reversed. It is still sloppy.
At least when I ran this document through Statcheck, the findings, as reported, appeared clean - no inconsistencies and no decision inconsistencies. I wish that provided cold comfort. Since I don't know if I can trust any of what I have read in either the original document or the current erratum, I am not sure that I there is any comfort to be had.
What saddens me is that so much media violence research is based on WEIRD samples. That influences the generalizability of the findings. That also limits the scope of any skepticism I and my peers might have about media violence effects. We need good non-WEIRD research. So the fact that there is a lab that is generating a lot of research that is non-WEIRD, but is riddled with errors is a major disappointment.
At this juncture, the only cold comfort I would find is if the lot of the problematic studies from this lab were retracted. I do not say that lightly. I view retraction as a last resort, when there is no reasonable way for the record to be corrected without removing the paper itself. Doing so appears to be necessary for at least a few reasons. One, meta-analysts might try to use this research - either the original article or the erratum (or both if they are not paying attention) to generate effect size estimates. If we cannot trust the effect size estimates we generate, it's pretty much game over. Two, given that in a globalized market we all consume much of the same media (or at least the same genres), it makes sense to have evidence from not only WEIRD samples but also non-WEIRD samples. Some of us might try to understand just how violent media affect samples from non-WEIRD populations in order to understand if our understanding of these phenomena are universal. The findings generated from this paper and from this lab more broadly do not contribute to that understanding. If anything, the findings detract from our ability to get any closer to the truth. Three, the general public latches on to whatever seems real. If the findings are bogus - either due to gross incompetence or fraud - then the public is essentially being fleeced, which to me is simply unacceptable. The Chinese taxpayers deserved better. So do all of us who are global citizens.
Zhang, Q. , Zhang, D. & Wang, L. (2013). Is Aggressive Trait Responsible for Violence? Priming Effects of Aggressive Words and Violent Movies. Psychology, 4, 96-100. doi: 10.4236/psych.2013.42013
The erratum can be found here.
It is disheartening when an erratum ends up being more problematic than the original published article. One thing that struck me immediately is that the authors continue to insist that they ran a MANCOVA. As I stated previously:
It is unclear just how a MANCOVA would be appropriate as the only DV that the authors consider for the remaining analyses is a difference score. MANOVA and MANCOVA are appropriate analytic techniques for situations in which multiple DVs are analyzed simultaneously. The authors fail to list a covariate. Maybe it is gender? Hard to say. Without an adequate explanation, we as readers are left to guess. Even if a MANCOVA were appropriate, Table 4 is a case study in how not to set up a MANCOVA table. Authors should be explicit about what they are doing as possible. I can read Method and Results sections just fine, thank you. I cannot, however, read minds.
In essence, my initial complaint remains unaddressed. One change, Table 4 is now Table 1, and it has different numbers in it. Great. I still have no idea (nor would any reasonably-minded reader), based on the description given, what the authors used as a covariate nor do I know what purported multiple DVs were used simultaneously. This is not an analysis I use very often in my own work, although I have certainly done so in the past. I do have an idea of how MANOVA and MANCOVA tables would be set up, and how those analyses would be described. I did a fair amount of that for my first year project at Mizzou a long time ago. The authors used as their DV a difference score (diff between RT aggressive words vs RT nonaggressive words), which would rule out the need for a MANOVA. And since no covariate is specified, a MANCOVA would be ruled out. I am going to make a wild guess that the partial summary table that comprises Table 1 will end up being nonsensical as have been similar tables generated in papers by this lab, including errata and corrigenda. I don't expect to be able to generate the necessary error MS, which I could then use to estimate the pooled SD.
I also want to note that the description of Table 2 as characterized by the authors and the numbers in Table 2 do not match up. I find that troubling. I am assuming that the authors mislabeled the columns, and intended for the low trait and high trait columns to be reversed. It is still sloppy.
At least when I ran this document through Statcheck, the findings, as reported, appeared clean - no inconsistencies and no decision inconsistencies. I wish that provided cold comfort. Since I don't know if I can trust any of what I have read in either the original document or the current erratum, I am not sure that I there is any comfort to be had.
What saddens me is that so much media violence research is based on WEIRD samples. That influences the generalizability of the findings. That also limits the scope of any skepticism I and my peers might have about media violence effects. We need good non-WEIRD research. So the fact that there is a lab that is generating a lot of research that is non-WEIRD, but is riddled with errors is a major disappointment.
At this juncture, the only cold comfort I would find is if the lot of the problematic studies from this lab were retracted. I do not say that lightly. I view retraction as a last resort, when there is no reasonable way for the record to be corrected without removing the paper itself. Doing so appears to be necessary for at least a few reasons. One, meta-analysts might try to use this research - either the original article or the erratum (or both if they are not paying attention) to generate effect size estimates. If we cannot trust the effect size estimates we generate, it's pretty much game over. Two, given that in a globalized market we all consume much of the same media (or at least the same genres), it makes sense to have evidence from not only WEIRD samples but also non-WEIRD samples. Some of us might try to understand just how violent media affect samples from non-WEIRD populations in order to understand if our understanding of these phenomena are universal. The findings generated from this paper and from this lab more broadly do not contribute to that understanding. If anything, the findings detract from our ability to get any closer to the truth. Three, the general public latches on to whatever seems real. If the findings are bogus - either due to gross incompetence or fraud - then the public is essentially being fleeced, which to me is simply unacceptable. The Chinese taxpayers deserved better. So do all of us who are global citizens.
Monday, October 14, 2019
"It's beginning to, and back again": Zheng and Zhang (2016) part 2
A few months ago, I blogged about an article by Zheng and Zhang (2016) that appeared in Social Behavior and Personality. I thought it would be useful to briefly return to this particular article as it was (unbeknownst to me at the time) my first exposure to that lab's work, and because I think it might be helpful if you all are seeing what I am seeing when I read the article.
I don't think I need to re-litigate my reasoning for recommending a rejection when I was a peer reviewer on that particular article, nor my disappointment that the article still was published anyway. Water under the bridge. I think what I want to do is to share some screen shots of the analyses in question as well as to note a few other odds and ends that always bugged me about that particular paper.
I am keeping my focus to Study 2, as that seems to be the portion of the paper that is most problematic. Keep in mind that there were 240 children who participated in the experiment. One of the burning questions is why the degrees of freedom in the denominator for so many of the analyses were so low. As the authors provided no descriptive statistics (including n's) it is often difficult to know exactly what is happening, but I might have a guess. If you follow the Zhang lab's progression since near the start of this decade, sample sizes have increased in their published work. I have a sneaking hunch that the authors copied and pasted text from prior articles and did not necessarily adequately update the degrees of freedom reported. The df for simple effects analyses may actually be correct, but there is no real way of knowing given the lack of descriptive statistics reported.
One problem is that there seemed to be something of a shifting dependent variable (DV). In the first analysis where the authors attempted to establish a main effect, the authors only used the mean reaction times (rt) for aggressive words as the DV. In subsequent analyses, the authors used a mean difference in reaction times (rt neutral minus rt aggressive) as the DV. That created some confusion already.
So let's start with the main analysis, as I do have a screen shot I used in a tweet a while back:
So you see the problem I am seeing already. The analysis itself is nonsensical. There is no way to say that violent video games primed aggressive thoughts in children who played the ostensibly violent game as there was no basis for comparison (i.e, rt for non-aggressive words). There is a reason why I and my colleagues in the Mizzou Aggression Lab a couple decades ago computed a difference score between rts for aggressive words and rts for non-aggressive words and used it as our DV when we ran pronunciation tasks, lexical decision tasks, etc. If the authors were not willing to do that much, then a complex between/within ANOVA in which the interaction term would have been the main focus would have been appropriate. Okay. Enough of that. Make note of the degrees of freedom (df) in the denominator. With 240 participants, there is no way one is going to end up with 68 df in the denominator for a main effects analysis.
Let's look at the rest of the results. The Game Type by Gender interaction analyses were, um, a bit unusual.
First let's let it soak in that the authors claim to be running a four-way ANOVA, but there appear to be only three independent variables: game type, gender, and trait aggressiveness. Where is the fourth variable hiding? Something is already amiss. Now note that first analysis goes back to main effects. Here the difference between rts for aggressive and nonaggressive words is used as the DV, unlike the prior analysis that only examined rts for aggressive words as the DV. Bad news, though: if we believe the df as reported, a Statcheck analysis shows that the reported F could not be significant. Bummer. Statcheck also found that although the reported F for the game type by gender interaction - F(1,62) = 4.89 - was significant, it was at the p = .031. Note again, that is assuming the df can be taken at face value. The authors do report the mean difference scores for boys playing violent games and nonviolent games, but do not do so for girls in either condition. I found the lack of descriptive statistical data to be vexing, to say the least.
How about the analyses examining a potential interaction of game type and trait aggressiveness? That doesn't exactly look great:
Although Statcheck reports no obvious decision errors for the primary interaction effect or the simple effects, the df reported are, for lack of a better way to phrase this, all over the place. The lack of descriptive statistics makes it difficult to diagnose exactly what is going on. Then the authors go on to report a 3-way interaction as non-significant, when a Statcheck analysis indicates that it would be. If there were a significant 3-way interaction, that would require some considerable effort to carefully characterize the interaction, and to carefully graphically portray the interaction.
It also helps to go back and look at the Method section and see how the authors determined how many trials each participant would experience in the experiment:
As I stated previously:
Finally, I just want to say something about the way the authors described the personality measure they used. The authors appeared to be interested in obtaining an overall assessment of aggressiveness. The Buss & Perry AQ is arguably defensible for such an endeavor. The authors have a tendency to repeat the original reliability coefficients reported by Buss and Perry (1992), but given that the authors only examined overall trait aggressiveness, and given that they presumably had to translate this instrument into Chinese, the authors would have been better served by reporting the reliability coefficient(s) that they specifically obtained, rather than doing little more than copying and pasting the same basic statement they make in other papers published by the Zhang lab. It really takes getting to the General Discussion section before the authors even obliquely mention that this instrument was translated, as well as to more specifically recommend an adaptation of the BPAQ specifically for Chinese-speaking and reading individuals.
This was a paper that had so many question marks that it should never have been published in the first place. That it did somehow slip through the peer review system is indeed unfortunate. If the authors are unable or unwilling to make the necessary corrections, it is up to the editorial team at the journal to do so. I hope that they will in due time. I know that I have asked.
If not retracted, any corrections would have to report the necessary descriptive statistics upon which the analyses for Study 2 were based, as well as provide the correct inferential statistics: accurate F-tests, df, and p-values. Yes, that means tables would be necessary. That is not a bad thing. The actual Coefficient Alphas used in the specific study for their version of the BPAQ should be reported, instead of simply repeating what Buss and Perry reported for the original English language version of the instrument in the previous century. The editorial team should insist on examining the original data themselves so that they can confirm that any corrections made are indeed correct, or so that they can determine that the data set is so hopelessly botched that the findings reported cannot be trusted, hence necessitating a retraction.
How all this landed on my radar is really just the luck of the draw. I was asked to review a paper in 2014, and I had the time and interest in doing so. The topic of the paper was in my wheelhouse, so I agreed to do so. I recommended a rejection, which in hindsight was sound. I moved on with my life. A couple years later I would read a weapons priming effect paper that was really odd and with reported analyses that were difficult to trust. I didn't make the connection until an ex-coauthor of mine appeared on a paper that appeared to have originated from this lab. At that point I scoured the databases until I could locate every English-language paper published by this lab, and discovered that this specific paper - which I recommended rejecting - had been published as well. In the process, I was able to notice that there was a distinct similarity among all the papers - how they were formatted, the types of analyses, and the types of data analytic errors. I realized pretty quickly that "holy forking shirtballs, this is awful." I honestly don't know if what I have read in this series of papers amounts to gross incompetence or fraud. I do know that it does not belong in the published record.
To be continued (unfortunately)....
Reference:
Zheng, J., & Zhang, Q. (2016). Priming effect of computer game violence on children’s aggression levels. Social Behavior and Personality: An International Journal, 44(10), 1747–1759. doi:10.2224/sbp.2016.44.10.1747
Footnote: The lyric comes from the chorus in "German Shepherds" by Wire. Toward the the end of this post near the end of the last paragraph, I make a reference to some common expressions used in the TV series, The Good Place.
I don't think I need to re-litigate my reasoning for recommending a rejection when I was a peer reviewer on that particular article, nor my disappointment that the article still was published anyway. Water under the bridge. I think what I want to do is to share some screen shots of the analyses in question as well as to note a few other odds and ends that always bugged me about that particular paper.
I am keeping my focus to Study 2, as that seems to be the portion of the paper that is most problematic. Keep in mind that there were 240 children who participated in the experiment. One of the burning questions is why the degrees of freedom in the denominator for so many of the analyses were so low. As the authors provided no descriptive statistics (including n's) it is often difficult to know exactly what is happening, but I might have a guess. If you follow the Zhang lab's progression since near the start of this decade, sample sizes have increased in their published work. I have a sneaking hunch that the authors copied and pasted text from prior articles and did not necessarily adequately update the degrees of freedom reported. The df for simple effects analyses may actually be correct, but there is no real way of knowing given the lack of descriptive statistics reported.
One problem is that there seemed to be something of a shifting dependent variable (DV). In the first analysis where the authors attempted to establish a main effect, the authors only used the mean reaction times (rt) for aggressive words as the DV. In subsequent analyses, the authors used a mean difference in reaction times (rt neutral minus rt aggressive) as the DV. That created some confusion already.
So let's start with the main analysis, as I do have a screen shot I used in a tweet a while back:
So you see the problem I am seeing already. The analysis itself is nonsensical. There is no way to say that violent video games primed aggressive thoughts in children who played the ostensibly violent game as there was no basis for comparison (i.e, rt for non-aggressive words). There is a reason why I and my colleagues in the Mizzou Aggression Lab a couple decades ago computed a difference score between rts for aggressive words and rts for non-aggressive words and used it as our DV when we ran pronunciation tasks, lexical decision tasks, etc. If the authors were not willing to do that much, then a complex between/within ANOVA in which the interaction term would have been the main focus would have been appropriate. Okay. Enough of that. Make note of the degrees of freedom (df) in the denominator. With 240 participants, there is no way one is going to end up with 68 df in the denominator for a main effects analysis.
Let's look at the rest of the results. The Game Type by Gender interaction analyses were, um, a bit unusual.
First let's let it soak in that the authors claim to be running a four-way ANOVA, but there appear to be only three independent variables: game type, gender, and trait aggressiveness. Where is the fourth variable hiding? Something is already amiss. Now note that first analysis goes back to main effects. Here the difference between rts for aggressive and nonaggressive words is used as the DV, unlike the prior analysis that only examined rts for aggressive words as the DV. Bad news, though: if we believe the df as reported, a Statcheck analysis shows that the reported F could not be significant. Bummer. Statcheck also found that although the reported F for the game type by gender interaction - F(1,62) = 4.89 - was significant, it was at the p = .031. Note again, that is assuming the df can be taken at face value. The authors do report the mean difference scores for boys playing violent games and nonviolent games, but do not do so for girls in either condition. I found the lack of descriptive statistical data to be vexing, to say the least.
How about the analyses examining a potential interaction of game type and trait aggressiveness? That doesn't exactly look great:
Although Statcheck reports no obvious decision errors for the primary interaction effect or the simple effects, the df reported are, for lack of a better way to phrase this, all over the place. The lack of descriptive statistics makes it difficult to diagnose exactly what is going on. Then the authors go on to report a 3-way interaction as non-significant, when a Statcheck analysis indicates that it would be. If there were a significant 3-way interaction, that would require some considerable effort to carefully characterize the interaction, and to carefully graphically portray the interaction.
It also helps to go back and look at the Method section and see how the authors determined how many trials each participant would experience in the experiment:
As I stated previously:
The authors selected 60 goal words for their reaction time task: 30 aggressive and 30 non-aggressive. These goal words are presented individually in four blocks of trials. The authors claim that their participants completed 120 trials total, when the actual total would appear to be 240 trials. I had fewer trials for adult participants in an experiment I ran over a couple decades ago and that was a nearly hour-long ordeal for my participants. I can only imagine the heroic level of attention and perseverance required of these children to complete this particular experiment. I do have to wonder if the authors tested for potential fatigue or practice effects that might have been detectable across blocks of trials. Doing so was standard operating procedure in our lab in the Aggression Lab at Mizzou back in the 1990s. Reporting those findings would have also been done - at least in a footnote when submitted for publication.
Finally, I just want to say something about the way the authors described the personality measure they used. The authors appeared to be interested in obtaining an overall assessment of aggressiveness. The Buss & Perry AQ is arguably defensible for such an endeavor. The authors have a tendency to repeat the original reliability coefficients reported by Buss and Perry (1992), but given that the authors only examined overall trait aggressiveness, and given that they presumably had to translate this instrument into Chinese, the authors would have been better served by reporting the reliability coefficient(s) that they specifically obtained, rather than doing little more than copying and pasting the same basic statement they make in other papers published by the Zhang lab. It really takes getting to the General Discussion section before the authors even obliquely mention that this instrument was translated, as well as to more specifically recommend an adaptation of the BPAQ specifically for Chinese-speaking and reading individuals.
This was a paper that had so many question marks that it should never have been published in the first place. That it did somehow slip through the peer review system is indeed unfortunate. If the authors are unable or unwilling to make the necessary corrections, it is up to the editorial team at the journal to do so. I hope that they will in due time. I know that I have asked.
If not retracted, any corrections would have to report the necessary descriptive statistics upon which the analyses for Study 2 were based, as well as provide the correct inferential statistics: accurate F-tests, df, and p-values. Yes, that means tables would be necessary. That is not a bad thing. The actual Coefficient Alphas used in the specific study for their version of the BPAQ should be reported, instead of simply repeating what Buss and Perry reported for the original English language version of the instrument in the previous century. The editorial team should insist on examining the original data themselves so that they can confirm that any corrections made are indeed correct, or so that they can determine that the data set is so hopelessly botched that the findings reported cannot be trusted, hence necessitating a retraction.
How all this landed on my radar is really just the luck of the draw. I was asked to review a paper in 2014, and I had the time and interest in doing so. The topic of the paper was in my wheelhouse, so I agreed to do so. I recommended a rejection, which in hindsight was sound. I moved on with my life. A couple years later I would read a weapons priming effect paper that was really odd and with reported analyses that were difficult to trust. I didn't make the connection until an ex-coauthor of mine appeared on a paper that appeared to have originated from this lab. At that point I scoured the databases until I could locate every English-language paper published by this lab, and discovered that this specific paper - which I recommended rejecting - had been published as well. In the process, I was able to notice that there was a distinct similarity among all the papers - how they were formatted, the types of analyses, and the types of data analytic errors. I realized pretty quickly that "holy forking shirtballs, this is awful." I honestly don't know if what I have read in this series of papers amounts to gross incompetence or fraud. I do know that it does not belong in the published record.
To be continued (unfortunately)....
Reference:
Zheng, J., & Zhang, Q. (2016). Priming effect of computer game violence on children’s aggression levels. Social Behavior and Personality: An International Journal, 44(10), 1747–1759. doi:10.2224/sbp.2016.44.10.1747
Footnote: The lyric comes from the chorus in "German Shepherds" by Wire. Toward the the end of this post near the end of the last paragraph, I make a reference to some common expressions used in the TV series, The Good Place.
Sunday, October 13, 2019
Back to that latest batch of errata: Tian et al (2016)
So a few weeks ago I noted that there had been four recent corrections to papers published out of the Zhang lab. It's time to turn to a paper with a fun history to it:
Tian, J. , Zhang, Q. , Cao, J. and Rodkin, P. (2016). The Short-Term Effect of Online Violent Stimuli on Aggression. Open Journal of Medical Psychology, 5, 35-42. doi: 10.4236/ojmp.2016.52005
What really caught initially was that there was a 3-way interaction reported as nonsignificant in this particular article that was identical to the analysis of a similar 3-way interaction in another article published by Zhang, Tian, Cao, Zhang, and Rodkin (2016). Same numbers, same failure to report degrees of freedom, and the same decision error in each paper. Quite the coincidence, as I noticed before. Eventually, Zhang et al. (2016) did manage to change the numbers on several analyses on the paper published in Personality and Individual Differences. See the corrigendum for yourself. Even the 3-way interaction got "corrected" so that it no longer appeared significant. We even get - gasp - degrees of freedom! Not so with Tian et al. (2016) in OJMP. I guess that is the hill these authors will choose to die on? Alrighty then.
So if you want to really see what gets changed from the original Tian et al. (2016) paper, read here. Compared to the original, it appears that the decision errors go away - except of course for that pesky three-way ANOVA, which I guess the authors simply chose not to address. Gone to is any reference to computing MANCOVAs, which is what I would minimally expect, given that there was no evidence that such analyses were ever done - no mention of a covariate, nor any mention of multiple dependent variables to be analyzed simultaneously. This is at least a bit better. The table of means at least on the surface seems to add up. The new Table 1 is a bit funky. I've noticed with another one of the papers that the Mean Square error based on the Mean Square information for the main effect and interaction effects that the authors were interested in would not give an estimate of Mean Square error that would support the SDs supplied in the descriptive stats. That appears to be the case with this correction as well, to the extent that one can make an educated guess about Mean Square error based on an incomplete summary table. Even with those disadvantages in papers by other authors in the past, I have generally managed to get a reasonably close estimate of MSE, and hence with some simple computations estimate the pooled standard deviation. That I am unable to do so satisfactorily here is troubling. When I ran into this difficulty with another one of the corrections from this lab, I consulted with a colleague who quickly made it clear that the likely correct pooled MSE would not support the descriptive statistics as reported. So at least I am reasonably certain here that I am not making a mistake.
I also find it odd that the authors now discuss that viewing violent stimuli had no change on aggressive personality - as if the Buss and Perry Aggressiveness Questionnaire, which measures a stable trait would ever be changed by short term exposure to a stimulus like a brief clip of a violent film. What the authors might have been trying to state is that there was no interaction of scores on the Buss and Perry AQ and movie violence. That is only a guess in my part.
These corrections, as they are billed, strike me as very rushed, and potentially as mistake-ridden as the original articles. This is the second correction out of this new batch that I have had time to review and it is as problematic as the first. Reader beware.
Tian, J. , Zhang, Q. , Cao, J. and Rodkin, P. (2016). The Short-Term Effect of Online Violent Stimuli on Aggression. Open Journal of Medical Psychology, 5, 35-42. doi: 10.4236/ojmp.2016.52005
What really caught initially was that there was a 3-way interaction reported as nonsignificant in this particular article that was identical to the analysis of a similar 3-way interaction in another article published by Zhang, Tian, Cao, Zhang, and Rodkin (2016). Same numbers, same failure to report degrees of freedom, and the same decision error in each paper. Quite the coincidence, as I noticed before. Eventually, Zhang et al. (2016) did manage to change the numbers on several analyses on the paper published in Personality and Individual Differences. See the corrigendum for yourself. Even the 3-way interaction got "corrected" so that it no longer appeared significant. We even get - gasp - degrees of freedom! Not so with Tian et al. (2016) in OJMP. I guess that is the hill these authors will choose to die on? Alrighty then.
So if you want to really see what gets changed from the original Tian et al. (2016) paper, read here. Compared to the original, it appears that the decision errors go away - except of course for that pesky three-way ANOVA, which I guess the authors simply chose not to address. Gone to is any reference to computing MANCOVAs, which is what I would minimally expect, given that there was no evidence that such analyses were ever done - no mention of a covariate, nor any mention of multiple dependent variables to be analyzed simultaneously. This is at least a bit better. The table of means at least on the surface seems to add up. The new Table 1 is a bit funky. I've noticed with another one of the papers that the Mean Square error based on the Mean Square information for the main effect and interaction effects that the authors were interested in would not give an estimate of Mean Square error that would support the SDs supplied in the descriptive stats. That appears to be the case with this correction as well, to the extent that one can make an educated guess about Mean Square error based on an incomplete summary table. Even with those disadvantages in papers by other authors in the past, I have generally managed to get a reasonably close estimate of MSE, and hence with some simple computations estimate the pooled standard deviation. That I am unable to do so satisfactorily here is troubling. When I ran into this difficulty with another one of the corrections from this lab, I consulted with a colleague who quickly made it clear that the likely correct pooled MSE would not support the descriptive statistics as reported. So at least I am reasonably certain here that I am not making a mistake.
I also find it odd that the authors now discuss that viewing violent stimuli had no change on aggressive personality - as if the Buss and Perry Aggressiveness Questionnaire, which measures a stable trait would ever be changed by short term exposure to a stimulus like a brief clip of a violent film. What the authors might have been trying to state is that there was no interaction of scores on the Buss and Perry AQ and movie violence. That is only a guess in my part.
These corrections, as they are billed, strike me as very rushed, and potentially as mistake-ridden as the original articles. This is the second correction out of this new batch that I have had time to review and it is as problematic as the first. Reader beware.
Tuesday, October 1, 2019
Interlude: When Social Sciences Become Social Movements
I found this very positive take on the recent SIPS conference in Rotterdam earlier today. Am so glad to have seen this. As someone who either grew up in the shadow of various social movements (including the No Nukes movement that was big when I was in my mid-teens) or was a participant (in particular Anti-Apartheid actions, as well as an ally of the feminist movement of the time and the then-struggling LGBT community that was largely at the time referred to as "Gay Rights"), I feel right at home in a social movement environment. They are after all calls to action, and at their best require of their participants a willingness to roll up our sleeves and do the hard work of raising awareness, changing policies and so on. All of that was present when I attended in July and what I experienced left me cautiously optimistic about the state of the psychological sciences. Questionable research practices still happen, and powerful players in our field still have too much pull in determining what gets said and what gets left unsaid. What's changed is Twitter, podcasts, etc. Anyone can listen to some state-of-the-art conversation with mostly early career researchers and educators who are at the cutting edge, and who are not afraid to blow whistles as necessary. Anyone can interact with these same individuals on Twitter. And although eliminating hierarchies is at best a pipe dream, the playing field among the Open Science community in my corner of the scientific universe is very level. I'll paraphrase something from someone I would prefer not to name: Some people merely talk about the weather. The point is to get up and do something about it. We've gone well beyond talk. We have the beginnings of some action thanks to what started as just a few understandably irate voices in the wake of the Bem paper and the Stapel scandal in 2011. We have a long way to go. And yes, if you do go to a SIPS conference, expect to meet some of the friendliest people you could hope to meet - amazing how those who are highly critical of bad and fraudulent science turn out to be genuinely decent in person. Well, not so amazing to me.
Sunday, September 15, 2019
The more things change...
A few months ago, I wrote a very brief critique on the following paper:
Tian, J. & Zhang, Q. (2014). Are Boys More Aggressive than Girls after Playing Violent Computer Games Online? An Insight into an Emotional Stroop Task. Psychology, 5, 27-31. doi: 10.4236/psych.2014.51006.
At the time, I offered the following image of a table as it told a very damning story:
I noted at the time that it was odd for a number of reasons, not the least because of the discrepancy in one of the independent variables. The paper manipulates the level of violent content of video games. And yet the interaction term in Table 3 is listed as Movie type. That struck me as odd. The best explanation I have for that strange typo is that the lab involved was studying movie violence and video game violence, and there is a strong likelihood that the authors simply copied and pasted information from a table in another paper without doing any sufficient editing. Of course there were other problems as well. The F value for the main variable of Game Type could not be statistically significant. You don't even need to rely on statcheck.io to sort that one out. The table does not report the finding for a main effect of gender (or probably more appropriately, sex). The analysis is supposed to be a MANCOVA, which would imply a covariate (none of which appears reported) as well as multiple dependent variables (none of which are reported beyond the difference score in RTs for aggressive and non-aggressive words).
There were plenty of oddities I did not discuss at the time. There are the usual problems with how the authors report the Stroop task that they use. Also, there is a bit of a problem with the way the authors define their sample. Note that the title indicates that this research used a sample of children. However the sample seems more like adolesecents and young adults (ages ranged from 12 through 21 and the average age was 16).
So that was over five years ago. So, what changed? Turns out, not much. Here is the erratum that was published in July, 2019. The authors are still acting like they are dealing with a youth sample, when as noted earlier, this is a sample of adolescents and adults, at least according to the method section as reported, including any changes made. Somehow the standard deviation for participants' age changes, if not the mean. Odd. What they were calling Table 3 is now Table 1. It is at least appropriately referred to as an ANOVA. The gender main effect is still missing. The F tests change a bit, although it is now made more clear that this is a paper in which the conclusions will be based on a sub-sample analysis. I am not sure if there is enough information for me to adequately determine if the mean-square error term would yield a sensible pooled standard deviation that would make sense given the means and standard deviations reported in what is now Table 2. The conclusions the authors draw are a good deal different than what they would have drawn initially. From my standpoint, any erratum or corrigendum should correct whatever mistakes were discovered. This "erratum" (actually a corrigendum) does not. Errors that were in place in the original paper persist in the alleged corrections. I have not yet tried a SPRITE test to determine if the means and standard deviations that are now being reported are ones that would be plausible. I am hoping that someone reading this will do that, as I don't exactly have tons of spare time.
Here are some screen shots of the primary changes according to the erratum:
What is now called Table 2 is a bit off as well. I know what difference scores in reaction time tasks normally look like. Ironically, the original manuscript comes across as more believable, which is really saying something.
So did the correction really correct anything. In some senses, clearly not at all. In other senses, I honestly do not know, although I have already shared some doubts. I would not be surprised if eventually this and other papers from this lab are eventually retracted. We would be better served if we could actually view the data and the research protocols that the authors should have on file. That would give us all more confidence than is currently warranted.
In the meantime, I could make some jokes about all of this, but really this is no laughing matter for anyone attempting to understand how violent media influence cognition in non-WEIRD samples, and for meta-analysts who want to extract accurate effect sizes.
Tian, J. & Zhang, Q. (2014). Are Boys More Aggressive than Girls after Playing Violent Computer Games Online? An Insight into an Emotional Stroop Task. Psychology, 5, 27-31. doi: 10.4236/psych.2014.51006.
At the time, I offered the following image of a table as it told a very damning story:
I noted at the time that it was odd for a number of reasons, not the least because of the discrepancy in one of the independent variables. The paper manipulates the level of violent content of video games. And yet the interaction term in Table 3 is listed as Movie type. That struck me as odd. The best explanation I have for that strange typo is that the lab involved was studying movie violence and video game violence, and there is a strong likelihood that the authors simply copied and pasted information from a table in another paper without doing any sufficient editing. Of course there were other problems as well. The F value for the main variable of Game Type could not be statistically significant. You don't even need to rely on statcheck.io to sort that one out. The table does not report the finding for a main effect of gender (or probably more appropriately, sex). The analysis is supposed to be a MANCOVA, which would imply a covariate (none of which appears reported) as well as multiple dependent variables (none of which are reported beyond the difference score in RTs for aggressive and non-aggressive words).
There were plenty of oddities I did not discuss at the time. There are the usual problems with how the authors report the Stroop task that they use. Also, there is a bit of a problem with the way the authors define their sample. Note that the title indicates that this research used a sample of children. However the sample seems more like adolesecents and young adults (ages ranged from 12 through 21 and the average age was 16).
So that was over five years ago. So, what changed? Turns out, not much. Here is the erratum that was published in July, 2019. The authors are still acting like they are dealing with a youth sample, when as noted earlier, this is a sample of adolescents and adults, at least according to the method section as reported, including any changes made. Somehow the standard deviation for participants' age changes, if not the mean. Odd. What they were calling Table 3 is now Table 1. It is at least appropriately referred to as an ANOVA. The gender main effect is still missing. The F tests change a bit, although it is now made more clear that this is a paper in which the conclusions will be based on a sub-sample analysis. I am not sure if there is enough information for me to adequately determine if the mean-square error term would yield a sensible pooled standard deviation that would make sense given the means and standard deviations reported in what is now Table 2. The conclusions the authors draw are a good deal different than what they would have drawn initially. From my standpoint, any erratum or corrigendum should correct whatever mistakes were discovered. This "erratum" (actually a corrigendum) does not. Errors that were in place in the original paper persist in the alleged corrections. I have not yet tried a SPRITE test to determine if the means and standard deviations that are now being reported are ones that would be plausible. I am hoping that someone reading this will do that, as I don't exactly have tons of spare time.
Here are some screen shots of the primary changes according to the erratum:
What is now called Table 2 is a bit off as well. I know what difference scores in reaction time tasks normally look like. Ironically, the original manuscript comes across as more believable, which is really saying something.
So did the correction really correct anything. In some senses, clearly not at all. In other senses, I honestly do not know, although I have already shared some doubts. I would not be surprised if eventually this and other papers from this lab are eventually retracted. We would be better served if we could actually view the data and the research protocols that the authors should have on file. That would give us all more confidence than is currently warranted.
In the meantime, I could make some jokes about all of this, but really this is no laughing matter for anyone attempting to understand how violent media influence cognition in non-WEIRD samples, and for meta-analysts who want to extract accurate effect sizes.
Saturday, September 14, 2019
Prelude to the latest errata
Now that there have been some relatively new developments regarding research from Qian Zhang's lab, I think the best thing to do is to give you all some context before I proceed. So let's look at some of the blog posts I have composed about the articles that are now being presumably corrected:
A. Let's start out with the most recent and work our way backwards. First, let's travel back to the year 2016. You can easily find this paper, which is noteworthy for being submitted roughly a year or so after its fourth author had passed away.
Tian, J. , Zhang, Q. , Cao, J. and Rodkin, P. (2016). The Short-Term Effect of Online Violent Stimuli on Aggression. Open Journal of Medical Psychology, 5, 35-42. doi: 10.4236/ojmp.2016.52005
See these blog posts:
"And bad mistakes/I've made a few"*: another media violence experiment gone wrong
Maybe replication is not always a good thing
It Doesn't Add Up: Postscript on Tian, Zhang, Cao, & Rodkin (2016)
A tale of two Stroop tasks
B. Now let's revisit the year 2014. There is one article of note here. I had one post on this article at the time, and had wished I had devoted a bit more time to it. Note that in many of these earlier articles, Zhang goes by Zhang Qian, and for whatever reason, the journal of record recommends citing Qian as the family name. Make of that what you will. Following is the reference.
Tian, J. & Zhang, Q. (2014). Are Boys More Aggressive than Girls after Playing Violent Computer Games Online? An Insight into an Emotional Stroop Task. Psychology, 5, 27-31. doi: 10.4236/psych.2014.51006.
See this blog post:
Funny, but sad
The year 2013 brings us two papers to consider. I only devoted a single blog post to the first article referenced. The second article got referenced twice as I noticed the same oddity when it came to the way the authors were describing the Stroop task and analyzing data based on that task.
C. First we will start here with a basic film violence study.
Zhang, Q. , Zhang, D. & Wang, L. (2013). Is Aggressive Trait Responsible for Violence? Priming Effects of Aggressive Words and Violent Movies. Psychology, 4, 96-100. doi: 10.4236/psych.2013.42013
See this blog post:
About those Stroop task findings (and other assorted oddities)
D. And here is the article in which the authors use the Stroop task in a most remarkably odd way.
Zhang, Q. , Xiong, D. and Tian, J. (2013) Impact of media violence on aggressive attitude for adolescents. Health, 5, 2156-2161. doi: 10.4236/health.2013.512294
See this blog post:
Some more oddness (Zhang, Xiong, & Tian, 2013)
I could probably add some other work for context, as there are some pervasive patterns that show up across studies over the course of this decade. As the authors have begun to rely upon larger data sets, there are some other troubling practices, such as using only a small fraction of a sample to analyze data (something I vehemently oppose as a practice). Whether the articles are published in low-impact journals or high-impact journals is of no importance in one sense: poorly conducted research is poorly conducted research, and if it needs to be corrected, it is up to the authors to do so in as transparent and forthright a manner as possible. That said, as this lab is getting work published in higher impact journals, the potential for incorrectly analyzed data and hence misleading findings to poison the proverbial well increases. That should trouble us all.
I want to end with something I said a few months ago, as it is important to understand where I am coming from as I once more proceed:
A. Let's start out with the most recent and work our way backwards. First, let's travel back to the year 2016. You can easily find this paper, which is noteworthy for being submitted roughly a year or so after its fourth author had passed away.
Tian, J. , Zhang, Q. , Cao, J. and Rodkin, P. (2016). The Short-Term Effect of Online Violent Stimuli on Aggression. Open Journal of Medical Psychology, 5, 35-42. doi: 10.4236/ojmp.2016.52005
See these blog posts:
"And bad mistakes/I've made a few"*: another media violence experiment gone wrong
Maybe replication is not always a good thing
It Doesn't Add Up: Postscript on Tian, Zhang, Cao, & Rodkin (2016)
A tale of two Stroop tasks
B. Now let's revisit the year 2014. There is one article of note here. I had one post on this article at the time, and had wished I had devoted a bit more time to it. Note that in many of these earlier articles, Zhang goes by Zhang Qian, and for whatever reason, the journal of record recommends citing Qian as the family name. Make of that what you will. Following is the reference.
Tian, J. & Zhang, Q. (2014). Are Boys More Aggressive than Girls after Playing Violent Computer Games Online? An Insight into an Emotional Stroop Task. Psychology, 5, 27-31. doi: 10.4236/psych.2014.51006.
See this blog post:
Funny, but sad
The year 2013 brings us two papers to consider. I only devoted a single blog post to the first article referenced. The second article got referenced twice as I noticed the same oddity when it came to the way the authors were describing the Stroop task and analyzing data based on that task.
C. First we will start here with a basic film violence study.
Zhang, Q. , Zhang, D. & Wang, L. (2013). Is Aggressive Trait Responsible for Violence? Priming Effects of Aggressive Words and Violent Movies. Psychology, 4, 96-100. doi: 10.4236/psych.2013.42013
See this blog post:
About those Stroop task findings (and other assorted oddities)
D. And here is the article in which the authors use the Stroop task in a most remarkably odd way.
Zhang, Q. , Xiong, D. and Tian, J. (2013) Impact of media violence on aggressive attitude for adolescents. Health, 5, 2156-2161. doi: 10.4236/health.2013.512294
See this blog post:
Some more oddness (Zhang, Xiong, & Tian, 2013)
I could probably add some other work for context, as there are some pervasive patterns that show up across studies over the course of this decade. As the authors have begun to rely upon larger data sets, there are some other troubling practices, such as using only a small fraction of a sample to analyze data (something I vehemently oppose as a practice). Whether the articles are published in low-impact journals or high-impact journals is of no importance in one sense: poorly conducted research is poorly conducted research, and if it needs to be corrected, it is up to the authors to do so in as transparent and forthright a manner as possible. That said, as this lab is getting work published in higher impact journals, the potential for incorrectly analyzed data and hence misleading findings to poison the proverbial well increases. That should trouble us all.
I want to end with something I said a few months ago, as it is important to understand where I am coming from as I once more proceed:
Although I don't have evidence that the Zhang lab was involved in any academic misconduct, and I have no intention of making accusations to that effect, I do think that some of the data reporting itself is at best indicative of incompetent reporting. All I can do is speculate, as I am unaware of anyone who has managed to actually look at this lab's data. What I can note is that there is a published record, and that there are a number of errors that appear across those published articles. Given the number of questions I think any reasonable reader of media violence research might have, Zhang and various members of his lab owe it to us to answer those questions and to provide us with the necessary data and protocols to accurately judge what went sideways.The reason I emphasize this point is because this is really not personal. This is a matter of making sure that those of us at minimum who do study media violence research have accurate evidence at our disposal.
Wednesday, September 11, 2019
Coming soon
I am starting to get my head around some relatively new Zhang lab errata. I have questions. Stay tuned.
Sunday, August 25, 2019
Will this time be different?
I had the pleasure of seeing and hearing Sanjay speak these words live in early July at the closing of the SIPS conference in Rotterdam. I really hope those words are heeded. Simply tightening up some methods without addressing the social inequality that afflicts our science (as is the case with so many sciences) is insufficient. If the only people who benefit are those who just happen to keep paying membership dues, we've failed. Open science is intersectional and is a social movement. Anything short of that will be a failure. I hope that I do not find myself in a decade asking the same question that a good friend of mine once asked over three decades ago in his zine, Pressure: "So, where's the change?"
Thursday, August 1, 2019
Some initial impressions about SIPS 2019
I think perhaps the best way to start is with a Twitter thread I posted right as we were about to end:
My experience started with the preconference put together by the repliCATS project. Their travel grant to those willing to participate in the preconference is what made going to SIPS possible for me. During the 5th and 6th of July, I spent the entire work day at the conference site with a team of several other psychologists in various phases of their careers (most were postdocs and grad students). Each team was tasked with the responsibility of assessing the probability of replication for 25 claims. We had a certain amount of time to read each claim, look up the relevant article, look up any other supplementary materials relevant to the task, and then to make our predictions. We then discussed our initial assessments and recalibrated. I found the process engaging and enlightening. What was cool was how each of us brought some unique expertise to the table. Four of the claims we assessed were meta-analytic, and since I was the one person in the room who had conducted published meta-analyses, I became the de facto expert on that methodology. Believe me when I say that I still don't feel like an expert. But okay. So for those claims, my peers quizzed me a good deal and I got to share a few things that might be useful about raw effect sizes, assessment of publication bias, etc. We did the same with others. There were some claims that all of us found vexing. Goes with the territory. I think what I got from the experience was how much we in the psychological sciences really need each other if we are going to move the field forward. I also realized just how talented the current generation of early career researchers truly is. As deflating as some of the research we evaluated was, I could not help but feel a certain level of cautious optimism about the future of the psychological sciences.
The conference itself started mid-day on the 7th of July and lasted through the 9th of July. It was filled with all sorts of sessions - hackathons, workshops, and unconferences. I tended to gravitate to the latter two classes of sessions during the conference this time around. The conference format itself is rather loose. One could start a session, decide "no this is not for me" and leave without worrying about hurt feelings. One could walk in to a session in the middle and be welcomed. It was great to go to a professional meeting without seeing one person wearing business attire. At least that level of pretense was dispensed with, for which I am grateful. I gravitated toward sessions on the last day that had a specific focus on inclusiveness. That is a topic that has been on my mind going back to my student activist days in the 1980s. There is a legitimate concern that by the time all is said and done, we'll manage to fix some methodological problems that are genuinely troubling for the discipline without addressing the problem that there are a lot of talented people who could offer so much who are shut out due to their ethnicity, national origin (especially if from the Global South), sexual orientation, and gender identity. If all we get as the same power structure with somewhat better methods, we will have failed, in my personal and professional opinion. I think the people leading at least a couple of those sessions seemed to get that. I hope that those who are running SIPS get it too. Maybe an inclusiveness hackathon is in order? I guess I am volunteering myself to lead that one just by blogging about it!
This was also an interesting conference in that I am both mid-career and primarily an educator. So, I was definitely part of a small subset of attendees. Personally, I felt pretty engaged. I can see how one in similar circumstances might end up feeling legitimately left out. There was some effort to have sessions devoted to mid-career researchers. We may want that expanded to mid-career educators as well. We too may want to be active participants in creating a better science of psychology, but our primary means of doing so is going to be in the classroom and not via published reports. Those of us who may one day become part of administration (dept chairs, deans, etc.) or who already occupy those position should have some forum to discuss how we can better educate the next generation of Psychology majors at the undergraduate level so that they are both better consumers of research and better prepared for the changes occurring that will impact them as they enter graduate programs (for that subset of majors who will do so).
It was also great to finally meet up in person with a number of people with whom I have interacted via Twitter DM, email, and sometimes via phone. That experience was beautiful. There is something about actually getting to interact in person that is truly irreplaceable.
Given the attendance at the conference, I can see how much of a logistical challenge the organizers faced. There were moments where last minute room changes did not quite get communicated. The dinner was one where participants were mostly underfed (that is probably more on the restaurant than the organizers, and I can chalk that up to "stuff happens" and leave it at that). Maintaining that sense of intimacy with a much larger than anticipated group was a challenge. But I never felt isolated or alone. There was always a session of interest. There was always someone to talk to. The sense of organized chaos is one that should be maintained.
I did find time to wander around the city of Rotterdam, in spite of my relative lack of down time for this conference. The fact that it was barely past the Summer Solstice meant that I got some good daylight quality photos well into the evening. I got to know the "cool district" of Rotterdam quite well, and definitely went off the beaten path in the process. I'll share some of those observations at another time.
Overall, this was a great experience. I will likely participate in the future, as I am finding ways of using what I learned in the classroom. As long as participants can come away from the experience thinking and knowing that there were more good sessions than they could possibly attend, it will continue to succeed. If the organizers are serious about inclusiveness, they will have something truly revolutionary as part of their legacy. Overall, the sum total of this set of experiences left me cautiously optimistic in a way that I have not been in a very long time. There is hope yet for the psychological sciences, and I got to meet some of the people who are providing the reason for that hope. Perhaps I will meet others who did not attend this year at future conferences. I'll hold out hope for that as well.
The conference was different from any conference I have ever experienced. For those wanting to get a feel for what SIPS is about, a good place to begin might be to check out the page for this year's conference. Rotterdam was a good location in part because the city slogan is make it happen. SIPS is an organization devoted to actively changing the way the science of psychology is done, and is formatted in such a way that those participating become active. This is a conference for people who really want to roll up their sleeves and get involved.1. With #SIPS2019 about to come to a close Wednesday, I have a few observations - brief for now. I love the format and have felt more actively engaged in a conference than I have in a very long time.— James Benjamin (@AJBenjaminJr) July 8, 2019
My experience started with the preconference put together by the repliCATS project. Their travel grant to those willing to participate in the preconference is what made going to SIPS possible for me. During the 5th and 6th of July, I spent the entire work day at the conference site with a team of several other psychologists in various phases of their careers (most were postdocs and grad students). Each team was tasked with the responsibility of assessing the probability of replication for 25 claims. We had a certain amount of time to read each claim, look up the relevant article, look up any other supplementary materials relevant to the task, and then to make our predictions. We then discussed our initial assessments and recalibrated. I found the process engaging and enlightening. What was cool was how each of us brought some unique expertise to the table. Four of the claims we assessed were meta-analytic, and since I was the one person in the room who had conducted published meta-analyses, I became the de facto expert on that methodology. Believe me when I say that I still don't feel like an expert. But okay. So for those claims, my peers quizzed me a good deal and I got to share a few things that might be useful about raw effect sizes, assessment of publication bias, etc. We did the same with others. There were some claims that all of us found vexing. Goes with the territory. I think what I got from the experience was how much we in the psychological sciences really need each other if we are going to move the field forward. I also realized just how talented the current generation of early career researchers truly is. As deflating as some of the research we evaluated was, I could not help but feel a certain level of cautious optimism about the future of the psychological sciences.
The conference itself started mid-day on the 7th of July and lasted through the 9th of July. It was filled with all sorts of sessions - hackathons, workshops, and unconferences. I tended to gravitate to the latter two classes of sessions during the conference this time around. The conference format itself is rather loose. One could start a session, decide "no this is not for me" and leave without worrying about hurt feelings. One could walk in to a session in the middle and be welcomed. It was great to go to a professional meeting without seeing one person wearing business attire. At least that level of pretense was dispensed with, for which I am grateful. I gravitated toward sessions on the last day that had a specific focus on inclusiveness. That is a topic that has been on my mind going back to my student activist days in the 1980s. There is a legitimate concern that by the time all is said and done, we'll manage to fix some methodological problems that are genuinely troubling for the discipline without addressing the problem that there are a lot of talented people who could offer so much who are shut out due to their ethnicity, national origin (especially if from the Global South), sexual orientation, and gender identity. If all we get as the same power structure with somewhat better methods, we will have failed, in my personal and professional opinion. I think the people leading at least a couple of those sessions seemed to get that. I hope that those who are running SIPS get it too. Maybe an inclusiveness hackathon is in order? I guess I am volunteering myself to lead that one just by blogging about it!
This was also an interesting conference in that I am both mid-career and primarily an educator. So, I was definitely part of a small subset of attendees. Personally, I felt pretty engaged. I can see how one in similar circumstances might end up feeling legitimately left out. There was some effort to have sessions devoted to mid-career researchers. We may want that expanded to mid-career educators as well. We too may want to be active participants in creating a better science of psychology, but our primary means of doing so is going to be in the classroom and not via published reports. Those of us who may one day become part of administration (dept chairs, deans, etc.) or who already occupy those position should have some forum to discuss how we can better educate the next generation of Psychology majors at the undergraduate level so that they are both better consumers of research and better prepared for the changes occurring that will impact them as they enter graduate programs (for that subset of majors who will do so).
It was also great to finally meet up in person with a number of people with whom I have interacted via Twitter DM, email, and sometimes via phone. That experience was beautiful. There is something about actually getting to interact in person that is truly irreplaceable.
Given the attendance at the conference, I can see how much of a logistical challenge the organizers faced. There were moments where last minute room changes did not quite get communicated. The dinner was one where participants were mostly underfed (that is probably more on the restaurant than the organizers, and I can chalk that up to "stuff happens" and leave it at that). Maintaining that sense of intimacy with a much larger than anticipated group was a challenge. But I never felt isolated or alone. There was always a session of interest. There was always someone to talk to. The sense of organized chaos is one that should be maintained.
I did find time to wander around the city of Rotterdam, in spite of my relative lack of down time for this conference. The fact that it was barely past the Summer Solstice meant that I got some good daylight quality photos well into the evening. I got to know the "cool district" of Rotterdam quite well, and definitely went off the beaten path in the process. I'll share some of those observations at another time.
Overall, this was a great experience. I will likely participate in the future, as I am finding ways of using what I learned in the classroom. As long as participants can come away from the experience thinking and knowing that there were more good sessions than they could possibly attend, it will continue to succeed. If the organizers are serious about inclusiveness, they will have something truly revolutionary as part of their legacy. Overall, the sum total of this set of experiences left me cautiously optimistic in a way that I have not been in a very long time. There is hope yet for the psychological sciences, and I got to meet some of the people who are providing the reason for that hope. Perhaps I will meet others who did not attend this year at future conferences. I'll hold out hope for that as well.
Wednesday, July 31, 2019
Loose ends
I got back in from Rotterdam late on the 10th of July. It was a great trip and I will talk about it and the aftermath another time.
I've also been clearing my file drawer a bit. Earlier this year saw a narrative review article on the weapons effect published in a small-circulation peer-review journal (National Social Science Journal). I and a former student just learned that a write up of a weapons priming data set got accepted for publication. We are just now going over the proofs. It will be a fun paper to discuss down the road as I am now in this awkward space where I had that one remaining study using a word completion task as a measure prior to realizing that the measure itself had some genuine validity problems (Randy McCarthy and I blogged about some of that earlier this year). My student and I used this as an opportunity to build in a paragraph urging readers to exercise caution in light of some genuine concerns about the lack of validation of not only the AWCT but also other word completion tasks. I am finally writing up an encyclopedic chapter on the weapons effect that is due at the end of August. After that, I think I will finally be done with the weapons effect as a focus of my attention for a good long while. At least that is my hope. I do have the sinking feeling that I could get roped into something to do with that area of inquiry sooner rather than later, but am keeping my fingers crossed that my time has come and gone at last.
Mostly I just want to focus my upcoming academic year on validation research and re-structuring my methods courses so that they incorporate more hands-on activities regarding replicability and open science practices. I got some ideas from SIPS and the repliCATS-sponsored pre-SIPS workshop in Rotterdam. I think it will turn out to be fun.
Onward.
I've also been clearing my file drawer a bit. Earlier this year saw a narrative review article on the weapons effect published in a small-circulation peer-review journal (National Social Science Journal). I and a former student just learned that a write up of a weapons priming data set got accepted for publication. We are just now going over the proofs. It will be a fun paper to discuss down the road as I am now in this awkward space where I had that one remaining study using a word completion task as a measure prior to realizing that the measure itself had some genuine validity problems (Randy McCarthy and I blogged about some of that earlier this year). My student and I used this as an opportunity to build in a paragraph urging readers to exercise caution in light of some genuine concerns about the lack of validation of not only the AWCT but also other word completion tasks. I am finally writing up an encyclopedic chapter on the weapons effect that is due at the end of August. After that, I think I will finally be done with the weapons effect as a focus of my attention for a good long while. At least that is my hope. I do have the sinking feeling that I could get roped into something to do with that area of inquiry sooner rather than later, but am keeping my fingers crossed that my time has come and gone at last.
Mostly I just want to focus my upcoming academic year on validation research and re-structuring my methods courses so that they incorporate more hands-on activities regarding replicability and open science practices. I got some ideas from SIPS and the repliCATS-sponsored pre-SIPS workshop in Rotterdam. I think it will turn out to be fun.
Onward.
Sunday, June 30, 2019
Pride Month May Be Coming To An End
Pride Month may be coming to an end today but the social concerns that brought us Pride Month continue year 'round. So, whatever our specific orientations and gender identities, let's remember that our sciences often fall short as safe spaces for everyone to discuss our research. The US is not an especially inclusivity-friendly nation right now - think of what the occupant in the White House and his allies espouse and put into policy (including undoing progress that took years of difficult work to accomplish). I am well aware that transgender scientists still get discriminated against and are publicly bullied by peers (very obviously so on social media) who really should know better.
In the meantime, enjoy this wonderful post. I love the logo (see below - nicked from the same post). And I love the notion of a more inclusive science and society.
In the meantime, enjoy this wonderful post. I love the logo (see below - nicked from the same post). And I love the notion of a more inclusive science and society.
Saturday, June 29, 2019
Radio Silence
The summer continues, and I am balancing online courses with professional and personal travel. I am also trying to clear my desk of data sets collected during the time I was working on the meta-analysis that got published last year. Those data sets are simple replication studies and I am mainly aiming for journals that are open source and access that are receptive to such work. Nothing prestigious, but that is not the goal. The point is simply to keep data out of the file drawer.
I am thinking through how to get back to the basics: mainly focusing on measurements relevant to my area of expertise. Although such work is often viewed as far from glamorous, it is vital. I think I have found some cool folks to work with. Part of my professional travel includes meeting up with some of these people. I am looking forward to it. Much of what I am doing is learning or relearning.
I hope to blog about my experience at SIPS. That is coming up soon. Probably won't post much of anything until later in July. Between what I was observing from the sidelines as the replication crisis unfolded and some professional experiences I wish could have been avoided, I became convinced that I needed to follow the lead of those who are on the front lines to change the psychological sciences for the better. I'll check out some hackathons and unconferences and soak it in. Then I'll figure out how to incorporate what I learn in my methods courses.
I have not had much to say about Qian Zhang's (Southwest University in China) work lately, but that has more to do with 1) a concern that I would be repeating myself at this point and 2) there are folks better positioned to force the issue regarding some serious problems with that lab's published work. It strikes me as prudent to let those who are better able to do so have that space to make it happen. If it makes sense for me to say something further, just know I will. I am sure that those of you who check this blog out know that I was genuinely appalled at what had slipped through peer review. That has not changed and as far as I am concerned this isn't over by a long shot. Let's see how the process plays out. I'll simply state for the moment that it is a shame, as quality research from outside the US and EU is essential to better understanding the nuances of whatever influence media violence cues might have on aggressive cognitive and behavioral outcomes. What I shared with you all about that set of findings only added to misconceptions about media violence, given that the reporting of the methodology and findings themselves was so poorly done.
For now, it's radio silence. I will return in a few weeks.
I am thinking through how to get back to the basics: mainly focusing on measurements relevant to my area of expertise. Although such work is often viewed as far from glamorous, it is vital. I think I have found some cool folks to work with. Part of my professional travel includes meeting up with some of these people. I am looking forward to it. Much of what I am doing is learning or relearning.
I hope to blog about my experience at SIPS. That is coming up soon. Probably won't post much of anything until later in July. Between what I was observing from the sidelines as the replication crisis unfolded and some professional experiences I wish could have been avoided, I became convinced that I needed to follow the lead of those who are on the front lines to change the psychological sciences for the better. I'll check out some hackathons and unconferences and soak it in. Then I'll figure out how to incorporate what I learn in my methods courses.
I have not had much to say about Qian Zhang's (Southwest University in China) work lately, but that has more to do with 1) a concern that I would be repeating myself at this point and 2) there are folks better positioned to force the issue regarding some serious problems with that lab's published work. It strikes me as prudent to let those who are better able to do so have that space to make it happen. If it makes sense for me to say something further, just know I will. I am sure that those of you who check this blog out know that I was genuinely appalled at what had slipped through peer review. That has not changed and as far as I am concerned this isn't over by a long shot. Let's see how the process plays out. I'll simply state for the moment that it is a shame, as quality research from outside the US and EU is essential to better understanding the nuances of whatever influence media violence cues might have on aggressive cognitive and behavioral outcomes. What I shared with you all about that set of findings only added to misconceptions about media violence, given that the reporting of the methodology and findings themselves was so poorly done.
For now, it's radio silence. I will return in a few weeks.
Wednesday, June 12, 2019
Back to the usual summer routine
Summer started about a month ago for me. A few things typically happen the moment I turn in my final grades. First, I try to decompress a bit. That might mean reading some books, binging some series on Netflix, Hulu, or Amazon Prime, gardening, or day hikes and photography. Some combination of the above is inevitably involved.
Then, I get busy with prepping summer classes, prepping for my usual trip to the AP Psychology Reading site, some family trips (one major due to distance, and a series of day trips for family closer to me), and planning projects and papers for the upcoming year. The latter is becoming less of a priority for me. I have some data sets I need to write up and submit that got neglected during the time when I was heavily involved in a meta-analysis. One was submitted late in April and I don't expect to hear anything for a while. The other two will need to be dealt with sooner or later. Both are replication reports, and those are often not a priority for most journals. I have those outlined. It is just a matter of blocking out time for writing. Beyond that, I have a chapter due at the end of August. It is on a topic with which I have regrettably way too much familiarity (the weapons effect). I can write in a neutral tone, and given the evidence I would need to cite, neutrality is pretty easy for me. I am involved in some collaborative projects that are still largely in their infancy. My hope there is that my collaborators take the lead, as I really have no ambition to take on more than my resources allow.
I have good collaborators, and I expect to learn a great deal from them in the coming year or two. I am heading to SIPS in Rotterdam in early July. Normally I don't do conferences right at the start of a fiscal year, and especially conferences overseas. I have to front way too much money with little hope for reimbursement until several months have passed. An outside travel grant from DARPA made this trip just barely feasible for me financially, and the opportunity to meet with and work with people I know primarily through Twitter direct messages and the occasional phone call or Skype is just too good for me to pass up. I am hoping to take away from the experience some ideas for how to better retool some of my program's methodology courses - ideas that I can float to my departmental colleagues and hopefully implement a bit this coming fall and more completely over the next academic year or two.
A while back I made mention that one of the big lies that we get told in graduate school is that we are supposed to want to work at prestigious institutions, publish in A-list journals, win large grant awards, and essentially be famous. Like a lot of people in my position, I bought into that lie, and I fear that I compromised myself in the process. The last couple years were a bit of a wake-up call for me. If one is really intent on an academic career, there are plenty of ways of doing so and find the experience meaningful and rewarding. The R-1 and R-2 institutions are not necessarily ideal destinations. My full-time gigs have been at institutions that barely emphasize research and that cater to students who may simply want a 2-year degree, rather than a 4-year degree. I do some adjuncting at a community college. I find something meaningful in giving away something interesting or useful about the science of psychology (to paraphrase the late George Miller) at my current position. It is, in short, a calling.
In a sense, I think this summer is more one of self-reflection. I honestly have no idea what the final chapter of my academic career will be. I know I have a couple decades to coauthor it. I can say that if the experiences of the last couple years have awakened me sufficiently to focus on the truth at all costs, that's good enough. I was in a dogmatic slumber until a couple years ago regarding a lot of social priming research. I have been sufficiently awakened from that dogmatic slumber (to paraphrase Kant) based on what I have personally had the chance to analyze and based on my reading of the current state of the literature in my area of expertise. I am fortunate enough to work and live in a community that has been nothing short of kind and accepting to me and my family. That too matters a great deal. Let's just say that part of my self-reflection centers on that particular reality as well.
Then, I get busy with prepping summer classes, prepping for my usual trip to the AP Psychology Reading site, some family trips (one major due to distance, and a series of day trips for family closer to me), and planning projects and papers for the upcoming year. The latter is becoming less of a priority for me. I have some data sets I need to write up and submit that got neglected during the time when I was heavily involved in a meta-analysis. One was submitted late in April and I don't expect to hear anything for a while. The other two will need to be dealt with sooner or later. Both are replication reports, and those are often not a priority for most journals. I have those outlined. It is just a matter of blocking out time for writing. Beyond that, I have a chapter due at the end of August. It is on a topic with which I have regrettably way too much familiarity (the weapons effect). I can write in a neutral tone, and given the evidence I would need to cite, neutrality is pretty easy for me. I am involved in some collaborative projects that are still largely in their infancy. My hope there is that my collaborators take the lead, as I really have no ambition to take on more than my resources allow.
I have good collaborators, and I expect to learn a great deal from them in the coming year or two. I am heading to SIPS in Rotterdam in early July. Normally I don't do conferences right at the start of a fiscal year, and especially conferences overseas. I have to front way too much money with little hope for reimbursement until several months have passed. An outside travel grant from DARPA made this trip just barely feasible for me financially, and the opportunity to meet with and work with people I know primarily through Twitter direct messages and the occasional phone call or Skype is just too good for me to pass up. I am hoping to take away from the experience some ideas for how to better retool some of my program's methodology courses - ideas that I can float to my departmental colleagues and hopefully implement a bit this coming fall and more completely over the next academic year or two.
A while back I made mention that one of the big lies that we get told in graduate school is that we are supposed to want to work at prestigious institutions, publish in A-list journals, win large grant awards, and essentially be famous. Like a lot of people in my position, I bought into that lie, and I fear that I compromised myself in the process. The last couple years were a bit of a wake-up call for me. If one is really intent on an academic career, there are plenty of ways of doing so and find the experience meaningful and rewarding. The R-1 and R-2 institutions are not necessarily ideal destinations. My full-time gigs have been at institutions that barely emphasize research and that cater to students who may simply want a 2-year degree, rather than a 4-year degree. I do some adjuncting at a community college. I find something meaningful in giving away something interesting or useful about the science of psychology (to paraphrase the late George Miller) at my current position. It is, in short, a calling.
In a sense, I think this summer is more one of self-reflection. I honestly have no idea what the final chapter of my academic career will be. I know I have a couple decades to coauthor it. I can say that if the experiences of the last couple years have awakened me sufficiently to focus on the truth at all costs, that's good enough. I was in a dogmatic slumber until a couple years ago regarding a lot of social priming research. I have been sufficiently awakened from that dogmatic slumber (to paraphrase Kant) based on what I have personally had the chance to analyze and based on my reading of the current state of the literature in my area of expertise. I am fortunate enough to work and live in a community that has been nothing short of kind and accepting to me and my family. That too matters a great deal. Let's just say that part of my self-reflection centers on that particular reality as well.
Tuesday, May 28, 2019
The Clampdown: Or How Not to Handle a Crisis of Confidence
Interesting post by Gelman recently. I am on the mailing list from where the quoted email came. It was in reference to revelations about the Zimbardo prison experiment that cast further doubt on its legitimacy. As someone watching HBO's Chernobyl series, there is something almost Soviet in the mindset expressed in that email. The thing about clampdowns is that they tend to generate further cynicism that erodes the edifice upon which a particular discipline or sub-discipline is based. If I could, I'd tell these folks that they are just making the changes they are fighting more inevitable, even if for a brief spell the life of those labeled as dissidents is made a bit more inconvenient.
The title for this and Gelman's post is inspired by a song by The Clash:
The title for this and Gelman's post is inspired by a song by The Clash:
Sunday, May 19, 2019
A Little Matter of Data Quality
A quote from Andrew Gelman:
If you have followed my blog over the last few months, you have an idea of what I've been going on about, yeah? Numbers mean squat if I cannot trust their source. Think about that the next time someone gives you an improbably claim, offers up some very complex looking tables, figures, and test statistics, and then hopes you don't notice that the tables are a bit odd, the marginal means and cell means don't quite mesh they way they should, or that there were serious decision errors. Beware especially of work coming from researchers who are unusually prolific at publishing findings utilizing methods that would take heroic team efforts to publish at that rate, let alone a single individual. Garbage data give us garbage findings more often than not. Seems like a safe enough bet.
I go on about this because there is plenty of dodgy work in my field. There is reason to be concerned about some of the zombies (i.e., phenomena that should have been debunked that continue to be taught and treated as part of our popular lore) in my field. Stopping the proliferation of these zombies at this point is a multifaceted effort. Part of that effort is making sure we can actually examine the data from which findings are derived. In the meantime, remember rule #2 for surviving a zombie apocalypse (including zombie concepts): the double tap.
So it’s good to be reminded: “Data” are just numbers. You need to know where the data came from before you can learn anything from them.
If you have followed my blog over the last few months, you have an idea of what I've been going on about, yeah? Numbers mean squat if I cannot trust their source. Think about that the next time someone gives you an improbably claim, offers up some very complex looking tables, figures, and test statistics, and then hopes you don't notice that the tables are a bit odd, the marginal means and cell means don't quite mesh they way they should, or that there were serious decision errors. Beware especially of work coming from researchers who are unusually prolific at publishing findings utilizing methods that would take heroic team efforts to publish at that rate, let alone a single individual. Garbage data give us garbage findings more often than not. Seems like a safe enough bet.
I go on about this because there is plenty of dodgy work in my field. There is reason to be concerned about some of the zombies (i.e., phenomena that should have been debunked that continue to be taught and treated as part of our popular lore) in my field. Stopping the proliferation of these zombies at this point is a multifaceted effort. Part of that effort is making sure we can actually examine the data from which findings are derived. In the meantime, remember rule #2 for surviving a zombie apocalypse (including zombie concepts): the double tap.
Monday, May 13, 2019
Causus Belli
I know I have been hard on work from Qian Zhang's lab in Southwest University for a while now. I have my reasons. I am mainly concerned with a large number of papers in which there are many serious errors. I can live with the one-off bad article. That happens. What I am reading suggests either considerable incompetence over multiple studies or something far more serious. There is a pervasive pattern of errors that is consistent across multiple articles. That I or any of my colleagues were stonewalled when asking for data is not acceptable given the pattern of results over the course of the last several years. My recent experiences have turned me from a "trust in peer review" to "trust but verify." If verification fails, trust goes bye-bye. Just how it is.
Given the sheer quantity of articles and given the increasing level of impact each new article has, I have good cause to be concerned. I am even more concerned given that well-known American and European authors are now collaborators in this research. They have reputations on the line, and the last thing I want for them is to find themselves dealing with corrections and retractions. Beyond that, I can never figure out how to say no to a meta-analysis. The findings in this body of research are ones that I would ordinarily need to include. As of now, I am questioning if I could even remotely hope to extract accurate effect sizes from this particular set of articles. I should never find myself in that position, and I think that anyone in such a position is right to be upset.
Under ordinary circumstances, I am not a confrontational person. If anything, I am quite the opposite. However, when I see something that is just plain wrong, I cannot remain silent. There is a moral and ethical imperative for speaking out. Right now I see a series of articles that have grave errors, and ones in which would lead a reasonable skeptic to state that the main effect the authors sought (weapons priming, video game priming, violent media priming) never existed. There may or may not be some subset effect going on, but without the ability to reproduce the original findings, there is no way to know entirely for sure. Not being able to trust what I read is extremely uncomfortable. I can live with uncertainty - after all a certain level of uncertainty is built into our research designs and our data analysis techniques. What I cannot live with is no certainty at all. To take an old John Lennon song (that I probably knew more from an old Generation X cover), "gimme some truth." Is that too much to ask? If so, I continue to be confrontational.
Given the sheer quantity of articles and given the increasing level of impact each new article has, I have good cause to be concerned. I am even more concerned given that well-known American and European authors are now collaborators in this research. They have reputations on the line, and the last thing I want for them is to find themselves dealing with corrections and retractions. Beyond that, I can never figure out how to say no to a meta-analysis. The findings in this body of research are ones that I would ordinarily need to include. As of now, I am questioning if I could even remotely hope to extract accurate effect sizes from this particular set of articles. I should never find myself in that position, and I think that anyone in such a position is right to be upset.
Under ordinary circumstances, I am not a confrontational person. If anything, I am quite the opposite. However, when I see something that is just plain wrong, I cannot remain silent. There is a moral and ethical imperative for speaking out. Right now I see a series of articles that have grave errors, and ones in which would lead a reasonable skeptic to state that the main effect the authors sought (weapons priming, video game priming, violent media priming) never existed. There may or may not be some subset effect going on, but without the ability to reproduce the original findings, there is no way to know entirely for sure. Not being able to trust what I read is extremely uncomfortable. I can live with uncertainty - after all a certain level of uncertainty is built into our research designs and our data analysis techniques. What I cannot live with is no certainty at all. To take an old John Lennon song (that I probably knew more from an old Generation X cover), "gimme some truth." Is that too much to ask? If so, I continue to be confrontational.
Thursday, May 9, 2019
A word about undergraduate research projects
I suppose standards about acceptable practices for undergraduate research projects vary from institution to institution and across countries. I do have a few observations of my own, based on an admittedly very, very small sample of institutions in one country.
I have worked at a university where we had the staffing and resources to give students introductory stats training and an introductory methods course - the latter usually taken during the Junior year. None of those projects was ever intended for publication, given the small samples involved, and given that students were expected to produce a finished research project in one semester. At my current university, my department requires students go through an intensive four-course sequence of statistical and methodological training. Students are required to learn enough basic stats to get by, learn how to put together a research prospectus, and then gain some more advanced training (including creating and managing small databases) in statistics and in methodology. The whole sequence culminates with students presenting their finished research on campus. That may seem like a lot, but by the time students are done, they have at least the basics required to handle developing a thesis prospectus in grad school. At least they won't do what I did and ask, "what is a prospectus?" That was a wee bit embarrassing.
Each fall semester, I write a general IRB proposal to cover most of these projects for my specific course sections. That IRB proposal is limited to a specific on-campus adult sample and to minimal risk designs. None of those projects covered under that general IRB proposal are intended for publication. Students wanting to go the extra mile need to complete their own IRB forms and gain approval. Students who are genuinely interested in my area of expertise go through the process of completing their own IRB proposals and dealing with any revisions, etc., before we even think of running anything.
Only a handful of those have produced anything that my students wished to pursue publication. To date, I have one successfully published manuscript with an undergrad, one that was rejected (it was an interesting project, but admittedly the sample was small and findings too inconclusive), and one that is currently under review. That these students were coauthors means that they contributed significantly to the writeup. That means my peers could grill them at presentation time and they could give satisfactory answers. They knew their stuff. And the reason they knew their stuff is because I went out of my way to make sure that they were mentored as they made those projects their own. I made sure that I had seen the raw data and worked together with each student to make sure data were analyzed correctly. I stick to fairly simple to accomplish personality-social projects in those cases as that is my training. That's just how we roll.
I have worked at a university where we had the staffing and resources to give students introductory stats training and an introductory methods course - the latter usually taken during the Junior year. None of those projects was ever intended for publication, given the small samples involved, and given that students were expected to produce a finished research project in one semester. At my current university, my department requires students go through an intensive four-course sequence of statistical and methodological training. Students are required to learn enough basic stats to get by, learn how to put together a research prospectus, and then gain some more advanced training (including creating and managing small databases) in statistics and in methodology. The whole sequence culminates with students presenting their finished research on campus. That may seem like a lot, but by the time students are done, they have at least the basics required to handle developing a thesis prospectus in grad school. At least they won't do what I did and ask, "what is a prospectus?" That was a wee bit embarrassing.
Each fall semester, I write a general IRB proposal to cover most of these projects for my specific course sections. That IRB proposal is limited to a specific on-campus adult sample and to minimal risk designs. None of those projects covered under that general IRB proposal are intended for publication. Students wanting to go the extra mile need to complete their own IRB forms and gain approval. Students who are genuinely interested in my area of expertise go through the process of completing their own IRB proposals and dealing with any revisions, etc., before we even think of running anything.
Only a handful of those have produced anything that my students wished to pursue publication. To date, I have one successfully published manuscript with an undergrad, one that was rejected (it was an interesting project, but admittedly the sample was small and findings too inconclusive), and one that is currently under review. That these students were coauthors means that they contributed significantly to the writeup. That means my peers could grill them at presentation time and they could give satisfactory answers. They knew their stuff. And the reason they knew their stuff is because I went out of my way to make sure that they were mentored as they made those projects their own. I made sure that I had seen the raw data and worked together with each student to make sure data were analyzed correctly. I stick to fairly simple to accomplish personality-social projects in those cases as that is my training. That's just how we roll.
Subscribe to:
Posts (Atom)