Saturday, May 31, 2025

Speaking of zombies: when public officials use AI to write their papers for them

This is embarrassing. Yes, it appears that the Make America Healthy Again (MAHA) Commission report was riddled with errors, including some non-existent citations and misrepresentation of other cited studies in its narrative. My initial impulse when I first read about the errors was that the report was probably largely "written" with AI instead of by the authors of the report themselves. It turns out my gut feeling was right on the money. Washington Post (paywalled, unfortunately) has an article showing a fair amount of AI's fingerprints on the references cited. I can't say I am really surprised. These fools apparently expected that no one would be the wiser. Remember that old saying, "you can fool some of the people some of the time but not all of the people all of the time"? I'd reckon that saying is applicable here. 

If the individuals responsible for this MAHA Commission report were students in one of my classes and I noticed the apparent AI influence, I would have no choice but to assign a 0 to the paper citing plagiarism concerns - with examples clearly identified. Regrettably, these are not my students but rather are public officials who wield a great deal of power over our own health outcomes in the US, and it is highly probable that the base supporting the current White House administration won't care about that little matter of academic integrity. This is not the first time that a high-profile public individual has been caught using AI to do their work, and it is safe to say it won't be the last time. I am asking those in the scientific community and the general public to be vigilant and to call out this form of fraud when they see it. That is especially crucial when the documents generated by AI are going to be used to make decisions that affect all of our health outcomes.

Question: Can we rid science of retracted articles?

I saw this opinion article on the problem of zombie papers and some possible solutions (i.e., retracted papers that continue to get cited and have influence) thanks to Retraction Watch. You can read the original in French here. Marc Joets makes some useful points here. Yes, we definitely have a problem. Part of the problem is that it takes a long time on average to get a problematic article (i.e., one that has error-ridden data reporting, fraudulent data reporting, or plagiarism) retracted. Apparently from date of publication to date of retraction, you're probably expect three years before a problematic paper is retracted, give or take. Apparently there are regional differences in time taken to retract an article: American and western European based journals do so more rapidly than elsewhere. Subscription based journals tend to retract articles more rapidly once a problem is identified than open source journals. In other words we need to keep in mind that there may be variations in culture and editorial practices at play. 

I've mentioned retractions before, and zombie articles before. In our various scientific fields, zombies are a legitimately concerning problem. As long as they are cited, they risk infecting not only the specific scientific discipline in question but also public discourse and policy. If you are living in the US right now, you probably know that a fraudulent and retracted article that spread some outright lies about the safety of childhood vaccinations has led in a matter of a couple decades to mainstream an anti-vax movement that is now in control of our own federal public health agencies. In this case, the consequences are life and death as the government is no longer as interested in containing a deadly measles outbreak. In my corner of the scientific community, the stakes may be considerably lower, but zombie articles can still infect public discourse and policy in ways that are not in the public interest. 

So, what to do? The answers in this editorial are ones that strike me as common sense at this point. Making data and research protocols publicly available can help to catch mistakes and fraud early enough to nip the problem in the bud. Better plagiarism detection tools are mentioned as well. Ultimately the author notes that there is no one-size-fits-all solution. But in broad brushstrokes we can expect that efforts to beef up transparency help. Efforts to improve reproducibility - requiring pre-registration of research protocols and offering evidence of replicability -are also necessary. Any of these practices can help detect errors or problems in a more timely manner. In addition to better plagiarism detection tools, the author suggests that each journal have its own panel that can objectively handle instances of fraud or serious errors as they occur. Finally, the author argues for making the fact that these articles have been retracted more visible in order to minimize the impact of retracted work. That strikes me as a solid idea. I still get a sense that retractions are not nearly as visible as they could be. Those with the PubPeer browser extension might be a bit more wise to retractions, as are those who use the Retraction Watch site's own retraction database. But how many of us are actually using those resources currently and consistently? I wonder. 

I wish that instructions for eliminating zombies from our sciences were as simple as "destroy the brain or remove the head"* but alas they are not. I remain a cautious optimist however.

*The reference in that quote was specifically to "Shaun of the Dead" which is a personal favorite of mine, but probably would refer to most zombie films and series I've seen over the years. 

 

Saturday, May 10, 2025

A cyberbullying paper with some concerns

To preface this: studying cyberbullying and examining how bystanders respond to cyberbullying if they notice it is a worthwhile endeavor. I'd say the same about bullying research in general. That said, just because the endeavor is generally worthy, one needs to be on the lookout when reading original empirical research reports. 

Let's just dive right in, starting with a link to the paper, so you can read it for yourself. Next I will provide the link to the comments on PubPeer.

Now, this is a decade old article at this point. In social psychology, that makes it ancient history to some. But here's the thing: no matter how much time passes, this is the sort of thing that will end up included in relevant meta-analyses. If something goes amiss in the original report, it will have some influence on the effect size estimates and on estimates of publication bias. That's where my concern is focused. 

If you look at PubPeer, there are several comments on the article, dating back to August 2022. Admittedly, I don't utilize PubPeer as much as I would like, and sometimes I miss something concerning to me. That's my reminder that I should go ahead and start using the PubPeer browser extension again. Comments 1 and 3 are arguably the more substantive of the comments. The last two comments merely reinforce what the initial commenter observed.

What were those comments? Let's review, starting with the first:

I have a few questions about this well-cited paper.

1. The abstract states: "Most participants (68%) intervened indirectly after the incident and threat were removed", but I cannot find this statistic or the n frequencies to calculate this statistic anywhere in the results.


2. Table 1 has the heading "Noticed cyberbullying" and then the subheading "Yes" and "No", implying that the columns are to compare those that noticed the cyberbullying versus not. However, the ns listed correspond to how many participants directly intervened regardless of whether they noticed cyberbullying or not.


3. I inputted some of the values provided into t-test calculators, which resulted in slightly different statistics. Due to the lack of information on the analytic plan/tests used and information about missing data/data normality it is difficult to discern the causes of these discrepancies.


4. 8% of the sample was omitted from analyses for being "suspicious" but no other information is provided. It is difficult to assess the quality of the sample or replicate the paper without further clarifications on how so many participants were determined to be "suspicious".

And here are some screenshots with highlights in yellow to point out what appeared to be problematic:


 


 


 You can enlarge the images by clicking on them. I will admit that like the initial commenter, I found the article's prose to be a bit challenging to follow. Some of that can probably be chalked up to the many edits and re-edits that happen when going through the peer review process. So it goes. But, there are some apparent typos that can't be chalked up to the inevitable compromises that happen in an effort to satisfy a reviewer or an editor. 

The first criticism of the article appears in the abstract, where the authors claim that 68% indirectly intervened is a fair one. This statistic gets thrown in there, but figuring out where it came from was not exactly straightforward. Eventually, buried somewhere, I was able to suss out that of the 221 participants who were included in the analysis, 150 of them noticed that bullying behavior was happening, which yields 67.87% of the total sample (or 68% for short). However, keep in mind that the statement that follows is the same basic percentage used to determine indirect intervention, which would not necessarily been the same percentage. Those who did not notice the bullying behavior would by definition be only able to indirectly intervene, if that makes sense. So let's see what happens when we remove 71 individuals who did not notice any bullying. I am unable to find where that 68% figure comes from. If we have 127 participants who noticed the bullying but only indirectly intervened and 23 who directly intervened, the percentage of those only using indirect interventions is 84%. That percentage goes up tremendously if we use the whole sample. So there's that. 

Table 1 could have arguably been constructed better, and I can see how a casual or even careful reader of the article could find themselves doing a double take. Looking at the labels, I can see how a reader could easily be misled into thinking that a whopping 198 participants did not notice any bullying, which is not what the authors communicate elsewhere in the article. Not great. There is a Chi-Square analysis in that table that does not seem to appear elsewhere in the manuscript. We'll come back to Table 1 in a moment. Going back to the narrative relevant to Table 1, there is another Chi-Square mentioned, right after a beta-weight each with an identical p-value. That should be explained to the article's readers much better than it was. It strikes me as sufficiently obvious that folks are much more likely to directly intervene if they notice something occurring than if they don't, but where that 4.62 times greater figure comes from is not quite explained. I am guessing that the authors somewhere had seen the direct interventions where the participants noticed the bullying and the direct interventions where the participants didn't notice the bullying (I guess that could have happened given the way the study was set up), and then did some quick back of the napkin division to sort that out. We just don't get to know what those numbers were. Okay. I said that and now it is time to let go.

The initial PubPeer commenter noted having difficulty reproducing the t-test results with the information made available in the manuscript and table. That should not have happened. One problem with the t-tests tied to Table 1 is that they appear positive in the table but are reported as negative in the manuscript. That is a bit jarring. Add to that a problem another commenter found in the manuscript: two of the cell means reported in Table 1 do not match the corresponding cell means in the manuscript itself. That would make reproducing a t-test challenging. I ran a SPRITE analysis on the cell means and standard deviations, making the assumption that the indirect interventions were scored on either a 1-to-4 or 0-to-3 Likert Scale (depending on the question) and that each intervention was based on a single item (I saw no information to the contrary in the manuscript). I made some assumptions that all 221 participants were included in those analyses (150 who noticed the bullying vs. 71 who did not). When I used those inputs, two of the cells appeared to report standard deviations that were mathematically impossible. To the extent that a t-test calculator would need accurate mean and standard deviation information, that is troubling. I will insert the image highlighting the discrepant means in the manuscript and you can compare those to Table 1 (above):

 

A later commenter highlighted some specific problems with two F-test results based on a Statcheck scan. Statcheck can reproduce a p-value based on the test statistic and degrees of freedom with a reasonable degree of precision. The two F-tests tied to Table 3 have slightly different p-values than what Statcheck computed. That's not the end of the world. Typos happen, and in any event, one would still reject the null as the authors had done. What is odd is the discrepancy between the degrees of freedom reported in the manuscript versus what was reported in Table 3. 

I get the feeling that a lot of the apparent errors could be cleared up with access to the original data set, if it still exists, along with the analysis plan. Unfortunately, as the initial commenter on PubPeer noted, there did not appear to be any analysis plan reported. And it goes without saying that the article in question would have been published at a time when journal editors were still fairly laissez-faire about publicly archiving data. So I am having to trust that somewhere the original data used to compute the analyses exists and is accurate. That said, it's been a decade since publication and likely considerably longer since the study was originally conducted. It's been said by others far wiser than me that we psychologists have historically been poor stewards of our original data. If the original data and analysis plan haven't been deleted, I'd be pleasantly surprised. 

The initial commenter's last remarks regarded the omission of some of the original sample. Apparently, some of the participants were suspicious and were removed from analyses. I am assuming that the authors meant to state that the participants in question were ones who guessed the hypothesis and hence would have been suspicious in that sense. If they were "suspicious" in any other way, that should be specified. 

The lead author did reply once to the initial critique. I will definitely buy the contention that space limitations in journals can lead to some unfortunate decisions when it comes to what to include and exclude from the narrative. Fair enough. If you've published, you've probably experienced something similar. We get it. I did feel that the lead author more or less glossed over the initial commenter's concerns and instead focused on how "rigorous" and "replicable" the study was. The initial commenter did not buy it. Nor would I. That lead author's reply seemed almost defensive, which is understandable - none of us like to be told that we might have made some mistakes that need to be corrected. 

As of this date, I have not seen an erratum or corrigendum for this article, so I am guessing that after the lead author more or less dismissed the initial concerns, there was no further escalation. That's a shame. I would have had a lot more confidence in the findings if the authors had either corrected any errors or demonstrated tangibly that any concerns were not warranted. Hey, the actual data set and the analysis plan would have gone a long way to sorting this all out. So far? Crickets.

As the late Kurt Vonnegut would say, "so it goes."

Wednesday, May 7, 2025

A milestone of sorts

It appears that as of today, this humble blog has had 250,000 visitors. It only took all of 13 years or so for that to happen, but then again, I am a relatively obscure researcher so I am not complaining. Quite the contrary: I am grateful. Thanks for dropping by and entrusting me to provide as honest and accurate account of my specialty area as is humanly possible. I'm not going anywhere. I'll have more in the upcoming months and years.

Tuesday, May 6, 2025

Postscript to the preceding

I have been spending a bit more time looking at the Replications and Reversals site I mentioned yesterday. It's worth the time. Of course I look for anything that is in my area of specialization, so I was heartened to see what strikes me as an accurate take on the weapons priming effect:


 I will note that the meta-analysis I was lead author on is reported generally fairly, but I would add that if one looked at behavioral outcomes only, that effect size is even lower, and when publication bias is factored in, depending on the method used to detect potential publication bias, the effect size is pretty much zero. That said, referring to the evidence for the weapons priming effect as mixed is a reasonable assessment. At minimum, the effect is indeed smaller than what was once believed.

Monday, May 5, 2025

Replications and Reversals

A few years ago, I wrote about a blog post documenting reversals in psychology. In the intervening years, it has turned into a full-fledged site hosted by FORRT (Replications and Reversals), and is quite comprehensive. If you are interested in what classic research has held up and what has at best mixed evidence or has been debunked, this is a valuable resource. Every time I redo my social psychology course, this is a site that will give me more ammo to make sure that my students have the most accurate information available.