To preface this: studying cyberbullying and examining how bystanders respond to cyberbullying if they notice it is a worthwhile endeavor. I'd say the same about bullying research in general. That said, just because the endeavor is generally worthy, one needs to be on the lookout when reading original empirical research reports.
Let's just dive right in, starting with a link to the paper, so you can read it for yourself. Next I will provide the link to the comments on PubPeer.
Now, this is a decade old article at this point. In social psychology, that makes it ancient history to some. But here's the thing: no matter how much time passes, this is the sort of thing that will end up included in relevant meta-analyses. If something goes amiss in the original report, it will have some influence on the effect size estimates and on estimates of publication bias. That's where my concern is focused.
If you look at PubPeer, there are several comments on the article, dating back to August 2022. Admittedly, I don't utilize PubPeer as much as I would like, and sometimes I miss something concerning to me. That's my reminder that I should go ahead and start using the PubPeer browser extension again. Comments 1 and 3 are arguably the more substantive of the comments. The last two comments merely reinforce what the initial commenter observed.
What were those comments? Let's review, starting with the first:
I have a few questions about this well-cited paper.
1. The abstract states: "Most participants (68%) intervened indirectly after the incident and threat were removed", but I cannot find this statistic or the n frequencies to calculate this statistic anywhere in the results.
2. Table 1 has the heading "Noticed cyberbullying" and then the subheading "Yes" and "No", implying that the columns are to compare those that noticed the cyberbullying versus not. However, the ns listed correspond to how many participants directly intervened regardless of whether they noticed cyberbullying or not.
3. I inputted some of the values provided into t-test calculators, which resulted in slightly different statistics. Due to the lack of information on the analytic plan/tests used and information about missing data/data normality it is difficult to discern the causes of these discrepancies.
4. 8% of the sample was omitted from analyses for being "suspicious" but no other information is provided. It is difficult to assess the quality of the sample or replicate the paper without further clarifications on how so many participants were determined to be "suspicious".
And here are some screenshots with highlights in yellow to point out what appeared to be problematic:
You can enlarge the images by clicking on them. I will admit that like the initial commenter, I found the article's prose to be a bit challenging to follow. Some of that can probably be chalked up to the many edits and re-edits that happen when going through the peer review process. So it goes. But, there are some apparent typos that can't be chalked up to the inevitable compromises that happen in an effort to satisfy a reviewer or an editor.
The first criticism of the article appears in the abstract, where the authors claim that 68% indirectly intervened is a fair one. This statistic gets thrown in there, but figuring out where it came from was not exactly straightforward. Eventually, buried somewhere, I was able to suss out that of the 221 participants who were included in the analysis, 150 of them noticed that bullying behavior was happening, which yields 67.87% of the total sample (or 68% for short). However, keep in mind that the statement that follows is the same basic percentage used to determine indirect intervention, which would not necessarily been the same percentage. Those who did not notice the bullying behavior would by definition be only able to indirectly intervene, if that makes sense. So let's see what happens when we remove 71 individuals who did not notice any bullying. I am unable to find where that 68% figure comes from. If we have 127 participants who noticed the bullying but only indirectly intervened and 23 who directly intervened, the percentage of those only using indirect interventions is 84%. That percentage goes up tremendously if we use the whole sample. So there's that.
Table 1 could have arguably been constructed better, and I can see how a casual or even careful reader of the article could find themselves doing a double take. Looking at the labels, I can see how a reader could easily be misled into thinking that a whopping 198 participants did not notice any bullying, which is not what the authors communicate elsewhere in the article. Not great. There is a Chi-Square analysis in that table that does not seem to appear elsewhere in the manuscript. We'll come back to Table 1 in a moment. Going back to the narrative relevant to Table 1, there is another Chi-Square mentioned, right after a beta-weight each with an identical p-value. That should be explained to the article's readers much better than it was. It strikes me as sufficiently obvious that folks are much more likely to directly intervene if they notice something occurring than if they don't, but where that 4.62 times greater figure comes from is not quite explained. I am guessing that the authors somewhere had seen the direct interventions where the participants noticed the bullying and the direct interventions where the participants didn't notice the bullying (I guess that could have happened given the way the study was set up), and then did some quick back of the napkin division to sort that out. We just don't get to know what those numbers were. Okay. I said that and now it is time to let go.
The initial PubPeer commenter noted having difficulty reproducing the t-test results with the information made available in the manuscript and table. That should not have happened. One problem with the t-tests tied to Table 1 is that they appear positive in the table but are reported as negative in the manuscript. That is a bit jarring. Add to that a problem another commenter found in the manuscript: two of the cell means reported in Table 1 do not match the corresponding cell means in the manuscript itself. That would make reproducing a t-test challenging. I ran a SPRITE analysis on the cell means and standard deviations, making the assumption that the indirect interventions were scored on either a 1-to-4 or 0-to-3 Likert Scale (depending on the question) and that each intervention was based on a single item (I saw no information to the contrary in the manuscript). I made some assumptions that all 221 participants were included in those analyses (150 who noticed the bullying vs. 71 who did not). When I used those inputs, two of the cells appeared to report standard deviations that were mathematically impossible. To the extent that a t-test calculator would need accurate mean and standard deviation information, that is troubling. I will insert the image highlighting the discrepant means in the manuscript and you can compare those to Table 1 (above):
A later commenter highlighted some specific problems with two F-test results based on a Statcheck scan. Statcheck can reproduce a p-value based on the test statistic and degrees of freedom with a reasonable degree of precision. The two F-tests tied to Table 3 have slightly different p-values than what Statcheck computed. That's not the end of the world. Typos happen, and in any event, one would still reject the null as the authors had done. What is odd is the discrepancy between the degrees of freedom reported in the manuscript versus what was reported in Table 3.
I get the feeling that a lot of the apparent errors could be cleared up with access to the original data set, if it still exists, along with the analysis plan. Unfortunately, as the initial commenter on PubPeer noted, there did not appear to be any analysis plan reported. And it goes without saying that the article in question would have been published at a time when journal editors were still fairly laissez-faire about publicly archiving data. So I am having to trust that somewhere the original data used to compute the analyses exists and is accurate. That said, it's been a decade since publication and likely considerably longer since the study was originally conducted. It's been said by others far wiser than me that we psychologists have historically been poor stewards of our original data. If the original data and analysis plan haven't been deleted, I'd be pleasantly surprised.
The initial commenter's last remarks regarded the omission of some of the original sample. Apparently, some of the participants were suspicious and were removed from analyses. I am assuming that the authors meant to state that the participants in question were ones who guessed the hypothesis and hence would have been suspicious in that sense. If they were "suspicious" in any other way, that should be specified.
The lead author did reply once to the initial critique. I will definitely buy the contention that space limitations in journals can lead to some unfortunate decisions when it comes to what to include and exclude from the narrative. Fair enough. If you've published, you've probably experienced something similar. We get it. I did feel that the lead author more or less glossed over the initial commenter's concerns and instead focused on how "rigorous" and "replicable" the study was. The initial commenter did not buy it. Nor would I. That lead author's reply seemed almost defensive, which is understandable - none of us like to be told that we might have made some mistakes that need to be corrected.
As of this date, I have not seen an erratum or corrigendum for this article, so I am guessing that after the lead author more or less dismissed the initial concerns, there was no further escalation. That's a shame. I would have had a lot more confidence in the findings if the authors had either corrected any errors or demonstrated tangibly that any concerns were not warranted. Hey, the actual data set and the analysis plan would have gone a long way to sorting this all out. So far? Crickets.
As the late Kurt Vonnegut would say, "so it goes."
No comments:
Post a Comment