Friday, February 23, 2018

Post-script to "Worth Your Consideration"

A couple weeks ago, I mentioned Erin Bartram's blog post that explained the structural problems facing so many scholars, and which led her to say it was time to exit the field. Her simple blog post struck a nerve. You can now read an interview with her in The Chronicle of Higher Education. I would recommend doing so. The current state of the academic environment in the US is not sustainable, and we are losing out on some talent in the process.

Thursday, February 22, 2018

A cautionary remark about meta-analyses

I've been wanting to share a few thoughts about meta-analysis for a while, and am relieved to finally have some spare moments here and there where I can do that for you all. I thought that might be somewhat useful as I both consume and produce meta-analyses. I have been involved in two meta-analyses that have been published (my most recent as the primary author is currently in press) and one that is currently in progress. I have an idea of the positives and negatives involved with meta-analysis that is based very much on those hands-on experiences.

I will start off be simply stating the obvious: our field has a serious publication bias problem. The causes of that problem are multiple and need not be litigated here. The bottom line remains, though, that a sample of studies in a meta-analysis is just that - a sample. We honestly do not have access to the population of studies testing a particular hypothesis, and hence any effect size estimate we calculate should not be taken at face value. So, some means of estimating publication bias effects (these techniques fall under the broad umbrella of sensitivity analyses) is necessary to give us a better understanding of the literature we are studying. The problem is that right now we do not seem to be in agreement over what techniques of estimating publication bias should be utilized. Well, I do think we are in agreement that the fail-safe N can be safely dispensed with. Beyond that, disagreements abound. For much of this century, trim-and-fill analyses and funnel plots have been the gold standard. However, that set of techniques has come under attack for underestimating the effects of publication bias, leading meta-analysts to be a bit too sanguine in their conclusions. A variety of techniques have been offered as alternatives - many of which have only been in use for a handful of years (PET-PEESE and p-curves come to mind). Meta-analyses utilizing these techniques lead to different conclusions than those using trim-and-fill analyses. The two meta-analyses recently published on ego depletion, based on the same sample of studies, come immediately to mind. Depending on whether one reads the meta in which the authors rely on trim-and-fill analyses or the one in which the other set of authors rely on PET-PEESE, one may draw different conclusions about the state of ego depletion research. The former may come to the conclusion that ego depletion research is alive and well, whereas the latter may come to the conclusion that ego depletion research has been thoroughly debunked. Although the authors of those meta-analyses appear to me to be working in good faith, it would not take too much to imagine meta-analysts who have specific axes to grind specifically choosing their favored sensitivity analysis techniques based on their severity, and hence based on whether the techniques in question will "support" or "debunk" a particular body of research. That worries me a great deal. For now I am recommending utilizing a battery of techniques to estimate publication bias (including trim-and-fill and PET-PEESE) and examining the extent to which these techniques appear to triangulate around a "true" effect size for a particular distribution. My reason for doing so is largely based on the ideal of maintaining the sort of objectivity that meta-analysis promised when it was introduced as an alternative to the narrative literature review. Anything short of that would place us back in the same position we were in back in the days when narrative reviews were the only way of assessing a literature: highly subjective and based upon the whims of the reviewer or reviewers. That is not a road we want to travel.

Sunday, February 11, 2018

Worth your consideration

The Sublimated Grief of the Left Behind

I follow quite a number of scholars on Twitter. Periodically I see posts of what falls under the broad umbrella of quit lit retweeted. This post is a bit different, and I hope that her perspective offers some much needed food for thought. As someone who has experienced the loss of talented colleagues due to the circumstances the above author faces, this is a post that hit close enough to home to bear mentioning.

Friday, February 2, 2018

Never treat a meta-analysis as the last word

I mentioned earlier that any individual meta-analysis should never be treated as the last word. Rather, it is best to treat a meta-analytic study as a tentative assessment of the state of a particular research literature at that particular moment. One obvious reason for my stance simply comes down to one of the available sample of studies testing a particular hypothesis at any given time. Presumably, over time, more studies that attempt to replicate the hypothesis test in question will be conducted and ideally reported. In addition, search engines are much better at detecting unpublished studies (what one of my mentors referred to as the "fugitive literature") than they once were. That's partially due to technological advances and partially due to individuals making their unpublished work (especially null findings) available for public consumption to a greater degree. To the extent that is the case, we would want to see periodic updated meta-analyses to account for these newer studies.

The second obvious reason is that meta-analysis itself is evolving. The techniques for synthesizing studies addressing a particular hypothesis are much more sophisticated than when I began my graduate studies, and are bound to continue to become more sophisticated going forward. The techniques for estimating mean effect sizes are more sophisticated, as are the techniques for estimating the impact of publication bias and outlier effects. If anything, recent meta-analyses are alerting us to what should have been obvious a long time ago: we have a real file drawer problem, and the failure to publish null findings or findings that are considered no longer "interesting" is leading us to have a more rose-colored view of our various research literatures than is warranted. Having said that, it is also very obvious that since we cannot quite agree among ourselves as to what publication bias analyses are adequate, and these techniques themselves can potentially yield divergent estimates of publication bias, it is best to use some battery of publication bias effect estimation techniques for the time being.

Finally, there is the nagging concern I have that once a meta-analysis gets published, if it is treated as the last word, future research pertaining to that particular research question has the potential to effectively cease. Yes, some isolated investigators will continue to conduct research, but with much less hope of their work being given its due than it might have otherwise. I suspect that we can look at research areas where a meta-analysis has indeed become the proverbial "last word" and find evidence that is exactly what transpired.Given reasons one and two above, that would be concerning, to say the least. There is at least one research literature with which I am intimately familiar where I suspect one very important facet of that literature effectively halted after what became a classic meta-analysis was published. At some point in the near future, I will turn to that research literature.

Monday, January 29, 2018

Cross-validating in meta-analysis

I thought I'd share a couple techniques I've picked up on that are useful for cross-validation purposes. Keep in mind that the sorts of meta-analyses I am interested in involve experimental designs, and so what I will offer may or may not work for your particular purposes.

If you are estimating Cohen's d from between-subjects designs, the following formula for estimating N is recommended:


Here you simply need to know your estimate of d and the variance (v) for a particular comparison. If you are able to estimate N from the above formula reasonably accurately, you can be confident that your estimate is in the ballpark. Note that this formula works best when your treatment and control group have equal sample sizes. Unequal sample sizes will not yield accurate estimates of N.

The above formula will not work with within-subjects designs. The formula that I know does work for within-subjects designs is the following:


Note that the above formula assumes you will know the exact correlation (r) between your variables, which may or may not be reported or available. I have found that under those circumstances, if I assume r = .5000, that I typically get accurate enough estimates of N from my calculations of d and variance (v). That said, for those in the process of conducting a meta-analysis, I recommend contacting the original authors or principle investigators under circumstances where all you might have to go on is a paired-sample t-test and a sample size (and potentially a p-value). Often, the authors are more than happy to provide the info you want or need either in the form of actual estimates of r for each comparison that they computed, or better yet provide the original data set and enough info so you can do so yourself. That's easy with newer studies. Good luck if the research was published much earlier than this decade - though even then I have been amazed at how helpful authors will try to be. For those cross-validating a meta-analyst's database, if the original correlational info is available, ideally it will be recorded in the database itself for within-subjects comparisons. If not, email the meta-analyst. Again, we should be able to provide that info easily enough.

If you embark on a meta-analysis, keep in mind that others who eventually want to see your data will try to cross-validate your effect size estimates. Get ahead of that situation and do so from the get-go on your own. You'll know that you can trust your calculations of effect size and you will be able to successfully address concerns about those computations as they arise later. Ultimately that's the bottom line: you need to know that you can trust the process of how your effect size calculations are being computed, regardless of whether you are using a proprietary software package like CMA or open access language like R, and regardless of how seasoned you are as a meta-analyst. If you find problems cross-validating, then you can go back and check your code for possible errors. That'll undoubtedly save some heartache and heartburn, but the more important thing is that you can be confident that what you ultimately present to your particular audience is the closest approximation to the truth possible. Ultimately, that is all that matters. Hopefully the above is helpful to someone.

And now back to meta-analysis.

I briefly led up to this topic a couple months ago (see Prelude). Where we left off was with the problem that inevitably cropped up with narrative reviews. Meta-analysis offered a promising and more objective alternative to reviewing the literature. The premise is simple enough. We we can combine all studies testing a specific hypothesis in order to get an estimate of the overall effect size (essentially the mean of Cohen's d, Pearson's r, etc.), along with 95% confidence intervals. If the confidence intervals do not include zero, the effect can be considered "significant" - that is it's an effect that appears to be noticeable.We can also examine moderators that might impact the mean effect size estimate. Now admittedly I am oversimplifying, but I just want to provide the gist. If you want the full story, I can recommend any of a number of resources (Michael's Borenstein's work is certainly worth reading).

Meta-analyses are often very useful in providing confirmation that multiple tests of the same hypothesis are confirming initial findings, making sense of messy research literatures, and debunking myths. The reason we rarely talk about Type A personality (TABP) any more is thanks to several meta-analyses that showed no relationship between TABP and heart disease, for example. However, it became obvious in a hurry that there were some issues with this new approach to reviewing the literature.

One problem was that effect sizes were estimated using what was called a fixed effects model. The problem with that was the assumption of fixed effects models is that the collection of studies represent a population. The reality is that we merely have a sample of studies whenever we conduct a meta-analysis, and so we moved to using random effects models. Another very obvious problem is publication bias and the proverbial file drawer problem. Journals rarely publish null findings, and those null findings often don't see the light of day. That is a problem because meta-analyses may be overestimating effect sizes. So, a number of approaches to dealing with that problem have been tried, each with its shortcomings. I still remember the days of the Failsafe N. Thankfully we've moved beyond that. For a number of years, the standard has been Trim-and-Fill analyses and funnel plots. Unfortunately, that approach may understate the potential impact of publication bias. A number of other techniques have been developed and utilized, usually individually, such as PET-PEESE, p-curves, and so on. Each of these techniques individually has its advantages and disadvantages, and in the case of p-curves may be limited to a very specific set of circumstances. A more recent approach, and one I prefer, is to use a combination of sensitivity analyses in order to address publication bias effects and attempt to triangulate around a likely estimate of the mean effect size. If we can triangulate around a likely effect size estimate, we can make some tentatively conclusive statements about the severity of publication bias in a literature and about the extent to which we can say that an effect is real. If we cannot triangulate, we can recommend caution in interpretation of the original naive effect size estimate and try to figure out what is going on with a particular research literature.

In the process, we need to look at another way meta-analysis has changed. When I was working on my dissertation, most of us were using proprietary software such as SAS (which was not developed to handle meta-analysis) or d-stat (which is now defunct) to extract effect size estimates and to synthesize the available effect sizes for a literature. Quite a number of us use software such as CMA, which has a lot of nifty features, although comes with its own limitations (its forest plots and funnel plots leave much to be desired, and one needs to be very careful when entering data into its spreadsheet, as some columns that you can create are not merely columns for entering coded data - something I learned the hard way!). As long as the algorithms in these software appear to work the way they are supposed to and as long as one can cross-validate (e.g., estimate n from, say, estimate of d and variance for each study), you're probably okay. Unfortunately, if one wants to do anything more heavy duty than that, you will want to learn how to use R, and specifically R metafor.

One more thing. I always tell my students that a meta-analysis is not the last word on a particular topic. Additional studies are published (or conducted and not published), methodology improves which may challenge conventional wisdom about a particular research question, and techniques for conducting meta-analysis are continuing to evolve. When reading early meta-analyses, do yourself and me a favor and don't diss them, especially when you realize that the authors may not have examined publication bias, or only used published research, or used Failsafe N as their method of addressing publication bias. The authors of those meta-analyses likely did the best they could with the tools available at the time. We can and should do better with the tools we have at our disposal.

I undoubtedly will want to say more. At some point, I'll provide a better draft of this post with some actual citations and linkage. For now, I just wanted to record some of my thoughts.

Monday, December 11, 2017

Interlude: Loss of Confidence Project

I saw a blurb on a Loss of Confidence Project on Twitter. I like the idea. Those of us in the psychological sciences conduct research that perhaps at the time we thought was well-done from a theoretical and/or methodological standpoint, but realize later that we made some sort of honest mistake. If we can get to a point where we can take ownership for our mistakes, and foster a culture that is forgiving and accepting of what is probably very common, we'll be better off for it. I'll certainly follow this project with great interest.