Monday, January 29, 2018

Cross-validating in meta-analysis

I thought I'd share a couple techniques I've picked up on that are useful for cross-validation purposes. Keep in mind that the sorts of meta-analyses I am interested in involve experimental designs, and so what I will offer may or may not work for your particular purposes.

If you are estimating Cohen's d from between-subjects designs, the following formula for estimating N is recommended:


Here you simply need to know your estimate of d and the variance (v) for a particular comparison. If you are able to estimate N from the above formula reasonably accurately, you can be confident that your estimate is in the ballpark. Note that this formula works best when your treatment and control group have equal sample sizes. Unequal sample sizes will not yield accurate estimates of N.

The above formula will not work with within-subjects designs. The formula that I know does work for within-subjects designs is the following:


Note that the above formula assumes you will know the exact correlation (r) between your variables, which may or may not be reported or available. I have found that under those circumstances, if I assume r = .5000, that I typically get accurate enough estimates of N from my calculations of d and variance (v). That said, for those in the process of conducting a meta-analysis, I recommend contacting the original authors or principle investigators under circumstances where all you might have to go on is a paired-sample t-test and a sample size (and potentially a p-value). Often, the authors are more than happy to provide the info you want or need either in the form of actual estimates of r for each comparison that they computed, or better yet provide the original data set and enough info so you can do so yourself. That's easy with newer studies. Good luck if the research was published much earlier than this decade - though even then I have been amazed at how helpful authors will try to be. For those cross-validating a meta-analyst's database, if the original correlational info is available, ideally it will be recorded in the database itself for within-subjects comparisons. If not, email the meta-analyst. Again, we should be able to provide that info easily enough.

If you embark on a meta-analysis, keep in mind that others who eventually want to see your data will try to cross-validate your effect size estimates. Get ahead of that situation and do so from the get-go on your own. You'll know that you can trust your calculations of effect size and you will be able to successfully address concerns about those computations as they arise later. Ultimately that's the bottom line: you need to know that you can trust the process of how your effect size calculations are being computed, regardless of whether you are using a proprietary software package like CMA or open access language like R, and regardless of how seasoned you are as a meta-analyst. If you find problems cross-validating, then you can go back and check your code for possible errors. That'll undoubtedly save some heartache and heartburn, but the more important thing is that you can be confident that what you ultimately present to your particular audience is the closest approximation to the truth possible. Ultimately, that is all that matters. Hopefully the above is helpful to someone.

And now back to meta-analysis.

I briefly led up to this topic a couple months ago (see Prelude). Where we left off was with the problem that inevitably cropped up with narrative reviews. Meta-analysis offered a promising and more objective alternative to reviewing the literature. The premise is simple enough. We we can combine all studies testing a specific hypothesis in order to get an estimate of the overall effect size (essentially the mean of Cohen's d, Pearson's r, etc.), along with 95% confidence intervals. If the confidence intervals do not include zero, the effect can be considered "significant" - that is it's an effect that appears to be noticeable.We can also examine moderators that might impact the mean effect size estimate. Now admittedly I am oversimplifying, but I just want to provide the gist. If you want the full story, I can recommend any of a number of resources (Michael's Borenstein's work is certainly worth reading).

Meta-analyses are often very useful in providing confirmation that multiple tests of the same hypothesis are confirming initial findings, making sense of messy research literatures, and debunking myths. The reason we rarely talk about Type A personality (TABP) any more is thanks to several meta-analyses that showed no relationship between TABP and heart disease, for example. However, it became obvious in a hurry that there were some issues with this new approach to reviewing the literature.

One problem was that effect sizes were estimated using what was called a fixed effects model. The problem with that was the assumption of fixed effects models is that the collection of studies represent a population. The reality is that we merely have a sample of studies whenever we conduct a meta-analysis, and so we moved to using random effects models. Another very obvious problem is publication bias and the proverbial file drawer problem. Journals rarely publish null findings, and those null findings often don't see the light of day. That is a problem because meta-analyses may be overestimating effect sizes. So, a number of approaches to dealing with that problem have been tried, each with its shortcomings. I still remember the days of the Failsafe N. Thankfully we've moved beyond that. For a number of years, the standard has been Trim-and-Fill analyses and funnel plots. Unfortunately, that approach may understate the potential impact of publication bias. A number of other techniques have been developed and utilized, usually individually, such as PET-PEESE, p-curves, and so on. Each of these techniques individually has its advantages and disadvantages, and in the case of p-curves may be limited to a very specific set of circumstances. A more recent approach, and one I prefer, is to use a combination of sensitivity analyses in order to address publication bias effects and attempt to triangulate around a likely estimate of the mean effect size. If we can triangulate around a likely effect size estimate, we can make some tentatively conclusive statements about the severity of publication bias in a literature and about the extent to which we can say that an effect is real. If we cannot triangulate, we can recommend caution in interpretation of the original naive effect size estimate and try to figure out what is going on with a particular research literature.

In the process, we need to look at another way meta-analysis has changed. When I was working on my dissertation, most of us were using proprietary software such as SAS (which was not developed to handle meta-analysis) or d-stat (which is now defunct) to extract effect size estimates and to synthesize the available effect sizes for a literature. Quite a number of us use software such as CMA, which has a lot of nifty features, although comes with its own limitations (its forest plots and funnel plots leave much to be desired, and one needs to be very careful when entering data into its spreadsheet, as some columns that you can create are not merely columns for entering coded data - something I learned the hard way!). As long as the algorithms in these software appear to work the way they are supposed to and as long as one can cross-validate (e.g., estimate n from, say, estimate of d and variance for each study), you're probably okay. Unfortunately, if one wants to do anything more heavy duty than that, you will want to learn how to use R, and specifically R metafor.

One more thing. I always tell my students that a meta-analysis is not the last word on a particular topic. Additional studies are published (or conducted and not published), methodology improves which may challenge conventional wisdom about a particular research question, and techniques for conducting meta-analysis are continuing to evolve. When reading early meta-analyses, do yourself and me a favor and don't diss them, especially when you realize that the authors may not have examined publication bias, or only used published research, or used Failsafe N as their method of addressing publication bias. The authors of those meta-analyses likely did the best they could with the tools available at the time. We can and should do better with the tools we have at our disposal.

I undoubtedly will want to say more. At some point, I'll provide a better draft of this post with some actual citations and linkage. For now, I just wanted to record some of my thoughts.