I briefly led up to this topic a couple months ago (see Prelude). Where we left off was with the problem that inevitably cropped up with narrative reviews. Meta-analysis offered a promising and more objective alternative to reviewing the literature. The premise is simple enough. We we can combine all studies testing a specific hypothesis in order to get an estimate of the overall effect size (essentially the mean of Cohen's d, Pearson's r, etc.), along with 95% confidence intervals. If the confidence intervals do not include zero, the effect can be considered "significant" - that is it's an effect that appears to be noticeable.We can also examine moderators that might impact the mean effect size estimate. Now admittedly I am oversimplifying, but I just want to provide the gist. If you want the full story, I can recommend any of a number of resources (Michael's Borenstein's work is certainly worth reading).
Meta-analyses are often very useful in providing confirmation that multiple tests of the same hypothesis are confirming initial findings, making sense of messy research literatures, and debunking myths. The reason we rarely talk about Type A personality (TABP) any more is thanks to several meta-analyses that showed no relationship between TABP and heart disease, for example. However, it became obvious in a hurry that there were some issues with this new approach to reviewing the literature.
One problem was that effect sizes were estimated using what was called a fixed effects model. The problem with that was the assumption of fixed effects models is that the collection of studies represent a population. The reality is that we merely have a sample of studies whenever we conduct a meta-analysis, and so we moved to using random effects models. Another very obvious problem is publication bias and the proverbial file drawer problem. Journals rarely publish null findings, and those null findings often don't see the light of day. That is a problem because meta-analyses may be overestimating effect sizes. So, a number of approaches to dealing with that problem have been tried, each with its shortcomings. I still remember the days of the Failsafe N. Thankfully we've moved beyond that. For a number of years, the standard has been Trim-and-Fill analyses and funnel plots. Unfortunately, that approach may understate the potential impact of publication bias. A number of other techniques have been developed and utilized, usually individually, such as PET-PEESE, p-curves, and so on. Each of these techniques individually has its advantages and disadvantages, and in the case of p-curves may be limited to a very specific set of circumstances. A more recent approach, and one I prefer, is to use a combination of sensitivity analyses in order to address publication bias effects and attempt to triangulate around a likely estimate of the mean effect size. If we can triangulate around a likely effect size estimate, we can make some tentatively conclusive statements about the severity of publication bias in a literature and about the extent to which we can say that an effect is real. If we cannot triangulate, we can recommend caution in interpretation of the original naive effect size estimate and try to figure out what is going on with a particular research literature.
In the process, we need to look at another way meta-analysis has changed. When I was working on my dissertation, most of us were using proprietary software such as SAS (which was not developed to handle meta-analysis) or d-stat (which is now defunct) to extract effect size estimates and to synthesize the available effect sizes for a literature. Quite a number of us use software such as CMA, which has a lot of nifty features, although comes with its own limitations (its forest plots and funnel plots leave much to be desired, and one needs to be very careful when entering data into its spreadsheet, as some columns that you can create are not merely columns for entering coded data - something I learned the hard way!). As long as the algorithms in these software appear to work the way they are supposed to and as long as one can cross-validate (e.g., estimate n from, say, estimate of d and variance for each study), you're probably okay. Unfortunately, if one wants to do anything more heavy duty than that, you will want to learn how to use R, and specifically R metafor.
One more thing. I always tell my students that a meta-analysis is not the last word on a particular topic. Additional studies are published (or conducted and not published), methodology improves which may challenge conventional wisdom about a particular research question, and techniques for conducting meta-analysis are continuing to evolve. When reading early meta-analyses, do yourself and me a favor and don't diss them, especially when you realize that the authors may not have examined publication bias, or only used published research, or used Failsafe N as their method of addressing publication bias. The authors of those meta-analyses likely did the best they could with the tools available at the time. We can and should do better with the tools we have at our disposal.
I undoubtedly will want to say more. At some point, I'll provide a better draft of this post with some actual citations and linkage. For now, I just wanted to record some of my thoughts.