Let's imagine a scenario. A researcher several years ago designs a study with five treatment conditions and is mainly interested in a planned contrast between condition 1 and the remaining four conditions. That finding appears statistically significant. A few years later, the same researcher runs a second experiment that appears to be based on the same protocols, but with a larger sample (both good ideas) and finds the same planned contrast is no longer significant. That is problematic for the researcher. So, what to do? Here is where we meet some forking paths. One choice is to report the findings as they appear and acknowledge that the original finding did not replicate. Admittedly finding journals to publish non-replications is still a bit of a challenge (too much so in my professional opinion), so that option may seem a bit unsavory. Perhaps another theory driven path is available. The researcher could note that other than the controller used in condition 1 and condition 2 (and the same for condition 3 and condition 4), the stimulus is identical. So, taking a different path, the researcher combines conditions 1 and 2 to form a new category and does the same with conditions 3 and 4. Condition 5 remains the same. Now, a significant ANOVA is obtainable and the researcher can plausibly argue that the findings show that this new category (conditions 1 and 2 combined) really is distinct from the neutral condition, thus supporting a theoretical model. The reported findings now look good for publication in a higher impact journal. The researcher did not find what she/he initially set out to find, but did find something. But did the researcher really replicate the original findings? If based on the prior published work, the answer appears to be no. The original planned contrast between condition 1 and the other conditions does not replicate. Does the researcher have a finding that tells us something possibly interesting or useful? Maybe. Maybe not. Does the revised analysis appear to be consistent with an established theoretical model? Apparently. Does the new finding tell us something about everyday life that the original would not have already told us had it successfully replicated? That's highly questionable. At bare minimum, in the strict sense of how we define a replication (i.e., a study that finds similar results to the original and/or to similar other studies) the study in question fails to do so. That happens with many psychological phenomena, especially ones that are quite novel and counter-intuitive.
This is a real scenario. I am remaining deliberately vague as to avoid picking on someone (usually not productive and likely only to result in defensiveness, which is the last thing we need as we move to a more open science) but also to point out that the process of following research from start to finish is one in which we find ourselves faced with many forking paths, and sometimes the ones we choose take us far from our intended destination (far more productive). Someone else noted that what we sometimes call p-hacking or HARKing (the latter seems to have occurred in this scenario) is akin to experimenter bias, and should be treated as such. We as researchers make a number of decisions - often outside of conscious awareness - that influence the outcome of our work. That includes the statistical side of our work as well. I like the idea as it avoids unnecessary shaming while allowing skeptics the space needed to point out potential problems that appear to have occurred. That seems healthier. As far as the real scenario above, it did not take me long once the presumed replication report was published online to realize that the findings were not actually a replication. Poking around at the data set (it was helpful the author made that available) was very crucial and what I was able to reproduce coincided with what others were also noticing before me. The bottom line was that having a set of registered protocols, the data, and the research report were all very helpful in determining that the conclusion the researcher wished to draw was apparently erroneous. It happens. In the process, I gained some insight into the paths the researcher chose to make the analysis decisions and conclusions that she/he made. The insights I was able to arrive at were hardly novel, and others have drawn similar insights prior to me.
Here's the thing I have to keep in mind going forward. My little corner of the psychological sciences is going through some pretty massive changes. It sometimes feels like walking through a virtual minefield in a video game, given the names, egos, and reputations involved. I am optimistic that if we can place our focus where it belongs - on the methodology and the data themselves - rather than focusing on the personalities, we'll wind up with the sort of open science worthy of the name. Getting there will not be easy.
In the meantime, we will still deal with reported replications that really are not replications.