1

Pre-analysis plans: Why and how to use them

Introduction

This module describes what is a pre-analysis plan (PAP) and why you should use one. We emphasize the potential political uses of PAPs and, in particular, how the PAP is in this respect a uniquely powerful tool for increasing the likelihood that evidence informs policymaking.

What is a Pre-Analysis Plan?

A pre-analysis plan (PAP) is a document describing how a research project will be conducted, written before data is collected or analyzed. There is an emphasis on explaining what questions will be asked and how data will be collected and analyzed to answer those questions. The "registration" of a PAP involves publishing the document, with a timestamp, into a public location where it cannot be further edited. A registered PAP is therefore a transparent record of what a researcher believed before conducting a study and how the researcher intended to update their beliefs with data.

There is substantial variation in how PAPs are written. There may be dozens of pages. Maybe only one page or even a few sentences. The description may (or may not) include literature reviews, hypotheses statements, equations, mock figures and tables, code, data simulations. People have offered templates, checklists, and guidelines in an attempt to standardize—or at least set minimal standards for—the content and level of detail within a PAP. But ultimately the researcher must use judgment to decide how much detail to include in a PAP, given the context and aims of the study.

Why use a Pre-Analysis Plan?

There are three potential benefits to using a PAP:

Depending on which uses are being pursued and to what degree, more or less detail will be required in the PAP.

PAPs enhance research integrity

The first and foremost benefit—and the most common reason why PAPs are becoming a standard practice throughout the academic community—is that PAPs enhance research integrity. In particular, the publicly registered PAP is a strategy for hedging against risks of p-hacking, HARKing, and publication bias.

P-Hacking

In the course of a study, a researcher will make hundreds of decisions regarding the design of data collection and how those collected data will be analyzed and reported. These decisions can substantially affect what results are uncovered and shared. For example, in considering whether the U.S. economy is affected by whether Republicans or Democrats are in office, decisions need to be made about how to operationalize economic performance (employment, inflation, GDP, etc?), which politicians to focus on (presidents, governors, senators, representatives, etc?), which years to examine, whether to entertain exclusions (e.g. ignore recessions?), whether models should be linear or nonlinear, and so forth. To p-hack would be to try combinations of those decisions until “statistically significant” results surface. This could happen intentionally or, much more commonly, unintentionally. The website FiveThirtyEight provides an interactive tool to build your p-hacking intuitions. Visit [https://fivethirtyeight.com/features/science-isnt-broken/](Science isn't Broken) (or search "Aschwanden Science Isn't Broken"). Toggle values on the “Hack Your Way to Scientific Glory” applet (it's in the middle of the article), to experience first hand how, depending on your choices, you can reach literally any conclusion about the impact of political party on the U.S. economy.

The PAP hedges against p-hacking by forcing researchers to make these methodological choices in advance, based on criteria such as theory or statistical best practice, rather than being lured into jiggling choices until a desired result is achieved.

HARKing

To HARK is to “Hypothesize After the Results are Known.” HARKing happens when a researcher presents post hoc hypotheses in a research report as if they were, in fact, a priori hypotheses. In other words, a result gets framed as predicted by theory when, in fact, the result was not expected given the beliefs held before the study was conducted; it is only upon seeing the results that the researcher updates their beliefs and develops a new theory-driven hypothesis that is consistent with the result.

The updating of beliefs is not the problem—quite to the contrary, if properly done that is the very essence of scientific progress. The problem is how HARKing conceals and distorts the belief updating process. HARKing is alchemy that presents exploratory results as if confirmatory. This sleight of hand is misleading for a variety of reasons. For example, HARKing violates the principle of disconfirmability: if a hypothesis is handcrafted to match already observed data, then there is no opportunity for a hypothesis to be disconfirmed by the study. And it is actually disconfirmed hypotheses, not confirmed hypotheses, that most efficiently winnows the field of competing ideas and advances our understanding. Consider also that HARKing disregards information: prior beliefs based on theory are ignored, and the hypothesis is instead constructed on the sand of currently observed data and cherry-picked rationales.

The PAP prevents HARKing by keeping clear which hypotheses were predicted in advance versus which hypotheses were generated on the basis of new results.

Publication Bias

Researchers are more likely to write up—and journals are more likely to publish—results that are statistically significant, even holding constant the importance of the question and the quality of methods. One study found that research with statistically significant results had a forty percentage point higher probability of being published than if results were non-significant. Such selective reporting leads to bias in the academic literature. Positive findings become overrepresented. Null or inconclusive findings, in contrast, become underrepresented, condemned to the researcher’s personal file drawer rather than shared with the community. When this happens, any review or meta-analysis of the literature is misleading. Zero or contradictory effect sizes are effectively censored, leaving only the positive and largest effect sizes in print—and thus false positives are more likely and effect sizes are overestimated. A job training program with two positive evaluations might seem effective; but less so, when it is uncovered that ten other evaluations, never published, failed to find any benefits or perhaps even found negative side-effects.

To correct publication bias, all results must be openly available, so that researchers can potentially summarize the entire body of findings.

PAPs prompt project management best practices.

The second benefit is mundane but important all the same. It may be the most immediate benefit you feel by adopting PAP practices. The documentation inherent to a PAP fosters project management best practices. To properly write out a methodology, the team has to plan for a wide variety of details. To explain how randomization will happen, for example, you must determine and map out a suite of implementation details—how exactly will the intervention be delivered and to whom and by whom and when and for how long? In mocking up a data visualization, you are forced to think clearly about what data is needed to create that figure. And so on. You are forced to conduct a sort of “pre-mortem,” considering what implementation or interpretation challenges might derail the project. And that, in turn, empowers you to manage against those challenges from the onset. By documenting all of these project management details, you also increase communication across the research team as well as build resiliency against staff turnover. Any new team member can be handed the PAP during onboarding to the project.

Note that the PAP process should not actually create any additional work. A PAP should, instead, alter when work happens, namely, sooner rather than later. The only way to avoid the PAP work is a naughty one: to plan (even if implicitly) not to write up details if you fail to uncover statistically significant results that advance your theorizing.

PAPs can be leveraged to facilitate political decision-making.

Despite slogans to “Follow the Science,” facts alone cannot determine any decision. The reason is that science inevitably involves value judgments, which are created by processes other than measuring and counting. There are necessary value judgments, for example, in deciding what constitutes a meaningful effect size and how much uncertainty should be tolerated in the estimate of that effect size. Resolving these decisions cannot be done on technical grounds. There is technical skill involved in the calculations—there are correct and incorrect ways to calculate a confidence interval or a p value, for instance—but subjective opinions always enter when considering whether an impact is big enough, how to balance the risks of a false positive versus a false negative, whether to focus on mean or distributional effects, how to consider the opportunity costs of spending scarce resources on X rather than Y, and so on.

Scientists often make these value judgments entirely by themselves, either deliberately or by default in following a convention, such as setting p < 0.05 as the threshold for “statistical significance.” In our experience, this is frequently the source of frustration with stakeholders and the lay public. For example, empirical data can be marshaled to estimate how much mask-wearing reduces the transmission of COVID-19. But to step further into a decision about whether people should wear masks is to enter a realm of subjective judgment: the estimated benefits of reducing the risk of transmission must be weighed against the downsides of requiring people to purchase and cover their faces with masks, with added considerations for how to manage the risk of misestimating either side of the ledger.

The PAP is a vehicle to think clearly about what are the technical judgments and what are the value judgments, and then to facilitate discussions on both fronts from the appropriate parties. For the technical components—for example, peer review of whether the randomization scheme was robust or double checking statistical code—feedback from other experts is usually most fitting. But for the value components, it is usually the case that feedback is needed from the community affected by the research, either directly or via representatives who are making decisions on their behalf.

Consider the PAP used in an evaluation of the D.C. police department’s body-worn camera program. You can view it at https://osf.io/hpmrt/. Police officers were randomly assigned to wear a body-camera or not (this was a randomized controlled trial, or RCT), empowering the estimation of how much (if at all) body-cameras reduced uses of force by way of comparing the group of officers with cameras against the group of officers without cameras. A key question was how long to run the study. From a technical standpoint, the more months of a treatment and a control group, the more precise the estimate will become. But how many months is enough? That’s a political judgment. It requires assessments such as: how big of a reduction in use of force would be meaningful in policy terms; how certain do we need to be about that effect size estimate; how much are you willing to pay (in added research costs) to achieve a given precision of estimate; how much downside is there to a false positive or a false negative; and so on. The research team held over ten public events—at schools, in libraries, and beyond—taking pains to explain concepts such as randomization, effect size coefficients, and confidence intervals, so that the community could then have a robust discussion about how big of an effect size would be meaningful to them. The PAP was key to facilitating these discussions.

The registration of a PAP is uniquely helpful in an additional way. There is a tendency for people—especially when busy, which is essentially always the case for practitioners—to carefully review documents only when absolutely necessary. It is common for drafts of reports to be skimmed but not fully engaged. This can lead to the frustrating situation where a document is shared and everyone thinks they agree on its contents, only to later discover—when about to really publish it publically and so everyone finally really reads the thing—that disagreements or objections linger. In our experience, the fact that a PAP will be registered—it will be public and uneditable at that point—is an excellent catalyst for engaging a partner’s full attention sooner rather than later.

Managing a partner’s full attention may feel like an added burden. It can slow down the launch of a project because extra time may be needed to clarify questions or negotiate points of debate. But we submit that the advance time is well spent. The basic reason is that the time will eventually be spent anyway: if not in advance, then after the fact while clearing up confusions about what was done. Indeed, dealing with the consequences of the misunderstanding is usually more complicated than avoiding the misunderstanding in the first place. At the extreme, a partner may want you to redo the work entirely.

Other FAQs about PAPs

Q1. Do PAPs restrict exploratory research?

No, absolutely not. Although PAPs are commonly applied for null hypothesis testing (where problems of p-hacking fester), there is nothing about the underlying concept—making transparent your beliefs and intentions before data collection—that is inconsistent with exploratory research. A 100% exploratory PAP could literally just say, “This study is exploratory; there are no predictions and every permutation of data analytics will be attempted and reported.” Notice how this simple PAP hedges against HARKing (no hypothesis at all!), alerts the reader of the many attempted statistical tests (and thus vigilance is needed to calibrate uncertainty estimates based on family-wise error rates, to mitigate false positives from p-hacking), and alleviates publication bias by creating a public record.

Q2. Can I deviate from the PAP?

A. Yes, of course. Just be transparent. Insights surfaced during unanticipated, exploratory analyses are the source of many scientific breakthroughs. Not to mention deviations are often practically necessary if the intervention was implemented differently than planned. The key is that PAPs empower everyone to keep clear on what was predicted versus what was learned through exploration. Register a new version of the PAP if you update before beginning analyses. If after, simply note in your write-up what was planned versus what was not planned.

Q3. Is the PAP process different from community engagement?

A. Yes. Any PAP that leans into the political uses must entail community engagement; but community engagement (broadly defined) need not and usually does not entail a PAP. Even when researchers publicly discuss their work with stakeholders, it is relatively rare to facilitate a discussion of value judgments and then to publically register those agreements.

Q4. Do PAPs have to be made public while a study is ongoing?

No. PAPs can be embargoed to have their contents hidden for a specified amount of time. What matters is that the date of their registration be trustworthy to readers.

Next Up
Project Portal toolkit

How to ask the right questions and connect with research teams to answer them.