Recommendation 1: Policy Question and Decision Space

Jonathan Torgovnik/Getty Images/Images of Empowerment

Evaluations should be built around policy needs and questions that matter most for development impact, focusing on directly informing policy decisions and/or building the global knowledge base.

Many evaluations accomplish both goals through nonlinear, often unpredictable, pathways of influence. But the importance of the former approach has not yet translated into widespread practice of explicitly supporting decision makers who are interested in using more evidence and have the political space to make related decisions based on that evidence. (If political space for rigorous evidence to directly inform a decision is not yet available, funders, researchers, and knowledge brokers can provide decision support through a range of appropriate and relevant methods.) To reap the practical benefits of cocreation and successfully identify and act on policy relevant research questions, evaluations should be conducted by those who understand the operating environment and relevance of the evaluation topic to specify national policy priorities, and can deploy rigorous methods to enhance research credibility and influence, i.e., delivery of both contextual linkages and high-quality research.

Evaluations should be conducted by those who understand the operating environment and relevance of the evaluation topic to specific policy priorities.

As such, impact evaluations must more regularly integrate a range of complementary analyses that address decision makers’ information needs and allow them to apply evaluation findings to real-world decisions. From the onset, more research should be designed from the decision makers’ vantage point to answer both experimental and observational questions required for successful policy implementation, considering that different types of information will likely be produced by a range of partners and methodological approaches. Policymakers often need to know that the intervention is effective across implementation models. To meet this imperative, researchers should set out—and funders should support—a theory of change that includes the baseline conditions, underlying outputs and outcomes being targeted, and implementation and delivery channels being investigated, giving policymakers clarity on the intervention’s impact among different settings and groups.

Through this approach, research proposals would outline a specific method to understanding both generalizability across contexts and, for initial evaluations conducted at a relatively small scale, scalability (i.e., the shape of the benefit and cost curves with respect to intervention size within a specific context). Assessing the scope for scaling up further may or may not involve additional large-scale impact evaluations (as those conducted by the Yale Research Initiative on Innovation and Scale), but it should at least specify what observational findings would be expected in the case of successful rollout. As part of efforts to consider the implications and inferences related to different implementation approaches and baseline groups, researchers should work with policymakers to ensure that their policy questions are informed by existing evidence drawn from systematic reviews, gap maps, and other sources. If deciding to pilot a new program, interventions found to be generally most effective through systematic reviews should be the starting point, with evaluations then designed to test and monitor those interventions in a given context.

Identifying where causal evidence is needed, where observational or qualitative information is sufficient to inform policy, and which methods will be used to understand scale and context (e.g., a randomized controlled trial, natural experiment, modeling, or observational data) requires a nuanced understanding of local policy processes and questions. For example, Piper et al. (2018) used observational data to successfully scale a national literacy program in Kenya. And in Colombia, Barrera-Osorio et al. (2022) adjusted an educational program for parents during scale-up based on implementation data.

Investments in impact evaluation should also be paired with embedded technical assistance to support evidence use through the program life cycle and implement evidence uptake plans. Like a pre-analysis plan, an evidence uptake plan maps out potential evaluation results alongside related policy responses and pathways to scale. For instance, when an HIV awareness campaign implemented by Youth Impact generated mixed results in Botswana, the government policymakers and other partners involved readily reached a consensus not to scale because they had previously discussed the possibility of negative and ambiguous results (Levy et al. 2018). While an evidence uptake plan is not meant to be binding, the process of developing the plan helps secure commitment from policymakers.

Investments in impact evaluation should also be paired with embedded technical assistance to support evidence use.

Platforms that consolidate and communicate insights from different bodies of knowledge, such as 3ie's synthesis products, are useful for policy translators providing embedded technical assistance—and for interested policymakers themselves. Locally connected experts and policy translators play a key role in garnering interest in evaluation findings among policymakers, which in turn facilitates evidence uptake.

Explicit ethical safeguards and policies are an important part of embedded approaches—and should be across all empirical research. Going forward, researchers should commit to—and funders should require—adherence to improved and more transparent ethical principles and practices (3ie 2022; Evans 2021).

Box 6. Indicative checklist for funders considering impact evaluation proposals with the aim of decision responsiveness

🔲 Does the primary demand arise from policymakers with a commitment and plan to incorporate results into decision making?

🔲 Have the researchers engaged regularly with the relevant policymakers?

🔲 If it is an embedded experiment, does it exclude any personal rewards for government officials who participate in it?

🔲 If evaluating a program at scale, has it been preceded with some sort of safety trial?

🔲 Has responsibility for compensation in the event of any harms been clarified in advance?

🔲 Does the evaluation include an assessment of cost-effectiveness?

🔲 Is bureaucratic feasibility or capacity a factor in the evaluation?

🔲 Does the evaluation precede, rather than follow, decisions to feed into program design and implementation?

🔲 Does the evaluation include parallel data collection activities to assess and improve implementation alongside headline efficacy?

🔲 Does the evaluation include qualitative work with program participants to inform how findings are translated into policy?

🔲 Does the research team include a principal investigator with deep contextual knowledge? (Not solely linked to geography; could be locally based researchers or members of diaspora.)

🔲 Does the evaluation involve capacity building or some form of knowledge transfer to local research institutions and/or policymakers?

🔲 Does the evaluation involve procurement of services from local providers?

🔲 Does the proposal reference and document relevant prior work completed by local researchers, government agencies, and nongovernmental institutions?

Recommendation 1: Related Resources on Developing and Conducting Evaluations to Support Decision Makers

Muralidharan and Niehaus (2017)

on the case for greater use of randomized experiments “at scale” and progress to date

The Goldilocks Challenge by Gugerty and Karlan (2018)

on how to create a “right-fit” evidence system that recognizes when (and when not) to measure impact

Gertler et al. (2016)’s Impact Evaluation in Practice Handbook

on how to design and implement impact evaluations

J-PAL’s generalizability framework (Bates and Glennerster 2017)

on how to combine evidence to assess policies in new contexts

Implementation handbook by Channon-Wells et al. (2020)

to be used across settings for a job-seeker skills certificate program

Recommendation 1: Design Evaluations That Start from the Policy Questions and Decision Space

Evaluations should be built around policy needs and questions that matter most for development impact, focusing on directly informing policy decisions and/or building the global knowledge base.

Muralidharan and Niehaus (2017)

The Goldilocks Challenge by Gugerty and Karlan (2018)

Gertler et al. (2016)’s Impact Evaluation in Practice Handbook

J-PAL’s generalizability framework (Bates and Glennerster 2017)

Stein et al. (2021)

Fischer et al. (2021)

Abebe et al. (2021)

Bold et al. (2018)

Implementation handbook by Channon-Wells et al. (2020)

Navigation

« Previous
Recommendations for High Returns Next »
Recommendation 2: Digital Transformation

Evaluations should be built around policy needs and questions that matter most for development impact, focusing on directly informing policy decisions and/or building the global knowledge base.

Recommendation 1: Related Resources on Developing and Conducting Evaluations to Support Decision Makers

Muralidharan and Niehaus (2017)

The Goldilocks Challenge by Gugerty and Karlan (2018)

Gertler et al. (2016)’s Impact Evaluation in Practice Handbook

J-PAL’s generalizability framework (Bates and Glennerster 2017)

Stein et al. (2021)

Fischer et al. (2021)

Abebe et al. (2021)

Bold et al. (2018)

Implementation handbook by Channon-Wells et al. (2020)

Navigation

« PreviousRecommendations for High Returns Next »Recommendation 2: Digital Transformation

« Previous
Recommendations for High Returns Next »
Recommendation 2: Digital Transformation