Examining Outstanding Challenges

Adobe Stock

Adobe Stock

Decisionmakers within governments, bilateral aid agencies, multilateral organizations, and NGOs have not yet fully realized the value of evidence from impact evaluation in improving public policies.

Despite significant progress over the past two decades, impact evaluation has not gained widespread traction for policymaking. While some funders have embraced impact evaluation, the international development community continues to underutilize it as a tool to drive significant value in important policy decisions. Likewise, investments in impact evaluation have not yet resulted in the full potential of evidence use in consequential public policy decision making at both the global and national levels. This inability to make the most of evidence has left social and economic gains on the table. Some posit that limited evidence use is due to shortcomings of the knowledge generation process (Dissanayake, forthcoming). 

Document icon

Twitter threads by Jessica Leight (2022a, 2022b) on existing literature on how policymakers use and respond to evidence

Read more

The working group identified three persistent challenges related to the demand, supply, and funding of impact evaluations:

1. On the demand side, impact evaluations may lack relevance to public policy decisions, and may fail to respond to the priorities, interests, timelines, and questions of decisionmakers.

Researchers design impact evaluation studies to isolate and identify the attributable impact of a specific intervention on outcomes of interest. Unless intentional efforts are made, many impact evaluations overlook the political economy of reform in different settings and contextual factors such as service quality and implementation capacity that influence the relationship between the program and its results (Al-Ubaydli et al. 2019). Whether scaling a pilot intervention, adjusting a widespread program, or introducing a new innovation, complementary analyses on context, cost structure, implementation feasibility, equity, and political economy matter for policy impact. 

Document icon

The Voltage Effect: How to Make Good Ideas Great and Great Ideas Scale by John List (2022) on real-world examples of pitfalls and solutions in scaling

Read more

Impact evaluations often start too late or last too long to influence future policy decisions. In some cases, this results in missing windows of political opportunity. Too often, evaluations follow decisions rather than precede them. Results from past impact evaluations are often not readily available to inform real-time decisions, and impact evaluations are rarely designed and implemented to address known questions and inform expected future decisions. Though some governments choose to scale interventions based on impact evaluations findings (as shown by IPA’s embedded labs, DIME’s government clients, and IDinsight’s government partners, for example), evaluation funding and implementation could do more to be decision responsive. And while some evaluations involve ongoing engagement and sharing of preliminary results with program implementers, results are often shared with implementers much later, sometimes over a year after fieldwork ends.

Like all empirical research, policy responsive impact evaluations and related efforts carry risks related to conflicts of interest and other ethical considerations (Evans 2021). While 3ie’s (2022) Transparent, Reproducible, and Ethical Evidence (TREE) policy provides tools and principles, more work is needed to translate these policies into consistent research practice beyond 3ie’s own studies. 

2. On the supply side, evidence users lack the required institutional incentives and funding to generate and act on relevant evidence.

Relative to other forms of evaluation and research, policymakers and researchers may not have sufficient funding to generate and act on impact evaluations; the share of public and aid spending that is rigorously evaluated remains small (Manning et al. 2020). And in many sectors, the availability of evidence does not relate to the biggest areas of expenditure. In other words, areas that donors spend most on are not proportionately evaluated (Gaarder 2020).

One key factor in multilateral and bilateral development institutions is the lack of institutional incentives, consistent signals, and role modeling from leadership on the importance of learning and evidence use. Professional success is still too often measured by project approval and disbursements, as opposed to learning from, acting on, and sharing evidence. This phenomenon is reflected in the limited interest in, and capacity for, evidence synthesis and communication to act on existing evidence, despite new synthesis tools and approaches such as Evidence in Governance and Politics’ Metaketa Initiative (which commissions numerous studies on similar questions in different contexts), MCC’s evaluation briefs, VoxDev’s wiki-inspired literature reviews, 3ie’s evidence gap maps and systematic review summaries, and J-PAL’s policy insights.

Even when evidence generation is prioritized, decision makers may overlook the methods that are most appropriate and relevant to answering specific policy questions. For instance, some performance evaluations seek to answer questions that are methodologically better suited for an impact evaluation to answer and thus may generate misleading results. Ten percent of USAID’s evaluation portfolio consists of impact evaluations (Steiger et al. 2021), yet many evaluations commissioned or conducted reflect a mismatch between the evaluation questions to answer and the methods used, highlighting the importance of using appropriate analytical methods to address policy questions of interest. 

Box 3. Impact evaluations are still relatively rare

Of 2,800 evaluations commissioned by CONEVAL in Mexico, only 11 are impact evaluations (Manning et al. 2020).

In health, less than 10 percent of evaluations conducted directly by major development agencies are impact evaluations (Raifman et al. 2018).

10 percent of USAID’s evaluations are impact evaluations (Steiger et al. 2021).

3. Current funding models contribute to misaligned incentives between policymaker needs and academic researchers.

Academic incentives help motivate valuable knowledge production, underpinned by peer review processes, in the public domain. Yet the norms and structures that drive academic research can also limit policy relevance and use. Academic researchers typically have few professional incentives to conduct complementary analyses of costs, equity, implementation capacity, and other contextual factors, in part because peer-reviewed academic journals are generally not designed to assess or reward them. New approaches are needed—not to replace existing rigor and identification standards in academia but to complement them with research that directly responds to near- and medium-term decision-making needs and fills information gaps along the entire causal chain, including observational and qualitative data on implementation. Causal chains are context specific, yet ways of designing and conducting impact evaluation continue to lack substantive engagement with local policy processes and rhetoric. Efforts to build equitable, trust-based evidence-to-policy partnerships—a key enabler for policy-relevant analyses and discussions to answer questions that evolve over time—remain a work in progress, in part due to limited institutional funding (Buteau et al. 2020; Taddese 2021).