Published on Development Impact

High - Frequency Checks in Action: Introducing iehfc for Better Data Quality

This page in:
High - Frequency Checks in Action: Introducing iehfc for Better Data Quality

Co-authored with Maria Reyes Retana Torre, Marc-Andrea Fiorina, Roshni Khincha, Ankriti Singh, and Marina Visintini – all members of DIME Analytics

When you're working with raw data, data quality issues - like duplicate IDs, missing values, or inconsistent coding - can seriously undermine research validity. Catching these problems early, ideally in real time, is crucial for diagnosing and fixing root causes such as programming errors, enumerator behavior, or data integration issues.

Several tools already support real-time data quality checks—also known as high-frequency checks (HFCs)—including ipacheck for Stata and SurveyCTO’s built-in checks. These tools have had a strong impact, particularly for teams already using those platforms. Yet many teams struggle to implement HFCs; and some don’t run them at all, due to limited technical capacity, tight timelines, or a lack of accessible, user-friendly tools.

To better understand these challenges, DIME Analytics conducted a Needs Assessment Survey in 2023. Most respondents reported relying on custom-built scripts rather than existing tools. Common barriers included limited accessibility, high setup costs, lack of intuitive interfaces, and insufficient flexibility across different project contexts.

We built iehfc to address these challenges. iehfc is a free, open-source R package with a user-friendly Shiny interface, designed to make HFCs easy to run, adapt, and share—no knowledge of r or coding experience required.

iehfc streamlines your workflow by offering:

·       A consistent interface to standardize check setup across projects;

·       Preloaded templates for common checks like duplicates and outliers;

·       Local execution, so your data stays secure on your machine;

·       Easy export of results as downloadable tables and reports; and

·       Optional code export in R or Stata for advanced users who want more flexibility.

“Being able to run checks as soon as data collection starts is a major advantage. The outputs are clearly presented and easy to share, and the tool supports multiple levels of analysis. Overall, it’s very user-friendly.”
 — Early user feedback

Try it for Yourself

But don’t just take our word for it. Getting started with iehfc is simple—and no coding is required to use the app. All you need to do is:

1.     Install R and RStudio (both are free and open source);

2.     Install the iehfc package by following the step-by-step instructions in the documentation;

3.  Launch the app using a single line of code in RStudio: iehfc::iehfc_app()

From that point on, everything happens through the Shiny interface—you won’t need to write any R code to use the tool.

iehfc in Action

After launching the app, your first step is to load a dataset—either one you're actively collecting in the field or a new dataset you've just received. If you're just exploring, the app includes a built-in sample dataset to help you test all its features.

iehfc currently supports five main types of checks, each designed to flag common data issues early:

1. Duplicate Check

Use it to: Catch accidental repeat interviews, duplicated household IDs, syncing errors, or template copying issues early, so they can be resolved before survey rounds move forward.

How it works: This check automatically flags duplicate observations based on selected variables. You can customize which variables are displayed, to make it easier to spot issues.

2. Outlier Detection

Use it to: Spot suspicious values, like an age of 210 or income of 1,000,000, and trigger validation with the field team.

How it works: Apply statistical checks to detect extreme or implausible values using either standard deviation (SD) or interquartile range (IQR). This works for both across variables and groups of variables.

3. Enumerator Monitoring

Use it to: Spot cases where an enumerator may be rushing through interviews or submitting incomplete forms and intervene quickly.

How it works: Track data submissions by enumerator, including submission count, completion rate, and average responses to selected variables. Results are presented in tables and cumulative graphs to highlight anomalies.

4. Administrative Unit Summary

Use it to: Monitor daily or weekly progress by location and decide whether to reassign staff or flag a slowdown in a specific geography.

How it works: Summarize survey progress by location (district, province, etc.) and visualize a time series of submissions.

5. Unit of Observation Tracking

Use it to: Quickly review which households or firms have been visited, who collected the data, and whether each submission is complete.

How it works: This creates a one-row-per-observation table including any key tracking variables you select (e.g., enumerator name, submission date, module completed). This table is often the most useful for daily team check-ins.

After selecting the needed checks, outputs can be viewed inside the app, downloaded individually, or exported as a consolidated HTML report. This report can be shared with your research or field teams in real-time, helping surface data quality issues as they emerge. Teams can act on the findings immediately, making sure errors are corrected early, before they escalate into broader inconsistencies or threaten the quality of the full dataset.

Suggested workflow

Here’s a quick overview of how to use iehfc. from uploading your data to downloading your results.

Image

Why not test it out today?

Our goal with iehfc is to help teams stop reinventing the wheel every time they set up HFCs, and to make these checks a standard, seamless part of any data collection process. Give it a try and let us know how it works for your team. We welcome feedback, ideas, and bug reports on our GitHub page.

Let’s build better data together.


Maria Jones

Survey Specialist, Development Research Group (DIME), World Bank

Join the Conversation

The content of this field is kept private and will not be shown publicly
Remaining characters: 1000