Facility-based data collection: a data methods bleg


This page in:

Today, I come to our readers with a request. I have a ton of experience with household and individual survey data collection. Ditto with biomarkers, assessments/tests at home, etc. However, I have less experience with facility-based data collection, especially when it is high frequency. For example, we do have a lot of data from the childcare centers in our study in Malawi, but we had to visit each facility once at each round of data collection and spend a day to collect all the facility-level data, including classroom observations, etc. What would you do if you needed high frequency data (daily, weekly, or monthly) that is a bit richer that what the facility collects themselves for their own administrative purposes that would not break the bank?

We'd like to collect data on consultations and services provided to clients at health facilities in a setting where medical records are not computerized. We could scan all facility registers on a monthly or quarterly basis, but (a) the quality of these data could vary, and (b) the data would be the bare bones of what we need for the study. Hence, we see this as our backup method of collecting administrative data...

The current idea we're entertaining is to recruit "nurse researchers" at each facility for our study, give them a tablet with a record-keeping software that would collect the data we'd like to have, train them for a day on using the tablet, and compensate the facility monthly for their time and uploading the data to a server, while providing support for problems, issues, etc. The advantage for the facility is that the data from the server can generate administrative records for them automatically, which could be shared with them quickly, reducing the need for pen and paper registries in the future. For us, daily records of client interactions would provide a huge amount of statistical power when paired with the exact dates of intervention rollout and also make the evaluation much more nimble. Of course, the downside is that any non-compliance is very costly to internal validity...

Do people have experience with recruiting facility staff for collecting data for a study rather than regularly sending (expensive) enumeration teams to collect the data? What are some things we can institute that would help? What are the pitfalls we're not seeing? The comments section is yours to contribute...Thanks in advance!


Berk Ozler

Lead Economist, Development Research Group, World Bank

Join the Conversation

Khushbu Mishra
February 26, 2018

We did an institutional level data collection for our RCT in Ghana where we collected data on bank's clients. Although the collection was not as frequent what helped us was to hold a small training in the field with a designated bank officers during which we provided them all with out template and went through each categories to explain what we were looking for. In addition, we partnered with the main chapter that overlooked each banks and partially funded one of their staff's salary who was responsible to call up these trained reps in each banks to ensure data was being filled in regularly. Then he would collect the excel sheet, combine them together into one file and send it to us. This way we were able to collect data from 14 banks with around 2500 clients. We also had a data quality person sitting in our US office from our team to ensure timely quality collection.

Berk Ozler
February 27, 2018

Hi Khushbu,
Thanks. This jives with our plans to hold a training session in each district with identified staff from all facilities. Your main chapter could be similar to our district-level partners, who could have their own incentives to make sure that the data at each facility are being collected regularly. And, of course, we can do our own checks on the server from real time uploading of data from tablets...

February 26, 2018

Share costs with Partner Organization
I was involved in an RCT project with a micro-finance. We hired our own surveyors to collect data on new clients and shared the costs of the surveyors salaries with the micro-finance institution. They agreed to pay half of the cost because they were getting new clients and good quality data. We provided the surveyor training and all the technical costs like the cell-phones and data management system. In addition to the costs, one disadvantage is that we also had to give feedback to surveyors on an ongoing basis. A benefit is that we got data every day from the cellphones, so we could do high frequency checks on the quality of the data. Another benefit is that, if you provide data that is useful for the partner organization, they could agree to share some of the costs. In order to take advantage of these benefits, you need to institute a system in place to check the data every day and on how to give feedback to your enumerators.

February 27, 2018

Whatever methods are used outside of developing an institutional approach will be costly and largely unsustainable. That is, using fieldworkers to collect the data or getting nursing staff to collect the data. The approach I would take (although we have used the fieldworker approach in doing health facility audits in SADC countries) is to establish mechanisms by which the data is collected by an identified person at the health facility and is stored realtime in a central database . The technology we use is available on Android phones and would allow health facility staff to use their own phones (or donated ones) to collect the data. The questionnaire can be provided in local languages and a number of different questionnaires provided to collect critical data. The questionnaire will need to be designed with proper checks and balances to ensure quality of data (eg printouts signed by facility head). The technology we use can allow health staff to view the central database to see their data but to also compare their situation with other nearby facilities. This is the ideal whether it is collecting education or health facility or crime statistics but it requires a commitment to the data cause & institutional approach. Maybe use this approach as a pilot to validate the value of the approach.

Berk Ozler
February 27, 2018

Thanks - this is useful. Sustainability is not something we're looking for, but this could be a proof of concept for facilities to transition from pen and paper registries to reliable tablet-based ones. I did not understand your comment about nursing staff: couldn't a nurse be the "identified staff: from the clinic if it is indeed her job to do certain consultations, anyway. The android-based software on donated tablets and real-time data are aspects we share. We are also thinking hard about how the data could be most conveniently shared back with the facility. They need summary sheets for admin purposes, but also may use the raw data for revisits, etc.
Thanks again...

February 27, 2018

I have been involved in various organizational evaluation studies requiring continuous administrative data and having students or interns embedded with special training worked well in most organizations where a good mentor was available within the organization. However you have to have clear data quality processes embedded as well. The context of where these data researchers are placed is crucial for sustainability of the model working.

Berk Ozler
February 27, 2018

Thanks - unfortunately that won't work for us as we have close to 300 facilities in about 15 districts. Will embed an RA in the study area and may have an intern in each district to oversee things, but another solution will be needed at the facility level...

February 27, 2018

I was recently involved in an RCT where we placed a NGO research officer at each site to collect data using an android tablet and CommCare mobile data collection platform app. Since we were collecting case management data (week by week) this was the best mobile platform to ensure timeliness of entry and data quality--specifically, accurate linking of records. Although electronic data collection facilitated ‘automatic administrative records which could be shared quickly', it did NOT reduce the need for pen and paper registries in our context. Due to our relatively short time in the facilities (

Berk Ozler
February 27, 2018

Thanks - very useful. As I mentioned above, we cannot embed 300 people for 6-12 months, so need another solution. Hence, gald to hear that you think training and identified nurse and incentivizing their facilities could work well. Our software is already built, tested, and piloted among nurses in the capital, so just need to retest with facilities in the study region after training and have a roaming supervisor or two to trouble shoot with tech glitches, issues, etc.

February 27, 2018

Talking about potential pitfalls, just wondering whether such in-house data collection would be (differentially) affected by the treatment itself. In other words, if aware of the treatment status, would these 'nurse-researchers' report in a different way than their counterparts, which would hurt internal validity. But, of course, it would depend on the nature of the treatment itself.

Berk Ozler
February 27, 2018

Yes, point taken. We'll have data collection for a few month before any intervention rolls out. The questions are basic (basic demographics, what the client came for, what they left with) and responses auditable (both via client follow-up and admin data). So, something we should be able to test - no fullproof, but to some satisfaction...

February 28, 2018

Great! Also, we can inform a (random) subset of these nurse-researchers (in both T & C) that their reports would be audited - to assess whether or to what extent there would be biased report (as long as the 'threat' is sufficiently credible).

Arja Dayal
February 28, 2018

Although not directly involved in recruiting facility staff, I thought of sharing a few suggestions (but mostly questions) with you here as I brainstormed on how to approach it:
1. Who and how to select these facility staff – I wonder whether different facilities based on observable characteristics can also lead to differentiated skill-sets for the person collecting data. Low-skilled nurse collects low quality data in a disadvantaged community. Should there be a standardized hiring process/test or criteria for selecting these staff then? Is there a way to randomize the data collector per facility based on qualification and experience to minimize the error rate?
2. Community pressure and bias - Even if the facility level data collectors are unbiased, if the community leaders are aware of a "potential" program coming their way, they might pressurize the "nurse researcher" in forms that research team might not be able to observe easily.
3. Language requirement for data collection and data recording – It would be useful to know whether paper format of the administrative data is always in one official language or not in the 300 facilities (with the slightest of chance). If yes, then learning from GE experience, research team can take into consideration electronic tool development (audio/language scripted) and should carefully assess whether the data collector can read, write and speak that language. Otherwise use audio recordings should be used as tested in Liberia.
4. Usually there is an accountability issue to complete data collection on time especially when an evaluation team is not monitoring properly in the field. Is there a way to overcome this by having a small back-check team to randomly rotate with 300 facilities from time to time for a shorter version of the form (random 10% per week)? Double entry can help RA to verify the information independently yet not incur high costs.
5. Burden of work for the nurse – Depending on how much these nurses are already burdened, do you think teaching a new tablet based tool might in fact increase more pressure than to assist. Or, it might divert their attention to doing data collection work more than doing the main consultations if tied to monetary incentives. In this case, age group and potentially the cohort of nurse you would like to work with should be considered, if possible.
6. Monetary Incentive - On the side, monetary compensation as you stated can be a mechanism, but once this data collection process is completed these same nurse researchers might experience an income shock and can impact the intervention itself. Especially if the payments for data collection were regular as opposed to their normal wage payments.
7. Even more, if the nurses know what data is being collected, during the intervention phase they might actively fudge data to show impact. You might also see higher level of John Henry effect.
8. Data safety, tablet security - Procedures should be put in place for data to be encrypted even if using SurveyCTO or survey solutions. There is an encrypt option for every PII data collected that can be coded. Have Tablet password protected and something missed out is the "Applock" application to lock gallery if PII pictures are to be taken. Do also create clear data security and theft protocols regarding who takes the responsibility of tablet being lost without synced data and how to retrieve that data? FYI -- If lost without syncing then data cannot be retrieved. This might help in managing moral hazards. Further, will there be internet to submit data in a timely fashion?
9. Should they still have some form of paper copy back-up to allow cross verification if at all you observe discrepancies? Otherwise, the current method proposed involves collecting primary data from the implementer with no way to proof check it. One suggestion to overcome this might be to think of variables that can be verified by back-checkers during their visits such as going to that particular HH and asking them whether they actually availed the consultation and other time-invariant questions.
10. As weird as it may sound but have you considered asking the enumeration team if they already have enumerators who are also nurses in the 300 facilities? You might be able to pull in some recruits from there as well, although a very small percentage.
Hope this is helpful and would love to learn from you on how you and your team actually proceed with planning.

March 02, 2018

During a recent data collection project we collected a few different types of high-frequency medical data from a large number of health facilities and pharmacies, including:
· Data on current stocks of certain medicine – Interviewers were asked to visit store rooms and dispensing outlets and count the number of tablets/bottles of medicine the facility had on the day of the visit.
· Historic data on stocks of medicine – We were also asked to use the stock records to collect data on every instance a medicine came into the facility or left the facility. Generally this was done by taking photos of the paper stock records and then having teams of data entrants to digitise the data from the photos. Data was collected for every instance of the medicine being received or issued, along with the date. We also recorded some data from medicine order forms and sales invoices.
· Historic data on patients – This may be the closes to the type of data you are wishing to collect. We were also asked to collect data and the diagnosis and treatments given to certain patients. In particular looking for respiratory diseases in children under 5. This involved interviewers reviewing the paper patient record books, and recording any instances when they came across certain diagnoses or prescriptions. During the pilot we found that these records were too dense to record for entire months and so we used our survey tool to select a 7-day period within each study month to collect data from. To deal with the quality issue, we also added questions to check whether there were any days missing data from this period, and selected replacement periods where necessary.
The collection of patient record data was done by providing interviewers with a paper table of treatments/diagnosis and having them tally each instance. This could then be entered into our questionnaire tool after all records had been reviewed. This data was then compared with the historic data on stocks of medicine to check the prescriptions given matched the medicines leaving the facility.
· Patient observations – We also had interviewers and medical professionals sit in on patient-doctor interactions. This was generally to check whether doctors were following the correct protocols for diagnosing and treating patients. Interviewers would record data on the questions asked by doctors, the responses given, and the overall diagnosis given.
We were fortunate in that we had cooperation with the local Ministry of Health in for this project. They had been involved in the design of the survey and in gaining approval for the activities to take place. This was very important in ensuring that the facilities and staff were welcoming of our teams and willing to help out with our requests. This also meant that we were able to frame the survey activities as part of the staff’s general job responsibilities, and so we didn’t need to provide any incentives for participation.
A couple of ethical considerations you might have with potential methods are: 1) Are these nurses potentially being pulled away from providing care and 2) What are they observing and what is their ethical obligation to intervene? 3) Would instituting this type of data collection change behaviour around data entry at the health facilities. i.e. Is there a danger that the health facilities don’t revert back to recording the administrative data on pen and paper, with broader implications for the health system?

March 07, 2018

Hi Berk,
Have you spoken to the team at the Bank (part of DIME I believe) that handled the KEPSIE project in Kenya (http://www.worldbank.org/en/programs/competitiveness-policy-impact-eval…)?
The contours of that project overlap to a good degree with what you are trying to do here: health facilities were/are surveyed on a longitudinal basis by facility inspectors rather than surveyors specifically hired and trained for data collection. They will probably have great advice on managing data collection through facility/non-enumerator staff and handling the unique incentive, training, quality, and performance problems that might arise from that.
The Clinton Health Access Initiative and JPAL North America also have projects with matching components to yours: CHAI runs a number of projects where enumerators use mobile data collection tools to digitize patient data in rural health facility registers (I did note you mentioned this was a back up option for you) and JPAL NA runs an evaluation in South Carolina of the Nurse Family Partnership program where data is collected by nurses at the clinics rather than trained enumerators. Happy to try and connect you to staff on either project if you might get useful insights from them.

March 08, 2018

Hi Faizan,
Thanks - very useful. Can you please drop me an email at my Bank email address so that we can connect? Cheers,