Syndicate content

Survey specialists and data scientists meet: machine learning to measure a person’s height from a picture.

Michael M. Lokshin's picture
A test subject holding a reference image and a silhouette derived from the photo by Tensorflow/DeepLab semantic image segmentation model.

Human body measurements are used to evaluate health trends in various populations. We wanted a simple way to reliably measure someone’s height as part a field interview, using a photo of them holding a reference object. We’ve developed an approach and would highlight two things we learned during the process:

  • With an iteratively refined method, it’s possible to get a measure of someone’s height accurate to 1% from a well-composed image of them holding a calibrated paper printout. We plan to integrate this functionality in to the free World Bank Survey Solutions CAPI tool.

  • We found working with an in-house team of survey specialists and data scientists the best way to tackle this problem. It’s only when we combined our domain knowledge and field experience with our data science skills and a healthy dose of creative problem solving, were we able to develop a working prototype.

The challenge: measuring height and reducing the cost of data collection

Our goal was to measure a respondent’s height using a photo taken during interviews in the low and middle-income countries where we typically work. To accomplish this, we initially wanted to use a standardized, cheap, and widely-available object, for example a plastic bottle, as a reference for calculation. A photo would be taken of the respondent holding a bottle close to his torso during the interview and then use machine learning (ML) to calculate the respondent’s height from the picture and the known size of the reference bottle.

This research is a part of the larger agenda aimed at reducing the cost of data. Data from surveys of households and individuals is critical for monitoring progress of poverty reduction, improvements in living standards, and economic growth. But household surveys involving face-to-face interviews are expensive; a nationally representative survey in low and middle-income countries on average costs about $2.5M, or $200 per household. Because of these high costs, 78 countries are categorized by the World Bank as data deprived defined as not having at least one household survey over the last five years.

Human body measurements are used to evaluate health trends in various populations. For example, the deviation of a child’s anthropometric measurements from an ideal standard might indicate presence of chronic diseases (if a child is short for his age) or recent illnesses (if a child’s weight is low for her age). These measurements can also be used to determine the prevalence of undernutrition and evaluate the need for nutritional support. Anthropometric measurements conducted in the field are time consuming exercises that require special equipment. Since most modern surveys are being conducted on tablets where cameras are readily available, being able to assess respondents’ height (and weight) from a photo could help reduce both the total costs of surveys and the time a survey takes.

Working with an in-house team of survey specialists and data scientists

We approached three leading companies with Machine Learning (ML) offerings, for assistance with our problem. Based on initial discussions, all three informed us that they could not offer any reliable technology or approach to help us. Despite this feedback, we decided to proceed on our own.

Our initial attempts to apply standard ML tools to measure human height confirmed the pessimistic assessments of the external ML experts. We failed to develop an algorithm that reliably identifies the outline of a plastic bottle or a can of soda. The image recognition of a bottle is not reliable because of multiple reflections and non-standard labels. Additionally, people might hold the bottle at different distances from their bodies which would distort the measurement.

Our first breakthrough came with the realization that we could replace a 3-D object with a picture of a checkerboard. It is fully acceptable for our requirements: we can give interviewers a piece of paper with a printed image that he will hand to a respondent to take a picture. A printed image is cheap, robust to various field environments, and easily replaceable. A high-contrast checkerboard is ideal for ML recognition because many algorithms are trained specifically to recognize that type of image. We then calculated the height of a person by comparing the dimensions (in pixels) of an object of a known size with the dimensions of a human body (Figure 1). Immediately, our measurements became more precise, but still far from our acceptance threshold for error of +/-2 cm.  

Figure 1: A test subject holding the sample image. Red lines on the checkerboard are superimposed by ML algorithm that identified corners of the squares. An image on the left shows the silhouette derived from the photo by Tensorflow/DeepLab semantic image segmentation model.

We identified the two main problems that affected the precision of height measurements. First, as can be seen from the left picture, the red rectangle that bounds the silhouette included feet. The 2-D projection of the length of the feet adds to the vertical dimension of an image. Second, test subjects are holding the piece of paper at different distances from their bodies and distorting the measurement.

The solution for both problems was to take photos from the side, as shown on Figure 2. This automatically eliminated errors related to protruding feet. Now, the ML algorithm bounds the silhouette from the heels up. Because the test subjects are holding their arms relaxed along the sides of their body, the paper is at a constant distance from the body. These adjustments brought accuracy to almost acceptable level of 3-5% error.

Further improvements were achieved through finding an optimal distance that the test subject should be from the camera. Being too close produces lens “barrel” distortion that affects the precision. Images made from a distance also have reduced precision because the dimensions of the test subject in the photo are measured in pixels so the further the object is, the larger area the area that is covered by a single pixel. To alleviate these issues, we provide guiding grid lines on the data collection app (Figure 3) to help the user take a photo at an optimal distance. The subject should be positioned in a cameral viewfinder so that his height is larger than the smaller yellow rectangle and smaller than the larger rectangle.

Figure 2: Site shot with the checkerboard sheet with segmented silhouette by DeepLab

As a result, we achieved good precision in measuring human heights from photos taken by tablet’s built-in camera. In our trials, when a person was positioned at an optimal distance, the measurement error did not exceed 1% or about 2cm for adults.

Next steps

We plan to integrate this technology into the free Survey Solutions CAPI/CAWI platform for data collection developed by the World Bank. Survey Solutions is currently used by 120 countries and we will have ample opportunities to test the application in the field. We will conduct a series of pilots to assemble large enough training set to further boost the accuracy of our algorithm, specifically to adjust for the distortion related to a hair style. We plan to validate our method for children, where the accuracy of measurement is more important.  Additionally, we want to expand our methodology to measure body parts and, potentially, estimate weight from a series of pictures.

What we learned about how and when to collaborate with external experts

An important lesson from this exercise is that the solution of an ML problem was found outside of ML space. The adjustments that helped solving our problem were informed by our domain knowledge, common sense, and trial and error approach. In other words, the solutions could not be found without understanding the nature of the process and the environment we operate in.

Figure 3: Tablet app with guidelines for optimal camera distance

Our experience highlights inefficiencies in working with external experts. It might be problematic to generalize our conclusions based on our limited interactions with the experts, but in all three cases we received no constructive advice on how to approach the problem we wanted to solve. Why is that? Partly, we attribute that to the incentive structure. Private companies are motivated to sell their products and services so in every meeting the experts were promoting their AI and ML tools. Because most of these tools are well-documented in many resources online, we already knew about them. In a low probability scenario when our requests matched exactly some undocumented functionality offered by these standard tools, we would have benefited from the expert advice. The expert advice would also have been useful while fine tuning our segmentation model and selecting the optimal configuration of the ML algorithm. However, we were not ready for that since we had just started the project.

Another difficulty in working with external experts is the time and high costs of these interactions. The solution we found required multiple, often informal and unscheduled, iterations and discussions that were possible among colleagues, but would have been difficult to organize and very expensive with external experts who charge fees of $300-400 per hour.

For us, the optimal approach to developing AI/ML tools and algorithms is to build an internal team of subject matter experts and data scientists to formulate the business requirements for the problem at hand. Existing tools should be researched and identified and the team should attempt to build a proof of concept (PoC) model to understand the deficiencies of existing technologies. Only at this point when the solution is more developed might it be useful to seek the advice of external experts either to fine tune the PoC model or how to resolve the bottlenecks discovered during the PoC building process. 

Add new comment