This post is part of the Q&A Series with the Data Ambassadors from DataDive2013. You can also read an interview with the fraud and corruption data ambassadors , a recap of Data Dive 2013 , and watch the presentations  from the weekend.
Photo credit: Itir Sonuparlak
During DataDive 2013, each project had an assigned data ambassador, a leader to guide and direct the research and efforts of the teams. In the days following the DataDive, we spoke with two of the data ambassadors from the fraud and corruption related projects to learn more about their experiences. Read their responses below and join the conversation in our comments section.
- Taimur Sajid develops financial models to asses risk for a financial firm and acted as a data ambassador during the DataDive.
- Marc Maxson is an Innovation Consultant with Global Giving and brought his Heuristic Auditing Tool  to the DataDive.
What were your expectations going into DataDive 2013?
TS: I had some expectations from the preparation for the DataDive. My expectations weren’t just met, but they were exceeded by the scale of ideas that people delivered. So, for me, my goal was to take the pages where the World Bank has data, which is unstructured, and to develop a method to extract that data. I didn’t have enough time to work on it while I was at the DataDive, but someone else did. They went above and beyond. They were able to accomplish what would have been unachievable any other way. Peolple even found the history of the debarred firms and they showed that against a general population of contractors. There were many methods that I hadn't even considered.
MM: Well, I’ve seen the outcomes from other Hackathons, so my expectations were that the output would be very similar to what I’ve seen done elsewhere. For example, when the earthquake hit Japan, a Google Hackathon created a tool that allowed you to look up people without having entire phone lines. You type in the name of a person and it’ll figure out whether that person is doing well, which is an incredibly useful tool. The other one was during the Kenya Hackathon, I was brought in to help talk to this giant NGO on taking the tool that they asked to be built, but hadn’t actually implemented on the drought in the Horn of Africa. Nothing came of that, so my expectations were lower coming in, because I had seen that the bigger the organization, the less likely they are to take advantage of the free thinking and labor that has gone into a problem that they identified.
What kind of data challenges did you face throughout the weekend and what data challenges do you face generally?
TS: The World Bank has some data that is nicely structured with built API’s that are easily accessed. They also have a lot of data that isn’t digitized or in structured formats that need to be scraped from text. That's always a challenge.
MM: The big problem for Global Giving and other tech non-profits is data curation. I think the World Bank is typical of many large organizations with a lot of data. They think that the structure of the data is what gives it value. But it’s actually the ability of data to be exchanged that gives it value. All of the cool stuff happening right now is people taking a bunch of data, removing the structure and restructuring it in ways that other people can translate it, and this is what I call data curation. I think there is a lack of data curation, and too much emphasis on data structuring within an organization and not across communities. The second largest thing I deal with is data visualization. For example, Global Giving sits on top of a huge amount of information about non-profits. We have web forms, nomination forms, due diligence, project reports. All of these are very similar to the data that the World Bank has. Luckily, they are all in a single database. But just because there is data there, doesn’t mean you can tell quickly, if that data is telling you anything. So, I built a little tool that can take any unstructured data and turn it into histograms. This is great because you can quickly look at data and decide if you should start digging down and comparing averages. So that is a useful visual step. It doesn’t require statistics. It just requires better user interface, which is why I call visualization a challenge.
Data ambassadors wrapping up at DataDive2013. Photo Credit: Carlos Teodoro Linares Carvalho.
What’s your data wish list from the World Bank?
TS: The World Bank keeps a list of debarred firms, but that’s a list of currently debarred firms. Once their debarred period is over, they are taken off this list, and there is no archive. It would be awesome if the World Bank can provide that archival information. So basically a history of every firm that has been debarred. The other thing would be the bidding data. There is data on the project and operation website that shows procurement awards, which shows who were the bidders and how much were their quotes. The World Bank produces many documents for their projects. I would love to use some form of Natural Language Processing to see if we can develop any early-warning indicators to anticipate fraud and corruption. It would also be great to get data from sites that provide corporate profiles. That would enable us to see if there are any correlations and relationships.
MM: More accessibility in real terms and they should be using Google Analytics to look at who is using their data. They should be judging themselves on the data scale by looking at what their analytics say about how much data people use and not by how much data they push out.
What is the one skill you found the most necessary at the DataDive?
TS: Probably patience and the ability to not get dismayed by setbacks. The ability to push through slowly and work towards your goal. If you get discouraged easily, you probably won't have a fun time at the DataDive, or enjoy the challenges.
MM: Curiousity is the most important trait. Skill is not so important. The people I worked with didn’t really have the kind of skills that would solve the problem in one day, but because I did this for myself and I’m trying to push out an open tool, I don’t have a deadline. It’s nice to have a larger community of people who are interested and are helping to think about the problem at different levels. That doesn’t require a skill set. It requires curiosity.
What would you recommend for those who want to get involved in similar events?
TS: It used to be that data analysts and statisticians would work in a dark room, but now everyone wants to get out. There are so many user groups and meet-ups. Washington, D.C., for example, has a huge community that meets to discuss neat topics within the field. So that would be a great entry to get to know people and learn the tools, and just try your hand at some Hackathons and data.
MM: There are four things that every intervention has to do. You have to do at least three out of four for it to have traction: (1) People have to care about the cause, (2) It has to be easy to get involved, (3) They have to feel like there is agency in the system, meaning that by participating something is going to change, and (4) Even if there isn’t agency, a person has to feel like they are being heard or the system has somehow acknowledged that they exist and have worth. There are many different version of democratizing the process, because there are lots of democracy like effects, like voting, which has very little agency but still allow people to feel like they have been heard, because they had a chance to speak out. So if you put this type of intervention through the lens, you’ll see that the agency is where we have the most work to do to improve. And that means demonstrating to people who came to past Data Dive or any kind of Hackathon events that something has changed in the system as a result of the work that they did.