Cleaning The Data Shit

Since the past one month, I have been struggling with data. All my previous experience has been in scientific research and the data generated was easy to handle. But my research background was not enough for handling this.

On top of it, we didn’t have SPSS or STATA. All the data had to be cleaned and analyzed on Excel. Although I knew a bit of Excel, I didn’t know all of it. People kept saying that there is nothing that can’t be analyzed using Excel and that it is the best. So, I browsed Youtube for relevant videos and courses that would help me convert numbers and text into visual graphs. I figured out how to sort my data but was unable to use the pivot table function. I had found an ineffective way of calculating frequencies which involved sorting data and manually noting the records into another Excel file. It was very tedious. I was hoping I would find a simpler way. I demonstrated my way of analyzing data to people, but nobody could suggest a better way.

One good thing that happened upon seeking help is that I realized that some of my data had to be analyzed using another software called WHO Anthro. This software would not only tell me which children in our study are malnourished based on their weight-for-age, weight-for-height, height-for-age and BMI but also plot the graph to indicate the shift of the population from the standard WHO graph. Another caveat – apparently this software takes the excel data in a dbf format only. I tried finding another software that could convert my excel sheets into dbf format. All the free (cracked) versions could convert a few cases only. I needed all the records to be imported so that the software could plot the graph. Finally, I found a person who had SPSS. I then was able to convert the files to dbf and import my data into WHO Anthro. The data was analyzed (partly) and I could tell which children, in our survey, were severely malnourished and needed continuous monitoring. Yet, only a part of the job was done. But I was glad I could manage to analyze a little, before I left for home.

I went home for 2 weeks passing all the relevant files hoping all the data would get magically analyzed. But alas, when I was back the data was still at the same stage; staring right back at me. My friends told me that people had started criticizing – Why is she not on the field? It doesn’t take such a long time to analyze data? Is she capable of analyzing? I made up my mind to dodge these criticisms and focus on getting things done.

Besides all this, I had 4-day software training by Akvo. This organization has developed an app for paper-free surveys. I wanted to skip it because of all the negativity. I knew I would take long to analyze and would have to work extra hours, if I attended the training. Inspite of that, I was keen to learn. I decided I will work after hours and attended the training.

The software training turned out to be amazing. Using this technology, the entries filled in the app could be transmitted onto a dashboard which allows the supervisor to access data almost immediately. In addition to getting data through the click of a button, the dashboard allows quick analysis of the data providing comprehensive statistics and graphs. What this essentially means is that – there is no need of data entry operators. After having the Akvo experience, I wished we had the software a few months in advance. I approached our trainer, Joy, to ask him whether I could transfer my excel sheets into their software. He said it is possible. A ray of hope. I then asked him the process. He said I would have to make the questionnaire in the software first. Making the questionnaire in the software meant a lot of time. All my hopes were crushed.

One of my chief reasons for writing this blog is to help those who are in a similar data mess as me.  So, like I mentioned, I was attending training in the morning and analyzing data in the evening. On the last day of the training, Joy was giving us information on some of the tools that could be used for creating Maps. He took a 45 minute workshop to help us design maps. With cartodb, almost anybody can develop maps if you have the geo-locations. You can also design interactive maps using this tool that will show the scenario over time.

While I was learning this, I had made up my mind to meet Joy after the session and get information about other tools that exist, which will help me analyze my data quicker. When I said I was cleaning and analyzing data using Excel. He laughed and said it was 19th century; archaic. He then suggested I use Google Open Refine and Google Fusion Tables for quick and easy data analysis.

That night I checked these two tools. I did browse google fusion tables a little but did not explore much. But what google fusion tables essentially does is, it lets the user select the variables in the cleaned data and generate graphs (Supposedly Google Fusion Tables is faster than Excel, I haven’t explored it because of time crunch, but will definitely try it out at a later point). Google open refine is extremely easy and user-friendly to clean huge data sets. I recommend it highly for those who do not have much experience cleaning huge sets of data. Once I cleaned the data, I figured the pivot table function. So, now I am comfortable and hope to finish analyzing my data soon.

After this session, I was glad that I did not miss this workshop due to the stress I was in. Sometimes in life, we miss out on so many things to handle the stresses in our life, not realizing that those very things might be a blessing in disguise. As the saying goes, everything happens for a reason and for the good.


2 thoughts on “Cleaning The Data Shit

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s