Working with Sinar's Popit API with Pandas

Working with Popolo data and exporting them as CSV

This post is also available as downloadable jupyter python notebook

In this post, we'll demonstrate the simplicity of working with Sinar's Popit API with Pandas, which is an open source BSD-liscenced library for data analysis. We will also demonstrate how to easily convert API response objects into flattened dataframes, which can be then converted to more convenient and familiar formats such as spreadsheets for collaborators to work with without programming knowledge.

First, let's make an API request to Popit for all memberships under the DAP organisation.

DAP_id = '53631a3619ee29270d8a9e90'
r = requests.get('http://api.popit.sinarproject.org/en/organizations/'+ DAP_id).json()['result']
DAP_memberships = r['memberships']

API response objects are commonly returned in the form of a nested dictionary object, which may be difficult to work with. Nested objects are commonly met with nested for-loops, which are messy and tedious to read through, while attempting to flatten nested dictionaries would require writing a function to recursively traverse through the dictionary tree to extract all its keys.

One of the fast and efficient data structures provided by the Pandas library is the DataFrame. DataFrames are tabular data structures similar to SQL tables or spreadsheets, and come with a variety of convenient method for data munging and analysis. We can use Pandas's from_dict method to turn our dictionary object into a Pandas dataframe, with the keys of the response payload as the columns for our new dataframe.

df = pd.DataFrame.from_dict(DAP_memberships, orient = 'columns')
df.head()

Notice that some of the dataframe columns contain dictionary values, due to the nested nature of the API response. Let's try to expand the contents of the “person” column. Let's first extract the contents of the “person” column and convert it into a new DataFrame using the same from_dict method that we introduced earlier.

personsdf = pd.DataFrame.from_dict(df['person'].to_dict(), orient = 'index')
personsdf.head()

Finally we'll join the Persons dataframe with the original, creating a flat representation of our original API response!

#add 'person' suffix to column names of personsdf
personsdf.columns = ['person_'+colName for colName in personsdf.columns]
df =  df.join(personsdf, how = 'outer')

We'll now be able to utilise Pandas's convenient methods to work with our flattened data. For instance, groupby methods, or filtering data with Boolean vectors; things which we would originally have to do with for loops.

df.csv(“dewan_rakyat_memberships.csv”)

Our flattened dataframe can also be converted into a spreadsheet and posted online as a more accessible medium to crowdsource contributions from the public in helping to manually clean or fill in missing data, which we will then be able to sync back to our Popit Database.

About the Author

Kay Wong

Research Intern, @wkayn

Kay is a research intern at Sinar. She is currently pursuing her Bachelors in Computer Science with a specialisation in Data Science. She is passionate about Machine Learning and data analysis, and is currently learning more about digital rights and open source development working at Sinar. In her free time, she’s usually working on her NLP project, learning new technologies, reading, and thinking about meta research.

popit pandas open data jupyter howto research note interns