Sourcing and extending metadata of Citizen Lab test-lists for dashboards

For the development of scalable and sustainable tools such as our upcoming online censorship dashboards, it is important to build and contribute to and leverage upon existing efforts such as OONI Probe and Citizen Lab test-lists.

Citizen Lab test-lists

Citizen Lab test-lists are lists of URLs used for censorship tests. They are in machine readable CSV format split by global and country lists with a few fields including categories.

They are also used by OONI Probe from which we source our data for the development of our online censorship dashboards project.

Having enough meta-data to generate the required views for our dashboard from these lists, will help ensure consistency across countries, as well as provide incentive for maintaining up-to-date categorised lists.

Example of a partial test-list for Malaysia

| url                                           | category_code | category_description | date_added | source       | notes                                        | 
|-----------------------------------------------|---------------|----------------------|------------|--------------|----------------------------------------------| 
| https://76crimes.com/anti-lgbt-laws-malaysia/ | LGBT          | LGBT                 | 2016-06-10 | OONI         |                                              | 
| http://7rangers.blogspot.com/                 | POLR          | Political Criticism  | 2016-06-10 | OONI         |                                              | 
| http://adb.org                                | ECON          | Economics            | 2014-04-15 | citizenlab   |                                              | 
| https://amanah.org.my/                        | POLR          | Political Criticism  | 2017-10-11 | sinarproject | political party                              | 
| http://anwaribrahimblog.com                   | POLR          | Political Criticism  | 2014-04-15 | citizenlab   |                                              | 
| http://asiafriendfinder.com                   | DATE          | Online Dating        | 2014-04-15 | citizenlab   |                                              | 
| http://www.asiaone.com                        | NEWS          | News Media           | 2017-10-24 | sinarproject |                                              | 
| http://babeinthecitykl.blogspot.com/          | CULTR         | Culture              | 2016-06-10 | OONI         |                                              | 
| https://www.barisannasional.org.my/           | POLR          | Political Criticism  | 2018-04-30 | sinarproject | Barisan Nasional governing coalition website | 
| http://bbs.buysell.net.my                     | FILE          | File-sharing         | 2014-04-15 | citizenlab   |                                              | 
| http://bebasmedia.tripod.com                  | NEWS          | News Media           | 2014-04-15 | citizenlab   |                                              | 
| http://bersih.org                             | POLR          | Political Criticism  | 2014-04-15 | citizenlab   |                                              |

Views

Description

The lists probably could be extended with an additional description column. In the Malaysian list above, Bersih.org is a civil society electoral watchdog, but we cannot infer this from the test lists.

Without this field, any report or maintainer of lists, will need to visit and figure out or maintain a separate list of descriptions of the URLs.

Map / Region

Mapping incidences of censorship on a map or by country is not an issue, OONI API provides information of country and local ASN (ISP network) in results.

There is not enough metadata however on the test-lists however to determine whether a site is local or not.

For the user using our dashboards to report on Press Freedom, we are currently not able to generate a list separating local and foreign news sources. This is for the country a site identifies itself with, not where it is hosted.

An additional country code column be added for the test-list, allowing the dashboard and researchers to use the test-lists to filter for specific local, regional or international sites.

Events

Events metadata such as start-date, end-date and title can help give additional context for a censorship even.

This type of data does not belong in URL test lists.

Providing additional context on censorship event, additional information could possibly be sourced from a WikiData query

Additional categorisation

Sub-categories

As for country and region metadata, when we take the use case of reporting on or displaying status of Press Freedom in a country we will need additional sub-categories.

If we look at RSF Press Freedom Index Methodology, we can possibly extract out a few additional sub-categories News.

Privately Owned
State Owned
Political Incumbent
Political Opposition
Religious Majority
Religious Minority

In reports for Press Freedom, this is simply reduced to "independent" or "non-independent/mainstream".

These additional sub-categories however are not too onerous, in that they are applicable for other human rights reports such as freedom of religion and expression.

News Media, Religious Minority
Political Criticism, Political Incumbent
Religion, Religious Majority

Using the same sub-categories, we can auto-generate dashboards and reports for users that can show if freedom of expression is restricted for religious minorities, or whether political expression is only restricted for opposition voices.

This needs additional testing against use cases and actual reports, but initial research indicates that a small set of sub-categories would allow use of the test-lists to generate meaningful insights for users on state of censorship across different international measures of freedom of expression.