Sourcing and extending metadata of Citizen Lab test-lists for dashboards

For the development of scalable and sustainable tools such as our upcoming online censorship dashboards, it is important to build and contribute to and leverage upon existing efforts such as OONI Probe and Citizen Lab test-lists.

Citizen Lab test-lists

Citizen Lab test-lists are lists of URLs used for censorship tests. They are in machine readable CSV format split by global and country lists with a few fields including categories.

They are also used by OONI Probe from which we source our data for the development of our online censorship dashboards project.

Having enough meta-data to generate the required views for our dashboard from these lists, will help ensure consistency across countries, as well as provide incentive for maintaining up-to-date categorised lists.

Example of a partial test-list for Malaysia

| url                                           | category_code | category_description | date_added | source       | notes                                        | 
|-----------------------------------------------|---------------|----------------------|------------|--------------|----------------------------------------------| 
| https://76crimes.com/anti-lgbt-laws-malaysia/ | LGBT          | LGBT                 | 2016-06-10 | OONI         |                                              | 
| http://7rangers.blogspot.com/                 | POLR          | Political Criticism  | 2016-06-10 | OONI         |                                              | 
| http://adb.org                                | ECON          | Economics            | 2014-04-15 | citizenlab   |                                              | 
| https://amanah.org.my/                        | POLR          | Political Criticism  | 2017-10-11 | sinarproject | political party                              | 
| http://anwaribrahimblog.com                   | POLR          | Political Criticism  | 2014-04-15 | citizenlab   |                                              | 
| http://asiafriendfinder.com                   | DATE          | Online Dating        | 2014-04-15 | citizenlab   |                                              | 
| http://www.asiaone.com                        | NEWS          | News Media           | 2017-10-24 | sinarproject |                                              | 
| http://babeinthecitykl.blogspot.com/          | CULTR         | Culture              | 2016-06-10 | OONI         |                                              | 
| https://www.barisannasional.org.my/           | POLR          | Political Criticism  | 2018-04-30 | sinarproject | Barisan Nasional governing coalition website | 
| http://bbs.buysell.net.my                     | FILE          | File-sharing         | 2014-04-15 | citizenlab   |                                              | 
| http://bebasmedia.tripod.com                  | NEWS          | News Media           | 2014-04-15 | citizenlab   |                                              | 
| http://bersih.org                             | POLR          | Political Criticism  | 2014-04-15 | citizenlab   |                                              | 

Views

Description

The lists probably could be extended with an additional description column. In the Malaysian list above, Bersih.org is a civil society electoral watchdog, but we cannot infer this from the test lists.

Without this field, any report or maintainer of lists, will need to visit and figure out or maintain a separate list of descriptions of the URLs.

Map / Region

Southeast Asia

Mapping incidences of censorship on a map or by country is not an issue, OONI API provides information of country and local ASN (ISP network) in results.

There is not enough metadata however on the test-lists however to determine whether a site is local or not.

For the user using our dashboards to report on Press Freedom, we are currently not able to generate a list separating local and foreign news sources. This is for the country a site identifies itself with, not where it is hosted.

An additional country code column be added for the test-list, allowing the dashboard and researchers to use the test-lists to filter for specific local, regional or international sites.

Events

Events metadata such as start-date, end-date and title can help give additional context for a censorship even.

This type of data does not belong in URL test lists.

Providing additional context on censorship event, additional information could possibly be sourced from a WikiData query

Additional categorisation

Sub-categories

As for country and region metadata, when we take the use case of reporting on or displaying status of Press Freedom in a country we will need additional sub-categories.

If we look at RSF Press Freedom Index Methodology, we can possibly extract out a few additional sub-categories News.

  • Privately Owned
  • State Owned
  • Political Incumbent
  • Political Opposition
  • Religious Majority
  • Religious Minority

In reports for Press Freedom, this is simply reduced to "independent" or "non-independent/mainstream".

These additional sub-categories however are not too onerous, in that they are applicable for other human rights reports such as freedom of religion and expression.

  • News Media, Religious Minority
  • Political Criticism, Political Incumbent
  • Religion, Religious Majority

Using the same sub-categories, we can auto-generate dashboards and reports for users that can show if freedom of expression is restricted for religious minorities, or whether political expression is only restricted for opposition voices.

This needs additional testing against use cases and actual reports, but initial research indicates that a small set of sub-categories would allow use of the test-lists to generate meaningful insights for users on state of censorship across different international measures of freedom of expression.

Tags

For certain reports or dashboards, such as the use case of Access Now STOP incident report, there is a need for arbitrary categorization across a set of sites that cuts across categories. An example case is Elections. Which set of sites are critical to be on-line during elections?

Beyond dashboards, tagging test-lists make them useful to auto-generate lists of URLs for OONI Run. It encourages contributions to keeping test-lists for countries updated from a wider audience.

For upcoming Cambodian Elections, a collaborative monitoring effort could use test-lists csv in Google Sheets, filter by elections tag to generate a list of URLs for OONI Run and then a more up to date country test list for Cambodia can be contributed back to upstream test-lists.

Future custom reports or dashboards, can use the test-lists, categories and tags to generate reports for a country or across countries which would be in sync with the OONI Run test list URLs used by country censorship monitoring teams.

Current comments are positive for implementation of tags on upstream filed issue