Global Data Barometer: Do the data capabilities of governments correlate with the availability of data?

A brief article analyzing the comparison between open data capability areas such as having open data initiatives, data institutions, digital government policies and civil servants training in data skills with data availability. GDB data on regions and selected countries was used to investigate the relationship between these indicators and identify capability areas to be worked on to improve data availability.

Key Findings

  • At the global level and between regions, governments with robust data capabilities, including open data initiatives, data institutions, digital government policies, and trained civil servants in data skills, tend to have higher availability of data.
  • Countries with a strong correlation between their data capabilities and data availability should focus on enhancing specific areas of data capabilities, such as improving data skills, digital government efforts, or internet access.
  • Some countries exhibit low data availability despite possessing high data capabilities e.g. UAE, Malaysia and Ireland. These countries should evaluate their existing data initiatives and ensure that datasets are made accessible in accordance with GDB standards.
  • Future GDB surveys should consider tailoring their approach to countries that demonstrate high scores in data availability despite having lower capabilities e.g. US, Brazil and New Zealand. This is most likely because their data is available but they are on decentralized platforms.

Introduction

About Global Data Barometer

The Global Data Barometer (GDB) is the result of the efforts of over 100 researchers and a network of regional research hubs around the world. The design of the GDB builds on the previous editions of the Open Data Barometer, but takes a broader look at data sharing and use for the public good, including giving additional attention to issues of privacy and inclusion. 

The Barometer is a multi-dimensional and multi-layered study that assessed the state of data for public good in 109 countries. An expert survey was conducted from May 2019 – May 2021 to create a new global benchmark that looks at data governance, capability, availability, and use and impact of data for public good. The GDB full report and datasets of the GDB are available on their website.

Researchers in Asia were engaged in collaboration with the Data for Development AsiaHub to provide a new benchmark and the essential data needed to drive a fuller understanding of the state of data for development, open data implementation, and data justice in Asia.  

What is measured on capabilities and availabilities in the GDB

In the report, the Capabilities pillar measures four primary indicators, namely training of civil servants in data literacy and skills, availability of open government data initiatives in the country, government support for data re-use and capacity of sub-national governments to manage data.

The GDB assessed the availability of 16 datasets based on their potential to address key issues such as climate change, public health, political integrity and land rights.  The Availability pillar measures datasets availability as structured open data on:

 

  • Endangered species and ecosystems
  • Emissions information
  • Climate vulnerability 
  • Company beneficial ownership information
  • Company information
  • Civil registration and vital statistics
  • Real-time capacity of healthcare system
  • COVID-19 vaccination information
  • Land use information
  • Detailed and structured land tenure information
  • Government budget and spending information
  • Interest and asset declaration information
  • Lobby register information
  • Political finance information 
  • Detailed RTI performance information 
  • Public procurement processes

State of Capabilities


gdb_cap.png

Chart: Capabilities scores across countries according to the Global Data Barometer

Global

Pillar Score

42

According to the GDB report, there are significant variations in the capabilities of governments as among the pillars, it has the greatest range of scores between the highest (Estonia, 91.2) and the lowest (Haiti, 11.8). This variation seems to be consistent with the digital divide in the world, including digital literacy and access to digital technologies.

Estonia has scores of more than 90 in most of the Capabilities modules, and it is no wonder that they are claimed to be the one of the most digitally advanced countries

 

Between regions

Between regions, there is also a high level of variation of Capabilities scores, with European Union, United Kingdom, North America, Israel, Australia and New Zealand achieving an average module score of 70 and Africa with a score of 32.

 

Region

Average Score

Africa

32

Eastern Europe and Central Asia

45

European Union, United Kingdom, North America, Israel, Australia and New Zealand

70

Latin America and the Caribbean

44

Middle East and North Africa

46

South and East Asia

53

State of Availability


gdb_ava.png

Chart: Availability scores across countries according to the Global Data Barometer

 

Global

Pillar Score

42

Based on the GDB assessment, the pandemic has shown that most countries have capacity to make health datasets available, where 98.2% of countries have data on COVID-19 infection & mortality and 84.4% have data on COVID-19 testing. Many countries also have budget and spending information (96.3%) and public procurement data (91.7%).

However,  very few countries have data on lobbying (16.5%) and RTI performance (38.5%).

 

Comparison between regions

Although all regions achieved scores below 50, the variation of Availability scores does not seem to be as high as Capabilities.

Region

Average Score

Africa

11

Eastern Europe and Central Asia

29

European Union, United Kingdom, North America, Israel, Australia and New Zealand

50

Latin America and the Caribbean

29

Middle East and North Africa

14

South and East Asia

32



Comparing Capabilities with Availability

gdb_cap_vs_ava.png

Chart: Capabilities vs Availability scores for the 109 countries assessed in the GDB.



Global 

 

Pillar Score

0.80

Using the scores on Capabilities and Availability pillars in the GDB, the two pillars seem to be highly correlated based on the Pearson correlation score of 0.8. This means that the capability of a country or government in the use of data is almost directly correlated with their ability to make core datasets available. 

 

Between regions

By region, the statement is also quite true although there is slight variation on the range of scores. In African countries, although the Capabilities scores are below average, the availability scores achieved are twice that of the former.

Region

Capabilities 

Availability

Africa

11

27

Eastern Europe and Central Asia

29

37

European Union, United Kingdom, North America, Israel, Australia and New Zealand

50

61

Latin America and the Caribbean

29

37

Middle East and North Africa

14

38

South and East Asia

32

47

World

30

42

 

Countries with expected results: strong correlation between the two pillars

The following table shows 10 countries which have high correlation between the two pillars based on their lowest absolute residual score i.e. nearest to the regression line.

Country

Availability

Capabilities

Details

Bulgaria

41

54

Bulgaria has had an open data initiative since 2014 and has high use of standards in statistics, although sub-national capabilities are limited. This translates into their corresponding Availability score, with available health, public finance, political integrity and public procurement datasets. 

Mozambique

5

13

Mozambique has 0 score in open data initiative but has a high digitalisation of government services (52). However, it has low availability except in the health module.

Trinidad and Tobago

18

28

Has low open data initiatives but above average scores in terms of digital government services (61.2) and political freedoms and civil liberties (82.0). Despite this, they have low availability of datasets except for health, public finance and procurement modules.

Togo

11

21

Has low internet access and average score of government digital services, but has low availability scores except for procurement data.

Sweden

44

59

High high capabilities in various areas, particular open data initiatives (80) and data institutions (100). In return, has good scores on data availability on all modules except land, procurement and political integrity.

Bolivia (Plurinational State of)

20

31

Has below average scores in most Capabilities indicators, except for Political freedoms and civil liberties. However, they have above average scores on data availability for land and procurement. 

Germany

54

69

High high scores in Capabilities indicators, except civil service and political integrity interoperability. In return, they have above average scores on data availability except for company information.

Bahamas

17

28

Very high scores on certain indicators like internet access and political freedoms and civil liberties, but low scores on open data initiative, civil service and digital skills. However, they have some availability on all modules albeit low scores except for health, procurement and public finance.

Finland

52

69

High generally high scores on all Capabilities indicators, except sub-national. In return, they also have high Availability scores in all modules.

Bahrain

18

28

Has high scores in internet access, government online services and digital skills despite below average overall score. And in general, they have above average availability scores for company information, health and procurement.

 

In general, it was interesting to see that many of these countries had high scores on political freedoms and civil liberties including the lower-scored countries such as Bahamas, Bolivia, Trinidad and Tobago. 

Since these are countries which are most likely to follow the linearity model, we can try to calculate the residual score for each of the indicators with respect to the Availability pillar scores. It was found that only certain indicators were contributing to the scores.

Low residual score, thus contributing factor

Mixed or high residual score, thus likely not contributing factor

  • Business use of digital tools
  • Data institutions
  • Digital government
  • Internet access
  • Knowledge-intensive employment
  • Digital skills
  • Political freedoms and civil liberties
  • Use of standards and methods in statistics offices
  • Open data initiative
  • Civil service
  • Government online services
  • Government support for re-use
  • Human capital
  • Sub-national capabilities
  • Political integrity interoperability

This was also regardless of the indicators, as the highest weighted indicators were Civil service, Government support for re-use, Open data initiative and Sub-national capabilities. 

Countries with anomalous results: weak correlation between the two pillars

High availability, low capabilities:

The following table shows 10 countries which are anomalous to the hypothesis, where these countries have high Availability scores despite their low Capabilities scores. This is based on their highest residual value i.e. furthest positive distance to the regression line.

In general, these countries seem to have a decentralized statistical or information system and the datasets assessed were identified on different platforms. This may have contributed to their general below-average scores for open data initiatives, except for Chile, Italy and New Zealand, as well as their below-average scores for sub-national capabilities, showing that the data released was by various government agencies at the national level. Hence it may be more fair to assess their sub-national capabilities by their participation in preparing national level data.

Additionally, they have low scores in Government support for re-use, except for the US. So this may mean that even though their datasets are highly available, there is little evidence that they are reused. At the same time, they also have low scores in terms of the civil service training, which could be the contributing factor of the lack of government support in data reuse.

Moreover, despite the fact that most of these countries have political integrity, public finance and procurement datasets available, the datasets have low interoperability, meaning that they do not have common identifiers that facilitate mapping across the system. However, this may not be specific to this group of countries as only 27 countries scored more than 0 out of the 109 countries assessed. 

Country

Availability

Capabilities

Details 

Armenia

49

28

Armenia has no explicit open data initiative in the country. However, the datasets shown to be available seem to be on decentralized platforms. 

United States of America

80

64

The US has generally high scores in Capabilities indicators except civil service, data institutions and sub-national capabilities. With these, they have high availability of datasets assessed, except for company information. These datasets are on decentralized platforms. 

Brazil

62

49

Brazil has above average scores in Capabilities indicators, except civil service, digital skills,  government support for re-use. For availability scores, they generally scored above average in all modules.

Chile

59

50

Chile has high scores in digital government, government online services, open data initiative and use of standards in statistics offices. However, they have low scores in civil service and government support for re-use. For availability scores, they generally scored above average, except for land data.

New Zealand

70

62

New Zealand has generally high scores in all Capabilities indicators, except civil service and government support for re-use. In return, they have above average scores in Availability modules.

Georgia

46

40

Georgia has average scores across Capabilities scores, yet scored above average in most Availability modules, 

Croatia

47

43

Croatia has average scores across Capabilities scores, although scored below average for civil service and government support for re-use. Despite this, they scored above average in Availability modules except in climate action (6).

Latvia

54

50

Latvia achieved average scores across Capabilities scores, but scored below average for civil service and government support for re-use. However, they generally scored well in Availability modules except for land data (12).

Mexico

51

47

Mexico has high scores in a few Capabilities indicators such as data institutions (100) and digital government (83), but low scores in civil service and government support for re-use. However they scored above in most Availability modules.

Italy

56

54

Italy scored well in Capabilities indicators, particularly open data initiative (60), data institutions (100) and use of standards in statistics (100). Consequently, they have above average scores in Availability scores, except land data.

High capabilities, low availability

The following table shows 10 countries which are anomalous to the hypothesis, where these countries have low Availability scores despite their Capabilities scores. This is based on their lowest residual value i.e. furthest negative distance to the regression line.

In general, these countries have high scores in digital government, digital skills, government online services and open data initiatives. However, for their below average Availability scores, they may need to improve on civil servants training, political freedoms and civil liberties (except Tunisia and Ghana), use of standards and methods in statistics offices, and sub-national capabilities (except Malaysia and UAE).

Particularly, they may need to thoroughly review their current open data priorities to make datasets available according to GDB standards.

Country

Availability

Capabilities

Details

United Arab Emirates

18

58

UAE has high scores in digital government, digital skills, sub-national capabilities and open data initiative. Despite this, they have below average scores across Availability modules.

Tunisia

7

38

Tunisia has average scores in most Capabilities indicators, but has above average scores in political freedom and civil liberties. Nevertheless, they scored below average across Availability modules including the political datasets.

Sri Lanka

8

35

Sri Lanka has high scores in digital skills (54), political freedom and civil liberties (56) and digital government (77), but average scores in other Capabilities indicators. However, they did poorly in data availability assessments, with no data for climate action, company information, land and procurement.

Saudi Arabia

20

49

Saudi Arabia has high scores in a few Capabilities indicators, including open data initiative (80), digital government (93) and digital skills (72). But they have poor scores in political freedoms and civil liberties (7) and data institutions (0). On the other hand, they have some availability in public finance data.

Rwanda

11

39

Rwanda has below average scores in Capabilities, with low internet access but above average in digital government. 

Qatar

14

45

Qatar has average scores in general for Capabilities indicators, although achieve some high scores in data institutions and digital government. Despite this, they scored below average in Availability, with no datasets found for climate action and procurement.

Malaysia

24

69

Malaysia has a high Capabilities score as a result of their high scores in digital government, data institutions and sub-national capabilities. However, this does not translate into high availability of datasets.

Ghana

15

43

Like Tunisia, Ghana has average scores in Capabilities indicators but scored high in political freedom and civil liberties. However, they did poorly in Availability scores, except public finance (61).

Jordan

8

34

Jordan has above average scores in open data initiative  (80) and digital skills (65). However, for Availability modules, they do not have any data for land and political integrity.

Côte d'Ivoire

5

41

Côte d'Ivoire has average scores across the Capabilities indicators, although scored excellently in government support for re-use (100). But they scored poorly for digital government.


Conclusions and recommendations

At the global and regional level, the hypothesis of whether a government or country’s capabilities in data is correlated to their ability to make core datasets available seems to be true. In general, the countries that already show high correlation may want to improve their data skills, digital government or internet access, depending on their current state of capabilities. 

The countries that showed anomalous results may need to be looked at individually to identify the reasons for the gaps. For the countries that showed high Availability scores despite their lower-than-expected Capabilities scores, the reason may be a decentralized statistical system. Future GDB study may be refined to cater to these countries, particularly in participation of sub-national institutions in the preparation of national level data. However, these countries may also want to make use of the datasets available by promoting re-use.

On the other hand, the countries which showed the opposite, that is low Availability despite high Capabilities, will need to review their current data initiatives or priorities so as to better make datasets available according to GDB standards. 

About Sinar Project

Sinar Project is a civic tech initiative using open technology, open data and policy analysis to systematically make important information public and more accessible. It aims to improve governance and encourage greater citizen involvement in the public affairs of the nation by making Parliament and Government more open, transparent and accountable.