Construction Industry Development Board (CIDB) Database

Scraped Construction Industry Development Board (CIDB) Database
A scraped version of the CIDB database including projects, companies and beneficial ownership in CSV and JSON format to make it easier for the public to do data analysis on construction projects in Malaysia.

2000-2015

Second attempt at scraping CIDB contractor data. This set is more complete in terms of projects and companies, but in comparison to 2012 set, many contracts or names by Politically Exposed Persons (PEPs) seem to have been removed

JSON Format

2000-2012

First attempt at scraping CIDB contractor data

CSV Format

This format is useful for those using spreadsheets and other tools for data analysis

Update Date: 6 May 2012

Follow @sinarproject on Twitter or Google+ page for updates.

  • Projects (9.7MB)
    List of all projects, including company, date and summary details
  • Directors (891KB)
    Company Names and List of Directors
  • Contractor Details (6.9mb)
    Company details such including contant, email and registration numbers

These are currently compressed in XZ format, Windows user can install the free 7-Zip utility to extract these files.

Tips and Tools for Analysis

Spreadsheets

The spreadsheets can be combined and merged for data analysis using Pivot Tables. Some fields you can merge are the reference key (always accurate), but also fields (columns) such as Director's name. This will allow you to do analysis such as how many companies somebody is listed as a director, or the value of all projects by company name.

A tutorial for using pivot tables tools is available here. You can download and share the free and open source software LibreOffice spreadsheet and office suite. We recommend that you share your results in OpenDocument Format (ODF) to keep the data free to access without the need to purchase expensive proprietary software.

Google Fusion Tables

Google Fusion Tables provides a free on-line tool for you to host, visualize and share your data and analysis.

Code

Open source code for the tools used to grab the web page information from CIDB and converter to JSON data format.

Contributors

  • Sweemeng Ng for the scaper to transform html pages into JSON objects
  • Hodor for converting JSON objects into CSV format for usage by wider audience
  • CIDB for maintaining the original database and making it available publicly