You are here: Home / Projects / Scrapers / Construction Industry Development Board (CIDB) Database

Construction Industry Development Board (CIDB) Database

A scraped version of the CIDB database in CSV and JSON format to make it easier for the public to do data analysis on construction projects in Malaysia.

CIDB Local Contractor Database in common formats

CSV Format

This format is useful for those using spreadsheets and other tools for data analysis

Update Date: 6 May 2012

Follow @sinarproject on Twitter or Google+ page for updates.

  • Projects (9.7MB)
    List of all projects, including company, date and summary details
  • Directors (891KB)
    Company Names and List of Directors
  • Contractor Details (6.9mb)
    Company details such including contant, email and registration numbers

These are currently compressed in XZ format, Windows user can install the free 7-Zip utility to extract these files.

The reference key is the id provided by CIDB website, you can (and should) double check against the original CIDB website source. This data is imported as is from CIDB. We may provide a platform in future for crowd sourced clean up of bad data from CIDB.

Tips and Tools for Analysis

Spreadsheets

The spreadsheets can be combined and merged for data analysis using Pivot Tables. Some fields you can merge are the reference key (always accurate), but also fields (columns) such as Director's name. This will allow you to do analysis such as how many companies somebody is listed as a director, or the value of all projects by company name.

A tutorial for using pivot tables tools is available here. You can download and share the free and open source software LibreOffice spreadsheet and office suite. We recommend that you share your results in OpenDocument Format (ODF) to keep the data free to access without the need to purchase expensive proprietary software.

Google Fusion Tables

Google Fusion Tables provides a free on-line tool for you to host, visualize and share your data and analysis.

Code

Open source code for the tools used to grab the web page information from CIDB and converter to JSON data format.

Credits

  • Sweemeng Ng for the scaper to transform html pages into JSON objects
  • Hodor for converting JSON objects into CSV format for usage by wider audience
  • CIDB for maintaining the original database and making it available publicly