Data standard for company registers – Open Corporates

Open Corporates is a project whose goal is to gather standardised data about every company in the world. The project’s database currently has data for 110,723,978 companies drawn from 118 company registers from across the world.

OpenCorporates has an interesting operating model. It is registered as a company (founded by people with a strong track record in the open data and transparency community) and it has a business model – which brings sustainability to the project. At the same time, the project is faithful to the transparency community’s principles of being open and sharing one’s work. In practice, this means that some advanced features of the website are available “free of charge to anyone legitimately working on a project that will contribute back to the open-data community [under share-alike terms], and for a fee [under our non-share-alike terms] for use in commercial or proprietary applications” (rephrased, source).

What kind of data is covered?

OpenCorporates data specification covers detailed company information:

  • Basic data, such as name, address or company type;
  • Company employees and directors;
  • Mandatory statutory filings ie. change of address, issuance of shares, annual accounts, etc.;
  • Licenses obtained;
  • Information on the company’s network: subsidiary companies, branches, etc.;
  • Information on who controls the company through share ownership, voting rights; right to appoint directors, right to appoint members, rights to share surplus assets, etc.

See the glossary and API reference for details.

Which tools can process OpenCorporates data?

While I assume that data from the  OpenCorporates database is widely used by advocates, I wasn’t able to  find any open civic tech products built on top of the database (apart from OC’s own tools). One interesting tool is the corporate network visualization map:

Map of the global Goldman Sachs corporate network assembled and visualized by Open Corporates —

Data bulks & APIs

OpenCorporates offers bulk data downloads in CSV and XLS for each company register (example). It also offers a versioned API including search capabilities for publishing data in both JSON and XML.

What’s unique among the civic tech projects that I have seen, is that OpenCorporates offers an Open Refine reconciliation service API. “OpenRefine (formerly Google Refine) is a powerful tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data” – source. Reconciliation Service API means that you can easily connect the OpenCorporates database to Open Refine and use it while cleaning the companies’ data: Open Refine will suggest best matches for companies’ data that you are processing, allowing you to annotate messy company names with their unique OpenCorporates identifiers.

Data standard disclaimer

Up to now, I have been mentioning data specifications which, as most people would agree, fall into the open standards bucket. Here, however,  we have a slightly different situation. OpenCorporates has documented data schemas and their API reference, but they do not plan to open their specification for public collaboration. Because of this many people don’t consider it  a data standard. Neither do I but I still believe  that it’s the best available specification for reusing company data. It was developed with global reach in mind, so it’s not tied to any particular country. Practically speaking: if your government was to release company  data in a structured manner would  it be better if they used OpenCorporates’ spec or create their own? Thus, I recommend opening data by reusing OpenCorporates’ data specification (and optionally API), until we create better solutions in terms of governance.