Data standards: What are they and why do they matter? The Introduction

With a great pleasure we’re introducing an analysis of what data standards are and what they mean in practice.  You can read the report on the transparencee.org website where we divided it into 2 parts: The Introduction and The Analysis. This report is a part of our efforts to formulate regional standards for data and data related process (law included).

Why data is so important? Let’s listen to Anna Kuliberda:

Data helps us better understand the reality we function in, informs fact-based policies and, when well analyzed, it allows us to see patterns, irregularities and intersections we would never think of.

That’s why most of the tech for transparency projects begin with data: we want to be better informed – so first of all we need the data. You’ve probably heard someone say “we will open these datasets” many times. This phrase hides all the complexity of a rather  tricky process:

  1. data gathering (from published resources or FOI law demands)
  2. data extraction (ie. extracting text from PDFs or scanned documents)
  3. cleaning (ie. Y, y, yes, “+” all stand for an affirmative answer)
  4. structuring (in the simplest scenario – sorting data into columns ).

In the end, we get open data that can be analyzed further to reach some conclusions or published for others to work on. And all of that is great! Well… that is until you want to cooperate with others working on similar datasets.

Here at TransprenCEE we’re building a common knowledge base, which we hope you’ll find  useful and expand it with your experiences.

It’s important for civil society…

Let’s say that working on public procurement projects you’ve created clear visualizations and somebody else has created a tool to flag suspicious tenders. You want to import your data to the others’ solutions and vice-versa. However, if your datasets were opened independently they probably have a different format and collaboration is not that simple. Your solutions don’t “speak” the same language. That problem grows bigger in international collaborations where social and legal contexts differ.

Data standards are the answer to this problem. By supporting and using them, you ensure that your work will be easier to reuse by others. It’s like creating a piece which fits into bigger puzzle. It may slightly raise costs during data opening, but it significantly brings down the costs  of integrating with other solutions that ‘speak’ the same standard. In many cases, you can instantly use other solutions on your dataset.

In  our series of analyses about data standards we will recommend the best ones for a given topic. Imagine that you want to analyse procurement data in your country or municipality? We say: “Well, if you open data in this standard you will  then be able to easily deploy these open source tools to analyse it.”

It’s important for donors…

If you are a donor representative then specifying standards in grants given will bring more value for your money and allow for better integration with other initiatives in the field.

It’s important for IT experts too!

And if you’re not an expert in the field (like public procurements), but an IT development leader, our research will list existing tools as reference, highlight challenges in data opening, introduce language link between model and real world for several of existing deployments, provide contact with people who have previously worked on a given standard and tools, and, of course, provide you with links to specifications.

Please read the full report below: “Data standards: What are they and why do they matter? The Analysis“.

Image from page 592 of “Wright’s book of poultry, revised and edited in accordance with the latest poultry club standards” (1911), published on Flickr, belongs to Public Domain.