transparencee

Technology for Transparency in CEE and Eurasia.
Learn & Connect
  • analysis

    Tech Tools for Transparency and Accountability – the CEE Region Landscape

    You can download this analysis in PDF: Technology for Transparency in CEE incl Best Practices.

    You can use this publication under the Creative Commons BY-NC-SA 4.0 license. The data of this research can be found and search through on the website of the project: www.transparencee.org.

    In the era of information abundance, IT tools and processes are necessary. We use them to filter and emphasize relevant information, simplify and visualize it so that everyone can understand it.

    Initiatives in a Spotlight

    Public institutions unwillingness to cooperate, low level of transparency and corruption cases form a part of a superstitious image of the CEE region and there is an element of truth to it. It inspired many CSOs, many of watchdog origin, to independently roll out initiatives that would raise level of transparency and  would held officials accountable to their actions. Big part of these initiatives that relies on use of modern technologies, is the focus of TransparenCEE Network and this report.

    Active CSOs managed to develop tools that challenge public sector or present them best practices in transparency field. There is also some work done on transparency from the public sector side, both national and local. We intentionally leave public initiatives out of our focus (mostly: data portals and participation-based sites) with exception of those that were done with and by local CSOs or are exceptional feature-wise.

    Scope of analyzed tools is quite close to what is understood as civic tech. As the Knight Foundation report described #OpenGov:  “projects enabling top-down change through the promotion of government transparency, accessibility of government data and services and promotion of civic involvement in democratic process” with exception of the change here being done through bottom-up approach by CSOs, with no or little government collaboration.

    I. Solutions by Areas

    Most of the existing solutions seems to grow from watchdog-like activities: monitoring parliaments and local councils, courts, procurement, budgets, media and promises made by politicians. There are also some projects developed to strengthen civic engagement and participation: e-petitions, civic reporting, etc. usually run by global rather than local players.

    1.  Parliament Monitoring

    One of the first tech for transparency tools developed dealt with parliament monitoring. It presented how bills were processed: from an idea, through commissions, consultations and hearings all the way to the voting. The tool allowed people to check details about every MP including his/her speeches. Some initiatives extend the scope of statement monitoring and collect data also from social and traditional media.

    National level tools can be also implemented locally to monitor representatives and bills at this level.

    Solutions in this area include ParlData monitoring a dozen of CEE parliaments: Polish MyCountry and I have the right to know, Slovak Datanest, Slovenian Legislative Monitor, Lithuanian Mano Seimas are good examples.

    parldata-albaniaparldata-armenia

    2.  Court Monitoring

    Court monitoring focuses on court rulings with some metrics and aggregations around it. Database of accessible, copiable and structured rulings form its foundations. What is a workload in different courts, how effectively they work, what are the budgets, who is being sued the most? – these among other metrics allow to compare courts, rulings and look for patterns.

    Solutions include previously mentioned MyCountry and Legislative Monitor, as well as Lithuanian Open Courts.

    atviras-seimas2

    3.  Budgets

    Budgets at city level, not mentioning level of state, are at scale that’s not intuitive to citizens. Budget visualizations and other means of presenting financial data are an attempt of making it less abstract, which is not an easy task. Basic visualizations are using OpenSpending engine or similar designs. Quite recently tax calculators became a thing downsizing budgets at human scale – type in how much you earn and we will show you where your taxes go. You can see it in the Belarusian Cost of the state, Polish Transparent City or MyCountry portals. There are also budget games that allow users to modify existing budgets and propose their own version. When used by a representative number of people – it’s a tool for advocacy on its own.

    Interesting examples of budget visualizations come from the cities. On nicely styled infographics they present costs of a city per item: how much does it cost to plant a tree, maintain it, build 100m meters of an asphalt road, bike lane or a pavement? This balances between data aggregated in categories and specific contracts allowing citizens to understand differences in costs.

    Solutions include Belarusian Cost of the state, Polish Transparent City and Our Money, and Serbian Vodite Racuna among others.

    city-cost-gdansk-2

    4.  Procurement

    How public money is spent has always been the focus of watchdog organizations. Because of the high volume tech tools necessary to process it, solutions vary in features. The base is always to publish procurements with advanced filtering, the variations are navigating between them and simple aggregations (top contractors, top agencies). Some solutions go deeper and link money to specific people rather than companies (making daughter companies more transparent). Some (like Slovak Open Contracts) use crowdsourcing to assess tenders, some use automatic processing using indicators (like Hungarian Red Flags).

    Data availability differs, with EU countries obliged to publish every tender above 30k EUR limit. Formats of the data differ too and most of contracts are being published either in HTML or PDFs. Ukraine has a chance to be the first country with standardized data on procurement available through API when Prozorro platform will be officially transferred to the state.

    red-flags

    5.  Anti-corruption

    Anti-corruption activities focus on three areas: procurement monitoring mentioned above, political parties income monitoring, and building corruption cases database. Corruption cases database, not mentioned before, are searchable websites consisting of cases collected either by media monitoring, or anonymous reporting. Reports can be made on the website or through dedicated mobile apps.

    6.  Truth-o-meter – Fact Checking

    Truth-o-meters, known to the US public through the Politifact.com website are quite common in the CEE region. There are at least two major tools being scaled to different countries: Demagog operating in Visegrad region (Poland, Slovakia, Czech Republic and Hungary), and Istinomjer operating in Balkans (Serbia, Bosnia & Herzegovina, Montenegro, Macedonia and recently Croatia).

    Both tools are quite simple and offer a similar functionality: they monitor the truthfulness of statements of public figures (politicians, etc.) saying if it’s true, false, a manipulation or it cannot be verified. Apart from checking what has recently been said,  a user can also point at a new statement and ask for evaluation. A period of the highest activity of these sites aligns with elections – all the promises made by politicians and political parties are stored and then evaluated throughout the electoral term.

    The power of this tool comes from the community of verifiers who are doing the hard work. Search feature makes sure that no registered statement will be forgotten.

    Demagog - truthfullness of Law & Justice

    7.  Smart Voting

    Smart voting is a theme for initiatives that describe the candidates we vote for. All of initiatives allow to compare our political views with the ones declared by candidates. Some of them evaluate candidates after elections when they become MPs. One can then compare MPs views with how they actually voted.

    Solutions include Bosnian Glasometar, Polish Latarnik Wyborczy and I have the right to know, Lithuanian Mano Seimas, Czech and Slovak Election Calculator.

    glasometar

    II. Technical side: Data processing

    Data is the key to the most of the aforementioned initiatives and a process of acquiring and transforming data to structured formats is often the most expensive element of a tool development and then its maintenance. A lot depends on the formats of data which is why advocacy for better data quality is inseparable from IT works.

    1.  Open Data

    The most adopted scheme for classifying data quality and its openness is Sir Tim Berners-Lee 5-star deployment scheme for Open Data. While experts and data receivers should always be consulted on formats, the sca;e provides a clear guidance on how to enhance quality and gives an overview how much work is there to be done.

    Star One is publishing data in an open license in any format.

    Star Two is publishing data in a structured format which is relevant to the type of data. If it’s a text, publish it in a copyable text document. If it’s a table, publish it in Excel spreadsheet.

    Star Three is publishing data in non-proprietary formats. While Excel ‘97 spreadsheets are proprietary, new Excel 2007 are non-proprietary. Simplicity is also important: CSV’s are preferred to Excel which can be augmented with styles, formulas, etc.

    Star Four is using URLs to denote things with unique identifiers, best if they are accessible on the internet at these URLs, ie. state.gov/foia-requests/34.

    Star Five is linking data between each other using URLs. For example: while publishing an answer to FOIA request, be sure to include a link to the request itself.

    Not covered by the 5-star definition, but crucial for interoperability and reusability of IT tools are data standards. Let’s say we’re building procurement monitoring platform. If data is published even at the five star level, but it’s not standardised, one have to process every data feed separately, multiplying the cost. If relevant standards are introduced (Open Procurement in this case) connecting to other data feeds (other institutions, countries) can be technically as simple as inserting a feed URL. Data standards need to match at least Three Star level, and it is the best if they match the five stars.

    2.  APIs

    Application programming interfaces are serving data in a standardized manner in a form more more advance to simple file download. APIs allow to filter data, retrieve only portions of a content, perform aggregations and also fire actions potentially modifying the data.

    3.  Scraping

    What to do if data is publicly available, but its format doesn’t allow an easy reuse? You are forced to write expensive data scrapers. Scraping, as defined in Wikipedia, focuses […] on the transformation of unstructured data on the web, typically in HTML format, into structured data that can be stored and analyzed in a central local database or spreadsheet. Inputs for scraping can vary from HTML sites to PDF documents or scanned documents. In every case the cost of creating scrapers is high and its maintenance due to possible changes in underlying data is costly too.

    Extreme option for scraping is manually rewriting data into a structured format which is possible if a dataset is not big and necessary if data is really “dirty”.

    Dirty data means a lot of mistakes, switched columns, etc. One most popular case are probably address fields: some include postal codes, some not; street names can be written in few variants; it’s hard to distinguish building from apartment number. Enumerated values tend to fall out of reasonable dictionary or use a lot of variants (yes, no, y, y., t, maybe, havent seen, -). There can be IDs that are pointing to nothing, numbers that don’t sum up.

    As public sector still doesn’t serve data in standards and through APIs, scraping is in most cases an only way to develop tools based on public information. That’s why the first step in developing independent T&A tools is to go from scraping to publishing structured data, so that others don’t have to repeat this costly step. Polish MyCountry platform is outstanding in this field publishing over 100 processed public datasets through API, as well as ParlData project publishing public data (MPs profiles, debates, votings) on about over 10 CEE parliaments Popolo standard through API.

    4.  Crowdsourcing

    When there is no data or it is too dirty to be fully automatically processed, or there is a need to interpret lot of information, one can rely on data crowdsourcing. It is an approach that has been successfully implemented in a number of scientific projects like transcribing old ships weather logs or hunting for livable planets. In transparency & accountability field this approach is also being introduced. Warsaw Ninja crowdsource delays of public transportation from its passengers providing often more timely information than official news from transport agencies. Civic data mining engages users to process answers to FOIA requests into machine-readable formats. Slovak’s Open Contracts asks users who feel knowledgeable in the field to fill a survey about a contract being viewed, thus providing rough metrics on them (are they complete, coherent, overpriced, etc.).

    Reliability of crowd-sourced information is a crucial issue. Proper statistical methods need to be introduced to filter through gathered input from multiple sources. Quite often crowdsourcing goes in pair with expert review – it is a process of data filtering, where a few most interesting examples are brought up for review by experts.

    5.  Machine learning

    This is a yet to be applied method for analyzing data, with high hopes of using it for procurement solutions. Machine learning allows automatic categorization of data samples (among other things) after a small set of training examples prepared by experts are provided.

    One of the recent solutions where machine learning is applied, is the Red Flags in Public Procurement platform. It works as an early warning system built on risk indicators which help identify or prevent risky (potentially corrupt) public procurements. Red flagging is done automatically based on a number of indicators. Most flagged cases are sent to experts, investigative journalist, competing companies and other actors for review. After a throughout review of facts, procurements can be further analyzed by the machine learning techniques and fed back into system to enhance quality of automatic processing, closing the feedback loop.

    learn more
  • analysis

    Open Data Principles for Beginners

    With a great pleasure we’re introducing an analysis of what institutional openness and open data mean in practice, also from a legal point of view.  You can read the report on the transparencee.org website where we divided it into 5 parts:

    1. Legal Introduction to Openness and Open Public Data
    2. Open by Default and Permissible Restrictions for Open Data
    3. Open Data: The Process of Quality and Quantity
    4. Open Data: Where to Start (incl. tl;dr simple recommendations)
    5. Open Data for Anti-corruption and Public Participation

    or just download the: Open Data Principles by Krzysztof Izdebski (please note: the structure of the PDF is slightly different from the presentation on the website the content however, doesn’t differ).

    You can use this publication under the Creative Commons BY-NC-SA 4.0 license.

    Why did we prepare this analysis?

    The recent civic technology (technology for better civic and public life) developments in the world heavily rely on data. Data helps us better understand the reality we function within, informs fact-based policies and when well analyzed allows us to see patterns, irregularities and intersections we would never think of. When analyzing civic technology for transparency and accountability, we shouldn’t forget about the open data system in the given country. Law provides the framework within which working with data is possible. Law is a structure of power. Analyzing and influencing legal system is one of the duties of the civil society.

    The Intertwine

    Freedom of Information (FOI) is commonly understood as a mechanism which contributes to anti-corruption struggles and ideas for enhancing public participation. FOI is a way in which we can obtain data. But what about the data itself? Open public data is a powerful tool and also a derivative of FOI. Without FOI, open data cannot deliver what it promises: the true freedom of information.

    That is why this publication starts with Freedom of Information and then never lets it go. Because public open data and FOI are intrinsically related to each other. Even if FOI can exist (and often does) without open data, public open data in democracies, cannot exist without FOI. Together with the author of this publication we stand on the position that it is not just another e-service provided by the public bodies, but a next step in pursuing a fundamental human right to obtain and disseminate information, especially public information and data produced with public money.

    In this publication we’re talking about the public open data only, as our main focus comes from transparency and accountability activists’ point of view. Open data held by public institutions is closely linked to open government, which is a wider term. It includes public participation and overall interactions with citizens (not only on-line). Yet again in this case when it comes to open government, open data is a natural thing to build upon.

    Public open data (or open government data) as described by the Organization for Economic Co-operation and Development (OECD) definition is

    any data and information produced or commissioned by public bodies; Open data are data that can be freely used, re-used and distributed by anyone, only subject to (at the most) the requirement that users attribute the data and that they make their work available to be shared as well.

    When we think public data we think documents, reports, registries, other databases, calendars, maps (geospatial information), timetables, real-time data about public transport and so on. But it’s also data on what is available and in what form including detailed meta-data.

    The Definitions

    Open Public Data it’s even more than data itself. It embraces legal and technical openness, which following the Open Definition is:

    Legal openness: being allowed to get the data legally, to build on it, and to share it. Legal openness is usually provided by applying an appropriate (open) license which allows free access to and reuse of data or by placing data into the public domain.

    Technical openness: there should be no technical barriers for using data. For example, providing data as printouts on paper (or as tables in PDF documents) makes the information extremely difficult to work with. The Open Definition has various requirements for “technical openness,” such as requiring that data would be machine readable and available in bulk. This way, a truly open public data is a data set that is basically ready for anybody to take it and do whatever they want with it.

    For Whom

    When meeting with activists and reformers in public bodies, the discussion about open data tends to avoid specifics, focusing on the general assumptions. With this publication we’re aiming at fulfilling the gap between general information and very specific, legally or technically oriented publications and websites, trying as much as possible to avoid jargon and giving examples, so the reader can easily find information where to start reforming the law or advocating for its reform.

    We hope this will allow everybody, who is thinking about opening data in a public institution to have a checklist of where to start and where to head. We don’t speak of technical issues here, as they are secondary to the readiness and also a general openness strategy of a public institution.

     

    About the author:

    Krzysztof Izdebski is a Polish lawyer providing legal consultations on access to public information and re-use of public sector information, drafting legal opinions and representing NGO’s and other clients in court proceedings. He is also specialized in the legal aspects of the prevention of corruption. Currently he is a Local Research Correspondent for Poland in the European Commission Anti-Corruption project (Anti-Corruption report) that aims to improve anti-corruption policies in the member states. He is also a trainer in the field of combating and preventing corruption. Author of publications on freedom of information, conflicts of interest, corruption and ethical standards of NGO’s. He is also taking active role in the Coalition for the Open Government – informal body that is acting toward Poland’s participation in the Open Government Partnership. He was one of the authors of the report of the Coalition that describes where Poland is on its way to the Open Government.

    learn more
  • analysis

    Legal Introduction to Openness and Open Public Data

    (This is part 1 of the analysis “Open Data for Beginners”, you can find out more here)

    learn more
  • analysis

    Open by Default and Permissible Restrictions for Open Data

    (This is part 2 of the analysis “Open Data for Beginners”, you can find out more here)

    There are a large number of documents that describe the standards of open data. Some are examples of the soft-law[1] that guides how public authorities should develop open data policies (for example, in the G8 Charter or Open Government Partnership Declaration), while some can effectively impose responsibilities on public authorities (such as the EU Directive on the Reuse of Public Sector Information). The general principles in this study follow those articulated in the G8 Data Charter with references to other more specific principles.

    Open by default and permissible restrictions

    “Open by default” means that governments should aim for maximum disclosure. The EU Directive on the Reuse of Public Sector Information recommends that

    making public all generally available documents held by the public sector — concerning not only the political process but also the legal and administrative process — is a fundamental instrument for extending the right to knowledge, which is a basic principle of democracy.

    This is the core standard of any freedom of information legislation.

    Such legislation should define information (sometimes referred also as a document) broadly, which in practice means that every piece of information developed or received by a public authority in connection with performing public tasks should be considered “open.”[2] Although there are many examples of legislation that order the release of information as open data,[3] it has to be emphasized that making data open is rarely the decision of lawmakers but is up to the public officials whose ambition is to become reformers. For inspiration, look at the activities of G8 governments as described in the Review of Progress on the Open Data Charter.[4]

    According to The Publics Right to Know: Principles on Freedom of Information Legislation,[5] the principle of maximum disclosure

    “establishes the presumption that all information held by public bodies should be subject to disclosure and that this presumption may be overcome only in very limited circumstances.

    The European Convention of Human Rights expressed in article 10.2 that exercising the freedom of information

    “may be subject to such formalities, conditions, restrictions or penalties as are prescribed by law and are necessary in a democratic society.”

    Similarly, the International Covenant on Civil and Political Rights includes the rule that freedom of information

    may therefore be subject to certain restrictions, but these shall only be such as are provided by law and are necessary.

    The signatories of the Council of Europe’s (CoE) Convention on Access to Official Documents, in article 3, have agreed that

    limitations shall be set down precisely in law, be necessary in a democratic society, and be proportionate (…).”

    This means that when analyzing which document can be released, a public official should consider whether some restrictions need to be imposed because of the potential harm to third parties or the public interest, and if there is no other means to protect those rights and interest besides restricting access to the document.

    Permissible restrictions are generally covered by provisions that regulate the access to information. The most popular one is the need to protect intellectual property (copyright), trade secrets (economic secrets), and privacy and national security (state secrets). If your local access to information legislation does not permit the release of a document, the same rules would also mean that document could not be published or disseminated using open data standards.

    Intellectual property

    The concept of “intellectual property” entails the need to protect authors or inventors from the exploitation of their works and discoveries. However, for the purpose of access to information, intellectual property rights are commonly understood as copyrights, which means they exclude

    documents covered by industrial property rights, such as patents, registered designs, and trademarks[6]

    as those are protected by specific regulations. Very often, copyright protects not only the specific information held in the document, but also the whole dataset or database.[7]

    In practice, a restriction might apply to the access to detailed research data that was provided by external experts on an evaluation of the public education system. When a government holds rights to any of its documents, it should permit access and reuse. Only when the intellectual property rights belong to a third party should a restriction be considered. Governments should follow the general approach that everything that was funded publicly (such as reports, analyses, and opinions contributed by the external authors) should be available for the public. The importance of this approach is expressed in the Hague Declaration on Knowledge Discovery in the Digital Age,[8] which was signed by representatives of public and nongovernmental cultural and educational institutions; it describes how to make data open without harm to the legitimate interests of the data’s authors.

    It is worth noting that the United Kingdom has introduced the Open Government Licence, which limits the restrictive Crown Copyright and enables citizens to freely use and reuse governmental data.[9] Public officials should be also inspired by the UK Government Licensing Framework,[10] which has built a policy around preparing and releasing open data. It is worth to carefully check what possibilities are allowed by local regulations.

    Trade secrets

    Trade secrets — also referred to as commercial confidentiality or economic secrets — can also be a reason for restricting access to information. Restricting access is explicitly allowed by the EU Directive, CoE Convention, countries’ legislation, and soft-law, including the CSOs’ recommendations. While in some countries, legislation provides a legal definition of a trade secret,[11] in others it refers to different legal acts and is developed in practice.[12]

    A company that manages a local public transport system could claim as a trade secret the number of passengers using specific connections. As with any other restriction, it is the responsibility of the specific entrepreneur to clearly state which parts of its information are confidential. It is also important that public administration should consider each time whether such qualification of the information is appropriate given the principle of the maximum disclosure. Such consideration is also called a proportionality test.

    National security

    National security — or the broader term, a state secret — is another example of a restriction explicitly expressed in numerous documents, including the international and European human rights conventions and soft-law such as the Tshwane Principles on National Security and the Right to Information elaborated upon by 22 NGOs and academic centers.[13] The latter states that

    “no restriction on the right to information on national security grounds may be imposed unless the government can demonstrate that: (1) the restriction (a) is prescribed by law and (b) is necessary in a democratic society (c) to protect a legitimate national security interest (…).

    For example, reuse of the information concerning locations of police closed-circuit television cameras can be fairly restricted. In most countries, the definition of state secrets is quite similar and involves weighing the conflict between releasing specific information and its impact on the country’s internal and external security.[14]

    Privacy

    Very often, “privacy” is narrowly defined as protection of personal data[15], but in some cases, it can be defined more widely. The European Convention of Human Rights expressed the need to protect everyone’s private and family life.[16] However, this is also not an absolute exception to maximum disclosure standard. In the famous case of Google vs. Gonzales,[17] the Court of Justice of the European Union allowed for the interference with this fundamental right, stating that public activity of a person (such as a public official or anyone who is dealing with the management of public funds) justifies limiting the protection of their privacy. It is also accepted by the European Court of Human Rights that public officials or candidates to public posts are subject to reduced protection of the right to a private life.[18] This has also broader significance for releasing and reusing personal data that is part of national registries. In Poland, for example, the State Court Registry, which consists of data on company owners, is open and can therefore be reused.[19]


    [1] In this publication the soft low is understood as: referring to rules that are neither strictly binding in nature nor completely lacking legal significance. In the context of international law, soft law refers to guidelines, policy declarations or codes of conduct which set standards of conduct. However, they are not directly enforceable. (http://definitions.uslegal.com/s/soft-law (Accessed October 1, 2015.)

    [2] Art. 3 of the Regulation (EC) No 1049/2001 of the European Parliament and of the Council of 30 May 2001 regarding public access to European Parliament, Council, and Commission documents define “document” as “any content whatever its medium (written on paper or stored in electronic form or as a sound, visual or audiovisual recording) concerning a matter relating to the policies, activities and decisions falling within the institution’s sphere of responsibility.” Available at http://www.europarl.europa.eu/RegData/PDF/r1049_en.pdf. (Accessed October 1, 2015.)

    [3] For example, the Polish Act on Access to Public Information commands central administration bodies to transfer selected data to the Central Repository of Public Information — www.danepubliczne.gov.pl. In the Slovak Republic, thanks to the Act No. 546/2010 Coll., all public contracts (with some exemptions) must be published online. Those that are not published are unenforceable.

    [4] http://www2.datainnovation.org/2015-open-data-g8.pdf

    [5] Available at https://www.article19.org/data/files/pdfs/standards/righttoknow.pdf. (Accessed October 1, 2015.)

    [6] Directive 2003/98/EC of the European Parliament and of the Council of 17 November 2003 on the re-use of public sector information.

    [7] According to the Glossary of Public Sector Information and Open Data Terminology, a dataset is a collection of data, usually presented in tabular form, presented either electronically or in other formats. Available at https://data.gov.uk/glossary#letter_d. (Accessed October 1, 2015.)

    [8] Available at http://thehaguedeclaration.com/the-hague-declaration-on-knowledge-discovery-in-the-digital-age/. (Accessed October 1, 2015.)

    [9] http://www.nationalarchives.gov.uk/information-management/re-using-public-sector-information/licensing-for-re-use/what-ogl-covers/

    [10] http://www.nationalarchives.gov.uk/information-management/re-using-public-sector-information/licensing-for-re-use/ukglf/

    [11] As in article 1 of the Law on Commercial Secrets in the Republic of Moldova. Available at http://lex.justice.md/index.php?action=view&view=doc&id=312792. (Accessed November 25, 2015.)

    [12] For the example of Ukraine, see the expertise of A. Polikarpov. Available at http://www.ligue.org/uploads/documents/cycle%202015/Cycle%202015/Rapports%20B/2015rapportBukrainien.pdf. (Accessed November 25, 2015.)

    [13] Available at https://www.fas.org/sgp/library/tshwane.pdf. (Accessed October 1, 2015.)

    [14] For example, the Law of Georgia on State Secrets defines a state secret as “information available in the areas of defense, economy, foreign relations, intelligence, national security and law enforcement, the disclosure or loss of which can prejudice the sovereignty, constitutional order, political and economic interests of Georgia or of any party to the treaties and international agreements of Georgia and which, according to this Law and/or treaties and international agreements of Georgia, is predetermined as classified or deemed to be a state secret, and is subject to state protection.” Available at https://matsne.gov.ge/en/document/view/2750311. (Accessed November 25, 2015.)

    [15] Which following the article 2 of the   Directive 95/46/EC of the European Parliament and of the Council of 24 October 1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data can be defined “as any information relating to an identified or identifiable natural person (‘data subject’); an identifiable person is one who can be identified, directly or indirectly, in particular by reference to an identification number or to one or more factors specific to his physical, physiological, mental, economic, cultural or social identity”

    [16] It has also been defined by the CoE in the Declaration on Mass Communication Media and Human Rights, contained within Resolution 428 (1970), as the right to live one’s own life with a minimum of interference. Available at http://assembly.coe.int/nw/xml/XRef/Xref-XML2HTML-en.asp?fileid=15842&lang=en. (Accessed October 1, 2015.)

    [17] “… it is justified by the preponderant interest of the general public in having (…) access to the information in question.” Available at http://curia.europa.eu/juris/document/document.jsf?text=&docid=152065&pageIndex=0&doclang=en&mode=lst&dir=&occ=first&part=1&cid=70060. (Accessed October 1, 2015.)

    [18] Lingens v. Austria (1986), Oberschlick v. Austria (1991), Thorgierson v. Iceland (1992).

    [19] This example is among others cited by the ePaństwo Foundation at http://www.mojepanstwo.pl.

    learn more
  • analysis

    Open Data: The Process of Quality and Quantity

    (This is part 3 of the analysis “Open Data for Beginners”, you can find out more here)

    Opening data is a long process that has to be prepared for carefully. The first step is usually to identify the data sets that are in the possession of the public entity. If a public authority intends to share its institutional knowledge with the general public, it has the responsibility to do it in an effective and productive manner. OECD recommends

    “balancing the need to provide timely official data with the need to deliver trustworthy data.”[1]

    Open Government Partnership[2] members declare that they

    “commit to proactively provide high-value information, including raw data, in a timely manner, in formats that the public can easily locate, understand and use, and in formats that facilitate reuse.”

    In the Czech Republic, the law asks for released data to be machine-readable and the format of data and metadata to satisfy most open-format standards.[3] European Commission guidelines on the reuse of public sector information[4] contain several tips to increase the quality and quantity of delivered data sets. They should be“published online in their original, unmodified form to ensure timely release,” and the public office should ensure their completeness. The timely release of data is very often the crucial factor that determines the interest of potential users. Citizens want to have access to educational statistical data at the time that they are considering signing up a child for a specific school and not after the child has finished the education process.

    Because the greatest potential of open data lies in mixing different kinds of information with each other (such as spatial information with crime statistics), the public office should also guarantee the inter-operability of data sets. Therefore, the European Commission recommends that agencies describe data sets in rich metadata formats,[5]which means to include the information on topics such as what the data content is, who collected or created the data and when, whether there were any changes or updates, and so on.

    The EU Directive indicates that documents should be

    “available in any pre-existing format or language, and, where possible and appropriate, in open and machine-readable format together with their metadata. The pre-existing format is often a synonym for raw data, which is “data collected which has not been subjected to processing or any other manipulation beyond that necessary for its first use.[6]

    This means that data sets should be published in the same form as that they were created in. The user decides how the data will be used and for what. One must remember that presenting data in an attractive format (like an infographic) does not make it open. What makes data “open” is that raw data can be freely reused by others.

    Usable by all

    Open data is about sharing with the world the knowledge that is usually kept hidden in internal computer networks or somewhere on the shelves of public offices. But open data is not just about publishing data online and waiting until someone will read it. Sharing means that users have the opportunity to do what they like with the data and use it for what will serve their community best. According to the G8 Data Charter, “usable by all” means that public authorities should release data

    “without bureaucratic or administrative barriers, such as registration requirements, which can deter people from accessing the data.

    The other important aspect is to release data free of charge and in open formats. The latter, according to the EU Directive, means that

    “a file format that is platform-independent and made available to the public without any restriction that impedes the re-use of documents.

    In practice, the open format is a digital file standard that is free of charge and copyrights and that users can search, download, and use without buying special software.

    One of the principles of open data is to include everyone in using and reusing the data. This means that data is released in machine-readable format, which means that it can be read, searched, and combined with other data sets mechanically. The other important aspect is to guarantee that data is easy to find. For example, in Bulgaria, public institutions must facilitate searching of public sector information introducing mechanisms for online access or by any other suitable means.[7]

    According to the G8 Data Charter, open data also has a strong impact on innovation in the private sector. A Deloitte study[8] shows that opening data encourages a more open attitude in the business sector, which can use open data to inspire customer engagement. Without implementing open data principles such as open access, using machine-readable formats, and interoperability, this would not be possible.


     

    [1] OECD Recommendation of the Council on Digital Government Strategies. Available at http://www.oecd.org/gov/public-innovation/Recommendation-digital-government-strategies.pdf.

    [2]Available at http://www.opengovpartnership.org/about/open-government-declaration. (Accessed October 1, 2015.)

    [3] Section 4b of the Amendment to the Act on Free Access to Information. Available at http://www.senat.cz/xqw/webdav/pssenat/original/76610/64407. (Accessed November 25, 2015.)

    [4] Guidelines on Recommended Standard Licences, Datasets and Charging for the Re-use of Documents. European Commission Notice (2014/C 240/01). Available at https://ec.europa.eu/digital-agenda/en/news/commission-notice-guidelines-recommended-standard-licences-datasets-and-charging-re-use. (Accessed October 1, 2015.)

    [5] For more information on sufficient metadata, see https://github.com/project-open-data/G8_Metadata_Mapping https://github.com/project-open-data/G8_Metadata_Mapping. (Accessed October 1, 2015.)

    [6] Glossary of Public Sector Information and Open Data Terminology. Available at https://data.gov.uk/glossary#letter_r. (Accessed October 1, 2015.)

    [7] Art. 41d of the Decree No. 184.

    [8] Open Data: Driving Growth, Ingenuity and Innovation. A Deloitte Analytics paper. Available at http://www2.deloitte.com/content/dam/Deloitte/uk/Documents/deloitte-analytics/open-data-driving-growth-ingenuity-and-innovation.pdf (Accessed October 1, 2015.)

    learn more
  • analysis

    Open Data: Where to Start

    (This is part 4 of the analysis “Open Data for Beginners”, you can find out more here)

    How to make data open by default, of high quality, and usable by all?

    The EU Directive introduced some rules that support the implementation of open data standards. Although they are binding only for member states, the below principles need to be observed by every institution that wants to open its data.

    No charges, or only charges for the marginal costs of data reproduction, provision, and dissemination

    Datasets that are in the possession of the public sector are collected and used thanks to the money of taxpayers. There is no reason why citizens should pay twice for the same information — the second time to use it and disseminate it more widely. The zero-cost method is recommended by the European Commission when documents are already digitized and are disseminated electronically. If a public institution is considering a charge for using open data, fees should not exceed

    “the cost of collection, production, reproduction and dissemination, together with a reasonable return on investment.[1]

    For example, the Slovenian legal framework[2] enables free-of-charge non-commercial reuse of information. In addition, it accepts some free commercial use, such as when the data is reused to ensure the freedom of expression, culture, and art or is reused by news media. The more money a public authority charges, the less people will use the data and, as a result, in the context of the economic value of open data less money will be funneled in the form of taxes to the public budget.

    Transparency

    Whenever a public agency restricts data access by setting up charges or other conditions that users have to fulfill, the public entity should publish the appropriate information on its website. According to the 8 Principles of Open Government Data,

    “government information is a mix of public records, personal information, copyrighted work, and other non-open data; it is important to be clear about what data is available and what licensing, terms of service, and legal restrictions apply. Data for which no restrictions apply should be marked clearly as being in the public domain.” [3]

    Polish law provides the norm, stating that if there is no specific contrary statement published on the website, the reuse of released data is not subject to licenses or conditions.[4]

    A public authority should not act arbitrarily and without control. Therefore, it should also establish measures to redress decisions and practices that negatively influence those who want to reuse the information. The user should have the right to appeal to an oversight body or to go to court. The exact means are usually based on the local access-to-information legislation. It is also important to make sure that the information about those means is widely disseminated and accessible by everyone.

    Nondiscrimination

    The EU Directive says that

    “any applicable conditions for the re-use of documents shall be non-discriminatory for comparable categories of re-use.

    Sometimes conditions are based on whether someone wants to reuse the information for commercial or non-commercial purposes. Commercial use may be more intense and could mean more work for public officials, but generally the reason behind reusing data should not determine how a user is treated. The other important aspect of non-discrimination is a general prohibition of exclusive agreements between a public institution and one or more users that would restrict access to data by anyone who is interested in it. This also concerns other public entities. For example, in Slovenia, if documents are reused by a public agency as input for commercial activities that fall outside the scope of its public tasks, the same charges and other conditions apply to the supply of the documents for those activities as apply to other users.[5]

    Tl;dr: Simple Recommendations

     

    Do’s Don’ts
    Think open! Sharing information with citizens should be considered in every case for all data. Remember to check that there aren’t any legitimate exceptions that protect data from being released. Don’t keep data to yourself. There is wisdom in the collective. Sharing data will increase a public institution’s effectiveness and will allow it to develop evidence-based policies.
    Try to collect information about what data is available. Remember to engage citizens to get feedback on what else they need and employees to ask what data will help them to perform their duties. Don’t think that you know the best about what is interesting for citizens and helpful for your colleagues. Discuss what the need is.
    Allow others to reuse the data without any restrictions. If copyrights apply, check the most open licensing standard. Don’t charge for the data or impose disproportionate bureaucratic restrictions. If the cost of delivery of the data is significant, charge as little as possible.
    Publish information with rich metadata and in open and machine-readable formats. It not only increases the data’s inter-operability, it also increases access for people who are visually impaired. Include the methods that were used to collect the data. Maybe someone will propose more efficient ones for the future. Don’t publish data before making sure that its format will be usable for anything. Scanned documents and non-reusable data will only load your servers.
    Create, develop, and evaluate a strategy of opening the data in your institution that will engage a wide range of users and take into account the need to prevent corruption. Don’t think that something is better than nothing. Opening data is a systematic process. You can’t expect that anyone will be happy if you publish random data that is useless for the public. Without a strategic approach, you will never know what people need.

     


    [1] A detailed description on how to count costs is available in the European Commission Notice (2014/C 240/01) entitled Guidelines on Recommended Standard Licences, Datasets and Charging for the Re-use of Documents. Available at https://ec.europa.eu/digital-agenda/en/news/commission-notice-guidelines-recommended-standard-licences-datasets-and-charging-re-use https://ec.europa.eu/digital-agenda/en/news/commission-notice-guidelines-recommended-standard-licences-datasets-and-charging-re-use.

    [2] Art 34a of the Access to Public Information Act. Available at https://www.ip-rs.si/index.php?id=324. (Accessed November 25, 2015.)

    [3] Available at http://opengovdata.org/. (Accessed October 1, 2015.)

    [4] Art 23b of the Access to Public Information Act.

    [5] Art 36 of the Access to Public Information Act.

    learn more
  • analysis

    Open Data for Anti-corruption and Public Participation

    (This is part 5 of the analysis “Open Data for Beginners”, you can find out more here)

    Open data by default in the context of countering corruption and fostering public participation

    The main goal of setting up a standard of maximum disclosure is closely connected with the nature of corruption, which usually happens in secrecy. One of the demands of the UN Convention against Corruption[1] is to oblige countries

    “to develop and implement or maintain effective, coordinated anti-corruption policies that promote the participation of society and reflect the principles of the rule of law, proper management of public affairs and public property, integrity, transparency, and accountability.

    The convention emphasizes the role of transparency in fighting corruption by regulating that central arenas in which a state operates — such as public procurement, managing public funds, or recruitment for public posts — should be transparent, and public officials in these arenas should be held accountable. Enabling data to be accessed by anyone and from anywhere allows for verification by CSOs, experts, and a large number of public officials. According to findings by the Research Center on Security and Crime (TACOD project), about 7 percent of cases of corruption in the UK were detected thanks to open data.[2] This is a significant number of incidents, and the potential for additional open data development is promising as well.[3] Easy access to open data is also of great help to investigative institutions such as the police or prosecutors. For example, access to information about public procurements can facilitate the work of investigators by enabling quicker and more discreet access to information about public funds that are managed in a suspicious manner.[4]

    Public participation can only be empowered by offering open access to official sources. This is recognized by the OECD, which, in its Recommendation of the Council on Digital Government Strategies,[5] emphasizes the role of new technologies in social inclusion and public participation. The Open Government Partnership Declaration states that countries should be

    “making policy formulation and decision making more transparent, creating and using channels to solicit public feedback, and deepening public participation in developing, monitoring and evaluating government activities.”

    Proactive publication of official data will enhance the expertise of representatives of the general public, which is crucial for a sincere and effective public debate and the amount of feedback authorities receive regarding their actions. This point is supported by the G8 Data Charter, in which the authors wrote that

    “open data (…) increase awareness about how countries’ natural resources are used, how extractives revenues are spent, and how land is transacted and managed. All of which promotes accountability and good governance, enhances public debate, and helps to combat corruption.[6]

    Obtaining and reusing the data is supporting evidence-based law. Combining different statistical data with opinion polls can often bring solutions for burning problems that receive broad public support. You cannot implement a participative budgeting system if information on the budget is not widely spread.

    Usable by all in the context of countering corruption and fostering public participation

    There is no question that transparent, non-discriminatory, and nonrestrictive conditions for using data will increase public participation. The more people have access to data, the more people will engage in establishing new services that will reach wider circles. Open access will also result in more interest in and increased credibility for official websites. Open data improves governance by giving more people the chance to engage in managing a state or municipality. But it should be also treated as a tool that will help public officials set up standards for the information they gather, resulting in a better ability to exchange information between public institutions locally, nationally, or even internationally. Getting feedback from CSOs is important, but it is also a good practice to discuss the issue with employers.

    Corruption is a multilayer phenomenon. To fight corruption, governments have to engage a lot of sources, people, and tools. Allowing data to be open means that more experts can access the information (for example, on public procurement) and more risk factors can be identified. This will not only result in revealing cases of corruption, it will also enable the development of a system that can prevent corruption. According to the Research Centre on Security and Crime’s report,[7] public authorities should educate citizens about which datasets are in their possession and try to develop methods for engaging the general public to monitor the available data in order to identify potential corruption cases. It is also important to establish the possibility of following up when members of the public identify a corruption case by establishing clear ways for the public to communicate with specialized public bodies such as auditors. Only then will open data help win the fight against corruption.

    Corruption likes secrecy. There are so many cases, in both the public[8] and private sectors,[9] in which access to data would help to prevent and fight corruption. Open data allows more people to engage in decision-making processes and to influence other important activities of public officials. There is evidence of that coming from different political systems.[10] Open data also enhances the effectiveness of governments. Public officials can use data for evidence-based legislation and are able to react in more efficient ways to signals sent by the general public.

    To achieve these goals, public institutions have to prepare themselves to open their data. Below you will find some basic tips based on our analysis. They will help you to build open data policy in a public institution.


     

    [1] Available at https://www.unodc.org/documents/brussels/UN_Convention_Against_Corruption.pdf. (Accessed October 1, 2015.)

    [2] Available at http://www.tacod.eu/wordpress/wp-content/uploads/2015/04/National_Research_UK_def.pdf. (Accessed October 1, 2015.)

    [3] Some interesting examples on the potential use of open data in fighting corruption can be also found in How Open Data Can Help Tackle Corruption- Policy paper.. Transparency International. Available at http://issuu.com/transparencyuk/docs/policy_paper_-_how_open_data_can_he (Accessed October 1, 2015) and Davies, T., and Fumega, S., Mixed Incentives: Adopting ICT Innovations for Transparency, Accountability, and Anti-corruption. Available at http://www.cmi.no/publications/file/5172-mixed-incentives.pdf. (Accessed October 1, 2015.)

    [4] Towards a European Strategy to Reduce Corruption by Enhancing the Use of Open Data. National Research: Italy. Available at http://www.tacod.eu/wordpress/wp-content/uploads/2015/04/National_Research_IT_def.pdf. (Accessed October 1, 2015.)

    [5] Available at http://www.oecd.org/gov/public-innovation/Recommendation-digital-government-strategies.pdf. (Accessed October 1, 2015.)

    [6] Available at https://www.gov.uk/government/publications/open-data-charter/g8-open-data-charter-and-technical-annex. (Accessed October 1, 2015.)

    [7] Towards a European Strategy to Reduce Corruption by Enhancing the Use of Open Data. National Research: United Kingdom. Available at http://www.tacod.eu/wordpress/wp-content/uploads/2015/04/National_Research_UK_def.pdf. (Accessed October 1, 2015.)

    [8] An example is the military industry, as presented in Transparency International’s report: Classified Information: A Review of Current Legislation Across 15 Countries and the EU. Available at http://issuu.com/tidefence/docs/140911_classified_information. (Accessed October 1, 2015.)

    [9] Jones, S., September 3, 2015. “‘Web of Corrupt Activity’ Costs Poorest Countries a Trillion Dollars a Year.” The Guardian. Available at http://www.theguardian.com/global-development/2014/sep/03/one-g20-cracking-down-corruption. (Accessed October 1, 2015.) See also Houlder, V., October 31, 2013. “Company Register in UK to Remove ‘Cloak of Secrecy.'” Financial Times. Available at http://www.ft.com/intl/cms/s/0/f71fab54-417c-11e3-9073-00144feabdc0.html#axzz3x1wHL6OK. (Accessed October 1, 2015.)

    [10] IBM Center for the Business of Government. 2011. Assessing Public Participation in an Open Government Era: A Review of Federal Agency Plans. Available at http://www.govexec.com/pdfs/082211jm1.pdf (Accessed October 1, 2015.) See also Chiliswa, Z., 2014. Investigating the Impact of Kenya’s Open Data Initiative on Marginalized Communities: Case Study of Urban Slums and Rural Settlements. Available at http://www.opendataresearch.org/project/2013/jhc. (Accessed October 1, 2015.)

    learn more
  • analysis

    Data standards: What are they and why do they matter? The Analysis

    With a great pleasure we’re introducing an analysis of what data standards are and what they mean in practice.  You can read the report on the transparencee.org website where we divided it into 2 parts: The Introduction and The Analysis. This report is a part of our efforts to formulate regional standards for data and data related process (law included).

    The ultimate goal of the TransparenCEE initiative is to strengthen the civic tech sector in Central and Eastern Europe (CEE). We build foundations for collaboration, in part, by suggesting data standards to be used in joint projects.

     

     

    iso_8601

    Comic available at xkcd.com/1179/ under the Creative Commons Attribution-NonCommercial 2.5 License.

    What is a data standard?

    Think of a Master’s thesis that you have to write at the university. It consists of a title, abstract, the thesis itself and a bibliography. Oh, and it’s written as part of your curriculum and verified by other people. So you should specify the university, the faculty, the supervisor and the reviewer. As well as some audit logs: date of creation, last modification date, date of acceptance by reviewer (and his/her opinion), date of acceptance by supervisor. Probably also a bit of translation for the global community: a title and an abstract in English if the thesis is in some other language.

    Now let’s say that you want to create a tool to browse theses on any given subject. You need to gather a substantial number of theses and feed them into a computer. For that to happen you need to transform each thesis into a single data record.

    Planning how this data record will look like is called modelling. We model real life examples into data records. Modelling can drop some unimportant (arbitrarily) details, like did you publish your thesis in paperback or hardcover, or the color of the hair of your supervisor (unless you want to analyze this factor). Apart from that, it’s mostly about specifying requirements by making decisions like “should supervisor, reviewer, author be specified by providing given name, family name, academic title, university represented?”, “would the title and abstract be obligatory fields?” or “should dates include just year, month, day, or maybe the an hour is necessary as well.” Stakeholder collaboration is essential in this phase as different contexts need to be grasped. In one country, an independent review may not be necessary, while in another you may need two reviewers.

    From modelling to representation and interoperability

    Creating data standards is all about interoperability: the ability to exchange standardized data between systems owned by different subjects. For that to happen one more step is required: representation – making the decision which file formats to use, how to format dates (look at the last picture again ), how to store images, etc. In the end, you can land with the same information represented (or “serialized”) in possibly different file formats. The resulting files carry the same information, and a preference for one or the other is mostly a matter of preference, if you have the resources all of them can be used in parallel.

    Here are a few examples of the same content represented in some popular formats.

    JSON (preferred by scripted solutions)

    {

    “author”: {“given_name”: “Krzysztof”, “family_name”: “Madejski”},

    “title”: “Data standards: What are they and why they matter”,

    “date_of_final_accept”: “2016-01-29”

    }

     

    CSV (anyone is able to view it in spreadsheet, but embedding objects (ie. author in the thesis) is not possible)

    author_given_name, author_family_name, title, date_of_final_accept

    Krzysztof, Madejski, Data standards: What are they and why they matter, 2016-01-29

     

    XML (preferred by bigger institutions)

    <thesis>

     <author>

        <given_name>Krzysztof</given_name>

        <family_name>Madejski</family_name>

     </author>

     <title>Data standards: What are they and why they matter</title>

     <date_of_final_accept>2016-01-29</date_of_final_accept>

    </thesis>
    Or any other format for which so-called “serialization” is defined. And these files can then be processed by computers.

    Standardizing a Standard

    Now, what if I create and announce such a standard as “Madejski Thesis Standard 1.0”? Well… most likely no one  would care.

    The power of the standard comes from the power of all the stakeholders using it. If it’s not really common then it isn’t really a standard.

    standards

    The comic is published on xkcd.com/927/ under the Creative Commons Attribution-NonCommercial 2.5 License.

    There is also one key element to standards: their openness. However, there is no single standard for what constitutes an open standard:

    There are a number of definitions of open standards which emphasize different aspects of openness, including the openness of the resulting specification (is it published online? do you have to pay to get it?), the openness of the drafting process (who can propose changes? who decides?), and the ownership of rights to the standard. link

    Coming from the internet community, we suggest using World Wide Web Consortium’s definition that stresses open process of standards creation, transparency, relevance and royalty-free usage (you don’t have to pay to use it):

    […] we define the following set of requirements that a provider of technical specification must follow to qualify for the adjective Open Standard:

    • transparency (due process is public, and all technical discussions, meeting minutes, are archived and referenceable in decision making)
    • relevance (new standardization is started upon due analysis of the market needs, including requirements phase, e.g. accessibility, multi-linguism)
    • openness (anybody can participate, and everybody does: industry, individual, public, government bodies, academia, on a worldwide scale)
    • impartiality and consensus (guaranteed fairness by the process and the neutral hosting of the W3C organization, with equal weight for each participant)
    • availability (free access to the standard text, both during development and at final stage, translations, and clear IPR rules for implementation, allowing open source development in the case of Internet/Web technologies)
    • maintenance (ongoing process for testing, errata, revision, permanent access)

    Investing in standards by civil society (mainly by using them, but also participating in their development) should go in parallel with ensuring that the community has a voice on these standards. Please go through the definition above as a checklist before using any standard. When working on data that is not yet standardized, we propose that you involve other international stakeholders and create a W3C community group devoted to working on data standardization in a given field.

    How to serve the data?

    Data opening is a quite costly process. When we’re doing it for a social cause, let’s be sure that apart from creating a tool operating on data, we also publish the data itself. How do we do it?

    One option is the data export. You export all the data and publish it online as a file. Anyone can access it just by downloading it. When data changes you should set-up automatic periodical data exports (each month or day, depending on the data source).

    That option works quite well when data is small in size and doesn’t change very often.

    The second common option is to serve data through so-called APIs. API is a piece software with detailed specification which acts as a socket through which the data can be pulled by other computer programs. Think of it as a power socket. For example, if your hair dryer has a matching plug, you just plug it in and the electricity flows to your hairdryer, much like data flows to your data-propelled computer program or website.

    API is a slightly more complicated option than data export, though if you’ve build your website on any modern web framework, you probably have an API with sufficient basic functionality built-in. When you have a vast amount of data that changes quite frequently, it’s more efficient to publish an API rather than do data exports.

    P.S. APIs should be standardized as well as the data they serves. Think of the power adapters you need to take with you while going to UK from Continental Europe. With software these adapters cost even more than the plastic ones at Heathrow!

    the UK plug adapter

    The UK power plug adapter by Adafruit Industries published under the Creative Commons Attribution-NonCommercial-ShareAlike 2.0 license.

    What aspect do we analyse?

    As part of this project, we will analyse and recommend data standards to be used in the tech for transparency field.

    You will find them below, divided by their scope – the real-life situations that can be modelled by these standards. For each we will mention:

    • open source tools that work with these standards, so you know what you can deploy to process your data, or what other projects can make use of your data;
    • coverage of the standard: who, where and how uses the standard. The bigger coverage the more established the standard;
    • contacts to people responsible for existing deployments so you can consult with them;
    • challenges in data modelling in existing projects (ie. is this parliamentary body closer to a committee, a commission or a  board? how was it modeled in countries with a  similar parliamentary system);

    finally, what kind of data is covered (data types, data classes) and links to specifications.

    learn more
  • analysis

    Data standards: What are they and why do they matter? The Introduction

    With a great pleasure we’re introducing an analysis of what data standards are and what they mean in practice.  You can read the report on the transparencee.org website where we divided it into 2 parts: The Introduction and The Analysis. This report is a part of our efforts to formulate regional standards for data and data related process (law included).

    Why data is so important? Let’s listen to Anna Kuliberda:

    Data helps us better understand the reality we function in, informs fact-based policies and, when well analyzed, it allows us to see patterns, irregularities and intersections we would never think of.

    That’s why most of the tech for transparency projects begin with data: we want to be better informed – so first of all we need the data. You’ve probably heard someone say “we will open these datasets” many times. This phrase hides all the complexity of a rather  tricky process:

    1. data gathering (from published resources or FOI law demands)
    2. data extraction (ie. extracting text from PDFs or scanned documents)
    3. cleaning (ie. Y, y, yes, “+” all stand for an affirmative answer)
    4. structuring (in the simplest scenario – sorting data into columns ).

    In the end, we get open data that can be analyzed further to reach some conclusions or published for others to work on. And all of that is great! Well… that is until you want to cooperate with others working on similar datasets.

    Here at TransprenCEE we’re building a common knowledge base, which we hope you’ll find  useful and expand it with your experiences.

    It’s important for civil society…

    Let’s say that working on public procurement projects you’ve created clear visualizations and somebody else has created a tool to flag suspicious tenders. You want to import your data to the others’ solutions and vice-versa. However, if your datasets were opened independently they probably have a different format and collaboration is not that simple. Your solutions don’t “speak” the same language. That problem grows bigger in international collaborations where social and legal contexts differ.

    Data standards are the answer to this problem. By supporting and using them, you ensure that your work will be easier to reuse by others. It’s like creating a piece which fits into bigger puzzle. It may slightly raise costs during data opening, but it significantly brings down the costs  of integrating with other solutions that ‘speak’ the same standard. In many cases, you can instantly use other solutions on your dataset.

    In  our series of analyses about data standards we will recommend the best ones for a given topic. Imagine that you want to analyse procurement data in your country or municipality? We say: “Well, if you open data in this standard you will  then be able to easily deploy these open source tools to analyse it.”

    It’s important for donors…

    If you are a donor representative then specifying standards in grants given will bring more value for your money and allow for better integration with other initiatives in the field.

    It’s important for IT experts too!

    And if you’re not an expert in the field (like public procurements), but an IT development leader, our research will list existing tools as reference, highlight challenges in data opening, introduce language link between model and real world for several of existing deployments, provide contact with people who have previously worked on a given standard and tools, and, of course, provide you with links to specifications.

    Please read the full report below: “Data standards: What are they and why do they matter? The Analysis“.

    Image from page 592 of “Wright’s book of poultry, revised and edited in accordance with the latest poultry club standards” (1911), published on Flickr, belongs to Public Domain.

    learn more
  • analysis

    Data standard for democratic organizations – Popolo

    New to data standards? Please read these introductions.

    A