When I told my department that I was going to begin applying for grants to do research on the Indian economy I got the same email from two of my well-esteemed, well-published, well-admired advisors, "good luck!" Over the past ten months I have witnessed the same expression many timesupon receipt of my elevator pitch; a raising of the eyebrows, a slight widening of the eyes, perhaps a bit of humor. It comes as no surprise to most economists, that it is one of the largest feats of the field to track down usable raw data in India. Most of my superiors have given me the starkly boring imagery of waiting in offices, sending emails, talking on the phone, waiting on hold, and an unbelievable amount of drinking chai and chatting straight into the abyss. Research happens where there is money and the burden of finding data in India is one that is largely financial. It is not only a question of if you can afford labor on the ground but if you can afford labor continuously and if you can afford the amount of time it will take to get the data. And lastly, if you can afford the chance that you might not get the data at all, or that the data might just not exist.
Earlier this year as I was working as a research assistant for a labor economist and we wanted to look at a particular relationship between two variables in a developing country; we chose the country that we could get the data for. This begs the question; are researchers skipping over India as a viable research option because of lack of access to data? Are people being discouraged from the field of research as a whole in India because of lack of access to data? Most importantly, what are the implications of these if they are true?The Argument for and Against Open Data
Open Knowledge International (OKI) is a global non-profit focused on making open data a reality through helping use data to take action on social issues. They define open data as:
Data that can be freely used, re-used and redistributed by anyone â€“ subject only, at most, to the requirement to attribute and sharealike.
OKI says in their Open Data Handbook that as of now, "Open data, especially open government data, is a tremendous resource that is as yet largely untapped" (OKI). They split the benefits of open data into 9 categories where the positive effects of open government data can already be seen: transparency and democratic control, participation, self-empowerment, improved or new private products and services, innovation, improved efficiency/effectiveness of government services, impact measurement of policies, and new knowledge from combined data sources and patterns in large data volumes. OKI lists success stories of open data on their website ranging from, "Open data reduces mortality rate in UK hospitals," to "[Open government data exposes] $62m in potential pharmaceutical savings in Southern Africa" (OKI). We must acknowledge the irony in the fact that the movement for open data would hugely benefit from open data itself as, "activists are discovering the value of [Open Government Data] for defending their causes" (Jannsen 2012). Analytics India Magazine adds the ability to foresee predict and prepare for natural disasters and enhance government transparency to the list of things that open data can do. The benefits of open data may indeed extend even further than social causes, although they may overlap, to the capitalist's favorite: money, "developers and hackers are making smartphone apps based on datasets held by the public sector" (Jannsen 2012). This will be described more in the next section in it's origins in geospatial modeling. Analytics India Magazine describes it well when they say, "In a nutshell, open data can enable creation of tools to improve consumer choice and citizen decision making."
As an economic researcher, I can tell you the technical benefits of access to open data and especially open government data. Access to larger datasets, more robust data, means being able to create more dynamic models that may have more significant results. This should have an impact across the field of data analytics and can lead to many political conclusions: financial, social, scientific, or otherwise.
There are certainly exceptions to the argument for open data, including data on individuals that may compromise privacy and data that may compromise national security. OKI, even acknowledges this. Michael Blakemore and Max Craglia place them in the center of this debate in their paper Access to Public Sector Information in Europe: Policy, Rights, and Obligations arguing that there is not evidence that the economic benefits of open data outweighs the cost of not charging for it. Two scholars from the EU, Stefan Kulk and Bastiaan van Loenen offer take a dissenting standpoint offering a nuanced argument on how the lines can be blurred betweened private and non-private data. They say:
Open data may not seem to be personal data on first glance especially when it is anonymized or aggregated. However, it may become personal data by combining it with other publicly available data or when it is de-anonymized.
Meanwhile there are arguments against open data, even where there are no privacy or security problems.The Battle for Open Data
The analysis of access to data is an international one and a sizely one, and is rooted in the beginnings of its' struggle. In 2014 at the General Conference of the European Consortium for Political Research in Glasgow, Jonathan Gray, a researcher from King's College London, presented what was to be the first comprehensive background, or genealogy as he calls it, of the movement for open data (Gray 2015). Like most grassroots movements, access to information on the movement for open data is sparse, ironically as it is in this case. The movement began in the late twentieth century, stemming out of other "open" movements: open source, open hardware, open content, open knowledge, open web, etcâ€¦ The United States, followed shortly by the European Union, pioneered open data in the 1990s and 2000s with beliefs that geospatial information, or data connected to geographic coordinated which can be handled in real time, has the potential to drive economic growth (Folger 2009). Some information accessibility measures can be traced back to 1895 in the United States, in which the Printing Law stated that government publications couldn't be copyrighted. More clearly, in a 1993 US policy, it was stated that charges could not be administered above the cost of dissemination. Are these cost-cutting policies to benefit the government or are they motivated by OKI's fundamental beliefs that open data is important for solving social issues? Regardless of which it is, while the term has been only recently popularized over the past two decades, open data has been growing in it's prominence and especially over the past ten years with the inception of the majority of over 185 countries that have some portal with access to open data. OpenDataSoft, a private company aimed at helping disseminate data effectively and create new applications, offers a publicly available and updated geotagged map with information on all of the open data sources available (OpenDataSoft).
The Open Data Charter (ODC) has been one of the largest landmarks in the battle for open access to data. The advent of this charter began in July of 2013 when G8 leaders added their signature to the G8 Open Data Charter. Since 2013, the charter has adapted many changes thanks to the input of participants around the globe and has morphed into a new charter called the International Open Data Charter, and countries and regions continue to sign onto the charter as new members. The 6 main components of the charter are: open by defaults, timely and comprehensive, accessible and useable, comparable and interoperable, for improved governance and citizen engagement, and for inclusive development and innovation. As of now, the charter has been adopted by 17 national governments and 35 other local governments.
People are coming together from around the world for the battle for open data; since 2005 OKI has been organizing events for people to gather and talk about data and knowledge access. Initiatives have been started around the world including for one example, the National Democratic Institute's Open Election Data Initiative in 2013 which aims to help equip groups with the tools they need to make election data that meets the criteria for open data.Barriers to Data
The Congressional Research Service (CRS) in the United States released a report in 2009 saying that, "impediments to data sharing such as lack of interoperability between systems, restrictions on use, concerns about data security, and a lack of knowledge about what data exist and where that data can be found could hinder a timely and effective emergency response (CRS)." Even in situations where the data is not "urgent," however that may be deemed, the benefits have already been made clear of the existence of open data. This quote from CRS includes a lot of the main barriers that people experience in trying to access data. The example that they cite is the case of Hurricane Katrina, one of the largest natural disasters in US history. In this case, ability to access geospatial data by the emergency responders may have saved lives, but lack of access to it in the immediate area plagued their problems even worse. In addition to the barriers mentioned by CRS, there are also issues of money being charged for data, restrictions on membership-only access, use of encryption or other closed technology, copyright laws, patents, search engine restrictions, and time limits among other restrictions. To some extent there is still some debate on whether open data is necessarily free data and vice-versa. Eugene Osovetsky, WebServius Founder & CTO on Quora certainly makes a good point when he says, "A Modern REST API with self-service signup capabilities, selling some valuable data at $0.001 per record, is more 'open' than data that's sitting on something like Data.gov in a 50GB archived file is some obscure proprietary format- even though the latter is free (Analytics India)" Certainly for the average person this is true, but it does beg the question to society whether technological literacy or financial standing should be valued higher. Although often, the two come hand-in-hand.
The curious reader can read more about the barriers to creation of open data from the advent of the data's collection in Anneke Zuiderwijk and Marijn Janssen's essay Barriers and Development Directions for the Publication and Usage of Open Data: A Socio-Technical View which can be found in the book Open Government: Opportunities and Challenges for Public Governance.Open Data in India
India experiences all of the barriers mentioned in the last paragraph, but in it's own unique ways. Obviously some countries have easier access to data, and others don't. India falls into the latter. Of two distinct experiences of mine: the first I emailed a researcher who wrote a paper in which a dataset he used was relevant to my own researcher. He responded telling me that his institution doesn't allow data-sharing. Just to reiterate this- he didn't just tell me there was one dataset he couldn't share with me. He told me that his institution has a policy says that they can never share research with the outside. This is certainly a clear cut barrier. Another barrier I have experienced is with the National Sample Survey (NSS). NSS is one of the biggest and most prominently used datasets for research done in India. They have a spreadsheet posted online including the costs of their data. They have more than 30 purchasable surveys and the prices are different for different types of researchers. The cheapest group is for Indian students and the most expensive is institutions outside of India. The most expensive survey ranges from 16223 to 49290 RS ($243.33 to $739.31). This is a huge amount of money and normally a researcher will need more than one survey to do their research. Beyond this, the NSS Office doesn't have an effective standardized procedure for obtaining the data. Something else I have noticed is that there seems to be some primary data collection that has been replicated by different organizations without communication between the organizations. If the organizations worked together, they could support each other and perhaps use money and resources more effectively. But instead because of lack of communication and understanding, surveys are replicated.
The Transparency and Accountability Initiative (TIA), an organization "working toward a world where citizens are informed and impowered; governments are open and response; and collective action advances the public good (TIA)," facilitated a 2010 report on the status of open data in India written by the Centre for Internet & Society. This slightly outdated report of 52 pages pre-dates the National Data Sharing & Accessibility Policy of 2012(NDSAP) and perhaps even contributed to it's advent. The government of India has made efforts to make data more accessible, each steps towards an open data policy. The Department of Information and Technology has created the Open Government Data platform which holds government data across ministries. This was created as per the requirements of the NDSAP. The government of India has an approved Open Government Data License with guidelines for ensuring the data is not "misused or misinterpreted, and that all users have the same and permanent right to use the data (meity.gov.in)." The objective of the site is tri-fold. To summarize: (1) to provide the public access to government data, to ensure that there is transparency, and (3) to shift the government paradigm to include citizens in data-related action. With noble aspirations, the Open Government Data Platform struggles to deliver what they had hoped. The issue with this site is that it's hard to navigate, the files are sometimes in complicated formats to open, and often the files hold very little information, perhaps one variable with 10 cases. Other parts of the government of India have made efforts to release open data, which all face similar issues.Conclusion
The author of the same article in Analytics India Magazine says, "While the size of population in India is huge, there is a lack of digital records that primarily hinders the adoption of big data and analytics by the government itself (Analytics India Magazine)." But it's clear that this is a vast oversimplification of the issue. In India the data can be expensive like the NSS data, it can be hard to find as in the case of studies being replicated, it can be technologically convoluted such as much of the data on data.gov.in, and it can be inaccessible as with data-sharing policies, among other issues. In each of these cases, the data is digitized but inaccessible. Lack of digitized data is just one more problem that's added to the pile, but this may be a hyperbole that the government uses to avoid bearing the cost of providing what the public is asking for. The Transparency and Accountability Initiative released their Open Government Data Study on India in 2010. It has been 8 years now, where is the open data?
India is not the only country with problems facing access to open data. But as a country of more than a billion people, the organizations collecting data and especially India's government certainly holds a responsibility to work towards public access to appropriate data. Luckily, the ball is rolling for change. The government is only recently starting to make these changes and in addition there are organizations including India Open Data Association (IODA) and the Centre for Internet & Society (the latter have an internship program for any interested undergraduates). working to make plausible change as well. Moving forward, the government and other organizations working towards an India with open access data need to acknowledge the individual needs and most distinct barriers that India has towards this goal.References
"A Comprehensive List of All Open Data Portals Around the World." OpenDataSoft. Accessed May 4, 2018. https://www.opendatasoft.com/a-comprehensive-list-of-all-open-data-portals-around-the-world/.
Deoras, Srishti. "What is Open Data and Where Can You Find it?" Analytics India Magazine, April 17, 2017. https://analyticsindiamag.com/what-is-open-data-and-where-can-you-find-it/.
Folger, Peter. Geospatial Information and Geographic Information Systems (GIS): Current Issues and Future Challenges. Washington D.C.: Congressional Research Service, 2009.
Gray, Jonathan. "Towards a Genealogy of Open Data." Presentation at the General Conference of the European Consortium for Political Research, Glosgow, September 3-6, 2014.
Janssen, Katleen. "Open Government Data and the Right to Information: Opportunities and Obstacles." The Journal of Community Informatics 8, no. 2 (2012): 1-10.
Katte, Abhijeet. "Open Data Movement and its Impact on the World." YOURSTORY, October 5, 2017. https://yourstory.com/2017/10/open-data-movement-and-its-impact-on-the-world/.
Kulk, Stefan and Bastiaan van Loenen. "Brave New Open Data World?" International Journal of Spatial Data Infrastructures Research 7 (2012): 196-206.
National Research Council. Successful Response Starts with a Map. Washington D.C.: National Research Council, 2007.
"Open Data Charter." Open Data Charter. Accessed May 4, 2018. https://opendatacharter.net/.
"Open Knowledge International." Open Knowledge International. Accessed May 4, 2018. https://okfn.org/.
"Open Data" Ministry of Electronics and Information Technology. Accessed May 4, 2018. http://meity.gov.in/open-data.
"The Centre for Internet and Society." The Centre for Internet and Society. Accessed May 4, 2018. https://cis-india.org/.
"Transparency and Accountability Initiative." Transparency and Accountability Initiative. Accessed May 4, 2018. http://www.transparency-initiative.org/.
Wright, Glover, Pranish Prakash, Sunil Abraham, and Nishant Shah. Open Government Data Study: India. London: Transparency & Accountability Initiative, 2010.
Zuiderwijk, Anneke and Marijn Janssen. "Barriers and Development Directions for the Publication and Usage of Open Data: A Socio-Technical View." In Open Government, Public Administration and Information Technology 4, edited by M. Gasco-Hernandez, 115-135. New York: Springer Science+Business Media, 2014.