In the Time of COVID-19, Secondary Data Deserves More Time in the Spotlight

22/01/2021

Illustration: Cdd20/pixabay.

Natural observations and data collection have always played important roles in scientific research. Data collection in particular has been an important first step towards answering novel research questions and testing hypotheses – including in ecology, environmental conservation and allied fields. Data is useful to accurately visualise trends and inform decisions.

However, the COVID-19 pandemic has imposed multiple setbacks on the world, and continues to do so even as researchers struggle to cope. Early-career researchers and students were the worst hit, since many of us depend on primary data. PhD students and researchers lost precious data needed to conclude their theses and to meet project objectives, and undergraduate and graduate students are worried their lack of research experience may reflect badly on their future prospects.

Primary data is central to research – but the pandemic makes for a useful time to reflect on what secondary data enables us to do, especially in a time when we’re at risk of having no data at all. Primary data is data we obtain directly based on observations, experiments, etc. Secondary data is data obtained from sources that collected primary data and have organised it to be searchable, accessible, etc.

There are many ways to use secondary data in research. For example, it can feed literature reviews and meta-analyses, which are both important to distil patterns and information from scientific articles. It can also be used to develop conceptual models and test new statistical methods.

Fantastic secondary data and where to find them

Of course, secondary data can’t ultimate replace its primary counterpart in value – but it can help save time, resources and efforts required to collect primary data. Collecting data is often a time-consuming affair, and frequently requires high financial, human and social capital. With the COVID-19 pandemic still on, it has become riskier and more difficult to step out into the field as well.

In recent years, there has been a rise in citizen science projects in India; some examples include Big4mapping, Biodiversity Atlas India, eBird India, SeasonsWatch and Roadkills. While citizen-driven databases come with their own caveats, it’s possibile to produce good, important research working within these limitations (examples by eBird here). There are also secondary data platforms, like the Global Biodiversity Information Facility, the IUCN Red List, the US Geological Survey explorer and WorldClim, that lend themselves to research of a coarser variety and could even be free of cost. Then there are data repositories like Data Dryad, figshare and Zenodo, which allow users to use data from published studies. Data mining from the websites and social media platforms is also popular these days.

Long-term monitoring plots are another valuable source. In India, there are a few long-term forest monitoring programmes but they don’t sufficiently cover all landscapes. Under the Climate Change Action Programme of the Union environment ministry, the government has established ‘long-term ecological observatories’ in six landscapes. A few research institutes, universities and NGOs also have their own long-term monitoring sites. However, the data from these plots is not publicly accessible.

Another source of secondary data in India is the statistics and records maintained by ministries, departments and agencies of the Central and State government, which often requires taking prior permission to access the data. Websites like indiastats.com and data.gov.in host a lot of data from different ministries, departments, sectors and regions in India.

Working on secondary data is a challenge in itself and helps with developing important skills. By doing so, we strengthen our problem-solving skills but also get to work on management problems and tools and methods of analysis. However, undergraduate and postgraduate curricula don’t pay enough attention to them – probably because of the greater significance attached to primary data and a heightened awareness of the challenges in working with secondary data.

Using secondary data for research

In India, scientific research still banks on primary data, and once researchers have completed each project, they often lock up the data and forget about it. As such, there is a lot of data that scientists can use to produce more results if only it was more widely available. Outside of India, but especially in the west, it is more common to share data (as long as its provenance is duly credited). There are also more doctoral theses that predominantly use data from long-term monitoring projects, governmental and non-governmental agencies, social media sites, etc.

Two reasons for this difference are the not insubstantial amount of mistrust among many scientists in India that they will not be duly credited, and the difficulty in accessing data from government agencies, universities and NGOs. A more fundamental reason for this could be the absence of a national repository or database, and a relevant policy plus institutional arrangements to facilitate data-sharing.

Ultimately, the pandemic has gifted us an opportunity to rethink and improve the way we conduct research – more so since there are likely to be more pandemics in future. So when we have the time, we ought to shift and transform our working styles, and use what we already have through collaborations and goodwill.

Sakshi Rana and Tanvi Gaur are research scholars with the Wildlife Institute of India.