Scientific advances depend on the availability, accessibility and reusability of data, software, samples, and data products. Yet large amounts of data on the Earth are not well preserved or preserved at all. This proposal will address these questions by supporting the collaboration of a synthesis-science team and data-science team. We will conduct a transdisciplinary, trans-national synthesis science project in parallel and in partnership with a project on the use and re-use of environmental and socioeconomic data to assess practices for managing and preserving data. The project will provide a unique opportunity for data scientists and synthesis scientists to collaborate in real-time toward the goal of improving research outcomes and data sharing.

Our synthesis-science team has expertise in manipulating environmental and socioeconomic data, modelling, and temporal, and spatial analysis. In partnership with teams in France, Brazil, the United States, Japan, Australia, and the United Kingdom, they will examine the socioeconomic effects of natural protected areas (PAs) on local communities. The resulting

tools and metrics will enable better prediction and mitigation of the effect of actions that disrupt historical land use practices and threaten local communities. Our data-science team of leading environmental data management professionals, data communities (RDA, ESIP), society journals (AGU), and representatives of e-infrastructures for data attribution (e.g., DataCite and ORCID) will develop leading practices on data citation, attribution, credit, and reuse. With these new practices as the content, we will create a toolkit and workshop materials to deliver the content via face-to-face meetings and webinars internationally. In addition, a tool will be created for researchers to view their shared data, data usage and reuse through widgets and on-line profiles.

The project results will be useful to circa 300,000 earth, space, and environmental researchers worldwide. It will advance the momentum of cultural change in the use and reuse of big data in research on real-world problems.