recentpopularlog in

tsuomela : data-curation   212

« earlier  
[1904.04736] Cold Storage Data Archives: More Than Just a Bunch of Tapes
"The abundance of available sensor and derived data from large scientific experiments, such as earth observation programs, radio astronomy sky surveys, and high-energy physics already exceeds the storage hardware globally fabricated per year. To that end, cold storage data archives are the---often overlooked---spearheads of modern big data analytics in scientific, data-intensive application domains. While high-performance data analytics has received much attention from the research community, the growing number of problems in designing and deploying cold storage archives has only received very little attention. In this paper, we take the first step towards bridging this gap in knowledge by presenting an analysis of four real-world cold storage archives from three different application domains. In doing so, we highlight (i) workload characteristics that differentiate these archives from traditional, performance-sensitive data analytics, (ii) design trade-offs involved in building cold storage systems for these archives, and (iii) deployment trade-offs with respect to migration to the public cloud. Based on our analysis, we discuss several other important research challenges that need to be addressed by the data management community. "
archives  data-curation  big-data  science  computational-science 
april 2019 by tsuomela
The Data Curation Network – A shared staffing model for digital data repositories
"We are the Data Curation Network As professional data curators, research data librarians, academic library administrators, directors of international data repositories, disciplinary subject experts, and scholars we represent academic institutions and non-profit societies that make research data available to the public. What we do Data curators prepare and enrich research data to make them findable, accessible, interoperable and reusable (FAIR). Sharing our data curation staff across DCN partner institutions enables data repositories to collectively, and more effectively, curate a wider variety of data types (e.g., discipline, file format, etc.) that expands beyond what any single institution might offer alone."
publishing  online  scholarly-communication  data-curation 
february 2019 by tsuomela
Binder (beta)
"Have a repository full of Jupyter notebooks? With Binder, open those notebooks in an executable environment, making your code immediately reproducible by anyone, anywhere. "
data-curation  reproducible  python  ipython  programming  notebook  sharing  research  tool  github 
july 2018 by tsuomela
The Datamirror.org Experiment: Preservation Assurance for Federal Research Data – UC3 :: California Digital Library
"In early 2017, UC3 created Datamirror.org as an independent, dynamic, online mirror of Data.gov, the US federal government’s primary research data portal. Developed in collaboration with Code for Science & Society (CSS), a non-profit organization supporting innovative uses of technology for public good, Datamirror was intended to provide additional levels of assurance that the significant research data found at Data.gov remains freely accessible to the scholarly community and the public for open retrieval and reuse. As noted by the government’s Project Open Data initiative, “Data is a valuable national resource and a strategic asset to the U.S. Government, its partners, and the Public.” Thus, Datamirror.org plays a critical role in protecting this valuable resource from risks of data loss or loss of availability due to technological obsolescence, funding constraints, shifting organizational priorities, malicious attack, or inadvertent error. "
data  preservation  data-curation  government  federal 
july 2018 by tsuomela
Frictionless Data: Making Research Data Quality Visible| International Journal of Digital Curation
"There is significant friction in the acquisition, sharing, and reuse of research data. It is estimated that eighty percent of data analysis is invested in the cleaning and mapping of data (Dasu and Johnson,2003). This friction hampers researchers not well versed in data preparation techniques from reusing an ever-increasing amount of data available within research data repositories. Frictionless Data is an ongoing project at Open Knowledge International focused on removing this friction. We are doing this by developing a set of tools, specifications, and best practices for describing, publishing, and validating data. The heart of this project is the “Data Package”, a containerization format for data based on existing practices for publishing open source software. This paper will report on current progress toward that goal."
research-data  data-curation  analysis  methods 
may 2018 by tsuomela
Shifting to Data Savvy: The Future of Data Science In Libraries - D-Scholarship@Pitt
"The Data Science in Libraries Project is funded by the Institute for Museum and Library Services (IMLS) and led by Matt Burton and Liz Lyon, School of Computing & Information, University of Pittsburgh; Chris Erdmann, North Carolina State University; and Bonnie Tijerina, Data & Society. The project explores the challenges associated with implementing data science within diverse library environments by examining two specific perspectives framed as ‘the skills gap,’ i.e. where librarians are perceived to lack the technical skills to be effective in a data-rich research environment; and ‘the management gap,’ i.e. the ability of library managers to understand and value the benefits of in-house data science skills and to provide organizational and managerial support. This report primarily presents a synthesis of the discussions, findings, and reflections from an international, two-day workshop held in May 2017 in Pittsburgh, where community members participated in a program with speakers, group discussions, and activities to drill down into the challenges of successfully implementing data science in libraries. Participants came from funding organizations, academic and public libraries, nonprofits, and commercial organizations with most of the discussions focusing on academic libraries and library schools."
data-curation  libraries 
april 2018 by tsuomela
DH Curation | A community of practice
"Humanists have data and they need data skills. As the materials and analytical practices of research become increasingly digital, the theoretical knowledge and practical skills of information science, librarianship, and archival science will become ever more vital to humanists and to anyone working with cultural heritage."
digital-humanities  curation  data-curation 
march 2017 by tsuomela
Endangered Data Week - April 17-21, 2017
"Endangered Data Week is a new, collaborative effort, coordinated across campuses, nonprofits, libraries, citizen science initiatives, and cultural heritage institutions, to shed light on public datasets that are in danger of being deleted, repressed, mishandled, or lost. The week's events can promote care for endangered collections by: publicizing the availability of datasets; increasing critical engagement with them, including through visualization and analysis; and by encouraging political activism for open data policies and the fostering of data skills through workshops on curation, documentation and discovery, improved access, and preservation."
data-rescue  data-curation  activism  libraries 
march 2017 by tsuomela
DataLumos
"DataLumos is an ICPSR archive for valuable government data resources. ICPSR has a long commitment to safekeeping and disseminating US government and other social science data. DataLumos accepts deposits of public data resources from the community and recommendations of public data resources that ICPSR itself might add to DataLumos."
data  data-curation  government  crowdsourcing  preservation  archives  repository 
february 2017 by tsuomela
Welcome - Data Refuge
"DataRefuge helps to build refuge for federal data and supports climate and environmental research and advocacy. We are committed to fact-based arguments. DataRefuge preserves the facts we need at a time of ongoing climate change. "
data-curation  politics  government  preservation  archives 
february 2017 by tsuomela
ckan - The open source data portal software
"CKAN is a powerful data management system that makes data accessible – by providing tools to streamline publishing, sharing, finding and using data. CKAN is aimed at data publishers (national and regional governments, companies and organizations) wanting to make their data open and available."
data-curation  data-publication  data  management  research-data  open-source  server  catalog  metadata  cataloging 
february 2017 by tsuomela
Water Data Portal
"The International Water Management Institute (IWMI) has been doing research on water for the last 30 years. Being a non-profit research organization all the research outputs and data used for the research are shared with researchers across the globe as a global public goods (GPGs). The Water Data Portal (WDP), following "one-stop shop" approach, provides access to a large amount of data related to water and agriculture. WDP contains meteorological, hydrological, socio-economic, spatial data layer, satellite images as well as hydrological model setups. The data in the WDP, both spatial & non-spatial, are supported by the standardized metadata and are available for download by user including academia, scientists, researchers and decision makers. However, access is provided in compliance with copyrights, intellectual property rights and data agreements with our partners."
data-sources  data-curation  preservation  international  weather  meteorology 
november 2016 by tsuomela
The International Data Rescue (I-DARE) Portal | I-DARE
"This International Data Rescue (I-DARE) Portal provides a single point of entry for information on the status of past and present worldwide to be rescued data and data rescue projects, on best methods and technologies involved in data rescue, and on metadata for data that need to be rescued."
data-sources  data-curation  preservation  international  cooperation 
november 2016 by tsuomela
Data Basin
"Data Basin is a science-based mapping and analysis platform that supports learning, research, and sustainable environmental stewardship."
data-sources  data-curation  climate  environment  weather  meteorology 
november 2016 by tsuomela
Databrarianship: The Academic Data Librarian in Theory and Practice - Books / Professional Development - Books for Academic Librarians - New Products - ALA Store
"With the appearance of big data, open data, and particularly research data curation on many libraries’ radar screens, data service has become a critically important topic for academic libraries. Drawing on the expertise of a diverse community of practitioners, this collection of case studies, original research, survey chapters, and theoretical explorations presents a wide-ranging look at the field of academic data librarianship. By covering the data lifecycle from collection development to preservation, examining the challenges of working with different forms of data, and exploring service models suited to a variety of library types, this volume provides a toolbox of strategies that will allow librarians and administrators to respond creatively and effectively to the data deluge. Edited by Kristi Thompson and Lynda Kellam, Databrarianship: The Academic Data Librarian in Theory and Practice provides advice and insight on data services for all types of academic libraries and will be of interest to library educators."
book  publisher  data  data-curation  libraries  education  academic-lab 
june 2016 by tsuomela
A map of Data Labs in Libraries | The Library Lab
"Inspired by Amanda Goodman’s Map of 3D Printers in Libraries I’m building a map of data labs in libraries around the world. Hit me on Twitter at @clauersen or in the comment section if you got a lab that should be added A library data lab is in the context defined as a lab that provides data services (like software for statistics, mapping, visualization and data wrangling and the hardware and skills to support this) and facilitate learning activities within this area. At The Faculty Library of Social Sciences, Copenhagen University Library we are currently in the process on building Digital Social Science Lab."
libraries  data  data-curation  education  academic-lab 
june 2016 by tsuomela
NCSA Brown Dog
"Much of the data generated by science, social science, and the humanities is smaller, unstructured, un-curated and thus not easily shared. Taken together, however, this “long-tail” data, both past and present, represents a vast amount of research data with the potential to greatly impact future research in many areas of study. The unstructured, un-curated nature of this data, however, means that once the data is gathered and the research published, the data often never sees the light of day again. In addition, contemporary science relies on digital data and software that evolves and disappears quickly as underlying technology changes. Thus we are entering a period where scientific results are no longer easily reproducible.  Since reproducibility is foundational to scientific discovery, development of a method for easily accessing legacy data and software is essential to maintaining the viability of large bodies of research."
data  research  data-curation 
june 2016 by tsuomela
Data Curation — Council on Library and Information Resources
"Sayeed Choudhury, associate dean for research data management at Johns Hopkins University (JHU) and leader of the Data Conservancy, discusses the "stack model" for data management employed by JHU and discusses the model's components—storage, archiving, preservation, and curation—in the following video."
data-curation  model  service  libraries  curation  preservation  archive  storage 
may 2016 by tsuomela
Zotero for Data Repositories Webinar
"On May 17, 2016, DataCite continued our monthly webinar series with Sebastian Karcher, Associate Director of the Qualitative Data Repository (QDR) at Syracuse University, presenting on Zotero for data repositories. Sebastian is an expert in scholarly referencing and citation workflows and has been a longtime contributor to the Citation Style Language as well as Zotero, the open source reference management software. Sebastian provided insights in how Zotero can be used to fetch metadata from data repositories and demonstrated how repositories can aid integration with reference managers such as Zotero. Example repositories included Dataverse, the Dryad Digital Repository, the UK Data Service, and the Qualitative Data Repository."
data-curation  citation  zotero 
may 2016 by tsuomela
CLTC Scenarios – CLTC
"How might individuals function in a world where literally everything they do online will likely be hacked or stolen? How could the proliferation of networked appliances, vehicles, and devices transform what it means to have a “secure” society? What would be the consequences of almost unimaginably powerful algorithms that predict individual human behavior at the most granular scale? These are among the questions considered through a set of five scenarios developed by the Center for Long-Term Cybersecurity (CLTC), a new research and collaboration center founded at UC Berkeley’s School of Information with support from the Hewlett Foundation. These scenarios are not predictions—it’s impossible to make precise predictions about such a complex set of issues. Rather, the scenarios paint a landscape of future possibilities, exploring how emerging and unknown forces could intersect to reshape the relationship between humans and technology—and what it means to be “secure.”"
big-data  ethics  data  research  scenario  scenario-planning  futures  data-curation  online 
may 2016 by tsuomela
Dissertations and Data
"The keynote provides an overview on the field of research data produced by PhD students, in the context of open science, open access to research results, e-Science and the handling of electronic theses and dissertations. The keynote includes recent empirical results and recommendations for good practice and further research."
research  data-curation  graduate-student  graduate-school  publishing  scholarly-communication 
april 2016 by tsuomela
About Data-Pass | Datapass
"The Data Preservation Alliance for the Social Sciences (Data-PASS) is a voluntary partnership of organizations created to archive, catalog and preserve data used for social science research. Examples of social science data include: opinion polls; voting records; surveys on family growth and income; social network data; government statistics and indices; and GIS data measuring human activity."
data-curation  preservation  social-science  collaboration  catalog  multi-institution 
november 2015 by tsuomela
Home | Qualitative Data Repository
"QDR selects, ingests, curates, archives, manages, durably preserves, and provides access to digital data used in qualitative and multi-method social inquiry.  The repository develops and publicizes common standards and methodologically informed practices for these activities, as well as for the reusing and citing of qualitative data.  Four beliefs underpin the repository's mission: data that can be shared and reused should be; evidence-based claims should be made transparently; teaching is enriched by the use of well-documented data; and rigorous social science requires common understandings of its research methods."
data  data-curation  repository  qualitative  methods  social-science  research-data 
september 2015 by tsuomela
PLOS ONE: Sizing the Problem of Improving Discovery and Access to NIH-Funded Data: A Preliminary Study
"This study informs efforts to improve the discoverability of and access to biomedical datasets by providing a preliminary estimate of the number and type of datasets generated annually by research funded by the U.S. National Institutes of Health (NIH). It focuses on those datasets that are “invisible” or not deposited in a known repository."
data-curation  discovery  access  health  health-care  information-science 
august 2015 by tsuomela
Impact of Social Sciences – Introduction to Open Science: Why data versioning and data care practices are key for science and social science.
"A significant shift in how researchers approach their data is needed if transparent and reproducible research practices are to be broadly advanced. Carly Strasser has put together a useful guide to embracing open science, pitched largely at graduate students. But the tips shared will be of interest far beyond the completion of a PhD. If time is spent up front thinking about file organization, sample naming schemes, backup plans, and quality control measures, many hours of heartache can be averted."
science  data  data-curation  publishing  open-science 
june 2015 by tsuomela
The Hague Declaration
"The Hague Declaration aims to foster agreement about how to best enable access to facts, data and ideas for knowledge discovery in the Digital Age. By removing barriers to accessing and analysing the wealth of data produced by society, we can find answers to great challenges such as climate change, depleting natural resources and globalisation."
data-curation  data  management  sharing  declaration  scholarly-communication 
june 2015 by tsuomela
« earlier      
per page:    204080120160

Copy this bookmark:





to read