recentpopularlog in

juliusbeezer : opendata   77

Insee - Statistiques locales
L’Institut national de la statistique et des études économiques collecte, produit, analyse et diffuse des informations sur l’économie et la société françaises.
maps  france  opendata  driving 
5 weeks ago by juliusbeezer
Reproducibility: let’s get it right from the start | Nature Communications
we will initially only request that source data be supplied with the revised versions of the manuscript, unless we feel that it would be particularly important for the reviewers to have access to these data at an earlier stage. However, we strongly encourage authors to lead the way in promoting transparency by making sure that their work fulfils or exceeds our criteria right from the start. Here are four simple steps you can take to give your paper the best possible chance of flying through peer review:
editing  peerreview  opendata 
october 2018 by juliusbeezer
Open Science and its Discontents | Ronin Institute
Although there is a spectrum of responses, criticism of open-science tends to fall into one of two camps, that I will call “conservative” and “radical”. This terminology is not intended to imply an association with any conventional political labels, they are simply used for convenience to indicate the relative degree of comfort with the institutional status quo. Let’s look at these two groups of critiques.

The conservative response to regular timely release of pre-publication data could be best summarized by the phrase: “are you kidding me? why would I do that?” The apotheosis of this notion was appeared in an editorial published in the New England Journal of Medicinewhich described with some horror the “emergence of a new class of research parasites”. They further concluded that some of these parasites might not only use that data for their own publications, but might seek to examine whether the original study was correct....

Arguments for open-science made in response to the conservative critique tend to assume that release of more data, code, papers is a pure good in and of itself, and downplay the political economy in which they are embedded.
openscience  openaccess  archiving  opendata  politics  business  sociology 
july 2017 by juliusbeezer
Ten simple rules for responsible big data research
The beneficial possibilities for big data in science and industry are tempered by new challenges facing researchers that often lie outside their training and comfort zone. Social scientists now grapple with data structures and cloud computing, while computer scientists must contend with human subject protocols and institutional review boards (IRBs). While the connection between individual datum and actual human beings can appear quite abstract, the scope, scale, and complexity of many forms of big data creates a rich ecosystem in which human participants and their communities are deeply embedded and susceptible to harm. This complexity challenges any normative set of rules and makes devising universal guidelines difficult.
opendata  sciencepublishing  ethics  research  privacy 
april 2017 by juliusbeezer
The 20% Statistician: Five reasons blog posts are of higher scientific quality than journal articles
I’ve tried to measure blogs and journal articles on some dimensions that, I think, determine their scientific quality. It is my opinion that blogs, on average, score better on some core scientific values, such as open data and code, transparency of the peer review process, egalitarianism, error correction, and open access. It is clear blogs impact the way we think and how science works. For example, Sanjay Srivastava’s pottery barn rule, proposed in a 2012 blog, will be implemented in the journal Royal Society Open Science. This shows blogs can be an important source of scientific communication. If the field agrees with me, we might want to more seriously consider the curation of blogs, to make sure they won’t disappear in the future, and maybe even facilitate assigning DOI’s to blogs, and the citation of blog posts.
blogs  sciencepublishing  openscience  openness  opendata 
april 2017 by juliusbeezer
Fueling the Gold Rush: The Greatest Public Datasets for AI – Startup Grind – Medium
It has never been easier to build AI or machine learning-based systems than it is today. The ubiquity of cutting edge open-source tools such as TensorFlow, Torch, and Spark, coupled with the availability of massive amounts of computation power through AWS, Google Cloud, or other cloud providers, means that you can train cutting-edge models from your laptop over an afternoon coffee.

Though not at the forefront of the AI hype train, the unsung hero of the AI revolution is data — lots and lots of labeled and annotated data, curated with the elbow grease of great research groups and companies who recognize that the democratization of data is a necessary step towards accelerating AI.

It’s important to remember that good performance on data set doesn’t guarantee a machine learning system will perform well in real product scenarios. Most people in AI forget that the hardest part of building a new AI solution or product is not the AI or algorithms — it’s the data collection and labeling. Standard datasets can be used as validation or a good starting point for building a more tailored solution.

[curated list of opendata sets follows]
open  opendata  learning  software 
february 2017 by juliusbeezer
Around the Web: Saving Government Data from the Trumpocalypse – Confessions of a Science Librarian
While I’m working on a major update to my Documenting the Donald Trump War on Science: Pre-Inauguration Edition and preparing for the first of the post-inauguration posts, I thought I’d whet everyone’s appetite with a post celebrating all the various efforts to save environmental, climate and various kinds of scientific and other data from potential loss in the Trump presidential era.
openscience  opendata  canada  us  politics  library 
february 2017 by juliusbeezer
The high-tech war on science fraud | Science | The Guardian
Statcheck had read some 50,000 published psychology papers and checked the maths behind every statistical result it encountered. In the space of 24 hours, virtually every academic active in the field in the past two decades had received an email from the program, informing them that their work had been reviewed. Nothing like this had ever been seen before: a massive, open, retroactive evaluation of scientific literature, conducted entirely by computer.

Statcheck’s method was relatively simple, more like the mathematical equivalent of a spellchecker than a thoughtful review, but some scientists saw it as a new form of scrutiny and suspicion, portending a future in which the objective authority of peer review would be undermined by unaccountable and uncredentialed critics.
opendata  psychology  openscience  peerreview 
february 2017 by juliusbeezer
Researchers Are Preparing for Trump to Delete Government Science From the Web | Motherboard
While it’s easy to scrape an HTML website, Paterson and others are worried that, for instance, a NOAA database and tool regularly used by city planners to calculate sea level rise could be pulled offline.

“It’s less the documents, which we can get through alternative means,” he said. “The bigger issue in my mind is the access to databases and analytic software that public dollars paid for which by administrative fiat they may remove. I use the NOAA sea level rise projection database for discussion in my environmental impact assessment class. I use the greenhouse gas emission calculator for analysis of major federal climate actions.”

One of the main concerns is that a Trump presidency doesn’t even have to purposefully take down these tools—many of them will simply break or become useless without being regularly updated.
sciencepublishing  us  politics  opendata  openaccess  openscience  open 
december 2016 by juliusbeezer
Don't Just Build It, They Probably Won't Come: Data Sharing and the Social Life of Data in the Historical Quantitative Social Sciences | International Journal of Humanities and Arts Computing
According to the research, the main hindrances to data sharing in history and related fields are as follows:

• Institutional repositories are perceived to exist to serve the institution and funding bodies, rather than the individual.9

• An institutional repository is not seen as a prestigious outlet for data publication and faculty are not convinced that their work will receive adequate exposure.10

• Not all disciplines have a tradition of using repositories.11

• Depositors are concerned about copyright, plagiarism, and intellectual property rights.12

• Younger scholars share data at low rates due to standards for career advancement that require publication in prestige journals.13

• Depositors find it difficult to contend with restrictive data consistency requirements and incompatible data types.14

• Repositories have inadequate preservation infrastructure and make it difficult to update data.15

• Depositing data requires too much time and effort, and there is a technical learning curve to use repositories properly.16

To summarize, the largest barrier to data sharing in history and related fields is the fear of loss of control over data and the subsequent potential loss of reputation related to data authorship.17
repositories  history  digitalhumanities  opendata 
november 2016 by juliusbeezer
Wiley using fake DOIs to trap web crawlers…and researchers | Scholars Cooperative - Wayne State University Blogs
Smith-Unna, he described a situation whereby his entire institution was blocked from access to all of Wiley’s materials under the assumption that a legitimate, academic information mining crawl was, in fact, a botnet or some similar sinister process. He goes on to describe the university being contacted by Wiley to determine the source of this “data breach”. At the root of all this confusion? Several DOIs, or Digital Object Identifiers, assigned to resources associated with Wiley products.

A brief aside for any unfamiliar with DOIs: a DOI is sequence of letters and numbers meant to uniquely identify a particular digital object (hence the name), and are widely used for scholarly articles published online. They allow any user to quickly and easily navigate to the primary online home of any such article, regardless of whether or not platforms, URLs, or journal names have changed since the item’s publication; this is an invaluable service for researchers and librarians, among others. CrossRef, a not-for-profit association of publishers, handles much of the assignment of DOIs for scholarly materials.

So what was the problem with the Wiley DOIs accessed by Smith-Unna? In short, they were fake: dummy DOIs meant to catch anyone attempting to crawl through, and harvest information on, materials hosted by Wiley online. On the surface, this doesn’t seem like poor practice on behalf of the publisher. However, in addition to blocking access because of legitimate scholarly inquiry (as was Smith-Unna’s above), there are some serious issues with this approach.
repositories  sciencepublishing  opendata 
october 2016 by juliusbeezer
social defense systems – scatterplot
Unlike treatises that declare algorithms universally bad or always good, O’Neil asks three questions to determine whether we should classify a model as a “weapon of math destruction”:

Is the model opaque?
Is it unfair? Does it damage or destroy lives?
Can it scale?

These questions actually eliminate the math entirely. By doing so, O’Neil makes it possible to study WMDs by their characteristics not their content. One need not know anything about the internal workings of the model at all to attempt to answer these three empirical questions. More than any other contribution that O’Neil makes, defining the opacity-damage-scalability schema to identify WMDs as social facts makes the book valuable.

The classification also helped me realize that the failure of many of the WMDs she describes could be mitigated through the application of basic sociological principles.
opendata  sociology  philosophy 
october 2016 by juliusbeezer
OpenTrials: towards a collaborative open database of all available information on all clinical trials | Trials | Full Text
Hosting a broad range of data and documents presents some challenges around curation, especially because different sources of structured data will use different formats and different dictionaries. Although we will exploit available mapping between different data schemas and dictionaries, we do not expect to necessarily make all sources of all structured data on all trials commensurable and presentable side by side. For example, intervention may be described in free text or as structured data using various different dictionaries, and even sample size may be labelled in different ways in different available datasets, not all of which can necessarily be parsed and merged. For simplicity, we are imposing a series of broad categories as our top-level data schema, following the list given above. This is best thought of as a thread of documents on a given trial, where a “document” means either an actual physical document (such as a consent form or a trial report) or a bundle of structured data for a trial (such as the structured results page from a entry in XML format or a row of extracted data with accompanying variable names for a systematic review). This is for ease of managing multiple data sources, providing multiple bundles of structured data about each trial in multiple formats, each of which may be commonly or rarely used.
openmedicine  opendata  sciencepublishing  search  terminology  Dictionary  OA 
october 2016 by juliusbeezer
10 Simple Rules for the Care and Feeding of Scientific Data | Authorea
This article offers a short guide to the steps scientists can take to ensure that their data and associated analyses continue to be of value and to be recognized. In just the past few years, hundreds of scholarly papers and reports have been written on questions of data sharing, data provenance, research reproducibility, licensing, attribution, privacy, and more--but our goal here is not to review that literature. Instead, we present a short guide intended for researchers who want to know why it is important to "care for and feed" data, with some practical advice on how to do that. The set of Appendices at the close of this work offer links to the types of services referred to throughout the text. Boldface lettering below highlights actions one can take to follow the suggested rules.
opendata  openscience 
september 2016 by juliusbeezer
BMJ editor Fiona Godlee takes on corruption in science - Health - CBC News
As the editor of one of the oldest and most influential medical journals, Godlee is leading several campaigns to change the way science is reported, including opening up data for other scientists to review, and digging up data from old and abandoned trials for a second look.

She has strong words about the overuse of drugs, and the influence of industry on the types of questions that scientists ask, and the conclusions that are drawn from the evidence.

"It's not my job to be popular, I'm very clear about that," she says from her office in the historic British Medical Association building in central London.
sciencepublishing  opendata  openmedicine  commenting  agnotology 
april 2016 by juliusbeezer
Open science and trustworthy data | The Psychologist
In our letter (November 2015), we urged the Society’s boards and senior committees to respond to the very serious problems of replicating psychological research that were revealed by the meagre 36 per cent success rate of the Reproducibility Project’s report of 100 attempted replications. In reply, Professor Andy Tolmie commented that ‘low n research may be a more endemic part of the problem than any deliberate attempts at massaging data’. However, low ns were not the problem for the Reproducibility Project because a priori power analyses for the replications indicated that a 92 per cent replication rate was predicted based on the originally reported effect sizes.

The Project’s report (Open Science Collaboration, 2015) noted that the best predictor of replication success was the effect size observed in the replication, which is independent of sample size. Sadly, the average effect size for the replications was less than half of that for the original studies. The report described the original studies as having ‘upwardly biased effect sizes’. It seems likely that the psychology literature reflects questionable research practices that can inflate effect sizes, such as: p-hacking, unreported removal of troublesome data, and capitalising on chance through selective publishing after adjusting a paradigm to produce significant results or reporting a ‘successful’ dependent variable but not those showing smaller effects.
science  sciencepublishing  opendata  psychology 
december 2015 by juliusbeezer
BishopBlog: Who's afraid of Open Data
A move toward making data and analyses open is being promoted in a top-down fashion by several journals, and universities and publishers have been developing platforms to make this possible. But many scientists are resisting this process, and putting forward all kinds of argument against it. I think we have to take such concerns seriously: it is all too easy to mandate new actions for scientists to follow that have unintended consequences and just lead to time-wasting, bureaucracy or perverse incentives. But in this case I don't think the objections withstand scrutiny. Here are the main ones we identified at our meeting:
openness  opendata  openscience 
november 2015 by juliusbeezer
The characteristics of a register | Government Digital Service
what do we mean when we say “register”?

Across government we manage and hold data that we need to deliver services to users and to inform policymaking. We make that data in a variety of ways — from bespoke online tools, dumps of databases, through to published lists. A question we’re often asked is:

What is a register, how is it more than just a database, a statistical report, or a simple list?

To try and answer this question we’ve started to collect a list of characteristics based on the things we discovered during our early discovery and alpha work.
opendata  archiving  informationmastery 
october 2015 by juliusbeezer
IRUS-UK is a national aggregation service which contains details of all content downloaded from participating UK institutional repositories (IRs). It follows on from the successful PIRUS2 project ( ), which demonstrated how COUNTER-compliant article-level usage statistics could be collected and consolidated from Publishers and Institutional Repositories, IRUS-UK is a Jisc-funded repository and infrastructure service.
repositories  archiving  openaccess  opendata 
september 2015 by juliusbeezer
Edinburgh DataShare – new features for users and depositors | Research Data Blog
Edinburgh DataShare was built as an output of the DISC-UK DataShare project, which explored pathways for academics to share their research data over the Internet at the Universities of Edinburgh, Oxford and Southampton (2007-2009). The repository is based on DSpace software, the most popular open source repository system in use, globally. Managed by the Data Library team within Information Services,
repositories  opendata  tools 
may 2015 by juliusbeezer
2. The solution | The One Repo blog
The policy of The One Repo is to accept all objects deposited in the included repositories, including:

Actual manuscripts, with full text available.
Metadata records describing manuscripts that are not available. These are important for at least three reasons. First, in some cases, they describe manuscripts that will become freely available after the expiry of an embargo period; second, such metadata records provide a means of discovering the author and requesting a copy directly – a process that may be facilitated by an “ask author for a copy” button; and third, records of manuscripts that should be available (but are not) are important data for tracking compliance of open-access policies.
Associated data-sets, such as specimen photos, matrices for phylogenetic analysis, databases of observations and survey results.
repositories  openaccess  opendata  tools 
may 2015 by juliusbeezer
Notre tribune sur la recherche reproductible dans “Le Monde” | Deuxième labo
En février dernier, Daniele Fanelli, chercheur à l’université d’Edimbourg spécialiste de l’intégrité scientifique, proposait dans la revue Nature d’élargir la définition de la fraude scientifique à toute omission ou déformation de l’information nécessaire et suffisante pour évaluer la validité et l’importance d’une recherche. Jusqu’ici, la fraude a surtout été combattue en essayant de rendre les scientifiques plus objectifs et honnêtes que le commun des mortels, ce qui a fait perdre de vue que ce ne sont pas les qualités intrinsèques des chercheurs qui rendent les connaissances scientifiques robustes, mais bien l’exercice du jugement des pairs. Donc si l’on veut faire progresser la bonne science, quoi de mieux que de renforcer sa capacité à l’auto-correction ? Fanelli propose par exemple que les revues scientifiques se dotent de chartes dictant l’ensemble des informations nécessaires et suffisantes à une “bonne” publication. Cette franchise ne ferait pas disparaître l’autonomie du chercheur : par exemple, libre à lui ou elle de “pêcher” le résultat statistiquement significatif dans ses données, à condition d’indiquer l’ensemble des tests statistiques réalisés afin que ses pairs puissent décider des risques de faux positifs. Ainsi, la lutte contre la fraude scientifique se jouerait plus sur le terrain de la communication des résultats que sur celui du comportement des chercheurs. La culture de la reproductibilité est une alliée de l’intégrité scientifique.
sciencepublishing  opendata  science 
april 2015 by juliusbeezer
Impact of Social Sciences – It’s the Neoliberalism, Stupid: Why instrumentalist arguments for Open Access, Open Data, and Open Science are not enough.
“Big Data,” “Data Science,” and “Open Data” are now hot topics at universities. Investments are flowing into dedicated centers and programs to establish institutional leadership in all things related to data. I welcome the new Data Science effort at UC Berkeley to explore how to make research data professionalism fit into the academic reward systems. That sounds great! But will these new data professionals have any real autonomy in shaping how they conduct their research and build their careers? Or will they simply be part of an expanding class of harried and contingent employees- hired and fired through the whims of creative destruction fueled by the latest corporate-academic hype-cycle?

But in the current Neoliberal setting, being an entrepreneur requires a singular focus on monetizing innovation. PeerJ and Figshare are nice, since they have business models that less “evil” than Elsevier’s. But we need to stop fooling ourselves that the only institutions and programs that we can and should sustain are the ones that can turn a profit. For every PeerJ or Figshare (and these are ultimately just as dependent on continued public financing of research as any grant-driven project), we also need more innovative organizations like the Internet Archive, wholly dedicated to the public good and not the relentless pressure to commoditize everything (especially their patrons’ privacy)
scholarly  openaccess  opendata  openscience  politics  education 
april 2015 by juliusbeezer
Freeing the Data: For Clinicians | Free the Data
The Sharing Clinical Reports Project (SCRP) is a volunteer, grass-roots effort to encourage open sharing of genetic variant information. SCRP specifically aims to collect information on BRCA1 and 2 variants and make this information publicly available in the NCBI ClinVar database. SCRP is a component of the International Collaboration for Clinical Genomics (ICCG), a group of laboratories, physicians, genetic counselors, researchers, and others dedicated to raising the standard of patient care by improving the quality of genomic testing.

Free the Data aims to revolutionize how vital information is shared and accessed, in order to further translational research and advance clinical care towards lower costs, higher quality, and more efficient treatments. In addition to the educational materials that you give your patients who are about to have a BRCA1/2 genetic test performed (or have already had one), you can give them this one-pager, asking them to share their data. Alternatively, you can share the data for them.
opendata  medicine  genetics  crowdscience  crowdsourcing 
march 2015 by juliusbeezer
Fostering Open Science, Open Data & Reproducibility - NZ Commons
as the Commissioning Editor of GigaScience, a journal co-published by the BGI, the world’s largest genomics organisation, and the Open Access pioneer BioMed Central. GigaScience publishes open access ‘big-data’ studies from the entire spectrum of life and biomedical sciences, whose goal is to promote open science, transparency and reproducibility. The scope of GigaScience covers the issues producing and handling large-scale biological and biomedical data, and provides resources and a forum for data producers and the open science community.

At GigaScience, being a true Open Access journal, all our textual content (such as blogs, and open peer reviewer reports) is published under a CC BY 4.0 Attribution licence, and our data is CC0 — maximising its reuse and setting our content free in the commons. This has only allowed us to do great things
opendata  openaccess  openscience  database  journals 
february 2015 by juliusbeezer
Open Humans: Opening Soon!
The Open Humans Network, led by myself and Madeleine Ball of, attempts to break down health data silos through an online portal that will connect participants willing to share data about themselves publicly with researchers who are interested in using that public data and contributing their analyses and insight to it. The portal will showcase public health data and facilitate its exploration and download. The Open Humans Network ultimately hopes to revolutionize research by making it easy for anyone to participate in research projects and facilitating highly integrated, longitudinal health data. This portal will consist of three components: individual data profile pages, a public data explorer and a set of design guidelines for researchers seeking a collaborative data-sharing model.
open  opendata  openmedicine  healthcare  medicine  confidentiality  ethics  extrovertbias 
january 2015 by juliusbeezer
Open Humans Network -
Open Humans Network is launching soon. Led by Jason Bobe and Madeleine Ball of, OHN attempts to break down health data silos through an online portal that will connect participants willing to share data about themselves publicly with researchers who are interested in using that public data and contributing their analyses and insight to it. The portal will showcase public health data and facilitate its exploration and download. The Open Humans Network ultimately hopes to revolutionize research by making it easy for anyone to participate in research projects and facilitating highly integrated, longitudinal health data.
healthcare  genetics  confidentiality  medicine  open  opendata  openmedicine  openness 
january 2015 by juliusbeezer
Authorea | The Fork Factor: an academic impact factor based on reuse.
we would like to imagine what academia would be like if forking actually mattered in determining a scholar’s reputation and funding. How would you calculate it? Here, we give it a shot. We define the Fork Factor (FF) as:


Where N is the number of forks on your work and L their median length. In order to take into account the reproducibility of research data, the length of forks has a higher weight in the FF formula. Indeed, forks with length equal to one likely represent a failure to reproduce the forked research datum.
citation  opendata  openaccess  open 
january 2015 by juliusbeezer
2015 - The year of open data mandates
According to the JISC and RLUK funded Sherpa Juliet site, globally there are now 34 funders who require data archiving and 16 who encourage it.

While the rise of open access has fundamentally changed the academic publishing landscape, the policies around data are reigniting the conversation around what universities can and should be doing to protect the assets generated at their institution. The main difference between an open access and open data policy is that there is not already a precedent or status quo of how academia deals with the dissemination of research that is not in the form of a traditional ‘paper’ publication.
openaccess  opendata  repositories 
january 2015 by juliusbeezer
Clean sheet: how to release data or statistics in a spreadsheet
Releasing data or statistics in spreadsheets

Follow these simple guidelines to make your data or statistical releases as useful as possible.

Don’t merge cells.
Don’t mix data and metadata in the same sheet.
The first row of a data sheet should contain column headers. None of these headers should be duplicates or blank. The column header should clearly indicate which units are used in that column, where this makes sense.
The remaining rows should contain data, one datum per row. Don’t include aggregate statistics such as TOTAL or AVERAGE. You can put aggregate statistics in a separate sheet, if they are important.
Numbers should just be numbers. Don’t put commas in them, or stars after them, or anything else.
Use standard identifiers: e.g. identify countries using ISO 3166 codes rather than names.
Don’t use colour or other stylistic cues to encode information.
Leave the cell blank if a value is not available.
If you provide pivot tables, make sure the underlying data is available separately too.
If you also want to create a human-friendly presentation of the data, do so by creating another sheet in the same workbook and referencing the appropriate cells in the canonical data sheet.

Created by @robinhouston and @SeanClarke as a counterpoint to the advice of the Government Statistical Service.
december 2014 by juliusbeezer
Libraries could play key role in managing research data - Data - Research Information
Of the 2,727 repositories listed in OpenDOAR, the Directory of Open Access Repositories, only 131 are currently listed as containing datasets (4.8 per cent). This is not too dissimilar to the results from the same query at the beginning of 2011, which found the proportion to be 4.1 per cent.

Distinguishing between institutional repositories and disciplinary datasets draws a bleaker picture for institutional repositories, where only four per cent of institutional repositories are listed as containing datasets in comparison to 11.1 per cent of disciplinary repositories.
repositories  opendata 
december 2014 by juliusbeezer
Confusion over publisher’s pioneering open-data rules : Nature News & Comment
That study included 51 PLoS ONE papers, and found that just 6 of them had shared the data that went into the STRUCTURE study. In a new analysis, Vines found 20 papers that mentioned STRUCTURE and had been published since March 2014 — including one that tracked different varieties of cotton plants in the Caribbean, and another that compared different populations of a particular sparrow across the southern United States (download full data). Eight of the new studies (40%) had shared the genotype data — meaning that a reader would be able to repeat their analysis. The remaining 60% of papers had not made their data available, even though each stated that “all data underlying the findings are fully available without restriction”, in accordance with PLOS’s policy (see ‘Free the data’).
opendata  openaccess 
december 2014 by juliusbeezer
Keeping an eye on the dashboard - Demos Quarterly
Although dashboards are increasingly our analytical window into the world of data, they are not necessarily neutral purveyors of that data. They invariably shape and prioritise the information that is presented. As NYU Professor of Media Lisa Gitelman recently put it the notion of raw data is an oxymoron and the dashboard adds another hermeneutic layer to the mix. Which metrics are privileged? Who decides when a particular indicator moves into the red? How regular is the refresh rate, that is, what kind of temporality is built into the dashboard and how does that move us to act? Which metrics are not available, or deliberately left out?

And dashboards can often obscure more than they enlighten, because many of them present data without the user really knowing how it was created. We call this the ‘black box’ problem. Sitting behind a dashboard is a complicated world of data scraping, API calls, word based sampling methods, natural language processing algorithms – and any number of new modes of collection and analysis.
opendata  politics  attention  agnotology  hermeneutics 
october 2014 by juliusbeezer
Joint Declaration of Data Citation Principles - FINAL | FORCE11
Data citation, like the citation of other evidence and sources, is good research practice and is part of the scholarly ecosystem supporting data reuse.

In support of this assertion, and to encourage good practice, we offer a set of guiding principles for data within scholarly literature, another dataset, or any other research object.
opendata  citation 
september 2014 by juliusbeezer
FAQ | is a global registry of research data repositories. The registry covers research data repositories from different academic disciplines. presents repositories for the permanent storage and access of data sets to researchers, funding bodies, publishers and scholarly institutions. aims to promote a culture of sharing, increased access and better visibility of research data.
repositories  openaccess  opendata  archiving 
september 2014 by juliusbeezer
Implications of Data Sharing for the Neuroimaging Research Community – My thoughts on the new PLOS Data Policy | Neuroscience Community
Brain image data are highly structured, and the processed data that goes into group level analysis is measured in MB, not GB. Given electrophysiology experiments that generate TB’s in a single session, we really can’t use the size of the data as an excuse. The fMRI community is also fortunate to have a widely accepted NIfTI file format to store the image data.
opendata  sciencepublishing  medicine  images 
august 2014 by juliusbeezer
Science in the Open » Blog Archive » Fork, merge and crowd-sourcing data curation
Over the past few weeks there has been a sudden increase in the amount of financial data on scholarly communications in the public domain. This was triggered in large part by the Wellcome Trust releasing data on the prices paid for Article Processing Charges by the institutions it funds. The release of this pretty messy dataset was followed by a substantial effort to clean that data up. This crowd-sourced data curation process has been described by Michelle Brook. Here I want to reflect on the tools that were available to us and how they made some aspects of this collective data curation easy, but also made some other aspects quite hard.
openaccess  opendata  crowdsourcing  statistics  database 
june 2014 by juliusbeezer
PLOS Data Policy: Update - PLOS Biologue
In order to optimise the re-use of data by readers and by data miners, authors of all new manuscripts submitted since March 3, 2014 have included a statement about where the data underlying their description of research can be found. At the time of writing, more than 16,000 sets of authors have included information about data availability with their submission. We have had fewer than 10 enquiries per week to from authors who need advice about ‘edge cases’ of data handling and availability – fewer than 1% of authors – and these cases have helped us to further update our FAQ, contributing to a decline in such enquiries over time.
opendata  sciencepublishing 
june 2014 by juliusbeezer
Content Mining will be legal in UK; I inform Cambridge Library and the world of my plans « petermr's blog
the UK government has passed a Statutory Instrument based on the Hargreaves review of copyright exempting certain activities from copyright, especially “data analytics” which covers content mining for facts. This comes into force on 2014-06-01.
I intend to use this to start non-commercial research and to publish the results in an OpenNotebookScience
openaccess  opendata  open  sciencepublishing  scholarly 
may 2014 by juliusbeezer
PLOS ONE: Willingness to Share Research Data Is Related to the Strength of the Evidence and the Quality of Reporting of Statistical Results
In this sample of psychology papers, the authors' reluctance to share data was associated with more errors in reporting of statistical results and with relatively weaker evidence (against the null hypothesis). The documented errors are arguably the tip of the iceberg of potential errors and biases in statistical analyses and the reporting of statistical results. It is rather disconcerting that roughly 50% of published papers in psychology contain reporting errors [33] and that the unwillingness to share data was most pronounced when the errors concerned statistical significance.
sciencepublishing  opendata  statistics 
may 2014 by juliusbeezer
Impact of Social Sciences – Global-level data sets may be more highly cited than most journal articles.
I attempted to measure the impact that a few openly accessible data sets have had on scientific research. In my recent paper in Plos One, I analyzed the impact that three freely available oceanographic data sets curated by the US National Oceanographic Data Center have had on oceanographic research by using citations as a measure of impact. Since scientific assessments like the RAE increasingly use citations to journal articles for this purpose, I wanted to do the same for data sets...
My results suggest that all three data sets are more highly cited than most journal articles. Each data set has probably been cited more often than 99% of the journal articles in oceanography that were published during the same years as the data sets. One data set in particular, the World Ocean Atlas and World Ocean Database, has been cited or referenced in over 8,500 journal articles since it was first released in 1982. To put that into perspective, this data set has a citation count over six times higher than any single journal article in oceanography from 1982 to the present.
opendata  citation 
may 2014 by juliusbeezer
Is Elsevier going to take control of us and our data? The Vice-Chancellor of Cambridge thinks so and I’m terrified « petermr's blog
Do you trust Mendeley? Do you trust Elsevier? Do you trust and large organisations without independent control (GCHQ, NSA, Google, Facebook)? If you do, stop reading and don’t worry.

In Mendeley, Elsevier has a window onto nearly everything that a scientist is interested in. Every time your read a new paper Mendeley knows what you are interested in. Mendeley knows your working habits – what time are you spending on your research?

And this isn’t just passive information. Elsevier has Scopus – a database of citations. How does a paper get into this? – Scopus decides, not the scientific world. Scopus can decide what to highlight and what to hold back. Do you know how Journal Impact Factors are calculated? I don’t because it’s a trade secret. Does Scopus’ Advisory Board guarantee transparency of practice? Not that I can see. Since JIF’s now control much academic thinking and planning, those who control them are in a position to influence academic practice.

Does Mendeley have an advisory board? I couldn’t find one.
sciencepublishing  citation  altmetrics  opendata 
may 2014 by juliusbeezer
Critically, when Galileo included the information from those notes in Siderius Nuncius (Galilei 1610), this integration of text, data and metadata was preserved, as shown in Figure 1. Galileo's work advanced the "Scientific Revolution," and his approach to observation and analysis contributed significantly to the shaping of today's modern "Scientific Method" (Galilei 1618, Drake 1957).

Today most research projects are considered complete when a journal article based on the analysis has been written and published. Trouble is, unlike Galileo's report in Siderius Nuncius, the amount of real data and data description in modern publications is almost never sufficient to repeat or even statistically verify a study being presented.
opendata  openscience 
april 2014 by juliusbeezer
What would happen if you lost all of your research data? - Digital Science
I was focussed on creating high resolution, 3D time lapse videos of developing crustacean embryos, so all of my work was digital-based. When I lost my laptop and backups, I lost 400GB of data and close to four years of work. As a direct result I ended up getting an MPhil rather than the PhD I’d been working towards. I was hoping to have an illustrious career in science and for a time it seemed like everything would be stopped in its tracks.”
opendata  security  backup 
april 2014 by juliusbeezer
DSHR's Blog: The Half-Empty Archive
Increasingly, the newly created content that needs to be ingested needs to be ingested from the Web. As we've discussed at two successive IIPC workshops, the Web is evolving from a set of hyper-linked documents to being a distributed programming environment, from HTML to Javascript. In order to find the links much of the collected content now needs to be executed as well as simply being parsed.
archiving  linkrot  web  internet  security  html5  opendata 
april 2014 by juliusbeezer
Editorial Policies
Open Health Data publishes data papers, which provide a concise description of a dataset and where to find it. Papers will only be accepted for datasets that authors agree to make freely available in a public repository. This means that they have been deposited in a data repository under an open licence (such as a Creative Commons Zero licence), and are therefore freely available to anyone with an internet connection, anywhere in the world.

A data paper is a publication that is designed to make other researchers aware of data that is of potential use to them for scientific and educational purposes. Data papers can describe deposited data from studies that have not been published elsewhere (including replication research) but also from studies that have previously been published in another journal. As such the data paper describes the methods used to create the dataset, its structure, its reuse potential, and a link to its location in a repository. It is important to note that a data paper does not replace a research article, but rather complements it.
opendata  openmedicine  openscience  sciencepublishing  OASPA 
march 2014 by juliusbeezer
EU-funded data repository sitting alongside 25PB/year output of CERN (!)
opendata  openaccess  openscience  eu 
march 2014 by juliusbeezer
9 questions about the new PLoS clarification | Neuropolarbear
2. Does PLoS propose any protections for authors who are worried someone will scoop them on reanalysis of their own data? How about a special vault where the data is posted publicly in one year?
===>Yes, it's all about YOU. Calmer in the comments.
opendata  dccomment 
march 2014 by juliusbeezer
My concerns about PLOS’s new open data policy | Erin C. McKiernan
data sharing sounds great. But what are some of the practical implications of this policy? There are many, but I would like to focus for now on the potential repercussions for researchers in low- to middle-income countries and the diversity of PLOS authorship.

First, a little background: I work in Mexico, where the country spends less than half a percent of its GDP on research.
opendata  mexico 
february 2014 by juliusbeezer
PLOS Pushes Data Sharing | The Scientist Magazine®
FLICKR, MISSERIONEffective next week, PLOS journals will require that submitting authors provide a statement about where their data can be freely accessed upon a paper’s publication. Some researchers may be able to include all relevant data within their manuscripts or as part of the supplementary materials; others will have to direct readers to a public data repository—like GenBank, FigShare, or Dryad—where this information is indexed.
february 2014 by juliusbeezer
Data reuse and the open data citation advantage [PeerJ]
In a multivariate regression on 10,555 studies that created gene expression microarray data, we found that studies that made data available in a public repository received 9% (95% confidence interval: 5% to 13%) more citations than similar studies for which the data was not made available.
opendata  citation  sciencepublishing 
february 2014 by juliusbeezer
Academic Torrents
Making datasets available is necessary for reproducible research. But getting and sharing data can be slow and difficult, especially for large datasets. Academic torrents solves the problems of both sharing and downloading data by providing a distributed repository for datasets which is fast, scalable, and easy to use.
opendata  archiving  sciencepublishing  scholarly 
february 2014 by juliusbeezer
Data Access for the Open Access Literature: PLOS’s Data Policy | PLOS
As a result, PLOS is now releasing a revised Data Policy that will come into effect on March 1, 2014, in which authors will be required to include a data availability statement in all research articles published by PLOS journals;
"PLOS journals require authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception1.

When submitting a manuscript online, authors must provide a Data Availability Statement describing compliance with PLOS’s policy. The data availability statement will be published with the article if accepted.

Refusal to share data and related metadata and methods in accordance with this policy will be grounds for rejection."
sciencepublishing  opendata 
january 2014 by juliusbeezer
PLOS Clinical Trials: Publishing Clinical Trial Results: The Future Beckons
Nice overview of future of biomedical publishing, including prospects for publishing trial data.
medicine  science  sciencepublishing  openaccess  opendata  openscience 
january 2014 by juliusbeezer
Academic bias & biotech failures | LifeSciVC
Venture capitalist runs into limitations of current science process:

"There’s a rich literature on the “Pharma bias” in publications (e.g., Pharma conflicts of interest with academics, clinical trial reporting); in the past 15 months, 63 peer-reviewed articles talk about pharma industry bias according to PubMed.

But what about academic bias? Or the lack of repeatability of academic findings? I couldn’t find a single paper in PubMed over the past few years."
science  sciencepublishing  openaccess  opendata  openscience 
january 2014 by juliusbeezer
Majority of published scientific data not recoverable 20 years later -
Eighty percent of scientific data are lost within two decades, disappearing into old email addresses and obsolete storage devices, a Canadian study indicated.

The finding comes from a study tracking the accessibility of scientific data over time, conducted at the University of British Columbia.

Researchers attempted to collect original research data from a random set of 516 studies published between 1991 and 2011.

While all data sets were available two years after publication, the odds of obtaining the underlying data dropped by 17 per cent per year after that... author Tim Vines said.
link_rot  sciencepublishing  opendata  internet 
january 2014 by juliusbeezer
Research findings: going deeper than the article | PLOS Tech
Not only are Europe PMC Database Citations and DataCite our first ALM sources that track research data that mention a particular PLOS article. But these ALM sources are also different from other ALM sources: although there can of course be additional research data that cite an article post-publication, they typically link datasets associated with an article and created by the same research group. These ALM sources discover links from the research data to the article and in an ideal world should be consistent with the links from the article to the research data.
opendata  openaccess 
december 2013 by juliusbeezer
The 5 minute guide to scraping data from PDFs | memeburn
There are some web services like cometdocs or pdftoexcelonline that could help you out. Or you could try to build a scraper yourself, but then you have to read Paul Bradshaw‘s Scraping for Journalists first.


My favourite tool though is Tabula. Tabula describes itself as “a tool for liberating data tables trapped inside PDF files”. It’s fairly easy to use too. All you have to do is import your PDF, select your data, push a button and there is your spreadsheet! You save the scraped page in CSV and from there you can import it into any spreadsheet program.
opendata  tools  software 
november 2013 by juliusbeezer
Diederik Stapel’s Audacious Academic Fraud -
The key to why Stapel got away with his fabrications for so long lies in his keen understanding of the sociology of his field. “I didn’t do strange stuff, I never said let’s do an experiment to show that the earth is flat,” he said. “I always checked — this may be by a cunning manipulative mind — that the experiment was reasonable, that it followed from the research that had come before, that it was just this extra step that everybody was waiting for.” He always read the research literature extensively to generate his hypotheses. “So that it was believable and could be argued that this was the only logical thing you would find,” he said. “Everybody wants you to be novel and creative, but you also need to be truthful and likely. You need to be able to say that this is completely new and exciting, but it’s very likely given what we know so far.”
scholarly  misconduct  opendata 
november 2013 by juliusbeezer
The one true route to good science is … | Dynamic Ecology
The usual form is to attack some now trendy but supposedly horrendous version of science and then mildly conclude that the way the author does science is the only really good way to do science. In the latest version of this archetype, two esteemed ecologists, David Lindenmayer and Gene Likens (hereafter L&L) penned an almost vitriolic piece attacking “Open-Access Science”, “Big Science” and I don’t know what all else (that I’m going to call for short hand “new-fangled ecology” for now)*.
opendata  openscience  openaccess  sciencepublishing  database 
november 2013 by juliusbeezer
Data-sharing: Everything on display : Naturejobs
Wolkovich is one of a number of early-career researchers who are enthusiastically posting their work online. They are publishing what one online-repository founder calls small data — experimental results, data sets, papers, posters and other material from individual research groups — as opposed to the 'big data' spawned by large consortia, which usually employ specialists to plan their data storage and release.
opendata  openscience  repositories  database 
august 2013 by juliusbeezer
Peer-review debate should include software - Research Information
At a time when scholarly publishing is debating the issue of data being published alongside papers, this makes an interesting test case. Reinhart and Rogoff’s errors could not have been detected by reading the journal article alone, so proper scrutiny in this case ought to have included the dataset.

But I would argue that the terms of the debate should go beyond data: we ought also to be thinking about software. In my view, the Reinhart and Rogoff story makes this clear.

Reproducibility is one of the main principles of the scientific method. Initially, Herndon and his Amherst colleagues found that they were unable to replicate Reinhart and Rogoff’s results. This was what caused them to request the underlying data, resulting in their subsequent discovery of errors.
science  opendata  openness  statistics  software 
june 2013 by juliusbeezer
What Do We Mean By Small Data | Open Knowledge Foundation Blog
“Small data is the amount of data you can conveniently store and process on a single machine, and in particular, a high-end laptop or server”
openscience  opendata 
may 2013 by juliusbeezer
Share your software early: the Reinhart-Rogoff case
Thomas Herndon, a graduate student, grabbed data from the Reinhart-Rogoff web site and tried to reproduce their results. He couldn’t. He then asked the authors for help. Reinhart and Rogoff shared the Excel spreadsheet they used with Herndon. He then promptly found basic flaws in their data processing. For example, the two professors ran sums over the wrong cells. It seems that they made several odd choices when processing the data. His paper is freely available online.
scholarly  opendata  openaccess  economics 
april 2013 by juliusbeezer
How does a country get to open data? What Taiwan can teach us about the evolution of access » Nieman Journalism Lab
Good stuff from US reporter leading data journalism course in Taiwan:
"In the United States, in 2013, it’s widely assumed that governments on all levels should make their data available for public use. But why? How did we get here? And, importantly, how do other countries get there?“We have a freedom of information law?” was the answer. I had discovered it during my research for the workshop, but just because a law is the books doesn’t mean anyone knows about or uses it. Also, under Taiwan’s civil law system, there isn’t any case law which tells the courts how to interpret this right. A few veteran reporters did know of the law, but weren’t aware of anyone who’d ever filed a request under its provisions."
opendata  journalism  taiwan  culture 
april 2013 by juliusbeezer
[Open-access] HEFCE (UK) seeking advice on open access & open data research evaluation
I saw a window to puff my blog to the okfn from the UniSussex seminar on OpenData and took it, much good may it do me.
dccomment  blog  opendata 
march 2013 by juliusbeezer
Getting Genetics Done: Stop Hosting Data and Code on your Lab Website
Excellent article on the impermanence of web resources: even in a scientific setting 72% lost after a few years, only one third of corresponding authors replied to email.
"It's a fact that most of us academics move around a fair amount. Often we may not deem a tool we developed or data we collected and released to be worth transporting and maintaining. After some grace period, the resource disappears without a trace."
archiving  openaccess  opendata  openscience 
january 2013 by juliusbeezer
Allen H. Renear (personal web page)
Denton Declaration collaborator's research interests include:

"Ontologies for digital objects. Our statements about digital objects make extensive use of idiom, metaphor, and logical fiction. If these sentences are naively transferred into the world of linked data and semantic technologies much unsound (and possibly harmful) inferencing will ensue and many opportunities will be lost. More robust ontologies are needed.

and publications include:

“What is Text, Really?” Steven J. DeRose, David G. Durand, Elli Mylonas, and Allen H. Renear. Journal of Computing in Higher Education 2:1 3-26 (1990). Reprinted in the ACM/SIGDOC *Journal of Computer Documentation 21:3 1-24 (1997). [ACM]
opendata  text  text_tools  semantic  ontology  sciencepublishing 
november 2012 by juliusbeezer
Open data in public private partnerships: how citizens can become true watchdogs | Open Knowledge Foundation Blog
Villo! makes real-time data available about the status of each station on their servers. This is to help users find out which stations have bikes or free parking spaces. Where’s My Villo?, launched in September to pressure JCDecaux into doing a better job, uses this data to shed light on the magnitude of the problems. By pulling data from Villo!’s servers every five minutes and writing them to a database, Where’s My Villo? computes a daily list of the worst places to find a bike or park one. The website also allows visitors to see how well the station(s) they use most frequently perform, and report their own encounters with problems (from finding a bike to technical problems at stations).
cycling  opendata  politics 
september 2012 by juliusbeezer
Truly Open Data - O'Reilly Radar
I'm kicking myself because I've been taking far too narrow an interpretation of "an open source approach". I've been focused on getting people to release data. That's the data analogue of tossing code over the wall, and we know it takes more than a tarball on an FTP server to get the benefits of open source. The same is true of data.
sciencepublishing  internet  opensource  openscience  opendata 
march 2010 by juliusbeezer

Copy this bookmark:

to read