recentpopularlog in

juliusbeezer : archiving   150

« earlier  
71,716 video tapes in 12,094 days | Internet Archive Blogs
Ms. Stokes was a fiercely private African American social justice champion, librarian, political radical, TV producer, feminist, Apple Computer super-fan and collector like few others. Her life and idiosyncratic passions are sensitively explored in the exceedingly well reviewed new documentary, Recorder: The Marion Stokes Project, by Matt Wolf. Having premiered last month at the 2019 Tribeca Film Festival, the film is on tour and will be featured at San Francisco’s Indefest, June 8th & 10th. For those in the Bay Area, please consider joining Internet Archive staff and leadership at the 7:00pm June 10th screening. Advance tickets are available now, seating is limited.

Long before many questioned the media’s motivations and recognized the insidious intentional spread of disinformation, Ms. Stokes was alarmed. In a private herculean effort, she took on the challenge of independently preserving the news record of her times in its most pervasive and persuasive form – TV.
archiving  television  internet  spectacle  history 
june 2019 by juliusbeezer
Clinical Orthopaedics and Related Research, The Bone & Joint... : JBJS
Preprint servers may be perceived by some (and used by less scrupulous investigators) as evidence even though the studies have not gone through peer review; the public may not be able to discern an unreviewed preprint from a seminal article in a leading journal. We are concerned that publishing in a preprint server may be a self-serving move by individuals with secondary-gain incentives and by those whose work is unlikely to withstand serious scrutiny by peer-reviewed journals.
scholarly  sciencepublishing  archiving  preprint  medicine  peerreview 
march 2019 by juliusbeezer
Oh, I Hate The Romans Already! | Do The Right Thing
This urge to divide the world around us, and to sort out which tribe we belong in goes deeper than would benefit our true self interests. Yet like a pair of dysfunctional co-dependents, we just can’t help ourselves, always looking for ways to find fault and a source of antagonism.

Dutch Kids Being Biked To School. Are we the same or different?

There’s always something that makes them just not like us. We ride the way people should ride, and if more people were like us, the world would be a better place.
cycling  blogs  commenting  archiving  linkrot 
october 2018 by juliusbeezer
Open Science and its Discontents | Ronin Institute
Although there is a spectrum of responses, criticism of open-science tends to fall into one of two camps, that I will call “conservative” and “radical”. This terminology is not intended to imply an association with any conventional political labels, they are simply used for convenience to indicate the relative degree of comfort with the institutional status quo. Let’s look at these two groups of critiques.

The conservative response to regular timely release of pre-publication data could be best summarized by the phrase: “are you kidding me? why would I do that?” The apotheosis of this notion was appeared in an editorial published in the New England Journal of Medicinewhich described with some horror the “emergence of a new class of research parasites”. They further concluded that some of these parasites might not only use that data for their own publications, but might seek to examine whether the original study was correct....

Arguments for open-science made in response to the conservative critique tend to assume that release of more data, code, papers is a pure good in and of itself, and downplay the political economy in which they are embedded.
openscience  openaccess  archiving  opendata  politics  business  sociology 
july 2017 by juliusbeezer
What I learned from predatory publishers | Biochemia Medica
I think preprint servers and overlay journals will play a role. Preprint servers, pioneered by are growing in number and are serving more scholarly fields. I expect this to continue. Compared to high-quality scholarly journals, they are inexpensive to operate – especially since they don’t have to manage peer review or do copyediting. They do minimal vetting, but when they do it, it’s usually done at the researcher level rather than at the paper level. That is to say, they blacklist researchers submitting papers that diverge from the scientific consensus.

One advantage of a move from open-access journals to preprint servers is the elimination of author fees and all the corruption that goes along with them.

Overlay journals in each field will select the best articles appearing in the corresponding preprint servers each month or quarter and will prepare a table of contents listing these and linking to them, an eclectic, ad hoc journal issue. The editorial board of each overlay journal, experts in their field, will select preprints that are methodologically sound, novel, scientific, and of importance to the field
openaccess  scholarly  sciencepublishing  overlay  archiving 
june 2017 by juliusbeezer
The selfish scientist’s guide to preprint posting – nikokriegeskorte
My lab came around to routine preprint posting for entirely selfish reasons. Our decision was triggered by an experience that drove home the power of preprints. A competing lab had posted a paper closely related to one of our projects as a preprint. We did not post preprints at the time, but we cited their preprint in the paper on our project. Our paper appeared before theirs in the same journal. Although we were first, by a few months, with a peer-reviewed journal paper, they were first with their preprint. Moreover, our competitors could not cite us, because we had not posted a preprint and their paper had already been finalised when ours appeared. Appropriately, they took precedence in the citation graph – with us citing them, but not vice versa.
archiving  citation  sciencepublishing 
may 2017 by juliusbeezer
The Winnower | The selfish scientist’s guide to preprint posting
Posting preprints doesn't only have advantages. It is also risky. What if another group reads the preprint, steals the idea, and publishes it first in a high-impact journal? This could be a personal catastrophe for the first author, with the credit for years of original work diminished to a footnote in the scientific record. Dishonorable scooping of this kind is not unheard of. Even if we believe that our colleagues are all trustworthy and outright stealing is rare, there is a risk of being scooped by honorable competitors...
All the advantages of using preprints to science and society are good and well. However, we also need to think about ourselves. Does preprint posting mean that we give away our results to competitors, potentially suffering a personal cost for the common good? What is the selfish scientist's best move to advance her personal impact and career? There is a risk of getting scooped. However, this risk can be reduced by not posting too early. It turns out that posting a preprint, in addition to publication in a journal, is advisable from a purely selfish perspective, because it brings the following benefits to the authors:
scholarly  archiving  openaccess 
may 2017 by juliusbeezer
Is it OK to cite preprints? Yes, yes it is. | Jabberwocky Ecology
Why hasn’t citing unreviewed work caused the wheels to fall off of science? Because citing appropriate work in the proper context is part of our job. There are good preprints and bad preprints, good reports and bad reports, good data and bad data, good software and bad software, and good papers and bad papers. As Belinda Phipson, Casey Green, Dave Harris and Sebastian Raschka point out it is up to us as the people citing research to make professional judgments about what is good science and should be cited. Casey’s take captures my thoughts on this exactly:
citation  peerreview  archiving 
may 2017 by juliusbeezer
» Blacklists are technically infeasible, practically unreliable and unethical. Period.
We already have plenty of perfectly good Whitelists. Pubmed listing, WoS listing, Scopus listing, DOAJ listing. If you need to check whether a journal is running traditional peer review at an adequate level, use some combination of these according to your needs. Also ensure there is a mechanism for making a case for exceptions, but use Whitelists not Blacklists by default.

Authors should check with services like ThinkCheckSubmit or Quality Open Access Market if they want data to help them decide whether a journal or publisher is legitimate. But above all scholars should be capable of making that decision for themselves. If we aren’t able to make good decisions on the venue to communicate our work then we do not deserve the label “scholar”.
openaccess  peerreview  archiving  scholarly  sciencepublishing 
february 2017 by juliusbeezer
If You See Something, Save Something – 6 Ways to Save Pages In the Wayback Machine | Internet Archive Blogs
In recent days many people have shown interest in making sure the Wayback Machine has copies of the web pages they care about most. These saved pages can be cited, shared, linked to – and they will continue to exist even after the original page changes or is removed from the web.

There are several ways to save pages and whole sites so that they appear in the Wayback Machine. Here are 6 of them.
archiving  linkrot  dccomment 
january 2017 by juliusbeezer
RFC 7089 - HTTP Framework for Time-Based Access to Resource States -- Memento
Abstract The HTTP-based Memento framework bridges the present and past Web. It facilitates obtaining representations of prior states of a given resource by introducing datetime negotiation and TimeMaps. Datetime negotiation is a variation on content negotiation that leverages the given resource's URI and a user agent's preferred datetime. TimeMaps are lists that enumerate URIs of resources that encapsulate prior states of the given resource. The framework also facilitates recognizing a resource that encapsulates a frozen prior state of another resource.
linkrot  archiving 
december 2016 by juliusbeezer
DSHR's Blog: Reference Rot Is Worse Than You Think
The British Library's Andy Jackson analyzed the UK Web Archive and found:

I expected the rot rate to be high, but I was shocked by how quickly link rot and content drift come to dominate the scene. 50% of the content is lost after just one year, with more being lost each subsequent year. However, it’s worth noting that the loss rate is not maintained at 50%/year. If it was, the loss rate after two years would be 75% rather than 60%. This indicates there are some islands of stability, and that any broad ‘average lifetime’ for web resources is likely to be a little misleading.
The robust link proposal also describes a different model of link decoration:

information is conveyed as follows:

href for the URI that provides the specific state, i.e. the snapshot or resource version;
data-originalurl for the URI of the original resource;
data-versiondate for the datetime of the snapshot, of the resource version.
linkrot  archiving 
december 2016 by juliusbeezer
Satoshi Village
NC or ND stipulations provide no additional protection against plagiarism. And anyways, scholarly norms rather than threats of being sued are what actually encourages attribution. If you want to maximize citations, choose the license that allows for maximum dissemination.

29.8% of bioRxiv preprints are all rights reserved, placing a large portion of the bioRxiv corpus in the same troublesome legal situation as traditional academic publishing.
openaccess  archiving  sciencepublishing  copyright 
december 2016 by juliusbeezer
Why Your Business Should Still Care About Twitter in 2017 - ManageFlitter Blog
If you’re a brand or a business who needs to have that level of real-time access to an audience, Twitter is still the right way to go. I use it constantly, far more than any other social network for just that reason. It’s the simplest way for me to get in-front of people, and I don’t have to bow to Facebook’s algorithms just to make it bloody work in the first place.
twitter  dccomment  archiving 
december 2016 by juliusbeezer
The Internet Archive is building a Canadian copy to protect itself from Trump - The Verge
Kahle estimates it will cost “millions” of dollars to host a copy of the Internet Archive in Canada, but it would shield its data from some American legal action.

The future of privacy and surveillance under the Trump administration remains unpredictable, but the president-elect has shown support for greater law enforcement surveillance powers and legal censorship, including “closing that internet up in some ways” to fight terrorism. “Somebody will say, 'Oh freedom of speech, freedom of speech.' These are foolish people,” he said in a 2015 speech.
archiving  politics  canada  us 
november 2016 by juliusbeezer
WikiLeaks specializes in publishing, curating, and ensuring easy access to full online archives of information that has been censored or suppressed, or is likely to be lost. An understanding of our historical record enables self-determination; publishing and ensuring easy access to full archives, rather than just individual documents, is central to preserving this historical record. Since publishing Cablegate, WikiLeaks has continued to work to make PlusD the most complete online archive of US Department of State documents, adding to the library each year with newly available cables and other documents from the State Department communications system. It can be accessed through a set of specially developed search interfaces at
reading  research  writing  wikileaks  agnotology  attention  journalism  history  archiving 
november 2016 by juliusbeezer
How to share large amounts of research video data with Syncthing | Saul Albert
Firstly, Syncthing is an open source, peer-to-peer file sharing system, which means it is relatively cheap, simple to set up and secure. It is especially good for sharing very large files and collections of files without having to pay or to trust an intermediary to maintain a centralized file server. You run Syncthing on your computer, the person you want to share files with runs it on theirs, and you can set up folders that will synchronize files automatically when both your computers are turned on and connected to the internet. The data is encrypted in transit, and never sits on someone else’s server. Syncthing only requires each user to have a normal computer or laptop, rather than running on a server with each person using a ‘client’. This is more secure, and probably less complex to set up and maintain.
archiving  tools 
july 2016 by juliusbeezer
How the CIA Writes History
When I asked to see the Cram and Applewhite papers, a staff archivist told me both collections had been removed from public view. The CIA, he explained, was reviewing the boxes for “security material.” He said he thought the material would be returned “by the fall” of 2015. When I asked to see the library records for the Cram papers again, I was told the CIA had removed those from public view, too.

“They knew you were coming,” Tim Weiner told me. Author of the best-selling CIA history Legacy of Ashes, Weiner suggested the agency had learned I was writing an Angleton biography and acted preemptively to protect itself.

Perhaps insufficiently paranoid, I hadn’t thought of that possibility, but I can’t dismiss it now. Trade publications reported in January 2015 that I had signed a contract for the Angleton biography. The Cram and Applewhite papers were removed from public view in the spring of 2015, according to one Georgetown employee.
history  archiving  agnotology  us 
april 2016 by juliusbeezer
The Imaginary Journal of Poetic Economics: Dramatic Growth of Open Access March 31, 2016
Update April 12: congratulations to Bielefeld Academic Search Engine (BASE) - and all of the contributing repositories - now over 90 million documents. On the Global Open Access List, BASE's Dirk Pieper estimates that 60% of the content is open access.

There are now 150 publishers of peer-reviewed open access books listed in the Directory of Open Access Books, publishing more than 4,400 open access books. 620 books were published in this quarter alone, a 16% increase in just this quarter. The Directory of Open Access Journals has been adding titles at a net rate of 6 titles per day, 540 journals added this quarter for a total of over 11,000 journals. This is the highest DOAJ growth rate since this series started!
openaccess  scholarly  archiving  repositories 
april 2016 by juliusbeezer
What do we need to know about the archived web? | Webstory: Peter Webster's blog
A theme that emerged for me in the IIPC web archiving conference in Reykjavik last week was metadata, and specifically: precisely which metadata do users of web archives need in order to understand the material they are using?
...During my own paper I referred to the issue, and was asked by a member of the audience if I could say what such enhanced metadata provision might look like. What I offer here is the first draft of an answer: a five-part scheme of kinds of metadata and documentation that may be needed (or at least, that I myself would need). I could hardly imagine this would meet every user requirement; but it’s a start.

1. Institutional
At the very broadest level, users need to know something of the history of the collecting organisation, and how web archiving has become part of its mission and purpose. I hope to provide a overview of aspects of this on a world scale in this forthcoming article on the recent history of web archiving.
archiving  web 
april 2016 by juliusbeezer
Photographing software | david mcClure
What if – instead of trying to preserve software – we tried to systematically record it like this? What would that look like? What kind of information would we want to capture? What matters, what doesn’t matter? How should it be structured and archived? I guess I’d start by taking a huge battery of screenshots, capturing all possible states of the application and on all different screen sizes – a high-resolution desktop monitor, a laptop, an iPad, a phone. Then, to capture a sense of the lived, in-motion experience of the code, I’d round up a diverse set of contributors and stakeholders – PIs, developers, students, colleagues, target users – and have each record a 20-30 minute screencast of the software, walking through the functionality in detail and talking out an account of what’s there, what’s not there, what works well, what doesn’t work well, how (whether) it fits into some kind of larger technical or intellectual project. Maybe most important – write this, bake it off into a kind of technological-ethnographic account of the software, something that records the body of tacit knowledge about information design, interaction mechanics, and knowledge representation that’s always created in the process of trying to make software that actually works. We’d essentially be “taping” the software, sort of like a Grateful Dead concert.
software  archiving  history  linkrot  ubuntu 
march 2016 by juliusbeezer
Handful of Biologists Went Rogue and Published Directly to Internet - The New York Times
Such postings are known as “preprints’’ to signify their early-stage status, and the 2,048 deposited on three-year-old bioRxiv over the last year represent a barely detectable fraction of the million or so research papers published annually in traditional biomedical journals.
preprint  openaccess  archiving 
march 2016 by juliusbeezer
Should there be greater use of preprint servers for publishing reports of biomedical science? - F1000Research
We know remarkably little, formally, about why researchers do and don’t do the things that they do and don’t do. Some efforts to secure research funding to investigate why researchers don’t publish reports of their research have not been successful (Professor Mary Dixon-Woods, personal communication). If the attractive vision of a more efficient publishing model for the life sciences is to be promoted effectively, research is needed to find answers to the questions raised by Tracz and Lawrence themselves: why are researchers reluctant to post preprints, and will sufficient other researchers post useful and critical comments on them to make the effort worthwhile?
sciencepublishing  medicine  preprint  archiving  scholarly  research 
march 2016 by juliusbeezer
The Internet Archive Turns 20: A Behind The Scenes Look At Archiving The Web - Forbes
The Internet Archive does not crawl all sites equally nor is our crawl frequency strictly a function of how popular a site is.” He goes on to caution “I would expect any researcher would be remiss to not take the fluid nature of the web, and the crawls of the [Internet Archive], into consideration” with respect to interpreting the highly variable nature of the Archive’s recrawl rate.

Though it acts as the general public’s primary gateway to the Archive’s web materials, the Wayback Machine is merely a public interface to a limited subset of all these holdings. Only a portion of what the Archive crawls or receives from external organizations and partners is made available in the Wayback Machine, though as Mr. Graham noted there is at present “no master flowchart of the source of captures that are available via the Wayback Machine” so it is difficult to know what percent of the holdings above can be found through the Wayback Machine’s public interface. Moreover, large portions of the Archive’s holdings carry notices that access to them is restricted, often due to embargos, license agreements, or other processes and policies of the Archive...
This is in marked contrast to the description that is often portrayed of the Archive by outsiders as a traditional centralized continuous crawl infrastructure, with a large farm of standardized crawlers ingesting the open web and feeding the Wayback Machine akin to what a traditional commercial search engine might do. The Archive has essentially taken the traditional model of a library archive and brought it into the digital era, rather than take the model of a search engine and add a preservation component to it.
archiving  internet  linkrot  dccomment 
january 2016 by juliusbeezer
Elsevier Granted Injunction Against Research Paper 'Pirate Site;' Which Immediately Moves To New Domain To Dodge It | Techdirt
Not officially part of the open-access movement are repositories run by Alexandra Elbakyan, a researcher born and educated in Kazakhstan. Elbakyan's first efforts to liberate documents from behind publisher paywalls were limited to fulfilling requests made by other researchers in online forums. When she saw the demand far exceeded the supply, she automated the process, stashing the documents at
archiving  arxiv  scholarly  finance  dccomment 
december 2015 by juliusbeezer at Society for Neuroscience | Hypothesis
Researchers clearly saw the value in incorporating into the scientific workflow, particularly during the peer review process, where the ability to use targeted annotations of particular phrases or sentences was seen as a very valuable means to improve the review process for authors, reviewers and editors alike.

Researchers were also excited by the educational and collaborative opportunities of web annotation, and asked whether one could annotate in groups with their colleagues. I am happy to let everyone know that the private group annotation launched on November 3rd. Thanks to our program in education, and our educational director, Jeremy Dean, is, in fact, enjoying robust use in the classroom.

But all of the above activities are carried out privately or semi-privately. What about “the Internet, peer reviewed”? This tag line brought people to the booth, but the possibility of putting a public knowledge layer over the scientific literature and related materials both excited and concerned many neuroscientists. Many recognized that our current methods for reporting scientific findings would benefit from an interactive, public layer where questions could be asked and answered and where additional information could be provided. Those that blogged liked the idea of “blogging in place” on articles or news articles that fell into their area of expertise.
peerreview  commenting  archiving 
november 2015 by juliusbeezer
Fund: On-Demand Web Archiving of Annotated Pages | Hypothesis
Whenever a web page changes or disappears, annotations on the page may no longer be viewable, unless the original content is preserved. The purpose of this project is to ensure that an archival recording is made of the annotated page.

The proposal is to build a simple service which will be triggered when an annotation is made and archive the full page by loading a headless browser through an existing web archiving tool.
annotation  commenting  archiving 
november 2015 by juliusbeezer
How Much Of The Internet Does The Wayback Machine Really Archive? - Forbes
In my opening keynote address at the 2012 IIPC General Assembly at the Library of Congress, I noted that for scholars to be able to use web archives for research, we needed far greater information on how those archives were being constructed. Three and a half years later few major web archives have produced such documentation, especially relating to the algorithms that control what websites their crawlers visit, how they traverse those websites, and how they decide what parts of an infinite web to preserve with their limited resources. In fact, it is entirely unclear how the Wayback Machine has been constructed, given the incredibly uneven landscape it offers of the top one million websites, even over the past year.
archiving  internet  linkrot 
november 2015 by juliusbeezer
The Internet's Dark Ages - The Atlantic
More recently, a researcher at the Internet Archive has been running an analysis on the Wayback Machine to figure out what survives. “It won't be surprising to say that preliminary findings are showing things stick around for much shorter and changing constantly before they disappear,” Scott told me.
Saving something on the web, just as Kevin Vaughan learned from what happened to his work, means not just preserving websites but maintaining the environments in which they first appeared—the same environments that often fail, even when they’re being actively maintained. Rose, looking ahead hundreds of generations from now, suspects “next to nothing” will survive in a useful way. “If we have continuity in our technological civilization, I suspect a lot of the bare data will remain findable and searchable,” he said. “But I suspect almost nothing of the format in which it was delivered will be recognizable.”
archiving  internet  linkrot 
october 2015 by juliusbeezer
7 tips for successful harvesting | CORE
the harvesting process is a two way relationship, were the content provider and the aggregator need to be able to communicate and have a mutual understanding. For a successful harvesting it is recommended that content providers apply the following best practices (some of the following recommendations relate generally to harvesting, while some are CORE specific):

Platform: For those who haven’t deployed a repository yet, it is highly advised that the repository platform is not built in house, but one of the industry standard platforms is chosen.
repositories  openaccess  archiving 
october 2015 by juliusbeezer
The characteristics of a register | Government Digital Service
what do we mean when we say “register”?

Across government we manage and hold data that we need to deliver services to users and to inform policymaking. We make that data in a variety of ways — from bespoke online tools, dumps of databases, through to published lists. A question we’re often asked is:

What is a register, how is it more than just a database, a statistical report, or a simple list?

To try and answer this question we’ve started to collect a list of characteristics based on the things we discovered during our early discovery and alpha work.
opendata  archiving  informationmastery 
october 2015 by juliusbeezer
E-Journal Preservation Service – Portico
Since 2005, we have worked with publishers and libraries to preserve a rapidly increasing number of e-journals through our E-Journal Preservation Service. This service operates on a community model, through which both publishers and libraries help to defray the ongoing costs of operating the archive, including the IT infrastructure set up to ingest, archive, and migrate the content committed to the archive.

Portico provides access to its library participants when specific conditions or “trigger events” occur, which cause journal titles to no longer be available from the publisher or any other source:
archiving  library  journals  sciencepublishing 
october 2015 by juliusbeezer
Open Document Format: Using Officeshots and ODFAutoTesting for Sustainable Documents |
Officeshots is great if you want to see how a specific ODF will appear to a user or if you want to examine the output ODF file that the application will produce when saving an update to the file. On the other hand, you might like to quickly be able to see if a feature is supported in a specific application, such as LibreOffice 4.3.x. The ODFAutoTests project aims to provide a methodical test of each attribute for each important document element for many versions of many office suites.

Currently, about 10 major document elements are included in the test results. For each feature tested, a screenshot of how that feature renders is presented along with an investigation into the output ODF produced and whether that feature has been preserved in the output document (Figure 3).
tools  text_tools  archiving 
october 2015 by juliusbeezer
News Sniffer Blog » About News Sniffer
News Sniffer monitors news websites and detects when articles change. The versions are viewable and the changes are highlighted.

It currently monitors a few key feeds from the BBC News website, The Guardian, The New York Times, The Independent, The Washington Post and The Intercept.

News Sniffer was written by John Leach. You can contact News Sniffer by email at

You can follow News Sniffer on Twitter too, @news_sniffer

The code for News Sniffer is released as free and open source and is available at github.
news  journalism  tools  archiving 
october 2015 by juliusbeezer
How Canada's Tories destroyed the country's memory, and its capacity to remember / Boing Boing
Canada has a new underground of scientists and statisticians and wonks who've founded a movement called LOCKSS -- "Lots of copies, keep stuff safe" -- who make their own archives of disappeared data, from the libraries of one-of-a-kind docs that have been literally incinerated or sent to dumpsters to the websites that vanish without notice. There's an election this October -- perhaps we can call on them then to restore the country's lost memory.
archiving  library  economics  science  openness 
september 2015 by juliusbeezer
IRUS-UK is a national aggregation service which contains details of all content downloaded from participating UK institutional repositories (IRs). It follows on from the successful PIRUS2 project ( ), which demonstrated how COUNTER-compliant article-level usage statistics could be collected and consolidated from Publishers and Institutional Repositories, IRUS-UK is a Jisc-funded repository and infrastructure service.
repositories  archiving  openaccess  opendata 
september 2015 by juliusbeezer
When using an archive could put it in danger | Webstory: Peter Webster's blog
“At some point after the content in question was removed from the original website, the [Conservative] party added the content in question to their robots.txt file. As the practice of the Internet Archive is to observe robots.txt retrospectively, it began to withhold its copies, which had been made before the party implemented robots.txt on the archive of speeches. Since then, the party has reversed that decision, and the Internet Archive copies are live once again.

Courtesy of wfryer on, CC BY-SA 2.0 :

Courtesy of wfryer on, CC BY-SA 2.0 :

As public engagement lead for the UK Web Archive at the time, I was happily able to use the episode to draw attention to holdings of the same content in UKWA that were not retrospectively affected by a change to the robots.txt of the original site.
archiving  agnotology  censorship  privacy 
august 2015 by juliusbeezer
ML: About the Macaulay Library
Scientists worldwide use our audio and video recordings to better understand and preserve our planet. Teachers use our sounds and videos to illustrate the natural world and create exciting interactive learning opportunities. We help others depict nature accurately and bring the wonders of animal behavior to the widest possible audience. It is an invaluable resource at your fingertips.
sound  archiving 
august 2015 by juliusbeezer - help - Frequently Asked Questions
Freesound aims to create a huge collaborative database of audio snippets, samples, recordings, bleeps, ... released under Creative Commons licenses that allow their reuse. Freesound provides new and interesting ways of accessing these samples, allowing users to:

browse the sounds in new ways using keywords, a "sounds-like" type of browsing and more
upload and download sounds to and from the database, under the same creative commons license
interact with fellow sound-artists!

We also aim to create an open database of sounds that can also be used for scientific research. Many audio research institutions have trouble finding correctly licensed audio to test their algorithms. Many have voiced this problem, but so far there hasn't been a solution.
sound  archiving 
august 2015 by juliusbeezer
In the kingdom of the bored, the one-armed bandit is king | ROUGH TYPE
The machine zone is where we spend much of our time these days. It extends well beyond the traditional diversions of media and entertainment and gaming. The machine zone surrounds us. You go for a walk, and you find that what inspires you is not the scenery or the fresh air or the physical pleasure of the exercise, but rather the mounting step count on your smartphone’s exercise app. “If I go just a little farther,” you tell yourself, glancing yet again at the interface, “the app will reward me with a badge.” The mechanism is more than beguiling. The mechanism knows you, and it cares about you. You give it your attention, and it tells you that your attention has not been wasted.
attention  screwmeneutics  goldsmith  drummond  music  archiving 
july 2015 by juliusbeezer
It’s a Mistake to Mistake Content for Content - The Los Angeles Review of Books
“The Complete Works of Morton Feldman.” I was surprised to see it there; I didn’t remember downloading it. Curious, I looked at its date — 2009 — and realized that I must’ve grabbed it during the heyday of MP3 sharity blogs. I opened it to find 79 albums as zipped files. I unzipped three of them, listened to part of one, and closed the folder. I haven’t opened it since.

My experience with Feldman indicates how, in a time when cultural artifacts are abundantly available, our primary focus has migrated from use to acquisition; I have more MP3s than I’ll ever be able to listen to in the next 10 lifetimes, yet I compulsively keep downloading more. In this way our role as librarians and archivists has outpaced our role as cultural consumers.
goldsmith  art  archiving  screwmeneutics 
july 2015 by juliusbeezer
Personal web pages on digital repositories. « Henry Rzepa
Symplectic... allows a researcher to upload the final accepted version of a manuscript. At Imperial College, a digital repository called Spiral serves this purpose and also acts as the front end for collecting informative metadata to enhance discoverability. The final accepted version is then converted by the publisher into a version-of-record. This contains styling unique to the publisher and the content is subjected to further scrutiny by the authors as proof corrections. In an ideal world, these latter changes should also be faithfully propagated back to the final accepted version, as would all the supporting information associated with the article...
I became concerned about the existence of two versions of any given scientific report and that the task of ensuring total fidelity in the content of both versions may negatively impact on the author’s time. Much better if the publisher could grant permission for the author to archive the version-of-record into a digital repository...
In an afternoon I had processed most of my ROMEO green articles. You know how it is sometimes, you do not read the fine print! And so the library soon informed me that archival of ROMEO GREEN was in fact only permitted on the author’s “personal web page”. Spiral, as an institutional repository, does not apparently constitute a personal web page for me and so none of my Symplectic submissions could be accepted for archival there...A repository is designed to hold metadata in a formal and standards-based manner and metadata helps achieve FAIR (findable, accessible, interoperable, reusable). So I asked the Royal Society of Chemistry (as a ROMEO GREEN publisher) whether a personal web page hosted on a digital repository would quality. I was soon informed that I had proposed a neat solution here, and they couldn’t see an issue.
openaccess  archiving  repositories 
june 2015 by juliusbeezer
British spies betrayed to Russians and Chinese -
A senior Downing Street source said: “It is the case that Russians and Chinese have information. It has meant agents have had to be moved and that knowledge of how we operate has stopped us getting vital information. There is no evidence of anyone being harmed.”
surveillance  privacy  journalism  archiving 
june 2015 by juliusbeezer
Rehashing PIDs without stabbing myself in the eyeball | CrossTech
About once a year somebody suggests that we could replace existing persistent citation identifiers (e.g. DOIs) with some new technology that would fix some of the weaknesses of the current systems. Usually said person is unhappy that current systems like DOI, Handle, Ark,, etc. depend largely on a social element to update the pointers between the identifier and the current location of the resource being identified.
scholarly  archiving  repositories  identity  linkrot 
june 2015 by juliusbeezer
How much storage do CERN have available?

Zenodo is currently a drop in the ocean. CERN currently stores more than 100PB of physics data from the Large Hadron Collider (LHC), and produces roughly 25PB per year when the LHC is running.
tools  archiving  scholarly 
may 2015 by juliusbeezer
Science Europe Posts “New Principles on Open Access Publisher Services” | LJ INFOdocket
four new common principles on Open Access Publisher Services. The Principles, which were prepared by Science Europe’s Working Group on Open Access to Scientific Publications,
sciencepublishing  openaccess  archiving 
april 2015 by juliusbeezer
Locking the Web Open: Rethinking the World Wide Web
In part, it would be based on peer-to-peer technologies — systems that aren’t dependent on a central host or the policies of one particular country. In peer-to-peer models, those who are using the distributed Web are also providing some of the bandwidth and storage to run it. Instead of one web server per website we would have many. The more people or organizations that are involved in the distributed Web, the safer and faster it will become.
In part, it would be based on peer-to-peer technologies — systems that aren’t dependent on a central host or the policies of one particular country. In peer-to-peer models, those who are using the distributed Web are also providing some of the bandwidth and storage to run it. Instead of one web server per website we would have many. The more people or organizations that are involved in the distributed Web, the safer and faster it will become.
internet  archiving  linkrot  censorship 
april 2015 by juliusbeezer
Planned Obsolescence | Publishing, Technology, and the Future of the Academy | Books | NYU Press
Academic institutions are facing a crisis in scholarly publishing at multiple levels: presses are stressed as never before, library budgets are squeezed, faculty are having difficulty publishing their work, and promotion and tenure committees are facing a range of new ways of working without a clear sense of how to understand and evaluate them.

Planned Obsolescence is both a provocation to think more broadly about the academy’s future and an argument for reconceiving that future in more communally-oriented ways. Facing these issues head-on, Kathleen Fitzpatrick focuses on the technological changes—especially greater utilization of internet publication technologies, including digital archives, social networking tools, and multimedia—necessary to allow academic publishing to thrive into the future. But she goes further, insisting that the key issues that must be addressed are social and institutional in origin.
sciencepublishing  scholarly  digitalhumanities  archiving  publishing 
april 2015 by juliusbeezer
Science in the Open » Blog Archive » Remembering Jean-Claude Bradley
Today I often struggle to sympathise with other people’s fears of what might happen if they open up, precisely because Jean-Claude forced me to confront those fears early on and showed me how ill founded many of them are...

In his constant quest to get more of the research process online as fast as possible Jean-Claude would grab whatever tools were to hand. Wiki platforms, YouTube, Blogger, SecondLife, Preprint servers, GoogleDocs and innumerable other tools were grasped and forced into service, linked together to create a web, a network of information and resources. Sometimes these worked and sometimes they didn’t...

in the hours after we heard the news [of his death] I didn’t realise the importance of preserving Jean-Claude’s work. I think its important to recognise that it was information management professionals who immediately realised both the importance of preservation and the risks to the record and set in motion the processes necessary to start that work. I remain, like most researchers I suspect, sloppy and lazy about proper preservation and we need the support of professionals who understand the issues and technical challenges, but also are engaged with preservation of works and media outside the scholarly mainstream if science that is truly on the web is to have a lasting impact. The role of a research institution, if it is to have one in the future, is in part to provide that support, literally to insitutionalise the preservation of digital scholarship.
archiving  openscience  opennotebook  sciencepublishing 
april 2015 by juliusbeezer
Trustworthiness: Self-assessment of an Institutional Repository against ISO 16363-2012
Preserving digital objects is more challenging than preserving items on paper. Hardware becomes obsolete, new software replaces old, storage media degrades. In recent years, there has been significant progress made to develop tools and standards to preserve digital media, particularly in the context of institutional repositories. The most widely accepted standard thus far is the Trustworthy Repositories Audit and Certification: Criteria and Checklist (TRAC), which evolved into ISO 16363-2012. Deakin University Library undertook a self-assessment against the ISO 16363 criteria.
repositories  linkrot  library  archiving 
march 2015 by juliusbeezer
The Winnower | DIY Scientific Publishing
This claim that self-archiving is a quick and easy task is one green OA advocates frequently make. Like HEFCE, they insist that depositing a paper in an institutional repository can be undertaken by authors in minutes — or more precisely 40 minutes per year for a highly active researcher.

More recently, a survey commissioned by SPARC Europe and London Higher estimated that it takes more like 48 minutes per output to deposit works in a repository. However, time is not the main issue here: self-archiving is proving sufficiently complicated that the task is having to be handed off to intermediary librarians trained in copyright and metadata issues, rather than being done by researchers themselves.
openaccess  archiving  repositories  finance 
march 2015 by juliusbeezer
hiberlink - Developments
The Hiberlink plugin for Zotero assist authors in proactively archiving at-risk digital content. Zotero is a free, easy-to-use tool that help with collecting, organising, citing and sharing g research sources. (
Zotero workflow diagram

The Hiberlink plugin creates a backup of the reference with an archival service and keeps a record of this backup. Should the original resource change or disappear the plugin allows you to view the content as it was when you referenced it. The archived references can be exported and used to refer to the original resource in other publications. These are compatible with Memento ( protocol.
linkrot  zotero  archiving  scholarly 
february 2015 by juliusbeezer
DSHR's Blog: The Evanescent Web
Papers drawing attention to the decay of links in academic papers have quite a history, i blogged about three relatively early ones six years ago. Now Martin Klein and a team from the Hiberlink project have taken the genre to a whole new level with a paper in PLoS One entitled Scholarly Context Not Found: One in Five Articles Suffers from Reference Rot. Their dataset is 2-3 orders of magnitude bigger than previous studies, their methods are far more sophisticated, and they study both link rot (links that no longer resolve) and content drift (links that now point to different content). There's a summary on the LSE's blog.

Below the fold, some thoughts on the Klein et al paper.

As regards link rot, they write:
linkrot  archiving  internet  scholarly 
february 2015 by juliusbeezer
On "Diamond OA," "Platinum OA," "Titanium OA," and "Overlay-Journal OA," Again - Open Access Archivangelism
I have a longstanding problem with the term "overlay journal" that I have rehearsed before. Overlay of what on what?

The notion of an "overlay journal" was first floated by Ginsparg for Arxiv. Arxiv contains authors' unrefereed, unpublished preprints and then their refereed, published postprints. Ginsparg said that eventually journals could turn into "overlays" on the Arxiv deposits, corresponding roughly to the transition from preprint to postprint. The "overlay" would consist of the peer review, revision, and then the journal title as the "tag" certifying the officially accepted version.

But in that sense, all Gold OA journals are "overlay journals" once they have phased out their print edition:

The "overlay" of the peer review service and then the tagging of the officially accepted version could be over a central repository, over distributed institutional repositories, or over the publsher's (OA) website.

Even a non-OA subscription journal would be an "overlay" journal if it had phased out its print edition: The peer review and certification tag would simply be an "overlay" on an online version, regardless of where it was located, and even regardless of whether it was OA or non-OA. (Once we get this far, we see that even for print journals the peer review and certification is just an "overlay").

What I think this reveals is that in the online era (and especially the OA era) the notion of "overlay" is completely redundant: Once we note that the print edition was just a technical detail of the Gutenberg era, we realize that journal publishing consists (and always implicitly consisted) of two components: access-provision and quality-control/certification (peer-review/editing). The latter is always an "overlay" on the former. And once the print edition is gone, it's an overlay on a digital template that can be here, there or everywhere. It is simply a tagged digital file.
overlay  sciencepublishing  openaccess  archiving  repositories 
february 2015 by juliusbeezer
PeerJ Grows Steadily With Papers, Authors | The Scholarly Kitchen
If you ask a scientist if they would publish in PeerJ or any other OA journal that asks anything more than 100 US$ to publish one PDF file if they had to pay from their own pocket (and I don’t mean from their own lab funds, I really mean from their own hard-worked salary), I think you might find an overwhelmingly universal no. In other words, it is precisely because global academia, especially from developed countries, is built up so incorrectly, providing funding willy-nilly to pay for these ridiculous OA fees, that we are seeing anger, frustration and chaos building up. It is precisely because academic institutions have been brain-washed (by the publishers?) into thinking that if they don’t offer their scientists often vast reserves of money to pay for OA fees then their reputations will suffer a serious knock, that we are seeing this constant gaming and reassessing of the risks to profitability by OA journals like PeerJ which are, to me, nothing more than experimental ventures rather than true academic bodies. It is only the elite managers and marketing fellas at the top of the food chain that are always seeing things in a rosy light. This is nothing less than a classical class war of modern times. And at some point, there will be revolt
sciencepublishing  openaccess  scholarly  archiving  repositories  commenting 
february 2015 by juliusbeezer
Never trust a corporation to do a library’s job — The Message — Medium
In 2001, Google made their first acquisition, the Deja archives. The largest collection of Usenet archives, Google relaunched it as Google Groups, supplemented with archived messages going back to 1981.
In 2006, Google News Archive launched, with historical news articles dating back 200 years. In 2008, they expanded it to include their own digitization efforts, scanning newspapers that were never online.

In the last five years, starting around 2010, the shifting priorities of Google’s management left these archival projects in limbo, or abandoned entirely.

After a series of redesigns, Google Groups is effectively dead for research purposes. The archives, while still online, have no means of searching by date.
google  search  archiving  internet 
january 2015 by juliusbeezer
Library workers under scrutiny | Local | The Register-Guard | Eugene, Oregon
Two University of Oregon librarians — who likely are also UO archivists — are under investigation in the leak of 22,000 documents sent and received by UO presidents between 2010 and 2014.

The presidential documents were placed in the library’s open archives without redacting student names, which the university assumes violates the Family Educational Rights and Privacy Act.

In early December, an unnamed professor requested — and received — a copy of the archives. So far, he has released only one document, which contained no student names but a revelation about an administrative proposal to disband the University Senate.

The university gave the professor until 5 p.m. Thursday to return the electronic file containing the trove of documents.
archiving  UO  library  security  confidentiality 
january 2015 by juliusbeezer
What the Web Said Yesterday
the Wayback Machine saved a screenshot of Strelkov’s VKontakte post about downing a plane. Two hours and twenty-two minutes later, Arthur Bright, the Europe editor of the Christian Science Monitor, tweeted a picture of the screenshot, along with the message “Grab of Donetsk militant Strelkov’s claim of downing what appears to have been MH17.” By then, Strelkov’s VKontakte page had already been edited: the claim about shooting down a plane was deleted. The only real evidence of the original claim lies in the Wayback Machine.…
When Kahle started the Internet Archive, in 1996, in his attic, he gave everyone working with him a book called “The Vanished Library,” about the burning of the Library of Alexandria. “The idea is to build the Library of Alexandria Two,” he told me. (The Hellenism goes further: there’s a *partial* backup of the Internet Archive in Alexandria, Egypt.)…
The Wayback Machine is humongous, and getting humongouser. You can’t search it the way you can search the Web, because it’s too big and what’s in there isn’t sorted, or indexed, or catalogued in any of the many ways in which a paper archive is organized; it’s not ordered in any way at all, except by URL and by date. To use it, all you can do is type in a URL, and choose the date for it that you’d like to look at. It’s more like a phone book than like an archive. Also, it’s riddled with errors.…
The footnote problem, though, stands a good chance of being fixed. Last year, a tool called was launched. It was developed by the Harvard Library Innovation Lab, and its founding supporters included more than sixty law-school libraries, along with the Harvard Berkman Center for Internet and Society, the Internet Archive, the Legal Information Preservation Alliance, and the Digital Public Library of America. promises “to create citation links that will never break.” It works something like the Wayback Machine’s “Save Page Now.” If you’re writing a scholarly paper and want to use a link in your footnotes, you can create an archived version of the page you’re linking to, a “permalink,” and anyone later reading your footnotes will, when clicking on that link, be brought to the permanently archived version.
internet  archiving  repositories  linkrot 
january 2015 by juliusbeezer
arXiv Update - January 2015 - CUL Public Wiki - Confluence
arXiv's sustainability plan is founded on and presents a business model for generating revenues. Cornell University Library (CUL), the Simons Foundation, and a global collective of institutional members support arXiv financially. The financial model for 2013-2017 entails three sources of revenues:

CUL provides a cash subsidy of $75,000 per year in support of arXiv's operational costs. In addition, CUL makes an in-kind contribution of all indirect costs, which currently represents 37% of total operating expenses.
The Simons Foundation contributes $50,000 per year in recognition of CUL's stewardship of arXiv. In addition, the Foundation matches $300,000 per year of the funds generated through arXiv membership fees.
Each member institution pledges a five-year funding commitment to support arXiv. Based on institutional usage ranking, the annual fees are set in four tiers from $1,500-$3,000.

In 2014, Cornell raised approximately $341,000 through membership fees from 183 institutions and the total revenue (including CUL and Simons Foundation direct contributions) is around $766,000. We are grateful for Simons Foundation's support. The gift has encouraged long-term community support by lowering arXiv membership fees, making participation affordable to a broader range of institutions. This model aims to ensure that the ultimate responsibility for sustaining arXiv remains with the research communities and institutions that benefit from the service most directly.
arxiv  archiving  repositories  preprint  openaccess  scholarly  oa 
january 2015 by juliusbeezer
My thoughts on Generation Open - Ross Mounce
3) What can librarians do to support ECRs in regards to being open?

Go out into departments and speak to people. Give energetic presentations in collaboration with an enthusiastic researcher in that department (sometimes a librarian alone just won’t get listened to). Academics sorely need to know:

* the cost of academic journal subscriptions
* that using journal impact factors to assess an individual’s research is statistically illiterate practice
* the cost of re-using non-open research papers for teaching purposes (licencing)
* What Creative Commons licences are, and why CC BY or CC0 are best for open access
* new research tools that support open research: Zenodo, Dryad, Github, Sparrho, WriteLatex etc…
openaccess  politics  library  archiving  repositories 
december 2014 by juliusbeezer
How Bitcoin's Block Chain Could Stop History Being Rewritten
Assange found a solution in the evolving block-chain technology. This provides decentralized solutions to the problems of centralized time stamping, as this requires trust in central authority, making it susceptible to third-party alteration and intervention.

Bitcoin’s distributed trust network can offer immunity from central control of any historical record. Assange described the basic premise of this technology as a network of consensus where “you can prove a particular statement, particular consensus and particular contract that happened at a particular time globally and it requires the subversion of every single jurisdiction where people are running bitcoin to overturn that”.
bitcoin  archiving  attention  internet  history  wikileaks 
october 2014 by juliusbeezer
FAQ | is a global registry of research data repositories. The registry covers research data repositories from different academic disciplines. presents repositories for the permanent storage and access of data sets to researchers, funding bodies, publishers and scholarly institutions. aims to promote a culture of sharing, increased access and better visibility of research data.
repositories  openaccess  opendata  archiving 
september 2014 by juliusbeezer
References, Please by Tim Parks | NYRblog | The New York Review of Books
There is, in short, an absolutely false, energy-consuming, nit-picking attachment to an outdated procedure that now has much more to do with the sad psychology of academe than with the need to guarantee that the research is serious. By all means, on those occasions where a book exists only in paper and where no details about it are available online, then let us use the traditional footnote. Otherwise, why not wipe the slate clean, start again, and find the simplest possible protocol for ensuring that a reader can check a quotation.
citation  scholarly  reference  archiving 
september 2014 by juliusbeezer
Asserting Rights We Don’t Have: Libraries and “Permission to Publish” | Peer to Peer Review
Special rules for special collections:

"these strike me as remarkably weak justifications for imposing an entirely artificial restriction on our patrons’ legal reuse of public domain content—for acting, in short, as if our ownership of physical copies of these documents entitles us to limit and control the use of those documents’ intellectual content. Again: this isn’t an issue of legal rights, strictly speaking—if we own a copy of a document, there’s nothing technically illegal about denying someone physical access to that copy for any reason we care to come up with, even if the document’s intellectual content is in the public domain. This is an issue of professional standards and ethics. As a profession that proclaims loudly and often its support for the free and open sharing of information—one that, in fact, regularly calls for the free distribution and unrestricted reuse of documents arising from publicly funded research—how can we, with a straight face, make people ask our permission to exercise the rights of redistribution and reuse that the law provides them, whether for private, public, commercial, or noncommercial purposes? And as a profession that proclaims its support for principles of intellectual freedom, how can we justify asking patrons to tell us, ahead of time, in what kind of publications and for what purposes they intend to republish public domain content
library  archiving  copyright  publishing 
september 2014 by juliusbeezer
The University of Openess | Conversational Aesthetics
Saul's current home page offers a nice way into old UO material at now hosted at
education  archiving  UO 
august 2014 by juliusbeezer
Système pour l’Information en Littérature Grise en Europe vous permet un accès libre à 700 000 références bibliographiques de littérature grise papier produites en Europe et vous facilite l'accès aux documents via l'export et leur localisation.

La littérature grise comprend des rapports techniques ou de recherche, des thèses de doctorat, des actes de congrès, des publications officielles, etc.
sciencepublishing  scholarly  eu  archiving  preprint  openaccess 
july 2014 by juliusbeezer help - Frequently Asked Questions on Public Statistics
the very availability of statistics could have many undesirable consequences. Some segment of submitters might find it highly embarrassing to see how little read is (and consequently how little impact has) their magnum opus on which they've expended so much quality time and effort. Furthermore, we don't wish to encourage people to try somehow to distort public statistics in their favor either by self-generating large numbers of requests, or by repeated replacement of their submission to generate multiple repeat requests.
altmetrics  archiving  arxiv 
july 2014 by juliusbeezer
Which preprint server should I use? | Jabberwocky Ecology | The Weecology Blog
Preprints are rapidly becoming popular in biology as a way to speed up the process of science, get feedback on manuscripts prior to publication, and establish precedence (Desjardins-Proulx et al. 2013). Since biologists are still learning about preprints I regularly get asked which of the available preprint servers to use. Here’s the long-form version of my response.
sciencepublishing  archiving 
july 2014 by juliusbeezer
How to do Twitter research on a shoestring | Poynter.
That source of reliable, inexpensive online access to the Twitter firehose has become almost a Holy Grail for journalism professors in the U.S. and Canada who I surveyed this June using a Google form.
twitter  search  archiving 
june 2014 by juliusbeezer
Thanks to WikiLeaks, public can debate alarming new trade deal | Al Jazeera America
Most of us give no thought to recordkeeping. But having locally available, verifiable and reliable records has played a major and underappreciated role in the creation of wealth, the reduction of violence and making sure the guilty are convicted while the innocent go free.

Requiring business records and setting rules for what they must contain date to before the Code of Hammurabi nearly four millennia ago, as I teach my Syracuse University College of Law students. Standards pertaining to business records were so important that they are found in seven of Hammurabi’s first 12 codes. More appear later among the 282 rules, some of which have been lost to time.

Hammurabi’s Code drew on several thousand years of experience, but the issues remain relevant to banks, insurers, trust companies and warehousing.
politics  law  tisa  wikileaks  security  economics  business  agnotology  history  archiving 
june 2014 by juliusbeezer
15-Second Ultrafast Charging For | CleanTechnica
TOSA, or Trolleybus Optimisation Système Alimentation, is a pilot project that launched in Geneva that recharges itself every few stops at special stations. Developed by ABB, the purpose of the pilot project is to uncover the most efficient and least costly way to adapt mass transit to vehicle electrification. ABB is also working on software to help cities determine the optimal bus routes and locations for the quick-charge stations, which pop out of the bus’s roof and attach to a special charger. The potential for this electric bus to save many metro authorities big bucks can’t be understated.

The quick-charge system is enough to get the 133-passenger bus through a few stops, and Geneva anticipates adopting the system as part of its full-time transit system by 2017. While other bus companies are focused driving far on as few recharges as possible, the high cost of the big batteries may make them too costly for some smaller cities. But the TOSA can have a smaller battery thanks to the quick charge system, keeping costs down, without adding unnecessary downtime for charging.
transport  antiscrape  archiving  alternatif  renewables 
june 2014 by juliusbeezer
Opening the Books: Arfon Smith on How Easy Peer Review can Turn Repositories into Journals
For Arfon Smith, a co-founder of the citizen science site Zooniverse and now resident “science guy” at Github, the question he and a few collaborators wanted to answer was more basic—what makes a journal article, and how can that process be simplified? As an answer, or at least the beginning of one, Smith and his colleagues built The Open Journal, a software prototype built on top of the arXiv, an open physics research repository, that lets researchers and peer reviewers discuss edits inside a browser-based system and publish the results. The result eschews publishers—and copyediting, typesetting, and marketing—altogether, letting researchers bring their work directly to readers with a minimum of fuss. Smith spoke with LJ about the issues The Open Journal can address for researchers and how the model could develop in the future.
openaccess  archiving  overlay  sciencepublishing  scholarly  repositories 
june 2014 by juliusbeezer
Melissa Terras' Blog: Digitisation's Most Wanted
Last month, at a meeting at the National Library of Scotland, an interesting fact flew by me. The NLS has hundreds of thousands of digitised items online, so what do you think is the most popular, and most regularly accessed and/or downloaded? (it is difficult to make the distinction regarding accessed or downloaded on most sites.) Is it the original Robert Burns material? The last letter of Mary Queen of Scots? or any of the 86,000 maps held in this, one of the best map collections worldwide? No. It is "A grammar and dictionary of the Malay language : with a preliminary dissertation" by John Crawfurd, published in 1852. This is accessed by hundreds of people every month - mostly from Malaysia, partly because it is featured on many product pages providing definitions of malaysian words - demonstrating the surprising reach and potential in digitising items and then making them freely available online, reaching out to a worldwide audience far beyond the geographical local of the library itself. Wonderful.
archiving  repositories  altmetrics  internet  socialmedia 
may 2014 by juliusbeezer
Home Page — spreads 0.4 documentation
spreads is a tool that aims to streamline your book scanning workflow. It takes care of every step: Setting up your capturing devices, handling the capturing process, downloading the images to your machine, post-processing them and finally assembling a variety of output formats.
tools  archiving  bookselling  ebooks 
may 2014 by juliusbeezer
SciTechSociety: Sustainable Long-Term Digital Archives
How do we build long-term digital archives that are economically sustainable and technologically scalable? We could start by building five essential components: selection, submission, preservation, retrieval, and decoding.

Selection may be the least amenable to automation and the least scalable, because the decision whether or not to archive something is a tentative judgment call. Yet, it is a judgment driven by economic factors. When archiving is expensive, content must be carefully vetted. When archiving is cheap, the time and effort spent on selection may cost more than archiving rejected content.
may 2014 by juliusbeezer
The novel is dead (this time it's for real) | Books | The Guardian
I believe the serious novel will continue to be written and read, but it will be an art form on a par with easel painting or classical music: confined to a defined social and demographic group, requiring a degree of subsidy, a subject for historical scholarship rather than public discourse.
writing  literature  archiving 
may 2014 by juliusbeezer
[1309.4016] Who and What Links to the Internet Archive
The Internet Archive's (IA) Wayback Machine is the largest and oldest public web archive and has become a significant repository of our recent history and cultural heritage. Despite its importance, there has been little research about how it is discovered and used. Based on web access logs, we analyze what users are looking for, why they come to IA, where they come from, and how pages link to IA. We find that users request English pages the most, followed by the European languages. Most human users come to web archives because they do not find the requested pages on the live web. About 65% of the requested archived pages no longer exist on the live web. We find that more than 82% of human sessions connect to the Wayback Machine via referrals from other web sites, while only 15% of robots have referrers. Most of the links (86%) from websites are to individual archived pages at specific points in time, and of those 83% no longer exist on the live web.
archiving  linkrot 
april 2014 by juliusbeezer
Keeping Bits Safe - ACM Queue
Our inability to compute how many backup copies we need to achieve a reliability target is something we're just going to have to live with. In practice, we aren't going to have enough backup copies, and stuff will get lost or damaged. This should not be a surprise, but somehow it is. The fact that bits can be copied correctly leads to an expectation that they always will be copied correctly, and then to an expectation that digital storage will be reliable. There is an odd cognitive dissonance between this and people's actual experience of digital storage, which is that loss and damage are routine occurrences.22

The fact that storage isn't reliable enough to allow us to ignore the problem of failures is just one aspect of a much bigger problem looming over computing as it continues to scale up. Current long-running petascale high-performance computer applications require complex and expensive checkpoint and restart schemes, because the probability of a failure during execution is so high that restarting from scratch is infeasible. This approach will not scale to the forthcoming generation:
archiving  repositories  linkrot  internet 
april 2014 by juliusbeezer
DSHR's Blog: A Petabyte For A Century
I started using the example of keeping a petabyte of data for a century to illustrate the problem. This post expands on my argument.

Lets start by assuming an organization has a petabyte of data that will be needed in 100 years. They want to buy a preservation system good enough that there will be a 50% chance that at the end of the 100 years every bit in the petabyte will have survived undamaged. This requirement sounds reasonable, but it is actually very challenging.
archiving  internet 
april 2014 by juliusbeezer
DSHR's Blog: DAWN vs. Twitter
A few years ago I reviewed the research on the costs of digital preservation and concluded that, broadly speaking, the breakdown was half to ingest, one third to preservation, and one sixth to access. The relatively small proportion devoted to access was in line with studies of access patterns to preserved data, such as that by my co-author Ian Adams at UCSC, which showed that readers accessed them infrequently. The bulk of the accesses were for integrity checks...
by combining very low-power CPUs with modest amounts of flash storage it was possible to build a network of much larger numbers of much smaller nodes that could process key-value searches as fast as existing hard-disk-based networks at 2 orders of magnitude less power. They called this concept FAWN, for Fast Array of Wimpy Nodes. By analogy, we proposed DAWN, for Durable Array of Wimpy Nodes.
archiving  repositories  internet  twitter 
april 2014 by juliusbeezer
« earlier      
per page:    204080120160

Copy this bookmark:

to read