recentpopularlog in

juliusbeezer : search   125

« earlier  
Get The Research
Latest Priem production: search engine for peer reviewed literature with open access button
search  openaccess  sciencepublishing 
may 2019 by juliusbeezer
Journal / Author Name Estimator
Have you recently written a paper, but you're not sure to which journal you should submit it? Or maybe you want to find relevant articles to cite in your paper? Or are you an editor, and do you need to find reviewers for a particular paper? Jane can help!

Just enter the title and/or abstract of the paper in the box, and click on 'Find journals', 'Find authors' or 'Find Articles'. Jane will then compare your document to millions of documents in PubMed to find the best matching journals, authors or articles.
editing  journals  search 
january 2019 by juliusbeezer
New Tool for Open-Access Research
A new search engine that aims to connect nonacademics with open-access research will be launched this fall.

Get the Research will connect the public with 20 million open-access scholarly articles. The site will be built by Impactstory -- the nonprofit behind browser extension tool Unpaywall -- in conjunction with the Internet Archive and the British Library.

Funded by a $850,000 grant from Arcadia, the search engine will be a place where “we can tell lay readers, ‘here’s where you can read free, trustworthy research about anything,’” said Jason Priem, Impactstory's co-founder. He added that artificial intelligence techniques will be used to annotate and summarize materials, making them easier to understand.
search  searchengines  openaccess  scholarly 
july 2018 by juliusbeezer
Search this database for inactive patents that are now in the public domain |
As anyone trying to innovate in the open source space can tell you, patents are nearly useless. However, Michigan Tech has released a free inactive patent search for finding public domain intellectual property in the hope of fostering innovation in the open source arena.

Patents were initially written into the U.S. constitution to enhance innovation. They would enable inventors to earn a return on investment for their efforts in creating new useful artifacts using a 20-year intellectual monopoly. In exchange, the nation benefited from access to the "intellectual property" after 20 years. Back when the most effective means of transportation was the horse, a 20-year innovation cycle did not seem that abhorrent. Since then, the rate of innovation has accelerated substantially. Consider what your 20-year-old cell phone looked like if you even had one. How about a 20-year-old computer? Many authors now argue that patenting actually slows technological progress overall (regardless of whether we are talking about software patents or nanotechnology patents). The patent system is simply broken.
patents  search 
february 2018 by juliusbeezer
The Lens
The Lens hosts more than 100 million patent records from over 95 different jurisdictions. Our patent searching capability allows use of advanced boolean functions, structured search, biological search, and classification search options to find the most relevant and important patent. Our analysis functions and faceted exploration tools allow drilling down and discovery of new insights, and sharing or embedding these in any website. Learn More
patents  search 
february 2018 by juliusbeezer
Three tips for a content marketing plan that makes your customers the central focus | Search Engine Watch
Here are three tips and tools to help you create a customer-focused content marketing strategy.
1. Be clear on content objectives
search  marketing 
january 2018 by juliusbeezer
Google's true origin partly lies in CIA and NSA research grants for mass surveillance — Quartz
In 1995, one of the first and most promising MDDS grants went to a computer-science research team at Stanford University with a decade-long history of working with NSF and DARPA grants. The primary objective of this grant was “query optimization of very complex queries that are described using the ‘query flocks’ approach.” A second grant—the DARPA-NSF grant most closely associated with Google’s origin—was part of a coordinated effort to build a massive digital library using the internet as its backbone. Both grants funded research by two graduate students who were making rapid advances in web-page ranking, as well as tracking (and making sense of) user queries: future Google cofounders Sergey Brin and Larry Page.

The two intelligence-community managers charged with leading the program met regularly with Brin as his research progressed, and he was an author on several other research papers that resulted from this MDDS grant before he and Page left to form Google.

The grants allowed Brin and Page to do their work and contributed to their breakthroughs in web-page ranking and tracking user queries. Brin didn’t work for the intelligence community—or for anyone else. Google had not yet been incorporated. He was just a Stanford researcher taking advantage of the grant provided by the NSA and CIA through the unclassified MDDS program.
google  search  us  politics 
december 2017 by juliusbeezer
tldr | simplified, community driven man pages
This is a web client for a project called tldr-pages; they are a community effort to simplify the beloved man pages with practical examples.
tools  unix  linux  search  learning 
november 2017 by juliusbeezer
Google makes it harder to search for results from other countries
For a long time, there was an easy way to conduct a Google search in a country other than the one you're in. If you wanted to get results specific to Japan, for instance, you would visit; to get Australian results you would visit -- but this trick no longer works.

Google has announced that it will now always serve up results that are relevant to the country that you're in, regardless of the country code top level domain names (ccTLD) you use. The reason given is a little bizarre.

The search giant says that the change has been introduced because of the way people are using the search engine these days. It says: "around one in five searches on Google is related to location, so providing locally relevant search results is an essential part of serving you the most accurate information."
google  search  searchengines 
october 2017 by juliusbeezer
Learn more about OpenTrials
Open Knowledge is developing Open Trials, an open, online database of information about the world’s clinical research trials. We are funded by The Laura and John Arnold Foundation through the Center for Open Science. The project, which is designed to increase transparency and improve access to research, will be directed by Dr. Ben Goldacre, an internationally known leader on clinical transparency.

OpenTrials is building a collaborative and open linked database for all available structured data and documents on all clinical trials, threaded together by individual trial. With a versatile and expandable data schema, it is initially designed to host and match the following documents and data for each trial:

Registry entries
Links, abstracts, or texts of academic journal papers
Portions of regulatory documents describing individual trials
Structured data on methods and results extracted by systematic reviewers or other
Clinical Study Reports
Additional documents such as blank consent forms, blank case report forms, and protocols

The intention is to create an open, freely re-usable index of all such information, to increase discoverability, facilitate research, identify inconsistent data, enable audits on the availability and completeness of this information, support advocacy for better data and drive standards around open data in evidence-based medicine.
science  medicine  sciencepublishing  openmedicine  openscience  research  search 
october 2017 by juliusbeezer
September 2017 Crawl Archive Now Available – Common Crawl
The crawl archive for September 2017 is now available! The archive is located in the commoncrawl bucket at crawl-data/CC-MAIN-2017-39/. It contains 3.01 billion web pages and over 250 TiB of uncompressed content.
internet  search  searchengines  database 
september 2017 by juliusbeezer
search - ubuntu: find all files changed at certain date - Stack Overflow
you can use a little trick. Use the touch command to create two files with a modification date that includes the date you're searching for.


touch -t 201204080000 dummy1

touch -t 201204082359 dummy2

then you can use find as follows:

find /somedir/ ( -newer dummy1 -and ! -newer dummy2 )

This should work out well.
search  unix  linux  tools 
august 2017 by juliusbeezer
Quote for Today from Paul Feyerabend | The Professor's Notes
While the political scientist in me as a rule stops listening when I hear someone is an “anarchist” the use of the word in this case carries far different baggage. That said, here’s the quote from his introduction, page 2:..
In cases where the scientists’ work affects the public it even should participate: first, because it is a concerned party (many scientific decisions affect public life); secondly, because such participation is the best scientific education the public can get–a full democratization of science (which includes the protection of minorities such as scientists) is not in conflict with science.
science  philosophy  search  citation  publishing 
may 2017 by juliusbeezer
Official Google Webmaster Central Blog: What Crawl Budget Means for Googlebot
Crawling is the entry point for sites into Google's search results. Efficient crawling of a website helps with its indexing in Google Search. Q: Does site speed affect my crawl budget? How about errors? A: Making a site faster improves the users' experience while also increasing crawl rate. For Googlebot a speedy site is a sign of healthy servers, so it can get more content over the same number of connections. On the flip side, a significant number of 5xx errors or connection timeouts signal the opposite, and crawling slows down.
search  searchengines  google 
january 2017 by juliusbeezer About is a one-stop ecommerce search engine that searches over 150 million books for sale—new, used, rare, out-of-print, and textbooks. We save you time and money by searching every major catalog online, and letting you know which booksellers are offering the best prices and selection. When you find a book you like, you can buy it directly from the original seller; we never charge a markup.

The website is part of the network, produced by a team of high-tech librarians and programmers based in Berkeley, California, and Düsseldorf, Germany. We are heavy readers, and buy several dozen books every year using our own search engine. We enjoy blogging about our work, and advocating for a strong, diverse, bookselling industry. was launched in 1997 by then-19-year-old UC Berkeley undergraduate Anirvan Chatterjee (personal website). Over the years, both users and the press have discovered why we are one of the most useful resources for bibliophiles online. Whether you collect rare books or buy cheap paperbacks to read on the train, we think you will appreciate our breadth, precision, and unbiased results.
bookselling  search 
december 2016 by juliusbeezer
How 4 public radio stations in California collaborated to cover the election – Poynter
After the primary, we parsed out that data to find out what pages were most frequented in the guide. We found that people were searching L.A. County Superior Court Judge candidates more than any other race in the state. That’s because it’s nearly impossible to find thorough info on those candidates elsewhere. Our voter guide was a start but we could do more.

As a result, KPCC’s daily magazine show Take Two produced a series called “Meet the Judges,” in which they profiled each of the eight candidates. The individual segment pages saw 58,901 unique users. That rivals metrics on KPCC’s voter guide, which saw 74,238 unique users during that same time period (Oct. 18 to Nov. 8).
search  journalism  politics 
december 2016 by juliusbeezer
Reflecting on the Right to be Forgotten
But some Data Protection Authorities argued that people could still find delisted links by searching on a non-European version of Google such as So in March 2016, in response to the concerns of a number of Data Protection Authorities, we made some changes. As a result, people using Google from the same country as the person who requested the removal can no longer find the delisted link, even on,, or

But one Data Protection Authority, the French Commission Nationale de l'Informatique et des Libertés (the CNIL), has ordered Google to go much further, effectively instructing us to apply the French balance between privacy and free expression in every country by delisting French right to be forgotten removals for users everywhere. Ultimately, we might have to implement French standards on Google search sites from Australia ( to Zambia ( and everywhere in between. And any such precedent would open the door to countries around the world, including non-democratic countries, to demand the same global power.
privacy  search  google  france 
december 2016 by juliusbeezer
Doing a Quick Literature Review – Advice for authoring a PhD or academic book – Medium
Instead of freezing our understanding of a field at one time, often indeed a time when we least understand the field, we should see the literature review as a repeated component of any ongoing research. We need more agile ways to surface other relevant research at every stage of our thinking and ‘writing up’, not just at the outset. We also need to consider how researchers actually work now, which is not very well presented by most institutional advice webpages or courses, generally produced by librarians I think, rather than by creative researchers themselves.
search  scholarly  notetaking  reviews 
november 2016 by juliusbeezer
How Google Search Really Works - ReadWrite
As we begin to answer a certain kind of query, people ask more of them. 16% to 20% of queries that get asked every day have never been asked before. We estimate that there have been about 450 billion unique queries asked of Google since 2003. It’s a pretty staggering number.

What makes it really challenging are the things you’ve never seen before, and yet you have to be prepared to answer. And as we get good at answering those things, people will ask us yet new things that we’ve never seen before.

Queries get longer and more complex, and a lot of the ranking changes are handling that length and complexity. That progress happens relatively silently, right? A query that didn’t work yesterday worked today, and a more complex query tomorrow won’t work, and it will work the day after. So that’s the nature of the progress.

We’re also going down the path of understanding entities with things like Freebase [an open structured database of semantic information acquired by Google in 2010]. Understanding the relationships between things.
google  search  searchengines  semantic 
november 2016 by juliusbeezer
Google Search Statistics - Internet Live Stats
Google now processes over 40,000 search queries every second on average (visualize them here), which translates to over 3.5 billion searches per day and 1.2 trillion searches per year worldwide. The chart below shows the number of searches per year throughout Google's history:
google  search  searchengines 
november 2016 by juliusbeezer
Confessions of a Google Spammer
He took the stage with a big smile, introduced himself and then proceeded with this presentation called “F#$%! Link Building”...
I swore I wouldn’t let Google make us stop hustlin’. But Rand was right. Within 2 months, our entire network of 5,000+ blogs — for which we paid more than $80,000 — was deindexed, dead, simply kaput. Our $100k/mo business was ruined.
search  google  journalism  spam 
november 2016 by juliusbeezer
The Future of Search | chad wellmon
Furthermore, Google’s PageRank technology makes no claims about the internal content of the pages it tracks. It makes no claims about that content’s truth. e value or worth that PageRank measures is the importance of a website as determined by other websites. PageRank measures how well connected the New York Times website is—its popularity, not the accuracy of its information. In fact, some gossip websites have far higher PageRank scores than many other more accurate sites. PageRank levels the standards of legitimacy so that traditional notions of epistemic authority—expertise, cultural and social capital, scholarly peer review—have little place in its calculations.

For some, such as Michael Lynch, a professor of philosophy at the University of Connecticut, Google PageRank represents everything that is wrong with knowledge in the digital age. It is central to what he calls in The Internet of Us “Google-knowing,” not just the way we use Google’s search engine but the way “we are increasingly dependent on knowing” by means of it and other digital technologies. Although Lynch acknowledges the ample benefits of such technologies, he worries that our increasing reliance on them will ultimately “undermine” and weaken other ways of knowing. He is concerned in particular about how “Google-knowing” impedes ways of knowing that require “taking responsibility for our own beliefs” and understanding how “information fits together.”
google  search  philosophy 
november 2016 by juliusbeezer
Official Google Blog: A remedy for your health-related questions: health info in the Knowledge Graph
when my infant son Veer fell off a bed in a hotel in rural Vermont, and I was concerned that he might have a concussion. I wasn’t able to search and quickly find the information I urgently needed (and I work at Google!). Thankfully my son was OK, but the point is this stuff really matters: one in 20 Google searches are for health-related information. And you should find the health information you need more quickly and easily...
We worked with a team of medical doctors (led by our own Dr. Kapil Parakh, M.D., MPH, Ph.D.) to carefully compile, curate, and review this information. All of the gathered facts represent real-life clinical knowledge from these doctors and high-quality medical sources across the web, and the information has been checked by medical doctors at Google and the Mayo Clinic for accuracy.
search  google  health  healthcare  medicine 
november 2016 by juliusbeezer
Introducing oaDOI: resolve a DOI straight to OA - Impactstory blog
Most papers that are free-to-read are available thanks to “green OA” copies posted in institutional or subject repositories. The fact these copies are available for free is fantastic because anyone can read the research, but it does present a major challenge: given the DOI of a paper, how can we find the open version, given there are so many different repositories?screen-shot-2016-10-25-at-9-07-11-am

The obvious answer is “Google Scholar” 🙂 And yup, that works great, and given the resources of Google will probably always be the most comprehensive solution. But Google’s interface requires an extra search step, and its data isn’t open for others to build tools on top of.

We made a thing to fix that. Introducing oaDOI:

DOI gets you a paywall page:
oaDOI gets you a PDF:

We look for open copies of articles using the following data sources:
oa  openaccess  search  repositories 
october 2016 by juliusbeezer
OpenTrials: towards a collaborative open database of all available information on all clinical trials | Trials | Full Text
Hosting a broad range of data and documents presents some challenges around curation, especially because different sources of structured data will use different formats and different dictionaries. Although we will exploit available mapping between different data schemas and dictionaries, we do not expect to necessarily make all sources of all structured data on all trials commensurable and presentable side by side. For example, intervention may be described in free text or as structured data using various different dictionaries, and even sample size may be labelled in different ways in different available datasets, not all of which can necessarily be parsed and merged. For simplicity, we are imposing a series of broad categories as our top-level data schema, following the list given above. This is best thought of as a thread of documents on a given trial, where a “document” means either an actual physical document (such as a consent form or a trial report) or a bundle of structured data for a trial (such as the structured results page from a entry in XML format or a row of extracted data with accompanying variable names for a systematic review). This is for ease of managing multiple data sources, providing multiple bundles of structured data about each trial in multiple formats, each of which may be commonly or rarely used.
openmedicine  opendata  sciencepublishing  search  terminology  Dictionary  OA 
october 2016 by juliusbeezer
Medical Subject Headings - Home Page
Welcome to Medical Subject Headings!
The NLM's curated medical vocabulary resource.

Our main purpose is to provide a hierarchically-organized terminology for indexing and cataloging of biomedical information such as MEDLINE/PUBmed and other NLM databases. We also distribute pharmaceutical information through our RxNorm database, and manage the curation of the UMLS and SnoMed database.
search  indexing  editing  sciencepublishing 
october 2016 by juliusbeezer
6 Ways to Use Advanced Twitter Search for Increased Influence : Social Media Examiner
Using Twitter’s Advanced Search to monitor keywords and conversations can help you connect with influencers and uncover opportunities for thought leadership.

In this article you’ll discover six ways to use Twitter’s Advanced Search to increase your influence in your industry.
twitter  search 
september 2016 by juliusbeezer
dtSearch – Text Retrieval / Full Text Search Engine
Instantly Search Terabytes of Text
Includes Document Filters
• The dtSearch product line can instantly search terabytes of text across a desktop, network, Internet or Intranet site.
• dtSearch products also serve as tools for publishing, with instant text searching, large data collections to Web sites or portable media.
• Developers can embed dtSearch’s instant searching and file format support into their own applications.
search  tools  text_tools 
september 2016 by juliusbeezer
How the internet flips elections and alters our thou...
We predicted that the opinions and voting preferences of 2 or 3 per cent of the people in the two bias groups – the groups in which people were seeing rankings favouring one candidate – would shift toward that candidate. What we actually found was astonishing. The proportion of people favouring the search engine’s top-ranked candidate increased by 48.4 per cent, and all five of our measures shifted toward that candidate. What’s more, 75 per cent of the people in the bias groups seemed to have been completely unaware that they were viewing biased search rankings. In the control group, opinions did not shift significantly.

This seemed to be a major discovery. The shift we had produced, which we called the Search Engine Manipulation Effect (or SEME, pronounced ‘seem’),
search  politics  google  facebook  psychology 
march 2016 by juliusbeezer
Power Searching with Google - Course
Google Search makes it amazingly easy to find information. Come learn about the powerful advanced tools we provide to help you find just the right information when the stakes are high.

NOTE: The course is now OPEN!
MOOC  google  search 
february 2016 by juliusbeezer
How To Configure YaCy as an Alternative Search Engine or Site Search Tool | DigitalOcean
YaCy is a project meant to fix the problem of search engine providers using your data for purposes you did not intend. YaCy is a peer-to-peer search engine, meaning that there is no centralized authority or server where your information is stored. It works by connecting to a network of people also running YaCy instances and crawling the web to create a distributed index of sites.

In this guide, we will discuss how to get started with YaCy on an Ubuntu 12.04 VPS instance.
search  ubuntu  linux  freesoftware  searchengines 
february 2016 by juliusbeezer
Introducing Aleph: everything companies tell investors, in one place : OpenOil
Can the world understand oil and mining companies as well as their investors do? We think it’s possible, and we’ve been hard at work building a tool to make it so. Our goal is: everything extractives companies tell investors, in one place.

The result is Aleph, our new search tool, which we are presenting an early version of today.

We’ve taken the documents filed by oil, gas and mining companies in several of the major jurisdictions. We’ve pulled out the full text of each document, indexed it, and made the whole lot available for anyone to search.

These documents contain every important piece of information from extractives companies. That’s not a boast, it’s a simple matter of regulation.
energy  search  searchengines  oil 
december 2015 by juliusbeezer
New Gmail Search Operators
Gmail added a lot of new search operators. Now you can finally filter messages by size, find old messages and mail that has no label.
tools  email  gmail  search 
october 2015 by juliusbeezer
Complete Guide to YouTube Optimization: Everything You Need to Know
Now that your channel is all set, let’s move on to optimizing your videos themselves.
youtube  video  search 
august 2015 by juliusbeezer
The digital language divide
which articles relate to different places in separate language editions on Wikipedia. The dominant language – English – has the densest information and greatest geographical spread. However, if you explore what the world looks like if you speak Hebrew or Arabic, a very different picture is painted. There are huge information vacuums in non-dominant languages, where people, places and cultures are swallowed into the dark...
In a case study of the West Bank, searching for “restaurant” locally in Hebrew, Arabic and English brought back different results for each language... only 11% of people are multilingual on Twitter (pdf), and 15% on ..Wikipedia (pdf), these multilingual individuals are more active, writing more tweets and creating and editing more Wikipedia content. These people, he believes, could potentially challenge the Balkanisation of information and discussion online. Whether it is translating and bringing foreign concepts into different language editions on Wikipedia, or moving breaking local news stories to new language communities and different geographies, they have the power to be influential.
internet  language  web  search  google  wikipedia  exclusion 
june 2015 by juliusbeezer
1. The Problem | The One Repo blog
Although these repositories in aggregate make an enormous amount of research freely available, the fragmentation of this knowledge across 4000 repositories makes much of it effectively undiscoverable, and therefore useless. In practice, IRs form an archipelago of isolated islands rather than a continent of discoverable knowledge.
scholarly  search  repositories  openaccess 
may 2015 by juliusbeezer
Google wants to rank websites based on facts not links - 28 February 2015 - New Scientist
Google's search engine currently uses the number of incoming links to a web page as a proxy for quality, determining where it appears in search results. So pages that many other sites link to are ranked higher. This system has brought us the search engine as we know it today, but the downside is that websites full of misinformation can rise up the rankings, if enough people link to them.

A Google research team is adapting that model to measure the trustworthiness of a page, rather than its reputation across the web. Instead of counting incoming links, the system – which is not yet live – counts the number of incorrect facts within a page. "A source that has few false facts is considered to be trustworthy," says the team ( The score they compute for each page is its Knowledge-Based Trust score.

The software works by tapping into the Knowledge Vault, the vast store of facts that Google has pulled off the internet. Facts the web unanimously agrees on are considered a reasonable proxy for truth.
google  search  searchengines  ontology 
march 2015 by juliusbeezer
Organize Your Life with Nepomuk | Linux Journal
Stuart Jarvis is a scientist and member of KDE's Marketing Working Group. He divides his time between losing data files, graphs and papers and finding them again
linux  search  funny 
march 2015 by juliusbeezer
Physician guidelines for Googling patients need revisions -- ScienceDaily
in what circumstances is it appropriate for a doctor to research a patient using online search engines?

"Googling a patient can undermine the trust between a patient and his or her provider, but in some cases it might be ethically justified," Baker says. "Healthcare providers need guidance on when they should do it and how they should deal with what they learn."

With regard to future guidelines, Baker and her co-authors suggest 10 situations that may justify patient-targeted Googling:
medicine  ethics  search  google  confidentiality 
march 2015 by juliusbeezer
Yelp Product & Engineering Blog | Analyzing the Web For the Price of a Sandwich
I geek out about the Common Crawl. It’s an open source crawl of huge parts of the Internet, accessible for anyone to use. You have full access to the HTML and text of billions of web pages. What’s more, you can scan the entire thing, tens of terabytes, for just a few bucks on Amazon EC2. These days they’re releasing a new dataset every month. It’s awesome.
internet  search  tools  software  programming  python 
march 2015 by juliusbeezer
Google Feud
Fun quizzes based on common google search requests
google  search 
march 2015 by juliusbeezer
An online community that deletes itself once it's indexed by Google - Boing Boing
Unindexed is an online community that anyone can contribute to; it runs a back-end process that continuously scours Google for signs that it has been indexed, and securely erases itself once it discovers evidence of same.
search  google  funny  internet  web  linkrot 
march 2015 by juliusbeezer
What a beheading feels like: The science, the gruesome spectacle — and why we can’t look away -
The al-Qaeda-linked site that first posted the video was closed down by the Malaysian company that hosted it two days after Berg’s execution because of the overwhelming traffic to the site. Alfred Lim, senior officer of the company, said it had been closed down ‘because it had attracted a sudden surge of massive traffic that is taking up too much bandwidth and causing inconven­ience to our other clients’...
The Berg beheading footage remained the most popular internet search in the United States for a week, and the second most popular throughout the month of May, runner up only to ‘American Idol.’
‘The point of terrorism is to strike fear and cause havoc – and that doesn’t happen unless you have media to support that action and show it to as many people as you can,’ said one analyst interviewed by the Los Angeles Times shortly after Nick Berg’s execution. These mur­derers post their videos on the internet because they know that the news media will be forced to follow the crowd. Television news pro­grammes either become redundant by refusing to air videos that are freely available online, or else they do exactly what the murderers want and show the footage to a wider audience.
internet  search  video  crime  law  bandwidth  networking  networktheory  television  attention 
february 2015 by juliusbeezer
Constantly tweaking: How The Guardian continues to develop its in-house analytics system » Nieman Journalism Lab
Among the more recent additions to Ophan is a function that will email staffers when certain traffic conditions are met, Ian Saleh, The Guardian’s audience development editor explained, showing off his own email alerts....
Among his long list of alerts, Saleh will get an email when a page on The Guardian US site gets more than 1,000 views per minute via a referral from the Drudge Report for at least one minute or when a page gets at least 500 views per minute from an unknown site for a minute or more.
guardian  journalism  internet  search 
january 2015 by juliusbeezer
Never trust a corporation to do a library’s job — The Message — Medium
In 2001, Google made their first acquisition, the Deja archives. The largest collection of Usenet archives, Google relaunched it as Google Groups, supplemented with archived messages going back to 1981.
In 2006, Google News Archive launched, with historical news articles dating back 200 years. In 2008, they expanded it to include their own digitization efforts, scanning newspapers that were never online.

In the last five years, starting around 2010, the shifting priorities of Google’s management left these archival projects in limbo, or abandoned entirely.

After a series of redesigns, Google Groups is effectively dead for research purposes. The archives, while still online, have no means of searching by date.
google  search  archiving  internet 
january 2015 by juliusbeezer
Google Scholar Help
Crawl Guidelines

Google Scholar uses automated software, known as "robots" or "crawlers", to fetch your files for inclusion in the search results. It operates similarly to regular Google search. Your website needs to be structured in a way that makes it possible to "crawl" it in this manner. In particular, automatic crawlers need to be able to discover and fetch the URLs of all your articles, as well as to periodically refresh their content from your website.

1. File formats

Your files need to be either in the HTML or in the PDF format. PDF files must have searchable text, i.e., you must be able to search for and find words in the document using Adobe Acrobat Reader.

Each file must not exceed 5MB in size. To index larger files, or to index scanned images of pages that require OCR, please upload them to Google Book Search.

[via Ross Mounce]
search  google  scholarly 
january 2015 by juliusbeezer
Common Crawl - FAQs
Our mission is to democratize access to web information by producing and maintaining an open repository of web crawl data that is universally accessible. We store the crawl data on Amazon’s S3 service, allowing it to be bulk downloaded as well as directly accessed for map-reduce processing in EC2.
search  web 
january 2015 by juliusbeezer
Is Google Now a Publisher Offering Other Publishers an Inadequate Deal? | The Scholarly Kitchen
Google has always had “the ability to use editorial judgment to modify search results.” We humble users would wish it so: the price of decent search results is eternal vigilance against SEO, content farms, spammers, etc etc, not to mention paywall publishers who want to show up in search, but then don’t want to abide by the convention of the web that the content then be freely downloadable;(So-called “cloaking”).

I don’t know if you remember the days of AskJeeves, OpenDirectory, and AltaVista, Kent, often bizarrely irrelevant results, with spammy paid links indistinguishably mixed in–I seem to remember Coors bought the word ‘beer’ on one search engine–and all that after a wait over dialup.

The appearance of Google beta search in 1999 quite literally transformed the utility of the web. And if another search engine comes along that serves our needs better, we are but one click away from changing our allegiance, and Google knows it. I for one would love to use a search engine that strongly deprecated in its search rankins any publisher not in conformance with an ideal web: presenting open access, freely and fully downloadable, archivable content, alongside a responsive and honest commenting system. Life is short, I have no shortage of materials to read, and I’d rather favour those that play nicely with my attention. I certainly don’t want my search results spammed with paywalled stuff I can’t afford and won’t be buying. Keep it for your hundreds of subscribers!

So I agree with Mike: if you find your old business model isn’t working on the web, remove your content: the rest of us will just have to get by on the few crumbs that are left.
search  jbcomment  google  publishing 
january 2015 by juliusbeezer
Homicide Watch D.C. uses clues in site search queries to ID homicide victim | Poynter.
Laura Amico, editor of Homicide Watch D.C., describes how she used site analytics to identify a homicide victim — again. Early Sunday morning, she saw a police department news alert stating that a juvenile male had been killed. She wrote an initial post. When she looked at Google Analytics, she saw a few different search queries that seemed to be related to the killing: People were searching for information on a killing the night before on the same street as in the news alert. After an hour of searching on Twitter and Facebook, she thought she had found the victim, a 17-year-old with a name similar to the one people were searching for.
search  google  twitter  facebook  journalism 
december 2014 by juliusbeezer
The Convenience Factor in Information Seeking
While participants saw information evaluation as important, the most important factors in determining what resources they used were the amount of weight given to an assignment and the time allocated to work on it... What really struck me, though, was the emphasis of study respondents on the convenience factor. “Convenience trumps all other reasons for selecting and using a source.” The anticipated amount of trouble it would take to find what people wanted was the major determinant of what sort of search tool they would use... I’m concerned, because, if I have anything resembling an information literacy philosophy, it is this: Dumbing down the research process in the interest of convenience is almost always a poor choice, especially when we have the option of educating researchers to excel. Search engines promote convenience. In fact, convenience is their main marketing tool. The resulting elephant in the room is the fact that convenience only breeds a desire for more convenience, not greater skill. Despite our efforts, information seekers continue to prefer Google to library databases...So maybe we should move our information literacy efforts into our users’ arena by teaching them how to optimize Google and Google Scholar, find dissertation archives, and create good search strategies. I already do that in my graduate credit courses but with a crafty twist: First, I have students use the library’s databases to do a search assignment. Then they do the same assignment with Google Scholar. Their reaction? Almost universally, it is that Google Scholar sucks.
informationmastery  search  google  scholarly 
december 2014 by juliusbeezer
Google Scholar pioneer on search engine’s future : Nature News & Comment
'Scholarly' is what everybody else in the scholarly field considers scholarly. It sounds like a recursive definition but it does settle down. We crawl the whole web, and for a new blog, for example, you see what the connections are to the rest of scholarship that you already know about. If many people cite it, or if it cites many people, it is probably scholarly. There is no one magic formula: you bring evidence to bear from many features.
scholarly  google  search  searchengines 
november 2014 by juliusbeezer
The Gentleman Who Made Scholar — Backchannel — Medium
“It’s pretty much everything — every major to medium size publisher in the world, scholarly books, patents, judicial opinions, small, most small journals…. It would take work to find something that’s not indexed.” (One serious estimate places the index at 160 million documents as of May 2014.) But like it or not, the niche reality was reinforced after Larry Page took over as CEO in 2011, and adopted an approach of “more wood behind fewer arrows.” Scholar was not discarded — it still commands huge respect at Google which, after all, is largely populated by former academics—but clearly shunted to the back end of the quiver.
google  scholarly  search  searchengines  sciencepublishing 
october 2014 by juliusbeezer
Devonian Times » We’re opting out of Google Analytics
In the last few years we used Google Analytics on our website. It helps us to learn more about our visitors, which pages they look at, and where they come from. But, being a Google product, Analytics also comes with many privacy concerns. Starting today we’ve removed all Analytics code from our website and replaced it with Piwik.

Contrary to Analytics, which runs on Google’s servers, Piwik is installed locally. We host it, we own the database it uses, and we control the data it collects. It doesn’t share any data with anyone outside of our company. Also, Piwik respects when you don’t want to be tracked (click here to learn how to activate this feature of modern web browsers).

Please feel free to install the excellent and free Ghostery web browser extension, too.
search  google  privacy  surveillance  software  tools 
september 2014 by juliusbeezer Tool For Thought
So the proper unit for this kind of exploratory, semantic search is not the file, but rather something else, something I don't quite have a word for: a chunk or cluster of text, something close to those little quotes that I've assembled in DevonThink. If I have an eBook of Manual DeLanda's on my hard drive, and I search for "urban ecosystem" I don't want the software to tell me that an entire book is related to my query. I want the software to tell me that these five separate paragraphs from this book are relevant. Until the tools can break out those smaller units on their own, I'll still be assembling my research library by hand in DevonThink.
search  evocatext  writing  tools  software 
september 2014 by juliusbeezer
Ask Slashdot: Software To Organise a Heterogeneous Mix of Files? - Slashdot
I've had a quick glance at Evernote, thebrain, Nepomuk (I'm loving KDE4 so far after switching a week or so ago), OpenKM and FreeMind and these seem promising. I've still to look at emacs' org-mode, and when I do I will try to put my vi prejudices aside ;-) Some of the other suggestions are rather good but aren't really what I'm looking for as they are either fully cloud-based (eg Google Docs, Wave) or one platform only (eg Sharepoint) or too expensive (hire a secretary...
KDE's Dolphin file manager, coupled with Akonadi and Strigi (built-in, and seamlessly integrated) does everything that you are asking for
tools  search  linux  commenting 
september 2014 by juliusbeezer
Inside DuckDuckGo, Google's Tiniest, Fiercest Competitor ⚙ Co.Labs ⚙ code + community
very year, we've grown 200-500%," Weinberg says. "The numbers keep getting bigger." As of early February, DuckDuckGo was seeing more than 4 million search queries per day. One year ago, that number had just barely broken 1 million.
search  duckduckgo  privacy  goldsmith  google 
september 2014 by juliusbeezer
Important Facts about Unbubble
Unbubble is a search engine that delivers particularly neutral search results. For this purpose, we pull results from many search engines simultaneously, and assess the neutrality of these sources. Search engines that use other sources like Unbubble are called “meta search engines”. The special thing about Unbubble is that we have developed our system from the very beginning to be neutral and store very little data.
search  searchengines  web  privacy  surveillance  europe 
july 2014 by juliusbeezer
Labtimes: Interviews with: Altmetrics specialist Martin Fenner
Altmetrics can help with this discovery process but for the most part, these kinds of tools are still missing. It is one of the things that is on our development roadmap for the next 12 months.

PLoS ONE publishes about 3,000 papers a month. Even if you filter by subject area, you cannot read a table of contents that long. One way to improve this is to use article-level metrics as a filter, to generate a table of contents small enough for people to read it every week.

At PLoS ONE, you're saying you're suffering from information overload?

Fenner: Absolutely. PLoS ONE publishes everything that is solid science and doesn't pay any attention to the perceived impact a submitted manuscript will have. One consequence is obviously that PLoS ONE publishes a lot of papers, another one that it publishes good papers, excellent papers - but also not so good papers. PLoS ONE makes it easier for the author to publish but it makes it more difficult for the reader to find the most relevant content - a reader can't read all papers he finds interesting in PLoS ONE, he needs tools to help him find the most relevant papers.
sciencepublishing  scholarly  altmetrics  search  overlay  blogs  internet 
july 2014 by juliusbeezer
EU's right to be forgotten: Guardian articles have been hidden by Google | James Ball | Comment is free |
The Guardian has no form of appeal against parts of its journalism being made all but impossible for most of Europe's 368 million to find. The strange aspect of the ruling is all the content is still there: if you click the links in this article, you can read all the "disappeared" stories on this site. No one has suggested the stories weren't true, fair or accurate. But still they are made hard for anyone to find.
search  censorship  eu  google 
july 2014 by juliusbeezer
How to do Twitter research on a shoestring | Poynter.
That source of reliable, inexpensive online access to the Twitter firehose has become almost a Holy Grail for journalism professors in the U.S. and Canada who I surveyed this June using a Google form.
twitter  search  archiving 
june 2014 by juliusbeezer
On MetaFilter Being Penalized By Google: An Explainer
Stories such as what’s happening with MetaFilter aren’t new. Google’s penalties have hit sites small and large for years. But often when those sites are hit, there’s something in them that doesn’t draw a great deal of sympathy.

You can (and I have) dig into some “small business” that claims to have done absolutely nothing wrong only to discover they’d been buying links or doing other things that many would agree were unsavory. Last week, I spent several hours looking into one such case that at first seemed all innocent but turned out to have layers and layers of garbage.

As for big businesses, after many called for Google to do something about “content farms,”
google  search  censorship  attention 
june 2014 by juliusbeezer
The decline and fall of Microsoft Academic Search : Nature News Blog
A team of Spanish researchers who study science communication at the University of Granada, led by Emilio Delgado López-Cózar, decided to compare Google Scholar and MAS. They discovered — to their surprise — that Microsoft’s product had been failing to efficiently index scholarly documents since around 2011.
search  google  scholarly 
may 2014 by juliusbeezer
Google Penalizes Bad Machine Translation - K International
Google classifies automatically translated content as “automatically generated content,” which violates their webmaster guidelines.

That means that poorly translated content could seriously impact your rankings. Also, as Ariel Hochstadt pointed out in Search Engine Land, if you’ve monetized your site using AdSense, your account could be disabled for including “websites with gibberish content that makes no sense or seems auto-generated.”

Ironically, Google itself has started using automatically generated content on its own properties, like the Google Play store. However, as Search Engine Land points out, it appears that Google is using some sort of new and improved Google Translate that’s not available to the general public.
google  search  translation  web 
may 2014 by juliusbeezer
Spotify: how a busy songwriter you've never heard of makes it work for him | Media |
a lot of people search Spotify for celebrities. So, he invented a band called Papa Razzi and the Photogs, which has so far released 22 albums. Each consists of 20 to 40 tracks about every kind of star imaginable, from classical literature to Hollywood. There is an album called The Life of This Legendary American Music Man Is Quite Good, Yes, which includes tracks dedicated to Jon Hamm, Donald Trump's sons and ex-first lady Laura Bush... He's able to indulge this passion three days a week, while working another three days as a carer, because in 2013 alone his back catalogue returned an income of $23,000 (£13,800). 60% of his sales come from MP3 downloads and the rest from streaming.
music  internet  search  searchengines 
january 2014 by juliusbeezer
Forget Obamacare, remember when Syria was going to ruin Obama’s term? Remember Syria??
To the extent that Google searches are a reasonable proxy for public interest, what we find, not surprisingly, is that Syria was supplanted in the public interest by – among other things – the government shutdown and issues related to healthcare.
google  search  politics  sociology 
december 2013 by juliusbeezer
Ideas: Who to Believe: Auto Speed and Accident Mortality
I wrote: "I guess the key study your search missed is Leaf and Preusser's 1999 literature review: which concludes that the risk of death and injury does indeed rise with the speed of the colliding vehicle.
Those of us fortunate enough to have had a basic scientific education find this an unsurprising result: the kinetic energy of a moving object increases with the square of its velocity. The more energy imparted to the body in the event of a collision, the greater the damage."

If this sounds slightly snarky, then it is. This was a very low quality blogpost and attracted a very low quality discussion. But as I'd found this post as part of my own search to nail the 20mph--10%, 40mph--90% collision mortality statistic (I realise I'd never seen it formalised in a scientific publication, or if I had, I've forgotten), and I thought maybe some reader might be saved from ignorance by encountering a comment. (Hence the extended tagging after the dccomment tag).
road_safety  safety  cycling  dccomment  commenting  agnotology  search  economics  law 
december 2013 by juliusbeezer
Epicodus — How a Developer Learned Not to Be Racist and Sexist
I wanted to learn about racism and sexism, I read some articles and picked up some books. I didn’t have a lot of experience with the topics, so I knew I needed to get some other perspectives. Turns out there were a lot of points of view I had never thought about, and a lot of people with very different experiences from mine.
politics  culture  racism  screwmeneutics  search 
december 2013 by juliusbeezer
Could Google and the NSA Make Whistleblowers Disappear? | The Nation
panicking nation columnist "As we edge toward a fully digital world, such things may soon be possible, not in science fiction but in our world—and at the push of a button. In fact, the earliest prototypes of a new kind of “disappearance” are already being tested. We are closer to a shocking, dystopian reality that might once have been the stuff of futuristic novels than we imagine. Welcome to the memory hole."

corrected in comments

{This is a nice outline of potential and actual threats to freedom of expression online, but it is too pessimistic. It ignores the multichannel nature of most people's communication, and the fact that a system that represents the verity of the universe faithfully is just too useful not to defend to the death, even—especially—in business.

As for being lowly ranked in a Google search: well, such a fate may be painful for a Nation columnist, but we humble internauts have long been accustomed to that sensation. But we carry on, because we know that they who only watch and read, also serve. And we email the odd link to a friend we know will appreciate it.}
[Comment copied in full here because Nation's commenting system, hosted by Disqus, seems very dodgy indeed, though admittedly in my javascript-free honey I broke the internet viewing mode.]
politics  search  google  commenting  attention  dccomment 
december 2013 by juliusbeezer
BitCoin meets Google Trends and Wikipedia: Quantifyi... [Sci Rep. 2013] - PubMed - NCBI
We show that not only are the search queries and the prices connected but there also exists a pronounced asymmetry between the effect of an increased interest in the currency while being above or below its trend value.
finance  gold  google  search  bitcoin 
december 2013 by juliusbeezer
Not all citations are equal: identifying key citations automatically
LeMire is looking for a razor to cut through huge numbers of scholarly citations to the 'good stuff', but I find the simplifications they have made over-simplifications, and thought I'd say so: he has a deep/shallow citation concept which is almost certainly false.
dccomment  citation  search 
november 2013 by juliusbeezer
Program :: Society of the Query
> Pascal Jürgens (GE)
Measuring Personalization: An Experimental Framework for Testing Technological Black Boxes
Search engines vastly enhance people’s daily lives by making information more accessible. At the same time, they harbor an enormous potential for influencing users. Personalized search results further expand this potential because they explicitly aim at maximizing the relevance of delivered content with regard to selection decisions. Despite their relevance, these technologies have rarely been subject to social scientific scrutiny – mainly because they operate as black boxes and their effects can only be observed in the field, where confounding variables abound. Building on a method developed by Feuz, Fuller, and Stalder, the goal is to create synthetic user profiles and stimulate personalization. By programmatically simulating realistic user behavior, this method performs hypothesis tests against unknown algorithms such as Google’s personalization. Our results indicate that although personalization of search results does occur, its effects (as of now) are too weak to produce a true ‘Filter Bubble’ in which two users receive truly distinct content.
google  search  bubble 
november 2013 by juliusbeezer
Outsell Inc. - The world's only research and advisory firm focused solely on media, information, and technology.
Thomson Reuters’ Web of Knowledge and Google Scholar are announcing a major new partnership between their services... When Google Scholar users at the participating institutions hit the Scholar search results page, they see a new Web of Science link directly in the results, under the article preview, as part of Scholar’s familiar navigation bar. On Web of Science, subscribers now can move directly from a Web of Science record to a Scholar search on the same item.
google  citation  search 
november 2013 by juliusbeezer
« earlier      
per page:    204080120160

Copy this bookmark:

to read