recentpopularlog in

kintopp : ocr   44

kraken — kraken 2.0.5-4-gbb42ba5 documentation
kraken is a turn-key OCR system forked from ocropus. It is intended to rectify a number of issues while preserving (mostly) functional equivalence. If you already got a model trained for ocropus you can always expect it to work with kraken without all the fuss of the original ocropus tools. via Pocket
deep  ocr  tools 
7 weeks ago by kintopp
A Scalable Handwritten Text Recognition System
Many studies on (Offline) Handwritten Text Recognition (HTR) systems have focused on building state-of-the-art models for line recognition on small corpora. However, adding HTR capability to a large scale multilingual OCR system poses new challenges. This paper addresses three problems in building such systems: data, efficiency, and integration. Firstly, one of the biggest challenges is obtaining sufficient amounts of high quality training data. We address the problem by using online handwriting data collected for a large scale production online handwriting recognition system. We describe our image data generation pipeline and study how online data can be used to build HTR models. We show that the data improve the models significantly under the condition where only a small number of real images is available, which is usually the case for HTR models. It enables us to support a new script at substantially lower cost. Secondly, we propose a line recognition model based on neural networks without recurrent connections. The model achieves a comparable accuracy with LSTM-based models while allowing for better parallelism in training and inference. Finally, we present a simple way to integrate HTR models into an OCR system. These constitute a solution to bring HTR capability into a large scale OCR system.
analysis  handwriting  ocr  tools 
12 weeks ago by kintopp
Image and Ground Truth Resources - IMPACT Centre of Competence
The Impact Centre of Competence dataset contains more than half a million representative text-based images compiled by a number of major European libraries. via Pocket
datasets  digitization  images  ocr  text  ml 
may 2019 by kintopp
Modern Tool for Old Texts - Universität Würzburg
Historians and other Humanities’ scholars often have to deal with difficult research objects: centuries-old printed works that are difficult to decipher and often in an unsatisfactory state of conservation. via Pocket
germany  ocr  tools 
may 2019 by kintopp
OCR-D · GitHub
Type: AllSelect type All Sources Forks Archived Mirrors Language: AllSelect language All CSS HTML Java JavaScript Jupyter Notebook Makefile PostScript Python Shell XSLT via Pocket
ocr  tools 
may 2019 by kintopp
Our Search for the Best OCR Tool, and What We Found - Features - Source: An OpenNews project
Heavily redacted documents such as Carter Page’s FISA warrant are notoriously challenging for OCR tools. via Pocket
ocr  review  tools 
may 2019 by kintopp
Manuscripts are among the most important witnesses to our European shared cultural heritage. Despite a large digitization, the wealth of their content remains largely inaccessible : current handwritten text recognition technology is not accurate enough to allow full text search. via Pocket
ml  ocr  recognition  text 
may 2019 by kintopp
Reading the First Books – Multilingual, Early-Modern OCR for Primeros Libros
Reading the First Books: Multilingual, Early-Modern OCR for Primeros Libros is a two-year, multi-university effort to develop tools for the automatic transcription of early modern printed books. via Pocket
americas  history  ocr  spain  tools 
april 2019 by kintopp
Christian Reul / OCR4all_Web · GitLab
Provides OCR (optical character recognition) services through web applications.
api  ocr  software  tools 
march 2019 by kintopp
Why You (A Humanist) Should Care About Optical Character Recognition · Ryan Cordell
Yesterday David Smith and I announced the release of “A Research Agenda for Historical and Multilingual Optical Character Recognition,” a report funded by the Andrew W. Mellon Foundation and conducted in consultation with the NEH’s Office of Digital Humanities and the Library of Congress. via Pocket
methodology  ocr 
march 2019 by kintopp
Experiments with early modern manuscripts and computer-aided transcription - The Collation
Guest post by Minyue Dai, Carrie Yang, Reeve Ingle, and Meaghan J. Brown. Hundreds of years ago, scholars might spend hours in a library searching through thousands of pages to find a useful paragraph.Things get much easier when we can work with digitized text. via Pocket
google  handwriting  ocr 
december 2018 by kintopp
DATeCH International Conference 2019 - Call for Papers - IMPACT Centre of Competence
The International DATeCH (Digital Access to Textual Cultural Heritage) conference brings together researchers and practitioners seeking innovative approaches for the creation, transformation and exploitation of historical documents in digital form. via Pocket
analysis  belgium  cfp  culture  handwriting  nlp  ocr  recognition  text 
november 2018 by kintopp
In Codice Ratio
In Codice Ratio is a research project that aims at developing novel methods and tools to support content analysis and knowledge discovery from large collections of historical documents. via Pocket
handwriting  manuscripts  ocr  recognition  text 
may 2018 by kintopp
An Automated Approach for Geocoding Tabular Itineraries
Authors: Rui Santos INESC-ID, Instituto Superior Técnico, University of Lisbon, Lisbon, Portugal Patricia Murrieta-Flores Digital Humanities Research Center, University of Chester, Chester, United Kingdom Bruno Martins INESC-ID, Instituto Superior Técnico, University of Lisbon, Lisbon, Portugal via Pocket
geo  history  itineraries  nlp  ocr  text 
may 2018 by kintopp
What is dhSegment? | dhSegment
It is a generic approach for Historical Document Processing. It relies on a Convolutional Neural Network to do the heavy lifting of predicting pixelwise characteristics. Then simple image processing operations are provided to extract the components of interest (boxes, polygons, lines, masks, …) via Pocket
analysis  images  manuscripts  ocr  deep 
april 2018 by kintopp
ocr - Applying TextRecognize on alpha-numerical table - Mathematica Stack Exchange
Here is a way of extracting the positions of the various characters in your image by using ImageCorrelate. Define the image to be worked on. via Pocket
ocr  wolfram 
september 2017 by kintopp
GitHub - tmbdev/ocropy: Python-based tools for document analysis and OCR
OCRopus is a collection of document analysis programs, not a turn-key OCR system. In order to apply it to your documents, you may need to do some image preprocessing, and possibly also train new models. via Pocket
ocr  python  tools  ml 
september 2017 by kintopp
Zotero | Groups > ocr-tools
No recent group discussions. via Pocket
bibliography  ocr  resources 
september 2017 by kintopp
GitHub - kba/awesome-ocr: Links to awesome OCR projects
This list contains links to great software tools and libraries and literature related to Optical Character Recognition (OCR). Contributions are welcome, as is feedback. via Pocket
ocr  resources 
september 2017 by kintopp
TSX by cziaarm
TSX is a web interface for transcription of digitised handwritten material "by the crowd". TSX was developed as part of the tranScripotium project and uses the Transkribus web servicces to manage transcripts and access digitised images and HTR tool outputs. via Pocket
crowdsourcing  handwriting  ocr  tools  transcription 
september 2017 by kintopp
Historical and Multilingual OCR
Northeastern University announces a grant from the Andrew W. Mellon Foundation to study the current state of optical character recognition (OCR) for historical and multilingual documents and to outline future directions for research in this area. via Pocket
history  language  ocr 
august 2017 by kintopp
Manuscripts are among the most important witnesses to our European shared cultural heritage. Despite a large digitization, the wealth of their content remains largely inaccessible : current handwritten text recognition technology is not accurate enough to allow full text search. via Pocket
ocr  recognition  text  ml 
june 2017 by kintopp
Satellite workshops Tuesday, May 30th 9:00 – 16:30 The journey from physical to digital and advancements in culture heritage digitisation 9:00 – 18:00 TRACER tutorial for computational text reuse detection 13:00 – 17:00 TextGrid user workshop Wednesday, May 31st 9:00 – 16:30 Handwritte via Pocket
analysis  conference  germany  history  ocr  tools  text 
may 2017 by kintopp
Image Analysis for Archival Discovery
Image Analysis for Archival Discovery (Aida) responds to two issues within the digital humanities and digital library communities: First, that we leverage little of the information potential of the millions of images that we are creating as we digitize the cultural record. via Pocket
analysis  images  ocr  tools 
may 2017 by kintopp
Nidaba — nidaba 0.9.7-6-g5a8da99 documentation
Nidaba is an open source distributed optical character recognition pipeline that makes it easy to preprocess, OCR, and postprocess scans of text documents in a multitude of ways. via Pocket
ocr  tools 
may 2017 by kintopp
Creating a Modern OCR Pipeline Using Computer Vision and Deep Learning | Dropbox Tech Blog
In this post we will take you behind the scenes on how we built a state-of-the-art Optical Character Recognition (OCR) pipeline for our mobile document scanner. via Pocket
analysis  deep  images  ocr  report  ml 
april 2017 by kintopp
GitHub - tesseract4java/tesseract4java: Java GUI and Tools for Tesseract OCR
A graphical user interface for the Tesseract OCR engine. The program has been introduced in the Master’s thesis “Analyses and Heuristics for the Improvement of Optical Character Recognition Results for Fraktur Texts” by Paul Vorbach (German). via Pocket
gui  ocr  tools 
march 2017 by kintopp
Transkribus: Text Recognition, Transcription and Information Extraction | Günter Mühlberger - YouTube
10/10/2016 | Day Meeting | What should be in your Digital Toolbox?The Linnean Society of London is the world’s oldest active biological society. Founded in 1788, the Society takes its name from the Swedish naturalist Carl Linnaeus (1707–1778). us on social media:htt via Pocket
demos  ocr  transcription  video 
january 2017 by kintopp
DATeCH 2017
There is a raising concern about the optimization of the available resources for the creation, transformation, and dissemination of digitised textual content. via Pocket
cfp  conference  ocr 
december 2016 by kintopp
GitHub - tberg12/ocular: Ocular is a state-of-the-art historical OCR system.
Ocular is a state-of-the-art historical OCR system. Continued development of Ocular is supported in part by a Digital Humanities Implementation Grant from the National Endowment for the Humanities for the project Reading the First Books: Multilingual, Early-Modern OCR for Primeros Libros. via Pocket
ocr  tools 
november 2016 by kintopp
+ Presentations from the READ partners now available! | READ Project
The READ project was launched in January 2016 at the ‘Technology meets Scholarship’ conference at the Hessian State Archives in Marburg (Germany).  This conference was organised by the co:op (community as opportunity – the creative archives’ and users’ network) project. via Pocket
conference  europe  infrastructure  ocr  recognition  text 
august 2016 by kintopp
Texas A&M's Initiative for Digital Humanities, Media, and Culture (IDHMC) is very pleased to announce eMOP, the Early Modern OCR Project. Best viewed on Safari or Firefox browsers. via Pocket
ocr  resources  tools 
may 2016 by kintopp
First international co:op convention | co:op
Technology meets Scholarship, or how Handwritten Text Recognition will Revolutionize Access to Archival Collections. via Pocket
conference  handwriting  ocr  recognition 
november 2015 by kintopp
The Berkeley NLP Group
Improved Typesetting Models for Historical OCR [PDF] Taylor Berg-Kirkpatrick and Dan Klein.ACL 2014. Unsupervised Transcription of Historical Documents [PDF] Taylor Berg-Kirkpatrick, Greg Durrett, and Dan Klein.ACL 2013. via Pocket
ocr  tools 
june 2015 by kintopp

Copy this bookmark:

to read