NovelTM Datasets for English-Language Fiction, 1700-2009 | hc:26955 | Humanities CORE
This report describes a collection of 210,305 volumes of fiction that researchers are encouraged to borrow for their own work. Alternately, readers can simply browse the report as a description of English-language fiction in HathiTrust Digital Library. For instance, how does the proportion of fiction written by British authors or by women change across time? We also divide nineteenth- and twentieth-century fiction into seven subsets with different emphases (for instance, one where men and women are represented equally, and one composed of only the most prominent and widely-held books). Comparing the pictures produced by these different samples allows us to assess the fragility of recent quantitative arguments about literary history. Preprint version of an article to appear in the Journal of Cultural Analytics.
Arcadia Fund |Protecting endangered culture and nature and promoting open access
Arcadia serves humanity by preserving endangered cultural heritage and ecosystems. We protect complexity and work against the entropy of ravaged and thereby starkly simplified natural environments and globalized cultures. Innovation and change occur best in already complex systems. Once memories, knowledge, skills, variety, and intricacy disappear – once the old complexities are lost – they are hard to replicate or replace. Arcadia aims to return to people both their memories and their natural surroundings.
[1701.07396] LAREX - A semi-automatic open-source Tool for Layout Analysis and Region Extraction on Early Printed Books
A semi-automatic open-source tool for layout analysis on early printed books is presented. LAREX uses a rule based connected components approach which is very fast, easily comprehensible for the user and allows an intuitive manual correction if necessary. The PageXML format is used to support integration into existing OCR workflows. Evaluations showed that LAREX provides an efficient and flexible way to segment pages of early printed books.
