Computing and Visualizing the 19th-Century Literary Genome | Digital Humanities 2012
Analyzing ~ 3600 English-language books published between 1780 and 1900 in terms of relative closeness of "style" and constructing network diagrams from this. " I employ the tools and techniques of stylometry, corpus linguistics, machine learning, and network analysis to measure influence in a corpus of late 18th- and 19th-century novels. ... relative frequencies of every word and mark of punctuation are calculated ... thematic data includes information about the percentages of each theme/topic found in each text. I combine these two categories of data – stylistic and thematic – to create ‘book signals’ composed of 592 unique feature measurements. The ‘Euclidian” metric is then used to calculate every book’s distance from every other book in the corpus.... reveals that works by female authors (colored light gray) and male authors (black) are more stylistically and thematically homogeneous within their respective gender classes..."
stylometrics  gender  literature  style  via:bruce_sterling 
