« earlier
How to be Curious Instead of Contrarian About COVID-19: Eight Data Science Lessons From ‘Coronavirus Perspective’ (Epstein 2020)
"It is an order of magnitude less effort to spam poorly constructed hypotheticals than it is to deconstruct them. This review took a substantial amount of time, and in the meantime the original piece was poorly revised, several interviews and a podcast were released, and a second post trying to cover for the first went live.15 More will no doubt soon continue to move the goal posts and argument. In a world where actual life or death policy analysis is being treated like a high school debate round, the only strategic move is to step back, slow down, and draw methodological lessons for our students and colleagues that will apply to a broad set of current and future analyses."
yesterday by cshalizi
The Transmission Dynamics of Human Immunodeficiency Virus (HIV) [and Discussion] (May and Anderson, 1988)
"The paper first reviews data on HIV infections and AIDS disease among homosexual men, heterosexuals, intravenous (IV) drug abusers and children born to infected mothers, in both developed and developing countries. We survey such information as is currently available about the distribution of incubation times that elapse between HIV infection and the appearance of AIDS, about the fraction of those infected with HIV who eventually go on to develop AIDS, about time-dependent patterns of infectiousness and about distributions of rates of acquiring new sexual or needle-sharing partners. With this information, models for the transmission dynamics of HIV are developed, beginning with deliberately oversimplified models and progressing - on the basis of the understanding thus gained - to more complex ones. Where possible, estimates of the model's parameters are derived from the epidemiological data, and predictions are compared with observed trends. We also combine these epidemiological models with demographic considerations to assess the effects that heterosexually-transmitted HIV/AIDS may eventually have on rates of population growth, on age profiles and on associated economic and social indicators, in African and other countries. The degree to which sexual or other habits must change to bring the basic reproductive rate', R0, of HIV infections below unity is discussed. We conclude by outlining some research needs, both in the refinement and development of models and in the collection of epidemiological data."

--- This is (apparently) the first paper which considered degree heterogeneity as a factor in determining the epidemic threshold in an SIR model (section 4.1), while admitting that the uncorrelated degree assumption is inaccurate (*)

*: "By assuming that partners are chosen randomly (apart from the activity levels characterized by the weighting factor $i$), we may be overestimating the contacts of less active individuals with those in more active categories, and thus overestimating the spread of infection among such less active sub-groups. Conversely, the transmission probability $\beta$ may be higher for longer-lasting partnerships (despite the data in figure 4), so that use of a constant $\beta$ may tend to underestimate the spread of infection among less active people. The net effect of these countervailing refinements is hard to guess." [pp. 583--584]
in_NB  epidemics_on_networks  epidemic_models  epidemiology  aids  may.robert_m.  have_read
6 weeks ago by cshalizi
[cond-mat/0205439] Epidemic threshold in structured scale-free networks
"We analyze the spreading of viruses in scale-free networks with high clustering and degree correlations, as found in the Internet graph. For the Suscetible-Infected-Susceptible model of epidemics the prevalence undergoes a phase transition at a finite threshold of the transmission probability. Comparing with the absence of a finite threshold in networks with purely random wiring, our result suggests that high clustering and degree correlations protect scale-free networks against the spreading of viruses. We introduce and verify a quantitative description of the epidemic threshold based on the connectivity of the neighborhoods of the hubs."

--- Initially, I found the focus on the average degree of a node's neighbors, their <k^nn>, very puzzling --- as a measure of how many secondary infections you could produce, this would seem to involve a lot of over-counting when the network is clustered. But looking at their figure 2 clarifies: <k^nn|k>, the average degree of neighbors conditional on ego's degree, is a _decreasing_ function of ego's degree in their model. (If ego's degree is 1 or 2, it looks like the average degree of ego's neighbors is in the 100s [!], while as ego's degree goes to infinity, <k^nn> tends to a constant _smaller_ than the average degree.) So this is a hub-and-spoke system where each hub has a huge number of ties to very low-degree nodes, but there are enough non-hubs tied to multiple hubs, or hub-hub ties, to keep things connected. And then it makes sense that the crucial step in an epidemic is what happens once a hub is infected.

ETA: In fact, Moreno and Vazquez (cond-mat/0210362) observe that the model generates, basically, a linear chain of stars (lovely phrase!), and that's where all this weird behavior comes from.

(Despite the last tag, I think this model is so anti-social that it's not worth mentioning in the paper with DA and HF, but maybe if I ever write that review...)
6 weeks ago by cshalizi
Influential node ranking via randomized spanning trees - ScienceDirect
"Networks portraying a diversity of interactions among individuals serve as the substrates(media) of information dissemination. One of the most important problems is to identify the influential nodes for the understanding and controlling of information diffusion and disease spreading. However, most existing works on identification of efficient nodes for influence minimization focused on centrality measures. In this work, we capitalize on the structural properties of a random spanning forest to identify the influential nodes. Specifically, the node importance is simply ranked by the aggregated degree of a node in the spanning forest, which reveals both local and global connection patterns. Our analysis on real networks indicates that manipulating the nodes with high aggregated degrees in the random spanning forest shows better performance in controlling spreading processes, compared to previously used importance criteria, including degree centrality, betweenness centrality, and random walk based indices, leading to less influenced population. We further show the characteristics of the proposed measure and the comparison with benchmarks."

--- Degree in a random (depth-first) spanning tree is a cute centrality measure, but it's got to be strongly related to eigenvector centrality (which they only mention in the last sentence). Last tag is because "what is this a Monte Carlo estimate of?" might make a good project...
6 weeks ago by cshalizi
[cond-mat/0007048] Resilience of the Internet to random breakdowns
"A common property of many large networks, including the Internet, is that the connectivity of the various nodes follows a scale-free power-law distribution, P(k)=ck^-a. We study the stability of such networks with respect to crashes, such as random removal of sites. Our approach, based on percolation theory, leads to a general condition for the critical fraction of nodes, p_c, that need to be removed before the network disintegrates. We show that for a<=3 the transition never takes place, unless the network is finite. In the special case of the Internet (a=2.5), we find that it is impressively robust, where p_c is approximately 0.99."
6 weeks ago by cshalizi
Immunization and epidemic dynamics in complex networks | SpringerLink
"We study the behavior of epidemic spreading in networks, and, in particular, scale free networks. We use the Susceptible-Infected-Removed (SIR) epidemiological model. We give simulation results for the dynamics of epidemic spreading. By mapping the model into a static bond-percolation model we derive analytical results for the total number of infected individuals. We study this model with various immunization strategies, including random, targeted and acquaintance immunization."
6 weeks ago by cshalizi
[cond-mat/0107066] Immunization of complex networks
"Complex networks such as the sexual partnership web or the Internet often show a high degree of redundancy and heterogeneity in their connectivity properties. This peculiar connectivity provides an ideal environment for the spreading of infective agents. Here we show that the random uniform immunization of individuals does not lead to the eradication of infections in all complex networks. Namely, networks with scale-free properties do not acquire global immunity from major epidemic outbreaks even in the presence of unrealistically high densities of randomly immunized individuals. The absence of any critical immunization threshold is due to the unbounded connectivity fluctuations of scale-free networks. Successful immunization strategies can be developed only by taking into account the inhomogeneous connectivity properties of scale-free networks. In particular, targeted immunization schemes, based on the nodes' connectivity hierarchy, sharply lower the network's vulnerability to epidemic attacks."
6 weeks ago by cshalizi
Spreading dynamics in complex networks - IOPscience
"Searching for influential spreaders in complex networks is an issue of great significance for applications across various domains, ranging from epidemic control, innovation diffusion, viral marketing, and social movement to idea propagation. In this paper, we first display some of the most important theoretical models that describe spreading processes, and then discuss the problem of locating both the individual and multiple influential spreaders respectively. Recent approaches in these two topics are presented. For the identification of privileged single spreaders, we summarize several widely used centralities, such as degree, betweenness centrality, PageRank, k-shell, etc. We investigate the empirical diffusion data in a large scale online social community—LiveJournal. With this extensive dataset, we find that various measures can convey very distinct information of nodes. Of all the users in the LiveJournal social network, only a small fraction of them are involved in spreading. For the spreading processes in LiveJournal, while degree can locate nodes participating in information diffusion with higher probability, k-shell is more effective in finding nodes with a large influence. Our results should provide useful information for designing efficient spreading strategies in reality."

--- Eh, the measure of "influence" is just the size of the reachable set. (They don't actually track the dynamics of anything.)
6 weeks ago by cshalizi
Rumor propagation with heterogeneous transmission in social networks - IOPscience
"Rumor models consider that information transmission occurs with the same probability between each pair of nodes. However, this assumption is not observed in social networks, which contain influential spreaders. To overcome this limitation, we assume that central individuals have a higher capacity to convince their neighbors than peripheral subjects. From extensive numerical simulations we find that spreading is improved in scale-free networks when the transmission probability is proportional to the PageRank, degree, and betweenness centrality. In addition, the results suggest that spreading can be controlled by adjusting the transmission probabilities of the most central nodes. Our results provide a conceptual framework for understanding the interplay between rumor propagation and heterogeneous transmission in social networks."

--- Preferentially suppressing the infectiousness of central nodes is very effective, whether we measure centrality by betweenness, degree or pagerank (and in particular pagerank looks a bit more effective than degree but not much)
6 weeks ago by cshalizi
Stochastic Rumours (Daley and Kendall, 1965)
"The superficial similarity between rumours and epidemics breaks down on closer scrutiny; a feature peculiar to the rumour-spreading situation leads to striking qualitative differences in the behaviour of the two phenomena whether one uses a stochastic model or the associated deterministic model. A preliminary account is given here of a new procedure, “the principle of the diffusion of arbitrary constants”, which can be used to study the variance of the fluctuations of the sample trajectory in the stochastic model about the unique trajectory in the associated deterministic approximation. Numerical evidence (based on Monte Carlo and other calculations) is given to illustrate the effectiveness of the “principle” in the present application."

--- The difference in the model is that they assume people stop spreading the rumor on encountering someone who's already heard it (i.e., they add reactions I+I -> 2R, I+R -> 2R). This makes it really hard for the rumor to ever reach _everyone_. I am not sure that this makes sense for all rumors, but someone version of "why say something everyone knows?" is sensible for a lot of cultural transmission.
6 weeks ago by cshalizi
A comparative analysis of approaches to network-dismantling | Scientific Reports
"Estimating, understanding, and improving the robustness of networks has many application areas such as bioinformatics, transportation, or computational linguistics. Accordingly, with the rise of network science for modeling complex systems, many methods for robustness estimation and network dismantling have been developed and applied to real-world problems. The state-of-the-art in this field is quite fuzzy, as results are published in various domain-specific venues and using different datasets. In this study, we report, to the best of our knowledge, on the analysis of the largest benchmark regarding network dismantling. We reimplemented and compared 13 competitors on 12 types of random networks, including ER, BA, and WS, with different network generation parameters. We find that network metrics, proposed more than 20 years ago, are often non-dominating competitors, while many recently proposed techniques perform well only on specific network types. Besides the solution quality, we also investigate the execution time. Moreover, we analyze the similarity of competitors, as induced by their node rankings. We compare and validate our results on real-world networks. Our study is aimed to be a reference for selecting a network dismantling method for a given network, considering accuracy requirements and run time constraints."
6 weeks ago by cshalizi
Phys. Rev. E 65, 056109 (2002) - Attack vulnerability of complex networks
"We study the response of complex networks subject to attacks on vertices and edges. Several existing complex network models as well as real-world networks of scientific collaborations and Internet traffic are numerically investigated, and the network performance is quantitatively measured by the average inverse geodesic length and the size of the largest connected subgraph. For each case of attacks on vertices and edges, four different attacking strategies are used: removals by the descending order of the degree and the betweenness centrality, calculated for either the initial network or the current network during the removal procedure. It is found that the removals by the recalculated degrees and betweenness centralities are often more harmful than the attack strategies based on the initial network, suggesting that the network structure changes as important vertices or edges are removed. Furthermore, the correlation between the betweenness centrality and the degree in complex networks is studied."

--- And by "have read", I mean "have memories of this paper that are themselves old enough to vote..."
6 weeks ago by cshalizi
Phys. Rev. E 84, 061911 (2011) - Suppressing epidemics with a limited amount of immunization units
"The way diseases spread through schools, epidemics through countries, and viruses through the internet is crucial in determining their risk. Although each of these threats has its own characteristics, its underlying network determines the spreading. To restrain the spreading, a widely used approach is the fragmentation of these networks through immunization, so that epidemics cannot spread. Here we develop an immunization approach based on optimizing the susceptible size, which outperforms the best known strategy based on immunizing the highest-betweenness links or nodes. We find that the network's vulnerability can be significantly reduced, demonstrating this on three different real networks: the global flight network, a school friendship network, and the internet. In all cases, we find that not only is the average infection probability significantly suppressed, but also for the most relevant case of a small and limited number of immunization units the infection probability can be reduced by up to 55% ."

--- The improvements look small (but non-spurious) for the real-world networks, they get the biggest improvement for Erdos-Renyi. This suggests to me that high betweenness centrality is pretty good after all...
6 weeks ago by cshalizi
People Are Less Gullible Than You Think – Reason.com
"We aren't gullible: By default we veer on the side of being resistant to new ideas. In the absence of the right cues, we reject messages that don't fit with our preconceived views or pre-existing plans. To persuade us otherwise takes long-established, carefully maintained trust, clearly demonstrated expertise, and sound arguments. Science, the media, and other institutions that spread accurate but often counterintuitive messages face an uphill battle, as they must transmit these messages and keep them credible along great chains of trust and argumentation. Quasi-miraculously, these chains connect us to the latest scientific discoveries and to events on the other side of the planet. We can only hope for new means of extending and strengthening these ever-fragile links."
have_read  mercier.hugo  cognition  collective_cognition  persuasion  via:?
7 weeks ago by cshalizi
After Carbon Democracy | Dissent Magazine
"Capitalism is at the heart of the climate challenge."
No, no, no.
(1) Look at the environmental record of the USSR, or of pre-Deng China. Soviet Earth would be facing ~ as big a climate crisis as Neoliberal Earth (only with Comrade Mann in the role of Sakharov at best).
(2) Maintaining our _current_ sized economies _with our current technologies_ would get us cooked, so it's not _economic growth_ that's the problem.

Purdy knows better.
7 weeks ago by cshalizi
Hollywood’s Next Great Studio Head Will Be a Computer
Evidence that data-mining social media is actually better at prediction than 1930s-vintage audience research is conspicuously absent from this.
Also, it misses the equilibrium point: suppose data-analytics firm X can improve predictions about how popular a film will be, and this would be worth $Y to a studio. A risk-neutral studio will pay up to$Y-\epsilon for this information, and be no better off. (And, of course, predictions are _also_ an experience good..)
9 weeks ago by cshalizi
The Internet of Beefs
Exaggerated and one-sided, but with some elements of truth.
10 weeks ago by cshalizi
The Secretive Company That Might End Privacy as We Know It - The New York Times
The only thing which is beyond my undergrad class this semester is computing the feature vectors. (And honestly I wonder how good that is.)

--- Because it's 2020: _Of course_ it's backed by a Giuliani crony who markets it to right-wing police officials. _Of course_ Thiele is involved. _Of course_ the founder appears dumbfounded when pressed on how it might be misused. _Of course_ there's no independent verification or even a notion of false positives.
privacy  information_retrieval  image_processing  pattern_recognition  have_read  to_teach:data-mining
10 weeks ago by cshalizi
Counterexamples to "The Blessings of Multiple Causes" by Wang and Blei
"This brief note is meant to complement our previous comment on "The Blessings of Multiple Causes" by Wang and Blei (2019). We provide a more succinct and transparent explanation of the fact that the deconfounder does not control for multi-cause confounding. The argument given in Wang and Blei (2019) makes two mistakes: (1) attempting to infer independence conditional on one variable from independence conditional on a different, unrelated variable, and (2) attempting to infer joint independence from pairwise independence. We give two simple counterexamples to the deconfounder claim"

--- Sadly, I find this convincing. But the method often works --- so why?
10 weeks ago by cshalizi
[1912.02729] Rademacher complexity and spin glasses: A link between the replica and statistical theories of learning
"Statistical learning theory provides bounds of the generalization gap, using in particular the Vapnik-Chervonenkis dimension and the Rademacher complexity. An alternative approach, mainly studied in the statistical physics literature, is the study of generalization in simple synthetic-data models. Here we discuss the connections between these approaches and focus on the link between the Rademacher complexity in statistical learning and the theories of generalization for typical-case synthetic models from statistical physics, involving quantities known as Gardner capacity and ground state energy. We show that in these models the Rademacher complexity is closely related to the ground state energy computed by replica theories. Using this connection, one may reinterpret many results of the literature as rigorous Rademacher bounds in a variety of models in the high-dimensional statistics limit. Somewhat surprisingly, we also show that statistical learning theory provides predictions for the behavior of the ground-state energies in some full replica symmetry breaking models."
to:NB  learning_theory  statistics  have_read  krzakala.florent  Krzakala  zdeborova.lenka
11 weeks ago by cshalizi
PsyArXiv Preprints | The Generalizability Crisis
"Most theories and hypotheses in psychology are verbal in nature, yet their evaluation overwhelmingly relies on inferential statistical procedures. The validity of the move from qualitative to quantitative analysis depends on the verbal and statistical expressions of a hypothesis being closely aligned—that is, that the two must refer to roughly the same set of hypothetical observations. Here I argue that most inferential statistical tests in psychology fail to meet this basic condition. I demonstrate how foundational assumptions of the "random effects" model used pervasively in psychology impose far stronger constraints on the generalizability of results than most researchers appreciate. Ignoring these constraints dramatically inflates false positive rates and routinely leads researchers to draw sweeping verbal generalizations that lack any meaningful connection to the statistical quantities they are putatively based on. I argue that the routine failure to consider the generalizability of one's conclusions from a statistical perspective lies at the root of many of psychology's ongoing problems (e.g., the replication crisis), and conclude with a discussion of several potential avenues for improvement."
to:NB  yarkoni.tal  measurement  social_science_methodology  social_measurement  psychometrics  psychology  scientific_method  have_read  loud_and_prolonged_applause
december 2019 by cshalizi
To Chain The Beast
"There is now a booming cottage industry of work on “virtual social warfare”, “hostile social manipulation”, and similar new terms for phenomena studied under previous names in the 2000s, 1990s, and 1980s as waves of informatization created new social realities. And in turn the 2000s, 1990s, and 1980s terminology recapitulated important features of Cold War denial and deception and active measures, and so forth. One might humorously conjecture that running on a hamster wheel of terminology is in and of itself a successful information attack on our collective information processing and decision-making systems. But that is for another conversation. Under whatever name, how shall we analyze it and deal with it? A big problem is accounting for what seem like competing interpretations of the same underlying events. Are they doing it for the lulz? Is it a Russian plot? And how do we know if we’re just attributing too much signal to what could just be enormous amounts of noise?"
december 2019 by cshalizi
Planning Without Prices (G. M. Heal, 1969)
Yet Another Lange-ian Central Planning Board:

The CPB sets a utility function in terms of levels of final goods. It also allocates raw materials and intermediate goods. Every firm must report to the CPB the marginal productivity of every resource for making every good; the CPB re-allocates goods towards firms with above-average productivity --- basically gradient ascent. (There is a slight complication here to avoid negative allocations.) This converges to a stationary point of the utility function. The claimed innovations over Lange are (a) no prices, just quantities (except that the CPB needs to use partial derivatives of the utility function that act just like prices for its internal work), (b) could handle non-convexity [sort of --- it'll converge to local maxima very happily], (c) along the path to the stationary point, we always stay inside the feasible set, and (d) the utility function is increasing along the path. The author sets the most store by (c) and (d), and so I'd characterize it as kin to an interior-point method, though without (say) a constraint-enforcing barrier penalty. The informational advantage over Kantorovich-style central planning is that the CPB doesn't have to know all the production functions, it just (!) needs to know every firm's marginal productivity for each possible input, which the firm will report honestly because reasons. (The computational and political difficulties of deciding on an economy-wide utility function are as usual unaddressed.)

--- N.B., the last tag (and my emphasis on what's _not_ here) is because someone pointed me at this (and an earlier paper by Malinvaud, cited by Heal) as disposing of everything I wrote about the difficulties of central planning.
have_read  economics  optimization  distributed_systems  re:in_soviet_union_optimization_problem_solves_you  shot_after_a_fair_trial  in_NB
december 2019 by cshalizi
Online and Social Media Data As an Imperfect Continuous Panel Survey
"There is a large body of research on utilizing online activity as a survey of political opinion to predict real world election outcomes. There is considerably less work, however, on using this data to understand topic-specific interest and opinion amongst the general population and specific demographic subgroups, as currently measured by relatively expensive surveys. Here we investigate this possibility by studying a full census of all Twitter activity during the 2012 election cycle along with the comprehensive search history of a large panel of Internet users during the same period, highlighting the challenges in interpreting online and social media activity as the results of a survey. As noted in existing work, the online population is a non-representative sample of the offline world (e.g., the U.S. voting population). We extend this work to show how demographic skew and user participation is non-stationary and difficult to predict over time. In addition, the nature of user contributions varies substantially around important events. Furthermore, we note subtle problems in mapping what people are sharing or consuming online to specific sentiment or opinion measures around a particular topic. We provide a framework, built around considering this data as an imperfect continuous panel survey, for addressing these issues so that meaningful insight about public interest and opinion can be reliably extracted from online and social media data."
to:NB  have_read  social_measurement  social_science_methodology  re:social_networks_as_sensor_networks  social_media  networked_life  hofman.jake  to_teach:data-mining
november 2019 by cshalizi
The relationship between external variables and common factors | SpringerLink
"A theorem is presented which gives the range of possible correlations between a common factor and an external variable (i.e., a variable not included in the test battery factor analyzed). Analogous expressions for component (and regression component) theory are also derived. Some situations involving external correlations are then discussed which dramatize the theoretical differences between components and common factors."
in_NB  have_read  factor_analysis  inference_to_latent_objects  psychometrics  statistics  re:g_paper
november 2019 by cshalizi
Factor indeterminacy in the 1930's and the 1970's some interesting parallels | SpringerLink
"The issue of factor indeterminacy, and its meaning and significance for factor analysis, has been the subject of considerable debate in recent years. Interestingly, the identical issue was discussed widely in the literature of the late 1920's and early 1930's, but this early discussion was somehow lost or forgotten during the development and popularization of multiple factor analysis. There are strong parallels between the arguments in the early literature, and those which have appeared in recent papers. Here I review the history of this early literature, briefly survey the more recent work, and discuss these parallels where they are especially illuminating."
in_NB  psychometrics  factor_analysis  inference_to_latent_objects  have_read  a_long_time_ago  re:g_paper
november 2019 by cshalizi
Some new results on factor indeterminacy | SpringerLink
"Some relations between maximum likelihood factor analysis and factor indeterminacy are discussed. Bounds are derived for the minimum average correlation between equivalent sets of correlated factors which depend on the latent roots of the factor intercorrelation matrix ψ. Empirical examples are presented to illustrate some of the theory and indicate the extent to which it can be expected to be relevant in practice."
in_NB  have_read  a_long_time_ago  factor_analysis  low-rank_approximation  statistics  re:g_paper
november 2019 by cshalizi
Seeing Like a Finite State Machine — Crooked Timber
(The title makes me wonder what "seeing like a push-down stack machine" would entail, but well said...)
machine_learning  authoritarianism  farrell.henry  kith_and_kin  have_read  re:democratic_cognition
november 2019 by cshalizi
"Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y. Freund & R. Schapire, Machine Learning: Proceedings of the Thirteenth International conference, ***, 148–156), but are more robust with respect to noise. Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting. Internal estimates are also used to measure variable importance. These ideas are also applicable to regression."
to:NB  have_read  breiman.leo  ensemble_methods  decision_trees  random_forests  to_teach:data-mining  machine_learning  statistics  prediction
november 2019 by cshalizi
[1911.00535] Think-aloud interviews: A tool for exploring student statistical reasoning
"As statistics educators revise introductory courses to cover new topics and reach students from more diverse academic backgrounds, they need assessments to test if new teaching strategies and new curricula are meeting their goals. But assessing student understanding of statistics concepts can be difficult: conceptual questions are difficult to write clearly, and students often interpret questions in unexpected ways and give answers for unexpected reasons. Assessment results alone also do not clearly indicate the reasons students pick specific answers.
"We describe think-aloud interviews with students as a powerful tool to ensure that draft questions fulfill their intended purpose, uncover unexpected misconceptions or surprising readings of questions, and suggest new questions or further pedagogical research. We have conducted more than 40 hour-long think-aloud interviews to develop over 50 assessment questions, and have collected pre- and post-test assessment data from hundreds of introductory statistics students at two institutions.
"Think-alouds and assessment data have helped us refine draft questions and explore student misunderstandings. Our findings include previously under-reported statistical misconceptions about sampling distributions and causation. These results suggest directions for future statistics education research and show how think-aloud interviews can be effectively used to develop assessments and improve our understanding of student learning."
to:NB  have_read  heard_the_talk  kith_and_kin  statistics  cognitive_science  education  protocol_analysis  expertise
november 2019 by cshalizi
[1911.02656] Invariance and identifiability issues for word embeddings
"Word embeddings are commonly obtained as optimizers of a criterion function f of a text corpus, but assessed on word-task performance using a different evaluation function g of the test data. We contend that a possible source of disparity in performance on tasks is the incompatibility between classes of transformations that leave f and g invariant. In particular, word embeddings defined by f are not unique; they are defined only up to a class of transformations to which f is invariant, and this class is larger than the class to which g is invariant. One implication of this is that the apparent superiority of one word embedding over another, as measured by word task performance, may largely be a consequence of the arbitrary elements selected from the respective solution sets. We provide a formal treatment of the above identifiability issue, present some numerical examples, and discuss possible resolutions."
to:NB  word_embeddings  text_mining  natural_language_processing  model_selection  to_teach:data-mining  have_read  linear_algebra  oopsies
november 2019 by cshalizi
[1911.02639] Word Embedding Algorithms as Generalized Low Rank Models and their Canonical Form
"Word embedding algorithms produce very reliable feature representations of words that are used by neural network models across a constantly growing multitude of NLP tasks. As such, it is imperative for NLP practitioners to understand how their word representations are produced, and why they are so impactful.
"The present work presents the Simple Embedder framework, generalizing the state-of-the-art existing word embedding algorithms (including Word2vec (SGNS) and GloVe) under the umbrella of generalized low rank models. We derive that both of these algorithms attempt to produce embedding inner products that approximate pointwise mutual information (PMI) statistics in the corpus. Once cast as Simple Embedders, comparison of these models reveals that these successful embedders all resemble a straightforward maximum likelihood estimate (MLE) of the PMI parametrized by the inner product (between embeddings). This MLE induces our proposed novel word embedding model, Hilbert-MLE, as the canonical representative of the Simple Embedder framework.
"We empirically compare these algorithms with evaluations on 17 different datasets. Hilbert-MLE consistently observes second-best performance on every extrinsic evaluation (news classification, sentiment analysis, POS-tagging, and supersense tagging), while the first-best model depends varying on the task. Moreover, Hilbert-MLE consistently observes the least variance in results with respect to the random initialization of the weights in bidirectional LSTMs. Our empirical results demonstrate that Hilbert-MLE is a very consistent word embedding algorithm that can be reliably integrated into existing NLP systems to obtain high-quality results."
to:NB  have_read  text_mining  natural_language_processing  word_embeddings  information_theory  to_teach:data-mining  low-rank_approximation
november 2019 by cshalizi
[1412.4643] Wrong side of the tracks: Big Data and Protected Categories
"When we use machine learning for public policy, we find that many useful variables are associated with others on which it would be ethically problematic to base decisions. This problem becomes particularly acute in the Big Data era, when predictions are often made in the absence of strong theories for underlying causal mechanisms. We describe the dangers to democratic decision-making when high-performance algorithms fail to provide an explicit account of causation. We then demonstrate how information theory allows us to degrade predictions so that they decorrelate from protected variables with minimal loss of accuracy. Enforcing total decorrelation is at best a near-term solution, however. The role of causal argument in ethical debate urges the development of new, interpretable machine-learning algorithms that reference causal mechanisms."
in_NB  have_read  algorithmic_fairness  information_theory  kith_and_kin  dedeo.simon  to_teach:data-mining  re:prediction_without_racism  to_teach:statistics_of_inequality_and_discrimination
november 2019 by cshalizi
Beyond Social Contagion: Associative Diffusion and the Emergence of Cultural Variation - Amir Goldberg, Sarah K. Stein, 2018
"Network models of diffusion predominantly think about cultural variation as a product of social contagion. But culture does not spread like a virus. We propose an alternative explanation we call associative diffusion. Drawing on two insights from research in cognition—that meaning inheres in cognitive associations between concepts, and that perceived associations constrain people’s actions—we introduce a model in which, rather than beliefs or behaviors, the things being transmitted between individuals are perceptions about what beliefs or behaviors are compatible with one another. Conventional contagion models require the assumption that networks are segregated to explain cultural variation. We show, in contrast, that the endogenous emergence of cultural differentiation can be entirely attributable to social cognition and does not require a segregated network or a preexisting division into groups. Moreover, we show that prevailing assumptions about the effects of network topology do not hold when diffusion is associative."

--- Preprint version: https://web.stanford.edu/~amirgo/docs/beyond.pdf

(I'm not sure that this _is_ really an alternative explanation. Or, rather, it would be an explanation for cultural polarization wtihin a densely-connected community, but not an explanation for associations between cultural traits and social identities. Also, I think their conclusion that small-world networks lead to less "meaningful" cultural differentiation than do scale-free networks may be an artifact of the way they're using mutual information. If there was one community and everyone in it enacted the same practices, they'd get an MI of 0, but that wouldn't make them meaningless....)
to:NB  social_influence  contagion  homophily  cultural_transmission  cultural_differences  sociology  re:do-institutions-evolve  have_read
november 2019 by cshalizi
Reducing Coastal Risk on the East and Gulf Coasts | The National Academies Press
"Hurricane- and coastal-storm-related losses have increased substantially during the past century, largely due to increases in population and development in the most susceptible coastal areas. Climate change poses additional threats to coastal communities from sea level rise and possible increases in strength of the largest hurricanes. Several large cities in the United States have extensive assets at risk to coastal storms, along with countless smaller cities and developed areas. The devastation from Superstorm Sandy has heightened the nation's awareness of these vulnerabilities. What can we do to better prepare for and respond to the increasing risks of loss?
"Reducing Coastal Risk on the East and Gulf Coasts reviews the coastal risk-reduction strategies and levels of protection that have been used along the United States East and Gulf Coasts to reduce the impacts of coastal flooding associated with storm surges. This report evaluates their effectiveness in terms of economic return, protection of life safety, and minimization of environmental effects. According to this report, the vast majority of the funding for coastal risk-related issues is provided only after a disaster occurs. This report calls for the development of a national vision for coastal risk management that includes a long-term view, regional solutions, and recognition of the full array of economic, social, environmental, and life-safety benefits that come from risk reduction efforts. To support this vision, Reducing Coastal Risk states that a national coastal risk assessment is needed to identify those areas with the greatest risks that are high priorities for risk reduction efforts. The report discusses the implications of expanding the extent and levels of coastal storm surge protection in terms of operation and maintenance costs and the availability of resources.
"Reducing Coastal Risk recommends that benefit-cost analysis, constrained by acceptable risk criteria and other important environmental and social factors, be used as a framework for evaluating national investments in coastal risk reduction. The recommendations of this report will assist engineers, planners and policy makers at national, regional, state, and local levels to move from a nation that is primarily reactive to coastal disasters to one that invests wisely in coastal risk reduction and builds resilience among coastal communities."
october 2019 by cshalizi
The Varieties Of The Technological Control Problem
My own take appears here (by link) towards the end, but it's nonetheless very good.
(My take is of course very much indebted to Wiener; note in particular the last word of his title, _God and Golem, Inc._, and his interest in _social_ systems as cybernetic systems.)
cybernetics  artificial_intelligence  autonomous_technics  wiener.norbert  the_nightmare_from_which_we_are_trying_to_awake  have_read  to:blog
october 2019 by cshalizi
[1910.08350] A Mutual Information Maximization Perspective of Language Representation Learning
"We show state-of-the-art word representation learning methods maximize an objective function that is a lower bound on the mutual information between different parts of a word sequence (i.e., a sentence). Our formulation provides an alternative perspective that unifies classical word embedding models (e.g., Skip-gram) and modern contextual embeddings (e.g., BERT, XLNet). In addition to enhancing our theoretical understanding of these methods, our derivation leads to a principled framework that can be used to construct new self-supervised tasks. We provide an example by drawing inspirations from related methods based on mutual information maximization that have been successful in computer vision, and introduce a simple self-supervised objective that maximizes the mutual information between a global sentence representation and n-grams in the sentence. Our analysis offers a holistic view of representation learning methods to transfer knowledge and translate progress across multiple domains (e.g., natural language processing, computer vision, audio processing)."

--- This would have been very useful to read _before_ explaining word2vec et al. to The Kids yesterday.
to:NB  have_read  information_theory  natural_language_processing  text_mining  to_teach:data-mining
october 2019 by cshalizi
[1910.05438] Comment on "Blessings of Multiple Causes"
"The premise of the deconfounder method proposed in "Blessings of Multiple Causes" by Wang and Blei, namely that a variable that renders multiple causes conditionally independent also controls for unmeasured multi-cause confounding, is incorrect. This can be seen by noting that no fact about the observed data alone can be informative about ignorability, since ignorability is compatible with any observed data distribution. Methods to control for unmeasured confounding may be valid with additional assumptions in specific settings, but they cannot, in general, provide a checkable approach to causal inference, and they do not, in general, require weaker assumptions than the assumptions that are commonly used for causal inference. While this is outside the scope of this comment, we note that much recent work on applying ideas from latent variable modeling to causal inference problems suffers from similar issues."

--- I need to sort out which side I agree with here...
to:NB  have_read  causal_inference  factor_analysis  statistics  kith_and_kin  shpitser.ilya  ogburn.elizabeth
october 2019 by cshalizi
[1910.06386] All of Linear Regression
"Least squares linear regression is one of the oldest and widely used data analysis tools. Although the theoretical analysis of the ordinary least squares (OLS) estimator is as old, several fundamental questions are yet to be answered. Suppose regression observations (X1,Y1),…,(Xn,Yn)∈ℝd×ℝ (not necessarily independent) are available. Some of the questions we deal with are as follows: under what conditions, does the OLS estimator converge and what is the limit? What happens if the dimension is allowed to grow with n? What happens if the observations are dependent with dependence possibly strengthening with n? How to do statistical inference under these kinds of misspecification? What happens to the OLS estimator under variable selection? How to do inference under misspecification and variable selection?
"We answer all the questions raised above with one simple deterministic inequality which holds for any set of observations and any sample size. This implies that all our results are a finite sample (non-asymptotic) in nature. In the end, one only needs to bound certain random quantities under specific settings of interest to get concrete rates and we derive these bounds for the case of independent observations. In particular, the problem of inference after variable selection is studied, for the first time, when d, the number of covariates increases (almost exponentially) with sample size n. We provide comments on the `right'' statistic to consider for inference under variable selection and efficient computation of quantiles."
to:NB  regression  statistics  have_read  re:TALR  to_teach:linear_models
october 2019 by cshalizi
Personality and fatal diseases: Revisiting a scientific scandal - Anthony J Pelosi, 2019
"During the 1980s and 1990s, Hans J Eysenck conducted a programme of research into the causes, prevention and treatment of fatal diseases in collaboration with one of his protégés, Ronald Grossarth-Maticek. This led to what must be the most astonishing series of findings ever published in the peer-reviewed scientific literature with effect sizes that have never otherwise been encounterered in biomedical research. This article outlines just some of these reported findings and signposts readers to extremely serious scientific and ethical criticisms that were published almost three decades ago. Confidential internal documents that have become available as a result of litigation against tobacco companies provide additional insights into this work. It is suggested that this research programme has led to one of the worst scientific scandals of all time. A call is made for a long overdue formal inquiry."

--- But everything he did on IQ is scientifically unimpeachable, I'm sure.
october 2019 by cshalizi
The Style Maven Astrophysicists of Silicon Valley | WIRED
"Understanding latent style involves other physics principles too. Moody’s team uses something called eigenvector decomposition, a concept from quantum mechanics, to tease apart the overlapping “notes” in an individual’s style, sort of like “plucking a guitar string and listening for the multiple notes overlayed.” "

--- Oh for crying out loud. I like to think that this is the journalist's cluelessness, rather than the ex-physicist's.
have_read  data_mining  physics  principal_components  utter_stupidity  singular_value_decomposition_rules_everything_around_me  to_teach:data-mining  fashion
october 2019 by cshalizi
[1901.00403] Can You Trust This Prediction? Auditing Pointwise Reliability After Learning
"To use machine learning in high stakes applications (e.g. medicine), we need tools for building confidence in the system and evaluating whether it is reliable. Methods to improve model reliability often require new learning algorithms (e.g. using Bayesian inference to obtain uncertainty estimates). An alternative is to audit a model after it is trained. In this paper, we describe resampling uncertainty estimation (RUE), an algorithm to audit the pointwise reliability of predictions. Intuitively, RUE estimates the amount that a prediction would change if the model had been fit on different training data. The algorithm uses the gradient and Hessian of the model's loss function to create an ensemble of predictions. Experimentally, we show that RUE more effectively detects inaccurate predictions than existing tools for auditing reliability subsequent to training. We also show that RUE can create predictive distributions that are competitive with state-of-the-art methods like Monte Carlo dropout, probabilistic backpropagation, and deep ensembles, but does not depend on specific algorithms at train-time like these methods do."

--- I haven't read the paper, but I am going to now use this box to sketch how an idiot would tackle this problem. (I do not mean that the authors are idiots.) Since we're fitting our abyssal learning system by optimizing some loss function, the usual asymptotics for minimization apply (http://bactra.org/weblog/1017.html), and the variance matrix of the parameters $\theta$ is ($n$ times) the sandwich covariance matrix $h^{-1} j h^{-1}$, where $h$ is the Hessian of the loss function and $j$ is the covariance matrix of the gradient. Now the prediction we make at point $x$ is $f(x;\theta)$. This has some gradient w.r.t. the parameters at the point estimate, say $g(x)$. Taylor-expand the prediction around the point estimate, stopping at first order. Applying the usual algebra for variances tells us the variance of the prediction will be $g(x) \cdot n^{-1} h^{-1} j h^{-1} g(x)$. This --- linearization plus variance algebra --- is "propagation of error" or "the delta method".
I am now going to make two predictions about the paper, which I have not read:
(1) The bit about "gradient and Hessian" in the abstract is a sign that they're talking about the sandwich covariance matrix.
(2) Their uncertainties-in-predictions are either propagation-of-error variances, _or_ they do not compare to to them.
If, on reading, I am wrong about either prediction, I will eat my crow here.

--- ETA after reading: OK, I need to eat a _little_ crow. They assume the loss is a sum of IID point-by-point terms, meaning the gradient is too, and so the over-all loss gradient can be written as a sum of point-wise gradients, say $l_1, \ldots l_n$. They then sample points with replacement (as in the bootstrap), and perturb the parameter estimate by a first-order Taylor series using the appropriate $l_i$'s. (I'm not 100% sold on this step --- given that the influence of any one data point on the parameter estimate is small, still, replacing 1/3 of them isn't necessarily a local perturbation.) Then they repredict with the new parameters, and take the variances of the repredictions over many resamplings. (I don't see why --- they could just get a confidence interval for each prediction.)
to:NB  prediction  statistics  halbert_white_died_for_your_sins  via:arsyed  have_read  uncertainty_for_neural_networks
september 2019 by cshalizi
What College Admissions Offices Really Want - The New York Times
I would be very interested to know how CMU's admissions office navigates this. (Also: how good are those models?)
september 2019 by cshalizi
[1908.04358] Graph hierarchy and spread of infections
"Trophic levels and hence trophic coherence can be defined only on networks with well defined sources, trophic analysis of networks had been restricted to the ecological domain until now. Trophic coherence, a measure of a network's hierarchical organisation, has been shown to be linked to a network's structural and dynamical aspects. In this paper we introduce hierarchical levels, which is a generalisation of trophic levels, that can be defined on any simple graph and we interpret it as a network influence metric. We discuss how our generalisation relates to the previous definition and what new insights our generalisation shines on the topological and dynamical aspects of networks. We also show that the mean of hierarchical differences correlates strongly with the topology of the graph. Finally, we model an epidemiological dynamics and show how the statistical properties of hierarchical differences relate to the incidence rate and how it affects the spreading process in a SIS model."
september 2019 by cshalizi
[1906.00232] Kernel Instrumental Variable Regression
"Instrumental variable regression is a strategy for learning causal relationships in observational data. If measurements of input X and output Y are confounded, the causal relationship can nonetheless be identified if an instrumental variable Z is available that influences X directly, but is conditionally independent of Y given X and the unmeasured confounder. The classic two-stage least squares algorithm (2SLS) simplifies the estimation problem by modeling all relationships as linear functions. We propose kernel instrumental variable regression (KIV), a nonparametric generalization of 2SLS, modeling relations among X, Y, and Z as nonlinear functions in reproducing kernel Hilbert spaces (RKHSs). We prove the consistency of KIV under mild assumptions, and derive conditions under which the convergence rate achieves the minimax optimal rate for unconfounded, one-stage RKHS regression. In doing so, we obtain an efficient ratio between training sample sizes used in the algorithm's first and second stages. In experiments, KIV outperforms state of the art alternatives for nonparametric instrumental variable regression. Of independent interest, we provide a more general theory of conditional mean embedding regression in which the RKHS has infinite dimension."
september 2019 by cshalizi
On Whorfian Socioeconomics by Thomas B. Pepinsky :: SSRN
"Whorfian socioeconomics is an emerging interdisciplinary field of study that holds that linguistic structures explain differences in beliefs, values, and opinions across communities. Its core empirical strategy is to document a correlation between the presence or absence of a linguistic feature in a survey respondent’s language, and her/his responses to survey questions. This essay demonstrates — using the universe of linguistic features from the World Atlas of Language Structures and a wide array of responses from the World Values Survey — that such an approach produces highly statistically significant correlations in a majority of analyses, irrespective of the theoretical plausibility linking linguistic features to respondent beliefs. These results raise the possibility that correlations between linguistic features and survey responses are actually spurious. The essay concludes by showing how two simple and well-understood statistical fixes can more accurately reflect uncertainty in these analyses, reducing the temptation for analysts to create implausible Whorfian theories to explain spurious linguistic correlations."
in_NB  linguistics  economics  social_science_methodology  pepinsky.thomas_b.  debunking  evisceration  have_read  to_teach:linear_models  have_sent_gushing_fanmail  to:blog  to_teach:data_over_space_and_time
september 2019 by cshalizi
[1909.02330] McDiarmid-Type Inequalities for Graph-Dependent Variables and Stability Bounds
"A crucial assumption in most statistical learning theory is that samples are independently and identically distributed (i.i.d.). However, for many real applications, the i.i.d. assumption does not hold. We consider learning problems in which examples are dependent and their dependency relation is characterized by a graph. To establish algorithm-dependent generalization theory for learning with non-i.i.d. data, we first prove novel McDiarmid-type concentration inequalities for Lipschitz functions of graph-dependent random variables. We show that concentration relies on the forest complexity of the graph, which characterizes the strength of the dependency. We demonstrate that for many types of dependent data, the forest complexity is small and thus implies good concentration. Based on our new inequalities we are able to build stability bounds for learning from graph-dependent data."
september 2019 by cshalizi
BishopBlog: Responding to the replication crisis: reflections on Metascience2019
"Another major concern I had was the widespread reliance on proxy indicators of research quality. One talk that exemplified this was Yang Yang's presentation on machine intelligence approaches to predicting replicability of studies. He started by noting that non-replicable results get cited just as much as replicable ones: a depressing finding indeed, and one that motivated the study he reported. His talk was clever at many levels. It was ingenious to use the existing results from the Reproducibility Project as a database that could be mined to identify characteristics of results that replicated. I'm not qualified to comment on the machine learning approach, which involved using ngrams extracted from texts to predict a binary category of replicable or not. But implicit in this study was the idea that the results from this exercise could be useful in future in helping us identify, just on the basis of textual analysis, which studies were likely to be replicable.
"Now, this seems misguided on several levels. For a start, as we know from the field of medical screening, the usefulness of a screening test depends on the base rate of the condition you are screening for, the extent to which the sample you develop the test on is representative of the population, and the accuracy of prediction. I would be frankly amazed if the results of this exercise yielded a useful screener. But even if they did, then Goodhart's law would kick in: as soon as researchers became aware that there was a formula being used to predict how replicable their research was, they'd write their papers in a way that would maximise their score. One can even imagine whole new companies springing up who would take your low-scoring research paper and, for a price, revise it to get a better score. I somehow don't think this would benefit science. In defence of this approach, it was argued that it would allow us to identify characteristics of replicable work, and encourage people to emulate these. But this seems back-to-front logic. Why try to optimise an indirect, weak proxy for what makes good science (ngram characteristics of the write-up) rather than optimising, erm, good scientific practices."
september 2019 by cshalizi
[1908.09635] A Survey on Bias and Fairness in Machine Learning
"With the widespread use of AI systems and applications in our everyday lives, it is important to take fairness issues into consideration while designing and engineering these types of systems. Such systems can be used in many sensitive environments to make important and life-changing decisions; thus, it is crucial to ensure that the decisions do not reflect discriminatory behavior toward certain groups or populations. We have recently seen work in machine learning, natural language processing, and deep learning that addresses such challenges in different subdomains. With the commercialization of these systems, researchers are becoming aware of the biases that these applications can contain and have attempted to address them. In this survey we investigated different real-world applications that have shown biases in various ways, and we listed different sources of biases that can affect AI applications. We then created a taxonomy for fairness definitions that machine learning researchers have defined in order to avoid the existing bias in AI systems. In addition to that, we examined different domains and subdomains in AI showing what researchers have observed with regard to unfair outcomes in the state-of-the-art methods and how they have tried to address them. There are still many future directions and solutions that can be taken to mitigate the problem of bias in AI systems. We are hoping that this survey will motivate researchers to tackle these issues in the near future by observing existing work in their respective fields."
in_NB  algorithmic_fairness  prediction  machine_learning  lerman.kristina  galstyan.aram  to_teach:data-mining  have_read  to_teach:statistics_of_inequality_and_discrimination
august 2019 by cshalizi
[1908.08741] A relation between log-likelihood and cross-validation log-scores
"It is shown that the log-likelihood of a hypothesis or model given some data is equivalent to an average of all leave-one-out cross-validation log-scores that can be calculated from all subsets of the data. This relation can be generalized to any k-fold cross-validation log-scores."

--- This sounds funny, because leave-one-out is (asymptotically) equivalent to the robustified AIC (= Takeuchi information criterion).

--- ETA after reading: The algebra looks legit, but kinda pointless.
statistics  likelihood  cross-validation  have_read  shot_after_a_fair_trial  not_worth_putting_in_notebooks
august 2019 by cshalizi
[1908.06319] Locally Linear Embedding and fMRI feature selection in psychiatric classification
"Background: Functional magnetic resonance imaging (fMRI) provides non-invasive measures of neuronal activity using an endogenous Blood Oxygenation-Level Dependent (BOLD) contrast. This article introduces a nonlinear dimensionality reduction (Locally Linear Embedding) to extract informative measures of the underlying neuronal activity from BOLD time-series. The method is validated using the Leave-One-Out-Cross-Validation (LOOCV) accuracy of classifying psychiatric diagnoses using resting-state and task-related fMRI. Methods: Locally Linear Embedding of BOLD time-series (into each voxel's respective tensor) was used to optimise feature selection. This uses Gauß' Principle of Least Constraint to conserve quantities over both space and time. This conservation was assessed using LOOCV to greedily select time points in an incremental fashion on training data that was categorised in terms of psychiatric diagnoses. Findings: The embedded fMRI gave highly diagnostic performances (> 80%) on eleven publicly-available datasets containing healthy controls and patients with either Schizophrenia, Attention-Deficit Hyperactivity Disorder (ADHD), or Autism Spectrum Disorder (ASD). Furthermore, unlike the original fMRI data before or after using Principal Component Analysis (PCA) for artefact reduction, the embedded fMRI furnished significantly better than chance classification (defined as the majority class proportion) on ten of eleven datasets. Interpretation: Locally Linear Embedding appears to be a useful feature extraction procedure that retains important information about patterns of brain activity distinguishing among psychiatric cohorts."

--- Last tag is because I plan to teach LLE and this might make a good example or assignment, if I like how it was actually done.

--- ETA: It's... not horrible (though the writing is bad and far too pretentious), but not very insightful, and too complicated to make a good teaching example.
to:NB  locally_linear_embedding  classifiers  fmri  dimension_reduction  have_read  to_teach:data-mining
august 2019 by cshalizi
[1302.0890] Local Log-linear Models for Capture-Recapture
"Log-linear models are often used to estimate the size of a closed population using capture-recapture data. When capture probabilities are related to auxiliary covariates, one may select a separate model based on each of several post-strata. We extend post-stratification to its logical extreme by selecting a local log-linear model for each observed unit, while smoothing to achieve stability. Our local models serve a dual purpose: In addition to estimating the size of the population, we estimate the rate of missingness as a function of covariates. A simulation demonstrates the superiority of our method when the generating model varies over the covariate space. Data from the Breeding Bird Survey is used to illustrate the method."

--- When did the title change from "Smooth Poststratification"?
to:NB  have_read  surveys  smoothing  statistics  estimation  kurtz.zachary  kith_and_kin
august 2019 by cshalizi
[1908.06456] Harmonic Analysis of Symmetric Random Graphs
"Following Ressel (1985,2008) this note attempts to understand graph limits (Lovasz and Szegedy 2006} in terms of harmonic analysis on semigroups (Berg et al. 1984), thereby providing an alternative derivation of de Finetti's theorem for random exchangeable graphs."

--- SL has been hinting about this for years (it's the natural combination of his 70s--80s work on "extremal point" models, sufficiency, and semi-groups with his recent interest in graph limits and graphons), so I'm very excited to read this.

--- ETA after reading: It's everything one might hope; isomorphism classes of graphs show up as the natural sufficient statistics in a generalized exponential family, etc.
in_NB  have_read  graph_limits  analysis  probability  lauritzen.steffen
august 2019 by cshalizi
[1901.00555] An Introductory Guide to Fano's Inequality with Applications in Statistical Estimation
"Information theory plays an indispensable role in the development of algorithm-independent impossibility results, both for communication problems and for seemingly distinct areas such as statistics and machine learning. While numerous information-theoretic tools have been proposed for this purpose, the oldest one remains arguably the most versatile and widespread: Fano's inequality. In this chapter, we provide a survey of Fano's inequality and its variants in the context of statistical estimation, adopting a versatile framework that covers a wide range of specific problems. We present a variety of key tools and techniques used for establishing impossibility results via this approach, and provide representative examples covering group testing, graphical model selection, sparse linear regression, density estimation, and convex optimization."
in_NB  information_theory  minimax  statistics  estimation  have_read  re:HEAS
august 2019 by cshalizi
Back to the Future: Review of Bit by Bit by Matt Salganik
"When I heard a few years ago that Salganik was writing a textbook, I was surprised and a little disappointed that this would be a distraction from his cutting edge research in areas like information cascades and respondent driven sampling. I was a fool. Just as chapter 5 of the book describes how computational approaches can enable mass collaboration on research projects by spreading the work from credentialed experts to masses of people with low or unkown skill, Bit by Bit itself will do more for computational social science by spreading the heretofore tacit knowledge of the field than a top researcher could accomplish directly. I strongly recommend Bit by Bit and fully expect it will be the standard methods textbook for computational social science until advances in the field render it dated. If we are lucky, we will benefit from a new edition every five to ten years so the book can keep pace with a rapidly evolving field. However for now it is incredibly current and I highly recommend it to any social scientist who teaches, practices, or aspires to practice or even just understand computational social science."
august 2019 by cshalizi
Confabulation in the humanities - Matthew Lincoln, PhD
Now, realize that this doesn't _just_ apply to interpreting quantitative analyses, but also to more traditionally-humanistic explanations...
august 2019 by cshalizi
[1901.10861] A Simple Explanation for the Existence of Adversarial Examples with Small Hamming Distance
"The existence of adversarial examples in which an imperceptible change in the input can fool well trained neural networks was experimentally discovered by Szegedy et al in 2013, who called them "Intriguing properties of neural networks". Since then, this topic had become one of the hottest research areas within machine learning, but the ease with which we can switch between any two decisions in targeted attacks is still far from being understood, and in particular it is not clear which parameters determine the number of input coordinates we have to change in order to mislead the network. In this paper we develop a simple mathematical framework which enables us to think about this baffling phenomenon from a fresh perspective, turning it into a natural consequence of the geometry of ℝn with the L0 (Hamming) metric, which can be quantitatively analyzed. In particular, we explain why we should expect to find targeted adversarial examples with Hamming distance of roughly m in arbitrarily deep neural networks which are designed to distinguish between m input classes."
august 2019 by cshalizi
Don't count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors
"Context-predicting models (more commonly known as embeddings or neural language models) are the new kids on the distributional semantics block. Despite the buzz surrounding these models, the literature is still lacking a systematic comparison of the predictive models with classic, count-vector-based distributional semantic approaches. In this paper, we perform such an extensive evaluation, on a wide range of lexical semantics tasks and across many parameter settings. The results, to our own surprise, show that the buzz is fully justified, as the context-predicting models obtain a thorough and resounding victory against their count-based counterparts."
to:NB  have_read  natural_language_processing  text_mining  word2vec  data_mining  to_teach:data-mining
august 2019 by cshalizi
[1402.3722] word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method
"The word2vec software of Tomas Mikolov and colleagues (this https URL ) has gained a lot of traction lately, and provides state-of-the-art word embeddings. The learning models behind the software are described in two research papers. We found the description of the models in these papers to be somewhat cryptic and hard to follow. While the motivations and presentation may be obvious to the neural-networks language-modeling crowd, we had to struggle quite a bit to figure out the rationale behind the equations.
"This note is an attempt to explain equation (4) (negative sampling) in "Distributed Representations of Words and Phrases and their Compositionality" by Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado and Jeffrey Dean."
to:NB  natural_language_processing  text_mining  statistics  neural_networks  data_mining  word2vec  have_read  to_teach:data-mining
august 2019 by cshalizi
per page:    204080120160

Copy this bookmark: