How to be Curious Instead of Contrarian About COVID-19: Eight Data Science Lessons From ‘Coronavirus Perspective’ (Epstein 2020)

yesterday by cshalizi

"It is an order of magnitude less effort to spam poorly constructed hypotheticals than it is to deconstruct them. This review took a substantial amount of time, and in the meantime the original piece was poorly revised, several interviews and a podcast were released, and a second post trying to cover for the first went live.15 More will no doubt soon continue to move the goal posts and argument. In a world where actual life or death policy analysis is being treated like a high school debate round, the only strategic move is to step back, slow down, and draw methodological lessons for our students and colleagues that will apply to a broad set of current and future analyses."

evisceration
bad_data_analysis
epidemiology
epstein.richard
utter_stupidity
coronavirus_pandemic_of_2019--
have_read
anti-contrarianism
yesterday by cshalizi

The Transmission Dynamics of Human Immunodeficiency Virus (HIV) [and Discussion] (May and Anderson, 1988)

6 weeks ago by cshalizi

"The paper first reviews data on HIV infections and AIDS disease among homosexual men, heterosexuals, intravenous (IV) drug abusers and children born to infected mothers, in both developed and developing countries. We survey such information as is currently available about the distribution of incubation times that elapse between HIV infection and the appearance of AIDS, about the fraction of those infected with HIV who eventually go on to develop AIDS, about time-dependent patterns of infectiousness and about distributions of rates of acquiring new sexual or needle-sharing partners. With this information, models for the transmission dynamics of HIV are developed, beginning with deliberately oversimplified models and progressing - on the basis of the understanding thus gained - to more complex ones. Where possible, estimates of the model's parameters are derived from the epidemiological data, and predictions are compared with observed trends. We also combine these epidemiological models with demographic considerations to assess the effects that heterosexually-transmitted HIV/AIDS may eventually have on rates of population growth, on age profiles and on associated economic and social indicators, in African and other countries. The degree to which sexual or other habits must change to bring the `basic reproductive rate', R0, of HIV infections below unity is discussed. We conclude by outlining some research needs, both in the refinement and development of models and in the collection of epidemiological data."

--- This is (apparently) the first paper which considered degree heterogeneity as a factor in determining the epidemic threshold in an SIR model (section 4.1), while admitting that the uncorrelated degree assumption is inaccurate (*)

*: "By assuming that partners are chosen randomly (apart from the activity levels characterized by the weighting factor $i$), we may be overestimating the contacts of less active individuals with those in more active categories, and thus overestimating the spread of infection among such less active sub-groups. Conversely, the transmission probability $\beta$ may be higher for longer-lasting partnerships (despite the data in figure 4), so that use of a constant $\beta$ may tend to underestimate the spread of infection among less active people. The net effect of these countervailing refinements is hard to guess." [pp. 583--584]

in_NB
epidemics_on_networks
epidemic_models
epidemiology
aids
may.robert_m.
have_read
--- This is (apparently) the first paper which considered degree heterogeneity as a factor in determining the epidemic threshold in an SIR model (section 4.1), while admitting that the uncorrelated degree assumption is inaccurate (*)

*: "By assuming that partners are chosen randomly (apart from the activity levels characterized by the weighting factor $i$), we may be overestimating the contacts of less active individuals with those in more active categories, and thus overestimating the spread of infection among such less active sub-groups. Conversely, the transmission probability $\beta$ may be higher for longer-lasting partnerships (despite the data in figure 4), so that use of a constant $\beta$ may tend to underestimate the spread of infection among less active people. The net effect of these countervailing refinements is hard to guess." [pp. 583--584]

6 weeks ago by cshalizi

[cond-mat/0205439] Epidemic threshold in structured scale-free networks

6 weeks ago by cshalizi

"We analyze the spreading of viruses in scale-free networks with high clustering and degree correlations, as found in the Internet graph. For the Suscetible-Infected-Susceptible model of epidemics the prevalence undergoes a phase transition at a finite threshold of the transmission probability. Comparing with the absence of a finite threshold in networks with purely random wiring, our result suggests that high clustering and degree correlations protect scale-free networks against the spreading of viruses. We introduce and verify a quantitative description of the epidemic threshold based on the connectivity of the neighborhoods of the hubs."

--- Initially, I found the focus on the average degree of a node's neighbors, their <k^nn>, very puzzling --- as a measure of how many secondary infections you could produce, this would seem to involve a lot of over-counting when the network is clustered. But looking at their figure 2 clarifies: <k^nn|k>, the average degree of neighbors conditional on ego's degree, is a _decreasing_ function of ego's degree in their model. (If ego's degree is 1 or 2, it looks like the average degree of ego's neighbors is in the 100s [!], while as ego's degree goes to infinity, <k^nn> tends to a constant _smaller_ than the average degree.) So this is a hub-and-spoke system where each hub has a huge number of ties to very low-degree nodes, but there are enough non-hubs tied to multiple hubs, or hub-hub ties, to keep things connected. And then it makes sense that the crucial step in an epidemic is what happens once a hub is infected.

ETA: In fact, Moreno and Vazquez (cond-mat/0210362) observe that the model generates, basically, a linear chain of stars (lovely phrase!), and that's where all this weird behavior comes from.

(Despite the last tag, I think this model is so anti-social that it's not worth mentioning in the paper with DA and HF, but maybe if I ever write that review...)

in_NB
have_read
epidemics_on_networks
networks
re:do-institutions-evolve
--- Initially, I found the focus on the average degree of a node's neighbors, their <k^nn>, very puzzling --- as a measure of how many secondary infections you could produce, this would seem to involve a lot of over-counting when the network is clustered. But looking at their figure 2 clarifies: <k^nn|k>, the average degree of neighbors conditional on ego's degree, is a _decreasing_ function of ego's degree in their model. (If ego's degree is 1 or 2, it looks like the average degree of ego's neighbors is in the 100s [!], while as ego's degree goes to infinity, <k^nn> tends to a constant _smaller_ than the average degree.) So this is a hub-and-spoke system where each hub has a huge number of ties to very low-degree nodes, but there are enough non-hubs tied to multiple hubs, or hub-hub ties, to keep things connected. And then it makes sense that the crucial step in an epidemic is what happens once a hub is infected.

ETA: In fact, Moreno and Vazquez (cond-mat/0210362) observe that the model generates, basically, a linear chain of stars (lovely phrase!), and that's where all this weird behavior comes from.

(Despite the last tag, I think this model is so anti-social that it's not worth mentioning in the paper with DA and HF, but maybe if I ever write that review...)

6 weeks ago by cshalizi

Influential node ranking via randomized spanning trees - ScienceDirect

6 weeks ago by cshalizi

"Networks portraying a diversity of interactions among individuals serve as the substrates(media) of information dissemination. One of the most important problems is to identify the influential nodes for the understanding and controlling of information diffusion and disease spreading. However, most existing works on identification of efficient nodes for influence minimization focused on centrality measures. In this work, we capitalize on the structural properties of a random spanning forest to identify the influential nodes. Specifically, the node importance is simply ranked by the aggregated degree of a node in the spanning forest, which reveals both local and global connection patterns. Our analysis on real networks indicates that manipulating the nodes with high aggregated degrees in the random spanning forest shows better performance in controlling spreading processes, compared to previously used importance criteria, including degree centrality, betweenness centrality, and random walk based indices, leading to less influenced population. We further show the characteristics of the proposed measure and the comparison with benchmarks."

--- Degree in a random (depth-first) spanning tree is a cute centrality measure, but it's got to be strongly related to eigenvector centrality (which they only mention in the last sentence). Last tag is because "what is this a Monte Carlo estimate of?" might make a good project...

in_NB
have_read
epidemics_on_networks
network_data_analysis
to_teach:baby-nets
--- Degree in a random (depth-first) spanning tree is a cute centrality measure, but it's got to be strongly related to eigenvector centrality (which they only mention in the last sentence). Last tag is because "what is this a Monte Carlo estimate of?" might make a good project...

6 weeks ago by cshalizi

[cond-mat/0007048] Resilience of the Internet to random breakdowns

6 weeks ago by cshalizi

"A common property of many large networks, including the Internet, is that the connectivity of the various nodes follows a scale-free power-law distribution, P(k)=ck^-a. We study the stability of such networks with respect to crashes, such as random removal of sites. Our approach, based on percolation theory, leads to a general condition for the critical fraction of nodes, p_c, that need to be removed before the network disintegrates. We show that for a<=3 the transition never takes place, unless the network is finite. In the special case of the Internet (a=2.5), we find that it is impressively robust, where p_c is approximately 0.99."

in_NB
networks
have_read
re:do-institutions-evolve
6 weeks ago by cshalizi

Immunization and epidemic dynamics in complex networks | SpringerLink

6 weeks ago by cshalizi

"We study the behavior of epidemic spreading in networks, and, in particular, scale free networks. We use the Susceptible-Infected-Removed (SIR) epidemiological model. We give simulation results for the dynamics of epidemic spreading. By mapping the model into a static bond-percolation model we derive analytical results for the total number of infected individuals. We study this model with various immunization strategies, including random, targeted and acquaintance immunization."

in_NB
have_read
epidemics_on_networks
re:do-institutions-evolve
re:do_not_adjust_your_receiver
6 weeks ago by cshalizi

[cond-mat/0107066] Immunization of complex networks

6 weeks ago by cshalizi

"Complex networks such as the sexual partnership web or the Internet often show a high degree of redundancy and heterogeneity in their connectivity properties. This peculiar connectivity provides an ideal environment for the spreading of infective agents. Here we show that the random uniform immunization of individuals does not lead to the eradication of infections in all complex networks. Namely, networks with scale-free properties do not acquire global immunity from major epidemic outbreaks even in the presence of unrealistically high densities of randomly immunized individuals. The absence of any critical immunization threshold is due to the unbounded connectivity fluctuations of scale-free networks. Successful immunization strategies can be developed only by taking into account the inhomogeneous connectivity properties of scale-free networks. In particular, targeted immunization schemes, based on the nodes' connectivity hierarchy, sharply lower the network's vulnerability to epidemic attacks."

in_NB
have_read
networks
epidemics_on_networks
re:do-institutions-evolve
6 weeks ago by cshalizi

Spreading dynamics in complex networks - IOPscience

6 weeks ago by cshalizi

"Searching for influential spreaders in complex networks is an issue of great significance for applications across various domains, ranging from epidemic control, innovation diffusion, viral marketing, and social movement to idea propagation. In this paper, we first display some of the most important theoretical models that describe spreading processes, and then discuss the problem of locating both the individual and multiple influential spreaders respectively. Recent approaches in these two topics are presented. For the identification of privileged single spreaders, we summarize several widely used centralities, such as degree, betweenness centrality, PageRank, k-shell, etc. We investigate the empirical diffusion data in a large scale online social community—LiveJournal. With this extensive dataset, we find that various measures can convey very distinct information of nodes. Of all the users in the LiveJournal social network, only a small fraction of them are involved in spreading. For the spreading processes in LiveJournal, while degree can locate nodes participating in information diffusion with higher probability, k-shell is more effective in finding nodes with a large influence. Our results should provide useful information for designing efficient spreading strategies in reality."

--- Eh, the measure of "influence" is just the size of the reachable set. (They don't actually track the dynamics of anything.)

in_NB
networks
social_influence
have_read
re:do-institutions-evolve
--- Eh, the measure of "influence" is just the size of the reachable set. (They don't actually track the dynamics of anything.)

6 weeks ago by cshalizi

Rumor propagation with heterogeneous transmission in social networks - IOPscience

6 weeks ago by cshalizi

"Rumor models consider that information transmission occurs with the same probability between each pair of nodes. However, this assumption is not observed in social networks, which contain influential spreaders. To overcome this limitation, we assume that central individuals have a higher capacity to convince their neighbors than peripheral subjects. From extensive numerical simulations we find that spreading is improved in scale-free networks when the transmission probability is proportional to the PageRank, degree, and betweenness centrality. In addition, the results suggest that spreading can be controlled by adjusting the transmission probabilities of the most central nodes. Our results provide a conceptual framework for understanding the interplay between rumor propagation and heterogeneous transmission in social networks."

--- Preferentially suppressing the infectiousness of central nodes is very effective, whether we measure centrality by betweenness, degree or pagerank (and in particular pagerank looks a bit more effective than degree but not much)

in_NB
epidemics_on_networks
have_read
re:do-institutions-evolve
--- Preferentially suppressing the infectiousness of central nodes is very effective, whether we measure centrality by betweenness, degree or pagerank (and in particular pagerank looks a bit more effective than degree but not much)

6 weeks ago by cshalizi

Stochastic Rumours (Daley and Kendall, 1965)

6 weeks ago by cshalizi

"The superficial similarity between rumours and epidemics breaks down on closer scrutiny; a feature peculiar to the rumour-spreading situation leads to striking qualitative differences in the behaviour of the two phenomena whether one uses a stochastic model or the associated deterministic model. A preliminary account is given here of a new procedure, “the principle of the diffusion of arbitrary constants”, which can be used to study the variance of the fluctuations of the sample trajectory in the stochastic model about the unique trajectory in the associated deterministic approximation. Numerical evidence (based on Monte Carlo and other calculations) is given to illustrate the effectiveness of the “principle” in the present application."

--- The difference in the model is that they assume people stop spreading the rumor on encountering someone who's already heard it (i.e., they add reactions I+I -> 2R, I+R -> 2R). This makes it really hard for the rumor to ever reach _everyone_. I am not sure that this makes sense for all rumors, but someone version of "why say something everyone knows?" is sensible for a lot of cultural transmission.

in_NB
epidemiology_of_representations
epidemic_models
have_read
stochastic_processes
--- The difference in the model is that they assume people stop spreading the rumor on encountering someone who's already heard it (i.e., they add reactions I+I -> 2R, I+R -> 2R). This makes it really hard for the rumor to ever reach _everyone_. I am not sure that this makes sense for all rumors, but someone version of "why say something everyone knows?" is sensible for a lot of cultural transmission.

6 weeks ago by cshalizi

A comparative analysis of approaches to network-dismantling | Scientific Reports

6 weeks ago by cshalizi

"Estimating, understanding, and improving the robustness of networks has many application areas such as bioinformatics, transportation, or computational linguistics. Accordingly, with the rise of network science for modeling complex systems, many methods for robustness estimation and network dismantling have been developed and applied to real-world problems. The state-of-the-art in this field is quite fuzzy, as results are published in various domain-specific venues and using different datasets. In this study, we report, to the best of our knowledge, on the analysis of the largest benchmark regarding network dismantling. We reimplemented and compared 13 competitors on 12 types of random networks, including ER, BA, and WS, with different network generation parameters. We find that network metrics, proposed more than 20 years ago, are often non-dominating competitors, while many recently proposed techniques perform well only on specific network types. Besides the solution quality, we also investigate the execution time. Moreover, we analyze the similarity of competitors, as induced by their node rankings. We compare and validate our results on real-world networks. Our study is aimed to be a reference for selecting a network dismantling method for a given network, considering accuracy requirements and run time constraints."

in_NB
networks
have_read
re:do-institutions-evolve
6 weeks ago by cshalizi

Phys. Rev. E 65, 056109 (2002) - Attack vulnerability of complex networks

6 weeks ago by cshalizi

"We study the response of complex networks subject to attacks on vertices and edges. Several existing complex network models as well as real-world networks of scientific collaborations and Internet traffic are numerically investigated, and the network performance is quantitatively measured by the average inverse geodesic length and the size of the largest connected subgraph. For each case of attacks on vertices and edges, four different attacking strategies are used: removals by the descending order of the degree and the betweenness centrality, calculated for either the initial network or the current network during the removal procedure. It is found that the removals by the recalculated degrees and betweenness centralities are often more harmful than the attack strategies based on the initial network, suggesting that the network structure changes as important vertices or edges are removed. Furthermore, the correlation between the betweenness centrality and the degree in complex networks is studied."

--- And by "have read", I mean "have memories of this paper that are themselves old enough to vote..."

in_NB
have_read
epidemics_on_networks
re:do-institutions-evolve
--- And by "have read", I mean "have memories of this paper that are themselves old enough to vote..."

6 weeks ago by cshalizi

Phys. Rev. E 84, 061911 (2011) - Suppressing epidemics with a limited amount of immunization units

6 weeks ago by cshalizi

"The way diseases spread through schools, epidemics through countries, and viruses through the internet is crucial in determining their risk. Although each of these threats has its own characteristics, its underlying network determines the spreading. To restrain the spreading, a widely used approach is the fragmentation of these networks through immunization, so that epidemics cannot spread. Here we develop an immunization approach based on optimizing the susceptible size, which outperforms the best known strategy based on immunizing the highest-betweenness links or nodes. We find that the network's vulnerability can be significantly reduced, demonstrating this on three different real networks: the global flight network, a school friendship network, and the internet. In all cases, we find that not only is the average infection probability significantly suppressed, but also for the most relevant case of a small and limited number of immunization units the infection probability can be reduced by up to 55% ."

--- The improvements look small (but non-spurious) for the real-world networks, they get the biggest improvement for Erdos-Renyi. This suggests to me that high betweenness centrality is pretty good after all...

in_NB
have_read
epidemics_on_networks
re:do-institutions-evolve
--- The improvements look small (but non-spurious) for the real-world networks, they get the biggest improvement for Erdos-Renyi. This suggests to me that high betweenness centrality is pretty good after all...

6 weeks ago by cshalizi

People Are Less Gullible Than You Think – Reason.com

7 weeks ago by cshalizi

"We aren't gullible: By default we veer on the side of being resistant to new ideas. In the absence of the right cues, we reject messages that don't fit with our preconceived views or pre-existing plans. To persuade us otherwise takes long-established, carefully maintained trust, clearly demonstrated expertise, and sound arguments. Science, the media, and other institutions that spread accurate but often counterintuitive messages face an uphill battle, as they must transmit these messages and keep them credible along great chains of trust and argumentation. Quasi-miraculously, these chains connect us to the latest scientific discoveries and to events on the other side of the planet. We can only hope for new means of extending and strengthening these ever-fragile links."

have_read
mercier.hugo
cognition
collective_cognition
persuasion
via:?
7 weeks ago by cshalizi

After Carbon Democracy | Dissent Magazine

7 weeks ago by cshalizi

"Capitalism is at the heart of the climate challenge."

No, no, no.

(1) Look at the environmental record of the USSR, or of pre-Deng China. Soviet Earth would be facing ~ as big a climate crisis as Neoliberal Earth (only with Comrade Mann in the role of Sakharov at best).

(2) Maintaining our _current_ sized economies _with our current technologies_ would get us cooked, so it's not _economic growth_ that's the problem.

Purdy knows better.

climate_change
environmentalism
progressive_forces
have_read
honestly_disappointed
No, no, no.

(1) Look at the environmental record of the USSR, or of pre-Deng China. Soviet Earth would be facing ~ as big a climate crisis as Neoliberal Earth (only with Comrade Mann in the role of Sakharov at best).

(2) Maintaining our _current_ sized economies _with our current technologies_ would get us cooked, so it's not _economic growth_ that's the problem.

Purdy knows better.

7 weeks ago by cshalizi

Hollywood’s Next Great Studio Head Will Be a Computer

9 weeks ago by cshalizi

Evidence that data-mining social media is actually better at prediction than 1930s-vintage audience research is conspicuously absent from this.

Also, it misses the equilibrium point: suppose data-analytics firm X can improve predictions about how popular a film will be, and this would be worth $Y to a studio. A risk-neutral studio will pay up to $Y-\epsilon for this information, and be no better off. (And, of course, predictions are _also_ an experience good..)

movies
marketing
data_mining
have_read
shot_after_a_fair_trial
Also, it misses the equilibrium point: suppose data-analytics firm X can improve predictions about how popular a film will be, and this would be worth $Y to a studio. A risk-neutral studio will pay up to $Y-\epsilon for this information, and be no better off. (And, of course, predictions are _also_ an experience good..)

9 weeks ago by cshalizi

The Internet of Beefs

10 weeks ago by cshalizi

Exaggerated and one-sided, but with some elements of truth.

networked_life
social_media
cultural_criticism
have_read
10 weeks ago by cshalizi

The Secretive Company That Might End Privacy as We Know It - The New York Times

10 weeks ago by cshalizi

The only thing which is beyond my undergrad class this semester is computing the feature vectors. (And honestly I wonder how good that is.)

--- Because it's 2020: _Of course_ it's backed by a Giuliani crony who markets it to right-wing police officials. _Of course_ Thiele is involved. _Of course_ the founder appears dumbfounded when pressed on how it might be misused. _Of course_ there's no independent verification or even a notion of false positives.

privacy
information_retrieval
image_processing
pattern_recognition
have_read
to_teach:data-mining
--- Because it's 2020: _Of course_ it's backed by a Giuliani crony who markets it to right-wing police officials. _Of course_ Thiele is involved. _Of course_ the founder appears dumbfounded when pressed on how it might be misused. _Of course_ there's no independent verification or even a notion of false positives.

10 weeks ago by cshalizi

Counterexamples to "The Blessings of Multiple Causes" by Wang and Blei

10 weeks ago by cshalizi

"This brief note is meant to complement our previous comment on "The Blessings of Multiple Causes" by Wang and Blei (2019). We provide a more succinct and transparent explanation of the fact that the deconfounder does not control for multi-cause confounding. The argument given in Wang and Blei (2019) makes two mistakes: (1) attempting to infer independence conditional on one variable from independence conditional on a different, unrelated variable, and (2) attempting to infer joint independence from pairwise independence. We give two simple counterexamples to the deconfounder claim"

--- Sadly, I find this convincing. But the method often works --- so why?

to:NB
have_read
causal_inference
probability
ogburn.elizabeth
--- Sadly, I find this convincing. But the method often works --- so why?

10 weeks ago by cshalizi

[1912.02729] Rademacher complexity and spin glasses: A link between the replica and statistical theories of learning

11 weeks ago by cshalizi

"Statistical learning theory provides bounds of the generalization gap, using in particular the Vapnik-Chervonenkis dimension and the Rademacher complexity. An alternative approach, mainly studied in the statistical physics literature, is the study of generalization in simple synthetic-data models. Here we discuss the connections between these approaches and focus on the link between the Rademacher complexity in statistical learning and the theories of generalization for typical-case synthetic models from statistical physics, involving quantities known as Gardner capacity and ground state energy. We show that in these models the Rademacher complexity is closely related to the ground state energy computed by replica theories. Using this connection, one may reinterpret many results of the literature as rigorous Rademacher bounds in a variety of models in the high-dimensional statistics limit. Somewhat surprisingly, we also show that statistical learning theory provides predictions for the behavior of the ground-state energies in some full replica symmetry breaking models."

to:NB
learning_theory
statistics
have_read
krzakala.florent
Krzakala
zdeborova.lenka
11 weeks ago by cshalizi

PsyArXiv Preprints | The Generalizability Crisis

december 2019 by cshalizi

"Most theories and hypotheses in psychology are verbal in nature, yet their evaluation overwhelmingly relies on inferential statistical procedures. The validity of the move from qualitative to quantitative analysis depends on the verbal and statistical expressions of a hypothesis being closely aligned—that is, that the two must refer to roughly the same set of hypothetical observations. Here I argue that most inferential statistical tests in psychology fail to meet this basic condition. I demonstrate how foundational assumptions of the "random effects" model used pervasively in psychology impose far stronger constraints on the generalizability of results than most researchers appreciate. Ignoring these constraints dramatically inflates false positive rates and routinely leads researchers to draw sweeping verbal generalizations that lack any meaningful connection to the statistical quantities they are putatively based on. I argue that the routine failure to consider the generalizability of one's conclusions from a statistical perspective lies at the root of many of psychology's ongoing problems (e.g., the replication crisis), and conclude with a discussion of several potential avenues for improvement."

to:NB
yarkoni.tal
measurement
social_science_methodology
social_measurement
psychometrics
psychology
scientific_method
have_read
loud_and_prolonged_applause
december 2019 by cshalizi

To Chain The Beast

december 2019 by cshalizi

"There is now a booming cottage industry of work on “virtual social warfare”, “hostile social manipulation”, and similar new terms for phenomena studied under previous names in the 2000s, 1990s, and 1980s as waves of informatization created new social realities. And in turn the 2000s, 1990s, and 1980s terminology recapitulated important features of Cold War denial and deception and active measures, and so forth. One might humorously conjecture that running on a hamster wheel of terminology is in and of itself a successful information attack on our collective information processing and decision-making systems. But that is for another conversation. Under whatever name, how shall we analyze it and deal with it? A big problem is accounting for what seem like competing interpretations of the same underlying events. Are they doing it for the lulz? Is it a Russian plot? And how do we know if we’re just attributing too much signal to what could just be enormous amounts of noise?"

networked_life
deceiving_us_has_become_an_industrial_process
have_read
december 2019 by cshalizi

Planning Without Prices (G. M. Heal, 1969)

december 2019 by cshalizi

Yet Another Lange-ian Central Planning Board:

The CPB sets a utility function in terms of levels of final goods. It also allocates raw materials and intermediate goods. Every firm must report to the CPB the marginal productivity of every resource for making every good; the CPB re-allocates goods towards firms with above-average productivity --- basically gradient ascent. (There is a slight complication here to avoid negative allocations.) This converges to a stationary point of the utility function. The claimed innovations over Lange are (a) no prices, just quantities (except that the CPB needs to use partial derivatives of the utility function that act just like prices for its internal work), (b) could handle non-convexity [sort of --- it'll converge to local maxima very happily], (c) along the path to the stationary point, we always stay inside the feasible set, and (d) the utility function is increasing along the path. The author sets the most store by (c) and (d), and so I'd characterize it as kin to an interior-point method, though without (say) a constraint-enforcing barrier penalty. The informational advantage over Kantorovich-style central planning is that the CPB doesn't have to know all the production functions, it just (!) needs to know every firm's marginal productivity for each possible input, which the firm will report honestly because reasons. (The computational and political difficulties of deciding on an economy-wide utility function are as usual unaddressed.)

--- N.B., the last tag (and my emphasis on what's _not_ here) is because someone pointed me at this (and an earlier paper by Malinvaud, cited by Heal) as disposing of everything I wrote about the difficulties of central planning.

have_read
economics
optimization
distributed_systems
re:in_soviet_union_optimization_problem_solves_you
shot_after_a_fair_trial
in_NB
The CPB sets a utility function in terms of levels of final goods. It also allocates raw materials and intermediate goods. Every firm must report to the CPB the marginal productivity of every resource for making every good; the CPB re-allocates goods towards firms with above-average productivity --- basically gradient ascent. (There is a slight complication here to avoid negative allocations.) This converges to a stationary point of the utility function. The claimed innovations over Lange are (a) no prices, just quantities (except that the CPB needs to use partial derivatives of the utility function that act just like prices for its internal work), (b) could handle non-convexity [sort of --- it'll converge to local maxima very happily], (c) along the path to the stationary point, we always stay inside the feasible set, and (d) the utility function is increasing along the path. The author sets the most store by (c) and (d), and so I'd characterize it as kin to an interior-point method, though without (say) a constraint-enforcing barrier penalty. The informational advantage over Kantorovich-style central planning is that the CPB doesn't have to know all the production functions, it just (!) needs to know every firm's marginal productivity for each possible input, which the firm will report honestly because reasons. (The computational and political difficulties of deciding on an economy-wide utility function are as usual unaddressed.)

--- N.B., the last tag (and my emphasis on what's _not_ here) is because someone pointed me at this (and an earlier paper by Malinvaud, cited by Heal) as disposing of everything I wrote about the difficulties of central planning.

december 2019 by cshalizi

Online and Social Media Data As an Imperfect Continuous Panel Survey

november 2019 by cshalizi

"There is a large body of research on utilizing online activity as a survey of political opinion to predict real world election outcomes. There is considerably less work, however, on using this data to understand topic-specific interest and opinion amongst the general population and specific demographic subgroups, as currently measured by relatively expensive surveys. Here we investigate this possibility by studying a full census of all Twitter activity during the 2012 election cycle along with the comprehensive search history of a large panel of Internet users during the same period, highlighting the challenges in interpreting online and social media activity as the results of a survey. As noted in existing work, the online population is a non-representative sample of the offline world (e.g., the U.S. voting population). We extend this work to show how demographic skew and user participation is non-stationary and difficult to predict over time. In addition, the nature of user contributions varies substantially around important events. Furthermore, we note subtle problems in mapping what people are sharing or consuming online to specific sentiment or opinion measures around a particular topic. We provide a framework, built around considering this data as an imperfect continuous panel survey, for addressing these issues so that meaningful insight about public interest and opinion can be reliably extracted from online and social media data."

to:NB
have_read
social_measurement
social_science_methodology
re:social_networks_as_sensor_networks
social_media
networked_life
hofman.jake
to_teach:data-mining
november 2019 by cshalizi

The relationship between external variables and common factors | SpringerLink

november 2019 by cshalizi

"A theorem is presented which gives the range of possible correlations between a common factor and an external variable (i.e., a variable not included in the test battery factor analyzed). Analogous expressions for component (and regression component) theory are also derived. Some situations involving external correlations are then discussed which dramatize the theoretical differences between components and common factors."

in_NB
have_read
factor_analysis
inference_to_latent_objects
psychometrics
statistics
re:g_paper
november 2019 by cshalizi

Factor indeterminacy in the 1930's and the 1970's some interesting parallels | SpringerLink

november 2019 by cshalizi

"The issue of factor indeterminacy, and its meaning and significance for factor analysis, has been the subject of considerable debate in recent years. Interestingly, the identical issue was discussed widely in the literature of the late 1920's and early 1930's, but this early discussion was somehow lost or forgotten during the development and popularization of multiple factor analysis. There are strong parallels between the arguments in the early literature, and those which have appeared in recent papers. Here I review the history of this early literature, briefly survey the more recent work, and discuss these parallels where they are especially illuminating."

in_NB
psychometrics
factor_analysis
inference_to_latent_objects
have_read
a_long_time_ago
re:g_paper
november 2019 by cshalizi

Some new results on factor indeterminacy | SpringerLink

november 2019 by cshalizi

"Some relations between maximum likelihood factor analysis and factor indeterminacy are discussed. Bounds are derived for the minimum average correlation between equivalent sets of correlated factors which depend on the latent roots of the factor intercorrelation matrix ψ. Empirical examples are presented to illustrate some of the theory and indicate the extent to which it can be expected to be relevant in practice."

in_NB
have_read
a_long_time_ago
factor_analysis
low-rank_approximation
statistics
re:g_paper
november 2019 by cshalizi

Seeing Like a Finite State Machine — Crooked Timber

november 2019 by cshalizi

(The title makes me wonder what "seeing like a push-down stack machine" would entail, but well said...)

machine_learning
authoritarianism
farrell.henry
kith_and_kin
have_read
re:democratic_cognition
november 2019 by cshalizi

Random Forests | SpringerLink

november 2019 by cshalizi

"Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y. Freund & R. Schapire, Machine Learning: Proceedings of the Thirteenth International conference, ***, 148–156), but are more robust with respect to noise. Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting. Internal estimates are also used to measure variable importance. These ideas are also applicable to regression."

to:NB
have_read
breiman.leo
ensemble_methods
decision_trees
random_forests
to_teach:data-mining
machine_learning
statistics
prediction
november 2019 by cshalizi

[1911.00535] Think-aloud interviews: A tool for exploring student statistical reasoning

november 2019 by cshalizi

"As statistics educators revise introductory courses to cover new topics and reach students from more diverse academic backgrounds, they need assessments to test if new teaching strategies and new curricula are meeting their goals. But assessing student understanding of statistics concepts can be difficult: conceptual questions are difficult to write clearly, and students often interpret questions in unexpected ways and give answers for unexpected reasons. Assessment results alone also do not clearly indicate the reasons students pick specific answers.

"We describe think-aloud interviews with students as a powerful tool to ensure that draft questions fulfill their intended purpose, uncover unexpected misconceptions or surprising readings of questions, and suggest new questions or further pedagogical research. We have conducted more than 40 hour-long think-aloud interviews to develop over 50 assessment questions, and have collected pre- and post-test assessment data from hundreds of introductory statistics students at two institutions.

"Think-alouds and assessment data have helped us refine draft questions and explore student misunderstandings. Our findings include previously under-reported statistical misconceptions about sampling distributions and causation. These results suggest directions for future statistics education research and show how think-aloud interviews can be effectively used to develop assessments and improve our understanding of student learning."

to:NB
have_read
heard_the_talk
kith_and_kin
statistics
cognitive_science
education
protocol_analysis
expertise
"We describe think-aloud interviews with students as a powerful tool to ensure that draft questions fulfill their intended purpose, uncover unexpected misconceptions or surprising readings of questions, and suggest new questions or further pedagogical research. We have conducted more than 40 hour-long think-aloud interviews to develop over 50 assessment questions, and have collected pre- and post-test assessment data from hundreds of introductory statistics students at two institutions.

"Think-alouds and assessment data have helped us refine draft questions and explore student misunderstandings. Our findings include previously under-reported statistical misconceptions about sampling distributions and causation. These results suggest directions for future statistics education research and show how think-aloud interviews can be effectively used to develop assessments and improve our understanding of student learning."

november 2019 by cshalizi

[1911.02656] Invariance and identifiability issues for word embeddings

november 2019 by cshalizi

"Word embeddings are commonly obtained as optimizers of a criterion function f of a text corpus, but assessed on word-task performance using a different evaluation function g of the test data. We contend that a possible source of disparity in performance on tasks is the incompatibility between classes of transformations that leave f and g invariant. In particular, word embeddings defined by f are not unique; they are defined only up to a class of transformations to which f is invariant, and this class is larger than the class to which g is invariant. One implication of this is that the apparent superiority of one word embedding over another, as measured by word task performance, may largely be a consequence of the arbitrary elements selected from the respective solution sets. We provide a formal treatment of the above identifiability issue, present some numerical examples, and discuss possible resolutions."

to:NB
word_embeddings
text_mining
natural_language_processing
model_selection
to_teach:data-mining
have_read
linear_algebra
oopsies
november 2019 by cshalizi

[1911.02639] Word Embedding Algorithms as Generalized Low Rank Models and their Canonical Form

november 2019 by cshalizi

"Word embedding algorithms produce very reliable feature representations of words that are used by neural network models across a constantly growing multitude of NLP tasks. As such, it is imperative for NLP practitioners to understand how their word representations are produced, and why they are so impactful.

"The present work presents the Simple Embedder framework, generalizing the state-of-the-art existing word embedding algorithms (including Word2vec (SGNS) and GloVe) under the umbrella of generalized low rank models. We derive that both of these algorithms attempt to produce embedding inner products that approximate pointwise mutual information (PMI) statistics in the corpus. Once cast as Simple Embedders, comparison of these models reveals that these successful embedders all resemble a straightforward maximum likelihood estimate (MLE) of the PMI parametrized by the inner product (between embeddings). This MLE induces our proposed novel word embedding model, Hilbert-MLE, as the canonical representative of the Simple Embedder framework.

"We empirically compare these algorithms with evaluations on 17 different datasets. Hilbert-MLE consistently observes second-best performance on every extrinsic evaluation (news classification, sentiment analysis, POS-tagging, and supersense tagging), while the first-best model depends varying on the task. Moreover, Hilbert-MLE consistently observes the least variance in results with respect to the random initialization of the weights in bidirectional LSTMs. Our empirical results demonstrate that Hilbert-MLE is a very consistent word embedding algorithm that can be reliably integrated into existing NLP systems to obtain high-quality results."

to:NB
have_read
text_mining
natural_language_processing
word_embeddings
information_theory
to_teach:data-mining
low-rank_approximation
"The present work presents the Simple Embedder framework, generalizing the state-of-the-art existing word embedding algorithms (including Word2vec (SGNS) and GloVe) under the umbrella of generalized low rank models. We derive that both of these algorithms attempt to produce embedding inner products that approximate pointwise mutual information (PMI) statistics in the corpus. Once cast as Simple Embedders, comparison of these models reveals that these successful embedders all resemble a straightforward maximum likelihood estimate (MLE) of the PMI parametrized by the inner product (between embeddings). This MLE induces our proposed novel word embedding model, Hilbert-MLE, as the canonical representative of the Simple Embedder framework.

"We empirically compare these algorithms with evaluations on 17 different datasets. Hilbert-MLE consistently observes second-best performance on every extrinsic evaluation (news classification, sentiment analysis, POS-tagging, and supersense tagging), while the first-best model depends varying on the task. Moreover, Hilbert-MLE consistently observes the least variance in results with respect to the random initialization of the weights in bidirectional LSTMs. Our empirical results demonstrate that Hilbert-MLE is a very consistent word embedding algorithm that can be reliably integrated into existing NLP systems to obtain high-quality results."

november 2019 by cshalizi

[1412.4643] Wrong side of the tracks: Big Data and Protected Categories

november 2019 by cshalizi

"When we use machine learning for public policy, we find that many useful variables are associated with others on which it would be ethically problematic to base decisions. This problem becomes particularly acute in the Big Data era, when predictions are often made in the absence of strong theories for underlying causal mechanisms. We describe the dangers to democratic decision-making when high-performance algorithms fail to provide an explicit account of causation. We then demonstrate how information theory allows us to degrade predictions so that they decorrelate from protected variables with minimal loss of accuracy. Enforcing total decorrelation is at best a near-term solution, however. The role of causal argument in ethical debate urges the development of new, interpretable machine-learning algorithms that reference causal mechanisms."

in_NB
have_read
algorithmic_fairness
information_theory
kith_and_kin
dedeo.simon
to_teach:data-mining
re:prediction_without_racism
to_teach:statistics_of_inequality_and_discrimination
november 2019 by cshalizi

Beyond Social Contagion: Associative Diffusion and the Emergence of Cultural Variation - Amir Goldberg, Sarah K. Stein, 2018

november 2019 by cshalizi

"Network models of diffusion predominantly think about cultural variation as a product of social contagion. But culture does not spread like a virus. We propose an alternative explanation we call associative diffusion. Drawing on two insights from research in cognition—that meaning inheres in cognitive associations between concepts, and that perceived associations constrain people’s actions—we introduce a model in which, rather than beliefs or behaviors, the things being transmitted between individuals are perceptions about what beliefs or behaviors are compatible with one another. Conventional contagion models require the assumption that networks are segregated to explain cultural variation. We show, in contrast, that the endogenous emergence of cultural differentiation can be entirely attributable to social cognition and does not require a segregated network or a preexisting division into groups. Moreover, we show that prevailing assumptions about the effects of network topology do not hold when diffusion is associative."

--- Preprint version: https://web.stanford.edu/~amirgo/docs/beyond.pdf

(I'm not sure that this _is_ really an alternative explanation. Or, rather, it would be an explanation for cultural polarization wtihin a densely-connected community, but not an explanation for associations between cultural traits and social identities. Also, I think their conclusion that small-world networks lead to less "meaningful" cultural differentiation than do scale-free networks may be an artifact of the way they're using mutual information. If there was one community and everyone in it enacted the same practices, they'd get an MI of 0, but that wouldn't make them meaningless....)

to:NB
social_influence
contagion
homophily
cultural_transmission
cultural_differences
sociology
re:do-institutions-evolve
have_read
--- Preprint version: https://web.stanford.edu/~amirgo/docs/beyond.pdf

(I'm not sure that this _is_ really an alternative explanation. Or, rather, it would be an explanation for cultural polarization wtihin a densely-connected community, but not an explanation for associations between cultural traits and social identities. Also, I think their conclusion that small-world networks lead to less "meaningful" cultural differentiation than do scale-free networks may be an artifact of the way they're using mutual information. If there was one community and everyone in it enacted the same practices, they'd get an MI of 0, but that wouldn't make them meaningless....)

november 2019 by cshalizi

Reducing Coastal Risk on the East and Gulf Coasts | The National Academies Press

october 2019 by cshalizi

"Hurricane- and coastal-storm-related losses have increased substantially during the past century, largely due to increases in population and development in the most susceptible coastal areas. Climate change poses additional threats to coastal communities from sea level rise and possible increases in strength of the largest hurricanes. Several large cities in the United States have extensive assets at risk to coastal storms, along with countless smaller cities and developed areas. The devastation from Superstorm Sandy has heightened the nation's awareness of these vulnerabilities. What can we do to better prepare for and respond to the increasing risks of loss?

"Reducing Coastal Risk on the East and Gulf Coasts reviews the coastal risk-reduction strategies and levels of protection that have been used along the United States East and Gulf Coasts to reduce the impacts of coastal flooding associated with storm surges. This report evaluates their effectiveness in terms of economic return, protection of life safety, and minimization of environmental effects. According to this report, the vast majority of the funding for coastal risk-related issues is provided only after a disaster occurs. This report calls for the development of a national vision for coastal risk management that includes a long-term view, regional solutions, and recognition of the full array of economic, social, environmental, and life-safety benefits that come from risk reduction efforts. To support this vision, Reducing Coastal Risk states that a national coastal risk assessment is needed to identify those areas with the greatest risks that are high priorities for risk reduction efforts. The report discusses the implications of expanding the extent and levels of coastal storm surge protection in terms of operation and maintenance costs and the availability of resources.

"Reducing Coastal Risk recommends that benefit-cost analysis, constrained by acceptable risk criteria and other important environmental and social factors, be used as a framework for evaluating national investments in coastal risk reduction. The recommendations of this report will assist engineers, planners and policy makers at national, regional, state, and local levels to move from a nation that is primarily reactive to coastal disasters to one that invests wisely in coastal risk reduction and builds resilience among coastal communities."

to:NB
books:noted
downloaded
climate_change
disasters
re:coastal_risks
have_read
"Reducing Coastal Risk on the East and Gulf Coasts reviews the coastal risk-reduction strategies and levels of protection that have been used along the United States East and Gulf Coasts to reduce the impacts of coastal flooding associated with storm surges. This report evaluates their effectiveness in terms of economic return, protection of life safety, and minimization of environmental effects. According to this report, the vast majority of the funding for coastal risk-related issues is provided only after a disaster occurs. This report calls for the development of a national vision for coastal risk management that includes a long-term view, regional solutions, and recognition of the full array of economic, social, environmental, and life-safety benefits that come from risk reduction efforts. To support this vision, Reducing Coastal Risk states that a national coastal risk assessment is needed to identify those areas with the greatest risks that are high priorities for risk reduction efforts. The report discusses the implications of expanding the extent and levels of coastal storm surge protection in terms of operation and maintenance costs and the availability of resources.

"Reducing Coastal Risk recommends that benefit-cost analysis, constrained by acceptable risk criteria and other important environmental and social factors, be used as a framework for evaluating national investments in coastal risk reduction. The recommendations of this report will assist engineers, planners and policy makers at national, regional, state, and local levels to move from a nation that is primarily reactive to coastal disasters to one that invests wisely in coastal risk reduction and builds resilience among coastal communities."

october 2019 by cshalizi

The Varieties Of The Technological Control Problem

october 2019 by cshalizi

My own take appears here (by link) towards the end, but it's nonetheless very good.

(My take is of course very much indebted to Wiener; note in particular the last word of his title, _God and Golem, Inc._, and his interest in _social_ systems as cybernetic systems.)

cybernetics
artificial_intelligence
autonomous_technics
wiener.norbert
the_nightmare_from_which_we_are_trying_to_awake
have_read
to:blog
(My take is of course very much indebted to Wiener; note in particular the last word of his title, _God and Golem, Inc._, and his interest in _social_ systems as cybernetic systems.)

october 2019 by cshalizi

[1910.08350] A Mutual Information Maximization Perspective of Language Representation Learning

october 2019 by cshalizi

"We show state-of-the-art word representation learning methods maximize an objective function that is a lower bound on the mutual information between different parts of a word sequence (i.e., a sentence). Our formulation provides an alternative perspective that unifies classical word embedding models (e.g., Skip-gram) and modern contextual embeddings (e.g., BERT, XLNet). In addition to enhancing our theoretical understanding of these methods, our derivation leads to a principled framework that can be used to construct new self-supervised tasks. We provide an example by drawing inspirations from related methods based on mutual information maximization that have been successful in computer vision, and introduce a simple self-supervised objective that maximizes the mutual information between a global sentence representation and n-grams in the sentence. Our analysis offers a holistic view of representation learning methods to transfer knowledge and translate progress across multiple domains (e.g., natural language processing, computer vision, audio processing)."

--- This would have been very useful to read _before_ explaining word2vec et al. to The Kids yesterday.

to:NB
have_read
information_theory
natural_language_processing
text_mining
to_teach:data-mining
--- This would have been very useful to read _before_ explaining word2vec et al. to The Kids yesterday.

october 2019 by cshalizi

[1910.05438] Comment on "Blessings of Multiple Causes"

october 2019 by cshalizi

"The premise of the deconfounder method proposed in "Blessings of Multiple Causes" by Wang and Blei, namely that a variable that renders multiple causes conditionally independent also controls for unmeasured multi-cause confounding, is incorrect. This can be seen by noting that no fact about the observed data alone can be informative about ignorability, since ignorability is compatible with any observed data distribution. Methods to control for unmeasured confounding may be valid with additional assumptions in specific settings, but they cannot, in general, provide a checkable approach to causal inference, and they do not, in general, require weaker assumptions than the assumptions that are commonly used for causal inference. While this is outside the scope of this comment, we note that much recent work on applying ideas from latent variable modeling to causal inference problems suffers from similar issues."

--- I need to sort out which side I agree with here...

to:NB
have_read
causal_inference
factor_analysis
statistics
kith_and_kin
shpitser.ilya
ogburn.elizabeth
--- I need to sort out which side I agree with here...

october 2019 by cshalizi

[1910.06386] All of Linear Regression

october 2019 by cshalizi

"Least squares linear regression is one of the oldest and widely used data analysis tools. Although the theoretical analysis of the ordinary least squares (OLS) estimator is as old, several fundamental questions are yet to be answered. Suppose regression observations (X1,Y1),…,(Xn,Yn)∈ℝd×ℝ (not necessarily independent) are available. Some of the questions we deal with are as follows: under what conditions, does the OLS estimator converge and what is the limit? What happens if the dimension is allowed to grow with n? What happens if the observations are dependent with dependence possibly strengthening with n? How to do statistical inference under these kinds of misspecification? What happens to the OLS estimator under variable selection? How to do inference under misspecification and variable selection?

"We answer all the questions raised above with one simple deterministic inequality which holds for any set of observations and any sample size. This implies that all our results are a finite sample (non-asymptotic) in nature. In the end, one only needs to bound certain random quantities under specific settings of interest to get concrete rates and we derive these bounds for the case of independent observations. In particular, the problem of inference after variable selection is studied, for the first time, when d, the number of covariates increases (almost exponentially) with sample size n. We provide comments on the ``right'' statistic to consider for inference under variable selection and efficient computation of quantiles."

to:NB
regression
statistics
have_read
re:TALR
to_teach:linear_models
"We answer all the questions raised above with one simple deterministic inequality which holds for any set of observations and any sample size. This implies that all our results are a finite sample (non-asymptotic) in nature. In the end, one only needs to bound certain random quantities under specific settings of interest to get concrete rates and we derive these bounds for the case of independent observations. In particular, the problem of inference after variable selection is studied, for the first time, when d, the number of covariates increases (almost exponentially) with sample size n. We provide comments on the ``right'' statistic to consider for inference under variable selection and efficient computation of quantiles."

october 2019 by cshalizi

Personality and fatal diseases: Revisiting a scientific scandal - Anthony J Pelosi, 2019

october 2019 by cshalizi

"During the 1980s and 1990s, Hans J Eysenck conducted a programme of research into the causes, prevention and treatment of fatal diseases in collaboration with one of his protégés, Ronald Grossarth-Maticek. This led to what must be the most astonishing series of findings ever published in the peer-reviewed scientific literature with effect sizes that have never otherwise been encounterered in biomedical research. This article outlines just some of these reported findings and signposts readers to extremely serious scientific and ethical criticisms that were published almost three decades ago. Confidential internal documents that have become available as a result of litigation against tobacco companies provide additional insights into this work. It is suggested that this research programme has led to one of the worst scientific scandals of all time. A call is made for a long overdue formal inquiry."

--- But everything he did on IQ is scientifically unimpeachable, I'm sure.

to:NB
have_read
eysneck.hans_j.
utter_stupidity
bad_science
epidemiology
psychology
--- But everything he did on IQ is scientifically unimpeachable, I'm sure.

october 2019 by cshalizi

The Style Maven Astrophysicists of Silicon Valley | WIRED

october 2019 by cshalizi

"Understanding latent style involves other physics principles too. Moody’s team uses something called eigenvector decomposition, a concept from quantum mechanics, to tease apart the overlapping “notes” in an individual’s style, sort of like “plucking a guitar string and listening for the multiple notes overlayed.” "

--- Oh for crying out loud. I like to think that this is the journalist's cluelessness, rather than the ex-physicist's.

have_read
data_mining
physics
principal_components
utter_stupidity
singular_value_decomposition_rules_everything_around_me
to_teach:data-mining
fashion
--- Oh for crying out loud. I like to think that this is the journalist's cluelessness, rather than the ex-physicist's.

october 2019 by cshalizi

[1901.00403] Can You Trust This Prediction? Auditing Pointwise Reliability After Learning

september 2019 by cshalizi

"To use machine learning in high stakes applications (e.g. medicine), we need tools for building confidence in the system and evaluating whether it is reliable. Methods to improve model reliability often require new learning algorithms (e.g. using Bayesian inference to obtain uncertainty estimates). An alternative is to audit a model after it is trained. In this paper, we describe resampling uncertainty estimation (RUE), an algorithm to audit the pointwise reliability of predictions. Intuitively, RUE estimates the amount that a prediction would change if the model had been fit on different training data. The algorithm uses the gradient and Hessian of the model's loss function to create an ensemble of predictions. Experimentally, we show that RUE more effectively detects inaccurate predictions than existing tools for auditing reliability subsequent to training. We also show that RUE can create predictive distributions that are competitive with state-of-the-art methods like Monte Carlo dropout, probabilistic backpropagation, and deep ensembles, but does not depend on specific algorithms at train-time like these methods do."

--- I haven't read the paper, but I am going to now use this box to sketch how an idiot would tackle this problem. (I do not mean that the authors are idiots.) Since we're fitting our abyssal learning system by optimizing some loss function, the usual asymptotics for minimization apply (http://bactra.org/weblog/1017.html), and the variance matrix of the parameters $\theta$ is ($n$ times) the sandwich covariance matrix $h^{-1} j h^{-1}$, where $h$ is the Hessian of the loss function and $j$ is the covariance matrix of the gradient. Now the prediction we make at point $x$ is $f(x;\theta)$. This has some gradient w.r.t. the parameters at the point estimate, say $g(x)$. Taylor-expand the prediction around the point estimate, stopping at first order. Applying the usual algebra for variances tells us the variance of the prediction will be $g(x) \cdot n^{-1} h^{-1} j h^{-1} g(x)$. This --- linearization plus variance algebra --- is "propagation of error" or "the delta method".

I am now going to make two predictions about the paper, which I have not read:

(1) The bit about "gradient and Hessian" in the abstract is a sign that they're talking about the sandwich covariance matrix.

(2) Their uncertainties-in-predictions are either propagation-of-error variances, _or_ they do not compare to to them.

If, on reading, I am wrong about either prediction, I will eat my crow here.

--- ETA after reading: OK, I need to eat a _little_ crow. They assume the loss is a sum of IID point-by-point terms, meaning the gradient is too, and so the over-all loss gradient can be written as a sum of point-wise gradients, say $l_1, \ldots l_n$. They then sample points with replacement (as in the bootstrap), and perturb the parameter estimate by a first-order Taylor series using the appropriate $l_i$'s. (I'm not 100% sold on this step --- given that the influence of any one data point on the parameter estimate is small, still, replacing 1/3 of them isn't necessarily a local perturbation.) Then they repredict with the new parameters, and take the variances of the repredictions over many resamplings. (I don't see why --- they could just get a confidence interval for each prediction.)

to:NB
prediction
statistics
halbert_white_died_for_your_sins
via:arsyed
have_read
uncertainty_for_neural_networks
--- I haven't read the paper, but I am going to now use this box to sketch how an idiot would tackle this problem. (I do not mean that the authors are idiots.) Since we're fitting our abyssal learning system by optimizing some loss function, the usual asymptotics for minimization apply (http://bactra.org/weblog/1017.html), and the variance matrix of the parameters $\theta$ is ($n$ times) the sandwich covariance matrix $h^{-1} j h^{-1}$, where $h$ is the Hessian of the loss function and $j$ is the covariance matrix of the gradient. Now the prediction we make at point $x$ is $f(x;\theta)$. This has some gradient w.r.t. the parameters at the point estimate, say $g(x)$. Taylor-expand the prediction around the point estimate, stopping at first order. Applying the usual algebra for variances tells us the variance of the prediction will be $g(x) \cdot n^{-1} h^{-1} j h^{-1} g(x)$. This --- linearization plus variance algebra --- is "propagation of error" or "the delta method".

I am now going to make two predictions about the paper, which I have not read:

(1) The bit about "gradient and Hessian" in the abstract is a sign that they're talking about the sandwich covariance matrix.

(2) Their uncertainties-in-predictions are either propagation-of-error variances, _or_ they do not compare to to them.

If, on reading, I am wrong about either prediction, I will eat my crow here.

--- ETA after reading: OK, I need to eat a _little_ crow. They assume the loss is a sum of IID point-by-point terms, meaning the gradient is too, and so the over-all loss gradient can be written as a sum of point-wise gradients, say $l_1, \ldots l_n$. They then sample points with replacement (as in the bootstrap), and perturb the parameter estimate by a first-order Taylor series using the appropriate $l_i$'s. (I'm not 100% sold on this step --- given that the influence of any one data point on the parameter estimate is small, still, replacing 1/3 of them isn't necessarily a local perturbation.) Then they repredict with the new parameters, and take the variances of the repredictions over many resamplings. (I don't see why --- they could just get a confidence interval for each prediction.)

september 2019 by cshalizi

What College Admissions Offices Really Want - The New York Times

september 2019 by cshalizi

I would be very interested to know how CMU's admissions office navigates this. (Also: how good are those models?)

education
academia
class_struggles_in_america
have_read
september 2019 by cshalizi

[1908.04358] Graph hierarchy and spread of infections

september 2019 by cshalizi

"Trophic levels and hence trophic coherence can be defined only on networks with well defined sources, trophic analysis of networks had been restricted to the ecological domain until now. Trophic coherence, a measure of a network's hierarchical organisation, has been shown to be linked to a network's structural and dynamical aspects. In this paper we introduce hierarchical levels, which is a generalisation of trophic levels, that can be defined on any simple graph and we interpret it as a network influence metric. We discuss how our generalisation relates to the previous definition and what new insights our generalisation shines on the topological and dynamical aspects of networks. We also show that the mean of hierarchical differences correlates strongly with the topology of the graph. Finally, we model an epidemiological dynamics and show how the statistical properties of hierarchical differences relate to the incidence rate and how it affects the spreading process in a SIS model."

in_NB
epidemics_on_networks
re:do-institutions-evolve
have_read
shot_after_a_fair_trial
september 2019 by cshalizi

[1906.00232] Kernel Instrumental Variable Regression

september 2019 by cshalizi

"Instrumental variable regression is a strategy for learning causal relationships in observational data. If measurements of input X and output Y are confounded, the causal relationship can nonetheless be identified if an instrumental variable Z is available that influences X directly, but is conditionally independent of Y given X and the unmeasured confounder. The classic two-stage least squares algorithm (2SLS) simplifies the estimation problem by modeling all relationships as linear functions. We propose kernel instrumental variable regression (KIV), a nonparametric generalization of 2SLS, modeling relations among X, Y, and Z as nonlinear functions in reproducing kernel Hilbert spaces (RKHSs). We prove the consistency of KIV under mild assumptions, and derive conditions under which the convergence rate achieves the minimax optimal rate for unconfounded, one-stage RKHS regression. In doing so, we obtain an efficient ratio between training sample sizes used in the algorithm's first and second stages. In experiments, KIV outperforms state of the art alternatives for nonparametric instrumental variable regression. Of independent interest, we provide a more general theory of conditional mean embedding regression in which the RKHS has infinite dimension."

to:NB
instrumental_variables
kernel_estimators
regression
nonparametrics
causal_inference
statistics
re:ADAfaEPoV
have_read
september 2019 by cshalizi

On Whorfian Socioeconomics by Thomas B. Pepinsky :: SSRN

september 2019 by cshalizi

"Whorfian socioeconomics is an emerging interdisciplinary field of study that holds that linguistic structures explain differences in beliefs, values, and opinions across communities. Its core empirical strategy is to document a correlation between the presence or absence of a linguistic feature in a survey respondent’s language, and her/his responses to survey questions. This essay demonstrates — using the universe of linguistic features from the World Atlas of Language Structures and a wide array of responses from the World Values Survey — that such an approach produces highly statistically significant correlations in a majority of analyses, irrespective of the theoretical plausibility linking linguistic features to respondent beliefs. These results raise the possibility that correlations between linguistic features and survey responses are actually spurious. The essay concludes by showing how two simple and well-understood statistical fixes can more accurately reflect uncertainty in these analyses, reducing the temptation for analysts to create implausible Whorfian theories to explain spurious linguistic correlations."

in_NB
linguistics
economics
social_science_methodology
pepinsky.thomas_b.
debunking
evisceration
have_read
to_teach:linear_models
have_sent_gushing_fanmail
to:blog
to_teach:data_over_space_and_time
september 2019 by cshalizi

[1909.02330] McDiarmid-Type Inequalities for Graph-Dependent Variables and Stability Bounds

september 2019 by cshalizi

"A crucial assumption in most statistical learning theory is that samples are independently and identically distributed (i.i.d.). However, for many real applications, the i.i.d. assumption does not hold. We consider learning problems in which examples are dependent and their dependency relation is characterized by a graph. To establish algorithm-dependent generalization theory for learning with non-i.i.d. data, we first prove novel McDiarmid-type concentration inequalities for Lipschitz functions of graph-dependent random variables. We show that concentration relies on the forest complexity of the graph, which characterizes the strength of the dependency. We demonstrate that for many types of dependent data, the forest complexity is small and thus implies good concentration. Based on our new inequalities we are able to build stability bounds for learning from graph-dependent data."

to:NB
learning_theory
dependence_measures
have_read
september 2019 by cshalizi

BishopBlog: Responding to the replication crisis: reflections on Metascience2019

september 2019 by cshalizi

"Another major concern I had was the widespread reliance on proxy indicators of research quality. One talk that exemplified this was Yang Yang's presentation on machine intelligence approaches to predicting replicability of studies. He started by noting that non-replicable results get cited just as much as replicable ones: a depressing finding indeed, and one that motivated the study he reported. His talk was clever at many levels. It was ingenious to use the existing results from the Reproducibility Project as a database that could be mined to identify characteristics of results that replicated. I'm not qualified to comment on the machine learning approach, which involved using ngrams extracted from texts to predict a binary category of replicable or not. But implicit in this study was the idea that the results from this exercise could be useful in future in helping us identify, just on the basis of textual analysis, which studies were likely to be replicable.

"Now, this seems misguided on several levels. For a start, as we know from the field of medical screening, the usefulness of a screening test depends on the base rate of the condition you are screening for, the extent to which the sample you develop the test on is representative of the population, and the accuracy of prediction. I would be frankly amazed if the results of this exercise yielded a useful screener. But even if they did, then Goodhart's law would kick in: as soon as researchers became aware that there was a formula being used to predict how replicable their research was, they'd write their papers in a way that would maximise their score. One can even imagine whole new companies springing up who would take your low-scoring research paper and, for a price, revise it to get a better score. I somehow don't think this would benefit science. In defence of this approach, it was argued that it would allow us to identify characteristics of replicable work, and encourage people to emulate these. But this seems back-to-front logic. Why try to optimise an indirect, weak proxy for what makes good science (ngram characteristics of the write-up) rather than optimising, erm, good scientific practices."

text_mining
have_read
track_down_references
reproducibility
science_as_a_social_process
"Now, this seems misguided on several levels. For a start, as we know from the field of medical screening, the usefulness of a screening test depends on the base rate of the condition you are screening for, the extent to which the sample you develop the test on is representative of the population, and the accuracy of prediction. I would be frankly amazed if the results of this exercise yielded a useful screener. But even if they did, then Goodhart's law would kick in: as soon as researchers became aware that there was a formula being used to predict how replicable their research was, they'd write their papers in a way that would maximise their score. One can even imagine whole new companies springing up who would take your low-scoring research paper and, for a price, revise it to get a better score. I somehow don't think this would benefit science. In defence of this approach, it was argued that it would allow us to identify characteristics of replicable work, and encourage people to emulate these. But this seems back-to-front logic. Why try to optimise an indirect, weak proxy for what makes good science (ngram characteristics of the write-up) rather than optimising, erm, good scientific practices."

september 2019 by cshalizi

[1908.09635] A Survey on Bias and Fairness in Machine Learning

august 2019 by cshalizi

"With the widespread use of AI systems and applications in our everyday lives, it is important to take fairness issues into consideration while designing and engineering these types of systems. Such systems can be used in many sensitive environments to make important and life-changing decisions; thus, it is crucial to ensure that the decisions do not reflect discriminatory behavior toward certain groups or populations. We have recently seen work in machine learning, natural language processing, and deep learning that addresses such challenges in different subdomains. With the commercialization of these systems, researchers are becoming aware of the biases that these applications can contain and have attempted to address them. In this survey we investigated different real-world applications that have shown biases in various ways, and we listed different sources of biases that can affect AI applications. We then created a taxonomy for fairness definitions that machine learning researchers have defined in order to avoid the existing bias in AI systems. In addition to that, we examined different domains and subdomains in AI showing what researchers have observed with regard to unfair outcomes in the state-of-the-art methods and how they have tried to address them. There are still many future directions and solutions that can be taken to mitigate the problem of bias in AI systems. We are hoping that this survey will motivate researchers to tackle these issues in the near future by observing existing work in their respective fields."

in_NB
algorithmic_fairness
prediction
machine_learning
lerman.kristina
galstyan.aram
to_teach:data-mining
have_read
to_teach:statistics_of_inequality_and_discrimination
august 2019 by cshalizi

[1908.08741] A relation between log-likelihood and cross-validation log-scores

august 2019 by cshalizi

"It is shown that the log-likelihood of a hypothesis or model given some data is equivalent to an average of all leave-one-out cross-validation log-scores that can be calculated from all subsets of the data. This relation can be generalized to any k-fold cross-validation log-scores."

--- This sounds funny, because leave-one-out is (asymptotically) equivalent to the robustified AIC (= Takeuchi information criterion).

--- ETA after reading: The algebra looks legit, but kinda pointless.

statistics
likelihood
cross-validation
have_read
shot_after_a_fair_trial
not_worth_putting_in_notebooks
--- This sounds funny, because leave-one-out is (asymptotically) equivalent to the robustified AIC (= Takeuchi information criterion).

--- ETA after reading: The algebra looks legit, but kinda pointless.

august 2019 by cshalizi

[1908.06319] Locally Linear Embedding and fMRI feature selection in psychiatric classification

august 2019 by cshalizi

"Background: Functional magnetic resonance imaging (fMRI) provides non-invasive measures of neuronal activity using an endogenous Blood Oxygenation-Level Dependent (BOLD) contrast. This article introduces a nonlinear dimensionality reduction (Locally Linear Embedding) to extract informative measures of the underlying neuronal activity from BOLD time-series. The method is validated using the Leave-One-Out-Cross-Validation (LOOCV) accuracy of classifying psychiatric diagnoses using resting-state and task-related fMRI. Methods: Locally Linear Embedding of BOLD time-series (into each voxel's respective tensor) was used to optimise feature selection. This uses Gauß' Principle of Least Constraint to conserve quantities over both space and time. This conservation was assessed using LOOCV to greedily select time points in an incremental fashion on training data that was categorised in terms of psychiatric diagnoses. Findings: The embedded fMRI gave highly diagnostic performances (> 80%) on eleven publicly-available datasets containing healthy controls and patients with either Schizophrenia, Attention-Deficit Hyperactivity Disorder (ADHD), or Autism Spectrum Disorder (ASD). Furthermore, unlike the original fMRI data before or after using Principal Component Analysis (PCA) for artefact reduction, the embedded fMRI furnished significantly better than chance classification (defined as the majority class proportion) on ten of eleven datasets. Interpretation: Locally Linear Embedding appears to be a useful feature extraction procedure that retains important information about patterns of brain activity distinguishing among psychiatric cohorts."

--- Last tag is because I plan to teach LLE and this might make a good example or assignment, if I like how it was actually done.

--- ETA: It's... not horrible (though the writing is bad and far too pretentious), but not very insightful, and too complicated to make a good teaching example.

to:NB
locally_linear_embedding
classifiers
fmri
dimension_reduction
have_read
to_teach:data-mining
--- Last tag is because I plan to teach LLE and this might make a good example or assignment, if I like how it was actually done.

--- ETA: It's... not horrible (though the writing is bad and far too pretentious), but not very insightful, and too complicated to make a good teaching example.

august 2019 by cshalizi

[1302.0890] Local Log-linear Models for Capture-Recapture

august 2019 by cshalizi

"Log-linear models are often used to estimate the size of a closed population using capture-recapture data. When capture probabilities are related to auxiliary covariates, one may select a separate model based on each of several post-strata. We extend post-stratification to its logical extreme by selecting a local log-linear model for each observed unit, while smoothing to achieve stability. Our local models serve a dual purpose: In addition to estimating the size of the population, we estimate the rate of missingness as a function of covariates. A simulation demonstrates the superiority of our method when the generating model varies over the covariate space. Data from the Breeding Bird Survey is used to illustrate the method."

--- When did the title change from "Smooth Poststratification"?

to:NB
have_read
surveys
smoothing
statistics
estimation
kurtz.zachary
kith_and_kin
--- When did the title change from "Smooth Poststratification"?

august 2019 by cshalizi

[1908.06456] Harmonic Analysis of Symmetric Random Graphs

august 2019 by cshalizi

"Following Ressel (1985,2008) this note attempts to understand graph limits (Lovasz and Szegedy 2006} in terms of harmonic analysis on semigroups (Berg et al. 1984), thereby providing an alternative derivation of de Finetti's theorem for random exchangeable graphs."

--- SL has been hinting about this for years (it's the natural combination of his 70s--80s work on "extremal point" models, sufficiency, and semi-groups with his recent interest in graph limits and graphons), so I'm very excited to read this.

--- ETA after reading: It's everything one might hope; isomorphism classes of graphs show up as the natural sufficient statistics in a generalized exponential family, etc.

in_NB
have_read
graph_limits
analysis
probability
lauritzen.steffen
--- SL has been hinting about this for years (it's the natural combination of his 70s--80s work on "extremal point" models, sufficiency, and semi-groups with his recent interest in graph limits and graphons), so I'm very excited to read this.

--- ETA after reading: It's everything one might hope; isomorphism classes of graphs show up as the natural sufficient statistics in a generalized exponential family, etc.

august 2019 by cshalizi

[1901.00555] An Introductory Guide to Fano's Inequality with Applications in Statistical Estimation

august 2019 by cshalizi

"Information theory plays an indispensable role in the development of algorithm-independent impossibility results, both for communication problems and for seemingly distinct areas such as statistics and machine learning. While numerous information-theoretic tools have been proposed for this purpose, the oldest one remains arguably the most versatile and widespread: Fano's inequality. In this chapter, we provide a survey of Fano's inequality and its variants in the context of statistical estimation, adopting a versatile framework that covers a wide range of specific problems. We present a variety of key tools and techniques used for establishing impossibility results via this approach, and provide representative examples covering group testing, graphical model selection, sparse linear regression, density estimation, and convex optimization."

in_NB
information_theory
minimax
statistics
estimation
have_read
re:HEAS
august 2019 by cshalizi

Back to the Future: Review of Bit by Bit by Matt Salganik

august 2019 by cshalizi

"When I heard a few years ago that Salganik was writing a textbook, I was surprised and a little disappointed that this would be a distraction from his cutting edge research in areas like information cascades and respondent driven sampling. I was a fool. Just as chapter 5 of the book describes how computational approaches can enable mass collaboration on research projects by spreading the work from credentialed experts to masses of people with low or unkown skill, Bit by Bit itself will do more for computational social science by spreading the heretofore tacit knowledge of the field than a top researcher could accomplish directly. I strongly recommend Bit by Bit and fully expect it will be the standard methods textbook for computational social science until advances in the field render it dated. If we are lucky, we will benefit from a new edition every five to ten years so the book can keep pace with a rapidly evolving field. However for now it is incredibly current and I highly recommend it to any social scientist who teaches, practices, or aspires to practice or even just understand computational social science."

book_reviews
have_read
social_science_methodology
sociology
rossman.gabriel
august 2019 by cshalizi

Confabulation in the humanities - Matthew Lincoln, PhD

august 2019 by cshalizi

Now, realize that this doesn't _just_ apply to interpreting quantitative analyses, but also to more traditionally-humanistic explanations...

data_analysis
humanities
everything_is_obvious_once_you_know_the_answer
to_teach
via:?
have_read
august 2019 by cshalizi

[1901.10861] A Simple Explanation for the Existence of Adversarial Examples with Small Hamming Distance

august 2019 by cshalizi

"The existence of adversarial examples in which an imperceptible change in the input can fool well trained neural networks was experimentally discovered by Szegedy et al in 2013, who called them "Intriguing properties of neural networks". Since then, this topic had become one of the hottest research areas within machine learning, but the ease with which we can switch between any two decisions in targeted attacks is still far from being understood, and in particular it is not clear which parameters determine the number of input coordinates we have to change in order to mislead the network. In this paper we develop a simple mathematical framework which enables us to think about this baffling phenomenon from a fresh perspective, turning it into a natural consequence of the geometry of ℝn with the L0 (Hamming) metric, which can be quantitatively analyzed. In particular, we explain why we should expect to find targeted adversarial examples with Hamming distance of roughly m in arbitrarily deep neural networks which are designed to distinguish between m input classes."

in_NB
adversarial_examples
have_read
august 2019 by cshalizi

Don't count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors

august 2019 by cshalizi

"Context-predicting models (more commonly known as embeddings or neural language models) are the new kids on the distributional semantics block. Despite the buzz surrounding these models, the literature is still lacking a systematic comparison of the predictive models with classic, count-vector-based distributional semantic approaches. In this paper, we perform such an extensive evaluation, on a wide range of lexical semantics tasks and across many parameter settings. The results, to our own surprise, show that the buzz is fully justified, as the context-predicting models obtain a thorough and resounding victory against their count-based counterparts."

to:NB
have_read
natural_language_processing
text_mining
word2vec
data_mining
to_teach:data-mining
august 2019 by cshalizi

[1402.3722] word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method

august 2019 by cshalizi

"The word2vec software of Tomas Mikolov and colleagues (this https URL ) has gained a lot of traction lately, and provides state-of-the-art word embeddings. The learning models behind the software are described in two research papers. We found the description of the models in these papers to be somewhat cryptic and hard to follow. While the motivations and presentation may be obvious to the neural-networks language-modeling crowd, we had to struggle quite a bit to figure out the rationale behind the equations.

"This note is an attempt to explain equation (4) (negative sampling) in "Distributed Representations of Words and Phrases and their Compositionality" by Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado and Jeffrey Dean."

to:NB
natural_language_processing
text_mining
statistics
neural_networks
data_mining
word2vec
have_read
to_teach:data-mining
"This note is an attempt to explain equation (4) (negative sampling) in "Distributed Representations of Words and Phrases and their Compositionality" by Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado and Jeffrey Dean."

august 2019 by cshalizi

Copy this bookmark: