recentpopularlog in


« earlier   
Predictions 2019: Customer Experience Comes Under Fire
Forrester's dour 2019 #customer #experience forecast: Unrealistic investments, shoddy business cases, and price wars in race to bottom.
customer  experience  adoption  transformation  prediction  analysis  measurement  2018  investment  results  business  case  demand  csr19  analyst  price  relationship 
4 hours ago by csrollyson
[1906.05473] Selective prediction-set models with coverage guarantees
"Though black-box predictors are state-of-the-art for many complex tasks, they often fail to properly quantify predictive uncertainty and may provide inappropriate predictions for unfamiliar data. Instead, we can learn more reliable models by letting them either output a prediction set or abstain when the uncertainty is high. We propose training these selective prediction-set models using an uncertainty-aware loss minimization framework, which unifies ideas from decision theory and robust maximum likelihood. Moreover, since black-box methods are not guaranteed to output well-calibrated prediction sets, we show how to calculate point estimates and confidence intervals for the true coverage of any selective prediction-set model, as well as a uniform mixture of K set models obtained from K-fold sample-splitting. When applied to predicting in-hospital mortality and length-of-stay for ICU patients, our model outperforms existing approaches on both in-sample and out-of-sample age groups, and our recalibration method provides accurate inference for prediction set coverage."
to:NB  prediction  statistics 
6 days ago by cshalizi
[1906.04711] ProPublica's COMPAS Data Revisited
"In this paper I re-examine the COMPAS recidivism score and criminal history data collected by ProPublica in 2016, which has fueled intense debate and research in the nascent field of `algorithmic fairness' or `fair machine learning' over the past three years. ProPublica's COMPAS data is used in an ever-increasing number of studies to test various definitions and methodologies of algorithmic fairness. This paper takes a closer look at the actual datasets put together by ProPublica. In particular, I examine the distribution of defendants across COMPAS screening dates and find that ProPublica made an important data processing mistake when it created some of the key datasets most often used by other researchers. Specifically, the datasets built to study the likelihood of recidivism within two years of the original COMPAS screening date. As I show in this paper, ProPublica made a mistake implementing the two-year sample cutoff rule for recidivists in such datasets (whereas it implemented an appropriate two-year sample cutoff rule for non-recidivists). As a result, ProPublica incorrectly kept a disproportionate share of recidivists. This data processing mistake leads to biased two-year recidivism datasets, with artificially high recidivism rates. This also affects the positive and negative predictive values. On the other hand, this data processing mistake does not impact some of the key statistical measures highlighted by ProPublica and other researchers, such as the false positive and false negative rates, nor the overall accuracy."
to:NB  data_sets  crime  prediction  to_teach:data-mining 
6 days ago by cshalizi
Interview with Donald Knuth | Interview with Donald Knuth | InformIT
Andrew Binstock and Donald Knuth converse on the success of open source, the problem with multicore architecture, the disappointing lack of interest in literate programming, the menace of reusable code, and that urban legend about winning a programming contest with a single compilation.
nibble  interview  giants  expert-experience  programming  cs  software  contrarianism  carmack  oss  prediction  trends  linux  concurrency  desktop  comparison  checking  debugging  stories  engineering  hmm  idk  algorithms  books  debate  flux-stasis  duplication  parsimony  best-practices  writing  documentation  latex  intricacy  structure  hardware  caching  workflow  editors  composition-decomposition  coupling-cohesion  exposition  technical-writing  thinking 
6 days ago by nhaliday
Predicting history
Can events be accurately described as historic at the time they are happening? Claims of this sort are in effect predictions about the evaluations of future historians; that is, that they will regard the events in question as significant. Here we provide empirical evidence in support of earlier philosophical arguments1 that such claims are likely to be spurious and that, conversely, many events that will one day be viewed as historic attract little attention at the time. We introduce a conceptual and methodological framework for applying machine learning prediction models to large corpora of digitized historical archives. We find that although such models can correctly identify some historically important documents, they tend to overpredict historical significance while also failing to identify many documents that will later be deemed important, where both types of error increase monotonically with the number of documents under consideration. On balance, we conclude that historical significance is extremely difficult to predict, consistent with other recent work on intrinsic limits to predictability in complex social systems2,3. However, the results also indicate the feasibility of developing ‘artificial archivists’ to identify potentially historic documents in very large digital corpora.
7 days ago by mgaldino
[1901.06758] A deep learning approach to real-time parking occupancy prediction in spatio-temporal networks incorporating multiple spatio-temporal data sources
A deep learning model is applied for predicting block-level parking occupancy in real time. The model leverages Graph-Convolutional Neural Networks (GCNN) to extract the spatial relations of traffic flow in large-scale networks, and utilizes Recurrent Neural Networks (RNN) with Long-Short Term Memory (LSTM) to capture the temporal features. In addition, the model is capable of taking multiple heterogeneously structured traffic data sources as input, such as parking meter transactions, traffic speed, and weather conditions. The model performance is evaluated through a case study in Pittsburgh downtown area. The proposed model outperforms other baseline methods including multi-layer LSTM and Lasso with an average testing MAPE of 10.6\% when predicting block-level parking occupancies 30 minutes in advance. The case study also shows that, in generally, the prediction model works better for business areas than for recreational locations. We found that incorporating traffic speed and weather information can significantly improve the prediction performance. Weather data is particularly useful for improving predicting accuracy in recreational areas.
machine-learning  city-planning  data-analysis  looking-to-see  prediction  deep-learning  to-write-about  consider:data-sourcing 
8 days ago by Vaguery
Sequential sampling strategy for extreme event statistics in nonlinear dynamical systems | PNAS
We develop a method for the evaluation of extreme event statistics associated with nonlinear dynamical systems from a small number of samples. From an initial dataset of design points, we formulate a sequential strategy that provides the “next-best” data point (set of parameters) that when evaluated results in improved estimates of the probability density function (pdf) for a scalar quantity of interest. The approach uses Gaussian process regression to perform Bayesian inference on the parameter-to-observation map describing the quantity of interest. We then approximate the desired pdf along with uncertainty bounds using the posterior distribution of the inferred map. The next-best design point is sequentially determined through an optimization procedure that selects the point in parameter space that maximally reduces uncertainty between the estimated bounds of the pdf prediction. Since the optimization process uses only information from the inferred map, it has minimal computational cost. Moreover, the special form of the metric emphasizes the tails of the pdf. The method is practical for systems where the dimensionality of the parameter space is of moderate size and for problems where each sample is very expensive to obtain. We apply the method to estimate the extreme event statistics for a very high-dimensional system with millions of degrees of freedom: an offshore platform subjected to 3D irregular waves. It is demonstrated that the developed approach can accurately determine the extreme event statistics using a limited number of samples.

--- My suspicions might be due to ignorance of what they are doing here but it looks too good to be true.
extreme_values  nonlinear_dynamics  prediction  fokker-planck  i_remain_skeptical 
12 days ago by rvenkat
[AI] - Vietato addestrare delle AI pre perdere le sentenze dei Giudici
In una "prima volta" davvero particolare, le corti francesi hanno deciso che è illegale creare sistemi di analisi dei singoli giudici per prevedere l'orientamento degli stessi rispetto a varie tematiche.
Una tendenza di legal-tech, questa, abbastanza in auge e volta principalmente a poter scegliere il giudice - ove possibile - ed il foro che per orientamento sia più vicino ai desiderata della sentenza. Strada preclusa, pare, in Francia.
AI  prediction  privacy  top 
14 days ago by mgpf
Generating Accurate Personalized Predictions of Future Behavior: A Smoking Exemplar
"Time stamps from ESM surveys were used to calculate the time of day, day of the week, and continuous time—where the last datum was, in turn, used to calculate 12-hr and 24-hr cycles. Each individual’s time series was split into sequential training and testing sections, so that trained models could be tested on future observations. Prediction models were trained on the first 75% of the individual’s data and tested on the last 25%. Predictions of future behavior were made on a person by person basis. Two prediction algorithms were employed, elastic net regularization and naïve Bayes classification. Sample-wide area under the curve was nearly 80%, with some models demonstrating perfect prediction accuracies. Sensitivity and specificity were between 0.78 and 0.81 across the two approaches. Importantly, prediction models were based on a lagged data structure. Thus, in addition to supporting the prediction accuracy of our models with out-of-sample tests in time-forward data, the models themselves were time-lagged, such that each prediction was for the subsequent measurement. Such a system could be the basis for mobile, just-in-time interventions for substance use, as models that accurately predict future behavior could ostensibly be used for delivering personalized interventions at empirically-indicated moments of need."
statistics  prediction  timeseries  psychology 
14 days ago by aapl
Je beste tijd als acteur gehad? Deze wetenschappers ‘voorspellen’ het met behulp van IMDb | De Volkskrant
Digitale encyclopedie 
Voor hun analyse gebruikten ze gegevens van ruim anderhalf miljoen acteurs en bijna negenhonderdduizend actrices uit de bekende Internet Movie Database (IMDb), een digitale encyclopedie waarin gegevens worden bijgehouden van film- en televisieproducties over de hele wereld. De gebruikte cijfers lopen van 1888 tot 2016.

In die gegevensberg ontdekten ze nog meer patronen. Zo bleek succes de belangrijkste graadmeter voor nog meer succes: wie populair is, wordt vaak nóg populairder en krijgt steeds meer rollen. ‘De cijfers toonden bovendien de ongelijkheid tussen mannen en vrouwen’, zegt Williams. Bij carrières langer dan één jaar bleken die van vrouwen gemiddeld korter. Ook zat het meest succesvolle jaar bij vrouwen vaker aan het begin van hun carrière.

IMDB  movies  actors  career  statistics  prediction  science 
15 days ago by dominomaster

Copy this bookmark:

to read