recentpopularlog in


« earlier   
How Machine Learning Can (And Can't) Help Journalists - Global Investigative Journalism Network
“There are probably relatively few circumstances under which reporters are going to need … to acquire machine learning – it’s really where you’ve got a classification task,” says Peter Aldhous, a reporter on the science desk at BuzzFeed News.
ml  dj  ij  ai  t 
6 hours ago by paulbradshaw
Neural Networks Are Essentially Polynomial Regression | Mad (Data) Scientist
* We present a very simple, informal mathematical argument that neural networks (NNs) are in essence polynomial regression (PR). We refer to this as NNAEPR.
* NNAEPR implies that we can use our knowledge of the “old-fashioned” method of PR to gain insight into how NNs — widely viewed somewhat warily as a “black box” — work inside.
* One such insight is that the outputs of an NN layer will be prone to multicollinearity, with the problem becoming worse with each successive layer. This in turn may explain why convergence issues often develop in NNs. It also suggests that NN users tend to use overly large networks.
* NNAEPR suggests that one may abandon using NNs altogether, and simply use PR instead.

[I'd like to reread more carefully & try reproducing some results myself... In the comments I see Matloff vehemently arguing with commenters about the interpretation of their "informal mathematical argument." But in principle, even if I'm not convinced yet, I'm partial to the desire to paint Deep Learning as just a fancy screen over much-simpler things like polynomial regression.]
statistics  ML 
19 hours ago by civilstat
Modern modelling techniques are data hungry: a simulation study for predicting dichotomous endpoints | BMC Medical Research Methodology | Full Text
We found that a stable AUC was reached by LR [logistic regression] at approximately 20 to 50 events per variable, followed by CART, SVM, NN and RF models. Optimism decreased with increasing sample sizes and the same ranking of techniques. The RF, SVM and NN models showed instability and a high optimism even with >200 events per variable.

Modern modelling techniques such as SVM, NN and RF may need over 10 times as many events per variable to achieve a stable AUC and a small optimism than classical modelling techniques such as LR. This implies that such modern techniques should only be used in medical prediction problems if very large data sets are available.
statistics  ML 
19 hours ago by civilstat
Troubling Trends in Machine Learning Scholarship - Lipton and Steinhardt
We encourage authors to ask “what worked?” and “why?”, rather than just “how well?”. Except in extraordinary cases, raw headline numbers provide limited value for scientific progress absent insight into what drives them. Insight does not necessarily mean theory. Three practices that are common in the strongest empirical papers are error analysis, ablation studies, and robustness checks (to e.g. choice of hyper-parameters, as well as ideally to choice of dataset). These practices can be adopted by everyone and we advocate their wide-spread use.


When writing, we recommend asking the following question: Would I rely on this explanation for making predictions or for getting a system to work? This can be a good test of whether a theorem is being included to please reviewers or to convey actual insight. It also helps check whether concepts and explanations match our own internal mental model.


Reviewers can set better incentives by asking: “Might I have accepted this paper if the authors had done a worse job?” For instance, a paper describing a simple idea that leads to improved performance, together with two negative results, should be judged more favorably than a paper that combines three ideas together (without ablation studies) yielding the same improvement.
statistics  ML 
19 hours ago by civilstat
about datalab
For this quarter our team objectives are:

Make it easy for BBC teams to rapidly develop and deploy Machine Learning engines
Provide great recommendations across multiple products beyond BBC+
bbc  datalab  ml  ai 
23 hours ago by paulbradshaw
Road Map for Choosing Between Statistical Modeling and Machine Learning | Statistical Thinking
ML and AI have had their greatest successes in high signal:noise situations, e.g., visual and sound recognition, language translation, and playing games with concrete rules. What distinguishes these is quick feedback while training, and availability of the answer. Things are different in the low signal:noise world of medical diagnosis and human outcomes. A great use of ML is in pattern recognition to mimic radiologists’ expert image interpretations. For estimating the probability of a positive biopsy given symptoms, signs, risk factors, and demographics, not so much.


There are many current users of ML algorithms who falsely believe that one can make reliable predictions from complex datasets with a small number of observations. Statisticians are pretty good at knowing the limitations caused by the effective sample size, and to stop short of trying to incorporate model complexity that is not supported by the information content of the sample.
statistics  ML 
23 hours ago by civilstat
A Better Lesson – Rodney Brooks
Nice... on the ML community's sleight of hand when it comes to making claims about AI...
ML  principles  bestpractice  TM358 
23 hours ago by psychemedia
SOD - An Embedded, Modern Computer Vision and Machine Learning Library
Saw this and made me think I want a tech called FuKD - FUnctional Knowledge Devices... Because we will be.
IoT  embedded  imageAnalysis  deepLearning  ML 
yesterday by psychemedia

Copy this bookmark:

to read