recentpopularlog in


« earlier   
[1705.06908] Unbiased estimates for linear regression via volume sampling
"For a full rank n×d matrix X with n≥d, consider the task of solving the linear least squares problem, where we try to predict a response value for each of the n rows of X. Assume that obtaining the responses is expensive and we can only afford to attain the responses for a small subset of rows. We show that a good approximate solution to this least squares problem can be obtained from just dimension d many responses. Concretely, if the rows are in general position and if a subset of d rows is chosen proportional to the squared volume spanned by those rows, then the expected total square loss (on all n rows) of the least squares solution found for the subset is exactly d+1 times the minimum achievable total loss. We provide lower bounds showing that the factor of d+1 is optimal, and any iid row sampling procedure requires Ω(dlogd) responses to achieve a finite factor guarantee. Moreover, the least squares solution obtained for the volume sampled subproblem is an unbiased estimator of optimal solution based on all n responses.
Our methods lead to general matrix expectation formulas for volume sampling which go beyond linear regression. In particular, we propose a matrix estimator for the pseudoinverse X+, computed from a small subset of rows of the matrix X. The estimator is unbiased and surprisingly its covariance also has a closed form: It equals a specific factor times X+X+⊤. We believe that these new formulas establish a fundamental connection between linear least squares and volume sampling. Our analysis for computing matrix expectations is based on reverse iterative volume sampling, a technique which also leads to a new algorithm for volume sampling that is by a factor of n2 faster than the state-of-the-art."
papers  regression  active-learning 
4 hours ago by arsyed
Everything is a Model | Delip Rao
Favorite tweet: WNixalo

Forgot where I got this link, glad I read it. Will have to look at the paper

— Wayne Nixalo (@WNixalo) January 23, 2018
IFTTT  twitter  favorite  Papers  compsci  ml  learning  algorithms 
23 hours ago by tswaterman
[1707.07270] MatchZoo: A Toolkit for Deep Text Matching
In recent years, deep neural models have been widely adopted for text matching tasks, such as question answering and information retrieval, showing improved performance as compared with previous methods. In this paper, we introduce the MatchZoo toolkit that aims to facilitate the designing, comparing and sharing of deep text matching models. Specifically, the toolkit provides a unified data preparation module for different text matching problems, a flexible layer-based model construction process, and a variety of training objectives and evaluation metrics. In addition, the toolkit has implemented two schools of representative deep text matching models, namely representation-focused models and interaction-focused models. Finally, users can easily modify existing models, create and share their own models for text matching in MatchZoo.
DNN  text-classification  IR  toolkit  papers 
yesterday by foodbaby
Online Experimentation Diagnosis and Troubleshooting Beyond AA Validation - Semantic Scholar
Online experiments are frequently used at internet companies to evaluate the impact of new designs, features, or code changes on user behavior. Though the experiment design is straightforward in theory, in practice, there are many problems that can complicate the interpretation of results and render any conclusions about changes in user behavior invalid. Many of these problems are difficult to detect and often go unnoticed. Acknowledging and diagnosing these issues can prevent experiment owners from making decisions based on fundamentally flawed data. When conducting online experiments, data quality assurance is a top priority before attributing the impact to changes in user behavior. While some problems can be detected by running AA tests before introducing the treatment, many problems do not emerge during the AA period, and appear only during the AB period. Prior work on this topic has not addressed troubleshooting during the AB period. In this paper, we present lessons learned from experiments on various internet consumer products at Yahoo, as well as diagnostic and remedy procedures. Most of the examples and troubleshooting procedures presented here are generic to online experimentation at other companies. Some, such as traffic splitting problems and outlier problems have been documented before, but others have not previously been described in the literature.
ab-testing  papers  experience 
2 days ago by foodbaby
Overlapping experiment infrastructure: more, better, faster experimentation - Semantic Scholar
At Google, experimentation is practically a mantra; we evaluate almost every change that potentially affects what our users experience. Such changes include not only obvious user-visible changes such as modifications to a user interface, but also more subtle changes such as different machine learning algorithms that might affect ranking or content selection. Our insatiable appetite for experimentation has led us to tackle the problems of how to run more experiments, how to run experiments that produce better decisions, and how to run them faster. In this paper, we describe Google's overlapping experiment infrastructure that is a key component to solving these problems. In addition, because an experiment infrastructure alone is insufficient, we also discuss the associated tools and educational processes required to use it effectively. We conclude by describing trends that show the success of this overall experimental environment. While the paper specifically describes the experiment system and experimental processes we have in place at Google, we believe they can be generalized and applied by any entity interested in using experimentation to improve search engines and other web applications.
ab-testing  google  papers 
2 days ago by foodbaby
[1710.08864] One pixel attack for fooling deep neural networks
"Recent research has revealed that the output of Deep Neural Networks (DNN) can be easily altered by adding relatively small perturbations to the input vector. In this paper, we analyze an attack in an extremely limited scenario where only one pixel can be modified. For that we propose a novel method for generating one-pixel adversarial perturbations based on differential evolution. It requires less adversarial information and can fool more types of networks. The results show that 70.97% of the natural images can be perturbed to at least one target class by modifying just one pixel with 97.47% confidence on average. Thus, the proposed attack explores a different take on adversarial machine learning in an extreme limited scenario, showing that current DNNs are also vulnerable to such low dimension attacks."
papers  neural-net  adversarial-examples 
3 days ago by arsyed

Copy this bookmark:

to read