**papers**

Visual Discovery at Pinterest

yesterday by tarakc02

Over the past three years Pinterest has experimented with several visual search and recommendation services, including Related Pins (2014), Similar Looks (2015), Flashlight (2016) and Lens (2017). This paper presents an overview of our visual discovery engine powering these services, and shares the rationales behind our technical and product decisions such as the use of object detection and interactive user interfaces. We conclude that this visual discovery engine significantly improves engagement in both search and recommendation tasks.

papers
ui
ux
object-recognition
neuralnets
recommender-systems
recommendations
pinterest
discovery
yesterday by tarakc02

How to read and understand a scientific paper: a guide for non-scientists

2 days ago by teffalump

yeah, #1 - read intro not abstract ---- do this, this becomes really obvious after reading a few when the abstract still makes no sense to you, lol

papers
research
science
education
reference
2 days ago by teffalump

[1706.04601] Provable benefits of representation learning

2 days ago by mraginsky

There is general consensus that learning representations is useful for a variety of reasons, e.g. efficient use of labeled data (semi-supervised learning), transfer learning and understanding hidden structure of data. Popular techniques for representation learning include clustering, manifold learning, kernel-learning, autoencoders, Boltzmann machines, etc.

To study the relative merits of these techniques, it's essential to formalize the definition and goals of representation learning, so that they are all become instances of the same definition. This paper introduces such a formal framework that also formalizes the utility of learning the representation. It is related to previous Bayesian notions, but with some new twists. We show the usefulness of our framework by exhibiting simple and natural settings -- linear mixture models and loglinear models, where the power of representation learning can be formally shown. In these examples, representation learning can be performed provably and efficiently under plausible assumptions (despite being NP-hard), and furthermore: (i) it greatly reduces the need for labeled data (semi-supervised learning) and (ii) it allows solving classification tasks when simpler approaches like nearest neighbors require too much data (iii) it is more powerful than manifold learning methods.

papers
to-read
representation-learning
To study the relative merits of these techniques, it's essential to formalize the definition and goals of representation learning, so that they are all become instances of the same definition. This paper introduces such a formal framework that also formalizes the utility of learning the representation. It is related to previous Bayesian notions, but with some new twists. We show the usefulness of our framework by exhibiting simple and natural settings -- linear mixture models and loglinear models, where the power of representation learning can be formally shown. In these examples, representation learning can be performed provably and efficiently under plausible assumptions (despite being NP-hard), and furthermore: (i) it greatly reduces the need for labeled data (semi-supervised learning) and (ii) it allows solving classification tasks when simpler approaches like nearest neighbors require too much data (iii) it is more powerful than manifold learning methods.

2 days ago by mraginsky

[1707.05199] Online codes for analog signals

2 days ago by mraginsky

We revisit a classical scenario in communication theory: a source is generating a waveform which we sample at regular intervals; we wish to transform the signal in such a way as to minimize distortion in its reconstruction, despite noise. The transformation must be online (also called causal), in order to enable real-time signaling. The noise model we consider is adversarial $\ell_1$-bounded; this is the "atomic norm" convex relaxation of the standard adversary model in discrete-alphabet communications, namely sparsity (low Hamming weight). We require that our encoding not increase the power of the original signal.

In the "block coding" setting such encoding is possible due to the existence of large almost-Euclidean sections in $\ell_1$ spaces (established in the work of Dvoretzky, Milman, Ka\v{s}in, and Figiel, Lindenstrauss and Milman).

Our main result is that an analogous result is achievable even online. Equivalently, we show a "lower triangular" version of $\ell_1$ Dvoretzky theorems. In terms of communication, the result has the following form: If the signal is a stream of reals $x_1,\ldots$, one per unit time, which we encode causally into $\rho$ (a constant) reals per unit time (forming altogether an output stream $\mathcal{E}(x)$), and if the adversarial noise added to this encoded stream up to time $s$ is a vector $\vec{y}$, then at time $s$ the decoder's reconstruction of the input prefix $x_{[s]}$ is accurate in a time-weighted $\ell_2$ norm, to within $s^{-1/2+\delta}$ (any $\delta>0$) times the adversary's noise as measured in a time-weighted $\ell_1$ norm. The time-weighted decoding norm forces increasingly accurate reconstruction of the distant past, while the time-weighted noise norm permits only vanishing effect from noise in the distant past.

Encoding is linear, and decoding is performed by an LP analogous to those used in compressed sensing.

papers
to-read
random-matrices
communication
information-theory
analog-information-theory
In the "block coding" setting such encoding is possible due to the existence of large almost-Euclidean sections in $\ell_1$ spaces (established in the work of Dvoretzky, Milman, Ka\v{s}in, and Figiel, Lindenstrauss and Milman).

Our main result is that an analogous result is achievable even online. Equivalently, we show a "lower triangular" version of $\ell_1$ Dvoretzky theorems. In terms of communication, the result has the following form: If the signal is a stream of reals $x_1,\ldots$, one per unit time, which we encode causally into $\rho$ (a constant) reals per unit time (forming altogether an output stream $\mathcal{E}(x)$), and if the adversarial noise added to this encoded stream up to time $s$ is a vector $\vec{y}$, then at time $s$ the decoder's reconstruction of the input prefix $x_{[s]}$ is accurate in a time-weighted $\ell_2$ norm, to within $s^{-1/2+\delta}$ (any $\delta>0$) times the adversary's noise as measured in a time-weighted $\ell_1$ norm. The time-weighted decoding norm forces increasingly accurate reconstruction of the distant past, while the time-weighted noise norm permits only vanishing effect from noise in the distant past.

Encoding is linear, and decoding is performed by an LP analogous to those used in compressed sensing.

2 days ago by mraginsky

[1703.03400] Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

2 days ago by arsyed

"We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning problems, including classification, regression, and reinforcement learning. The goal of meta-learning is to train a model on a variety of learning tasks, such that it can solve new learning tasks using only a small number of training samples. In our approach, the parameters of the model are explicitly trained such that a small number of gradient steps with a small amount of training data from a new task will produce good generalization performance on that task. In effect, our method trains the model to be easy to fine-tune. We demonstrate that this approach leads to state-of-the-art performance on two few-shot image classification benchmarks, produces good results on few-shot regression, and accelerates fine-tuning for policy gradient reinforcement learning with neural network policies."

papers
meta-learning
2 days ago by arsyed

[1701.04079] Agent-Agnostic Human-in-the-Loop Reinforcement Learning

2 days ago by arsyed

"Providing Reinforcement Learning agents with expert advice can dramatically improve various aspects of learning. Prior work has developed teaching protocols that enable agents to learn efficiently in complex environments; many of these methods tailor the teacher's guidance to agents with a particular representation or underlying learning scheme, offering effective but specialized teaching procedures. In this work, we explore protocol programs, an agent-agnostic schema for Human-in-the-Loop Reinforcement Learning. Our goal is to incorporate the beneficial properties of a human teacher into Reinforcement Learning without making strong assumptions about the inner workings of the agent. We show how to represent existing approaches such as action pruning, reward shaping, and training in simulation as special cases of our schema and conduct preliminary experiments on simple domains."

papers
reinforcement-learning
human-in-the-loop
2 days ago by arsyed

Stack-up-and-routing-optimization-by-understanding-micro-scale-PCB-effects.pdf

2 days ago by eriwst

Stack-up and routing optimization by understanding micro-scale PCB effects

pcb
design
FR4
engineering
laminates
dielectric
materials
high-speed
papers
2 days ago by eriwst

[1706.00764] Hyperparameter Optimization: A Spectral Approach

2 days ago by arsyed

"We give a simple, fast algorithm for hyperparameter optimization inspired by techniques from the analysis of Boolean functions. We focus on the high-dimensional regime where the canonical example is training a neural network with a large number of hyperparameters. The algorithm - an iterative application of compressed sensing techniques for orthogonal polynomials - requires only uniform sampling of the hyperparameters and is thus easily parallelizable. Experiments for training deep nets on Cifar-10 show that compared to state-of-the-art tools (e.g., Hyperband and Spearmint), our algorithm finds significantly improved solutions, in some cases matching what is attainable by hand-tuning. In terms of overall running time (i.e., time required to sample various settings of hyperparameters plus additional computation time), we are at least an order of magnitude faster than Hyperband and even more so compared to Bayesian Optimization. We also outperform Random Search 5X. Additionally, our method comes with provable guarantees and yields the first quasi-polynomial time algorithm for learning decision trees under the uniform distribution with polynomial sample complexity, the first improvement in over two decades."

papers
optimization
hyperparameter
2 days ago by arsyed

A Mathematical Theory of Communication

5 days ago by ngm01

Claude Shannon 1948

math
information
papers
theory
5 days ago by ngm01

[1707.02476] Adversarial Examples, Uncertainty, and Transfer Testing Robustness in Gaussian Process Hybrid Deep Networks

5 days ago by arsyed

"Deep neural networks (DNNs) have excellent representative power and are state of the art classifiers on many tasks. However, they often do not capture their own uncertainties well making them less robust in the real world as they overconfidently extrapolate and do not notice domain shift. Gaussian processes (GPs) with RBF kernels on the other hand have better calibrated uncertainties and do not overconfidently extrapolate far from data in their training set. However, GPs have poor representational power and do not perform as well as DNNs on complex domains. In this paper we show that GP hybrid deep networks, GPDNNs, (GPs on top of DNNs and trained end-to-end) inherit the nice properties of both GPs and DNNs and are much more robust to adversarial examples. When extrapolating to adversarial examples and testing in domain shift settings, GPDNNs frequently output high entropy class probabilities corresponding to essentially "don't know". GPDNNs are therefore promising as deep architectures that know when they don't know."

papers
neural-net
gaussian-processes
5 days ago by arsyed