Vaguery : machine-learning   981

 « earlier
Emergent Tool Use from Multi-Agent Interaction
We’ve observed agents discovering progressively more complex tool use while playing a simple game of hide-and-seek. Through training in our new simulated hide-and-seek environment, agents build a series of six distinct strategies and counterstrategies, some of which we did not know our environment supported. The self-supervised emergent complexity in this simple environment further suggests that multi-agent co-adaptation may one day produce extremely complex and intelligent behavior.
agent-based  artificial-life  machine-learning  collective-behavior  looking-to-see  rather-interesting  to-write-about  to-simulate  consider:a-sufficiently-complex-environment
2 days ago by Vaguery
Hugging Face GPT With Clojure - Squid's Blog
You can now interop with ANY python library.

I know. It’s overwhelming. It took a bit for me to come to grips with it too.

Let’s take an example of something that I’ve always wanted to do and have struggled with mightly finding a way to do it in Clojure:
I want to use the latest cutting edge GPT2 code out there to generate text.

Right now, that library is Hugging Face Transformers.

Get ready. We will wrap that sweet hugging face code in Clojure parens!
machine-learning  Clojure  interop  natural-language-processing  rather-interesting  library  to-understand  to-follow-along  to-write-about  consider:scraping-for-data
2 days ago by Vaguery
Integrating Deep Learning With clojure.spec - Squid's Blog
clojure.spec allows you to write specifications for data and use them for validation. It also provides a generative aspect that allows for robust testing as well as an additional way to understand your data through manual inspection. The dual nature of validation and generation is a natural fit for deep learning models that consist of paired discriminator/generator models.
Clojure  clojure.spec  machine-learning  to-understand  to-do  consider:representation  consider:structural-types
2 days ago by Vaguery
Regularization Improves the Robustness of Learned Sequence-to-Expression Models | bioRxiv
Understanding of the gene regulatory activity of enhancers is a major problem in regulatory biology. The nascent field of sequence-to-expression modelling seeks to create quantitative models of gene expression based on regulatory DNA (cis) and cellular environmental (trans) contexts. All quantitative models are defined partially by numerical parameters, and it is common to fit these parameters to data provided by existing experimental results. However, the relative paucity of experimental data appropriate for this task, and lacunae in our knowledge of all components of the systems, results in problems often being under-specified, which in turn may lead to a situation where wildly different model parameterizations perform similarly well on training data. It may also lead to models being fit to the idiosyncrasies of the training data, without representing the more general process (overfitting).

In other contexts where parameter-fitting is performed, it is common to apply regularization to reduce overfitting. We systematically evaluated the efficacy of three types of regularization in improving the generalizability of trained sequence-to-expression models. The evaluation was performed in two types of cross-validation experiments: one training on D. melanogaster data and predicting on orthologous enhancers from related species, and the other cross-validating between four D. melanogaster neurogenic ectoderm enhancers, which are thought to be under control of the same transcription factors. We show that training with a combination of noise-injection, L1, and L2 regularization can drastically reduce overfitting and improve the generalizability of learned sequence-to-expression models. These results suggest that it may be possible to mitigate the tendency of sequence-to-expression models to overfit available data, thus improving predictive power and potentially resulting in models that provide better insight into underlying biological processes.
bioinformatics  systems-biology  nonlinear-dynamics  machine-learning  regularization  statistics  numerical-methods  heuristics  to-write-about  to-simulate  consider:symbolic-regression  consider:robustness
2 days ago by Vaguery
[1807.06414] Combining a Context Aware Neural Network with a Denoising Autoencoder for Measuring String Similarities
Measuring similarities between strings is central for many established and fast growing research areas including information retrieval, biology, and natural language processing. The traditional approach for string similarity measurements is to define a metric over a word space that quantifies and sums up the differences between characters in two strings. The state-of-the-art in the area has, surprisingly, not evolved much during the last few decades. The majority of the metrics are based on a simple comparison between character and character distributions without consideration for the context of the words. This paper proposes a string metric that encompasses similarities between strings based on (1) the character similarities between the words including. Non-Standard and standard spellings of the same words, and (2) the context of the words. Our proposal is a neural network composed of a denoising autoencoder and what we call a context encoder specifically designed to find similarities between the words based on their context. The experimental results show that the resulting metrics succeeds in 85.4\% of the cases in finding the correct version of a non-standard spelling among the closest words, compared to 63.2\% with the established Normalised-Levenshtein distance. Besides, we show that words used in similar context are with our approach calculated to be similar than words with different contexts, which is a desirable property missing in established string metrics.
feature-construction  machine-learning  clustering  natural-language-processing  rather-interesting  to-simulate  to-write-about  consider:feature-discovery
5 days ago by Vaguery
[1812.08434] Graph Neural Networks: A Review of Methods and Applications
Lots of learning tasks require dealing with graph data which contains rich relation information among elements. Modeling physics system, learning molecular fingerprints, predicting protein interface, and classifying diseases require a model to learn from graph inputs. In other domains such as learning from non-structural data like texts and images, reasoning on extracted structures, like the dependency tree of sentences and the scene graph of images, is an important research topic which also needs graph reasoning models. Graph neural networks (GNNs) are connectionist models that capture the dependence of graphs via message passing between the nodes of graphs. Unlike standard neural networks, graph neural networks retain a state that can represent information from its neighborhood with arbitrary depth. Although the primitive GNNs have been found difficult to train for a fixed point, recent advances in network architectures, optimization techniques, and parallel computation have enabled successful learning with them. In recent years, systems based on variants of graph neural networks such as graph convolutional network (GCN), graph attention network (GAT), gated graph neural network (GGNN) have demonstrated ground-breaking performance on many tasks mentioned above. In this survey, we provide a detailed review over existing graph neural network models, systematically categorize the applications, and propose four open problems for future research.
machine-learning  representation  review  rather-interesting  to-write-about  to-simulate  consider:genetic-programming  consider:functional-transformations
5 days ago by Vaguery
High-dimensional Time Series Clustering via Cross-Predictability
The key to time series clustering is how to characterize the similarity between any two time series. In this paper, we explore a new similarity metric called “cross-predictability”: the degree to which a future value in each time series is predicted by past values of the others. However, it is challenging to estimate such cross-predictability among time series in the high-dimensional regime, where the number of time series is much larger than the length of each time series. We address this challenge with a sparsity assumption: only time series in the same cluster have significant cross-predictability with each other. We demonstrate that this approach is computationally attractive, and provide a theoretical proof that the proposed algorithm will identify the correct clustering structure with high probability under certain conditions. To the best of our knowledge, this is the first practical high-dimensional time series clustering algorithm with a provable guarantee. We evaluate with experiments on both synthetic data and real-world data, and results indicate that our method can achieve more than 80% clustering accuracy on real-world data, which is 20% higher than the state-of-art baselines.
performance-networks  clustering  time-series  yeah-I-probably-shoulda-published-in-1999  machine-learning  feature-construction  ah-well
5 days ago by Vaguery
[2001.09212] PCGRL: Procedural Content Generation via Reinforcement Learning
We investigate how reinforcement learning can be used to train level-designing agents. This represents a new approach to procedural content generation in games, where level design is framed as a game, and the content generator itself is learned. By seeing the design problem as a sequential task, we can use reinforcement learning to learn how to take the next action so that the expected final level quality is maximized. This approach can be used when few or no examples exist to train from, and the trained generator is very fast. We investigate three different ways of transforming two-dimensional level design problems into Markov decision processes and apply these to three game environments.
machine-learning  transfer-learning  generative-models  content-generation  rather-interesting  to-simulate  to-write-about  consider:feature-discovery
16 days ago by Vaguery
[1802.10038] Improving OCR Accuracy on Early Printed Books by combining Pretraining, Voting, and Active Learning
We combine three methods which significantly improve the OCR accuracy of OCR models trained on early printed books: (1) The pretraining method utilizes the information stored in already existing models trained on a variety of typesets (mixed models) instead of starting the training from scratch. (2) Performing cross fold training on a single set of ground truth data (line images and their transcriptions) with a single OCR engine (OCRopus) produces a committee whose members then vote for the best outcome by also taking the top-N alternatives and their intrinsic confidence values into account. (3) Following the principle of maximal disagreement we select additional training lines which the voters disagree most on, expecting them to offer the highest information gain for a subsequent training (active learning). Evaluations on six early printed books yielded the following results: On average the combination of pretraining and voting improved the character accuracy by 46% when training five folds starting from the same mixed model. This number rose to 53% when using different models for pretraining, underlining the importance of diverse voters. Incorporating active learning improved the obtained results by another 16% on average (evaluated on three of the six books). Overall, the proposed methods lead to an average error rate of 2.5% when training on only 60 lines. Using a substantial ground truth pool of 1,000 lines brought the error rate down even further to less than 1% on average.
OCR  digital-humanities  digitization  text-processing  image-processing  machine-learning  data-cleaning  the-mangle-in-practice  modeling  rather-interesting  to-write-about  to-simulate  consider:stochastic-resonance
26 days ago by Vaguery
[1811.03690] Machine Learning Characterization of Structural Defects in Amorphous Packings of Dimers and Ellipses
Structural defects within amorphous packings of symmetric particles can be characterized using a machine learning approach that incorporates structure functions of radial distances and angular arrangement. This yields a scalar field, \emph{softness}, that correlates with the probability that a particle is about to rearrange. However, when particle shapes are elongated, as in the case of dimers and ellipses, we find the standard structure functions produce imprecise softness measurements. Moreover, ellipses exhibit deformation profiles in stark contrast to circular particles. In order to account for effects of orientation and alignment, we introduce new structure functions to recover predictive performance of softness, as well as provide physical insight to local and extended dynamics. We study a model disordered solid, a bidisperse two-dimensional granular pillar, driven by uniaxial compression and composed entirely of monomers, dimers, or ellipses. We demonstrate how the computation of softness via support vector machine extends to dimers and ellipses with the introduction of new orientational structure functions. Then, we highlight the spatial extent of rearrangements and defects, as well as their cross-correlation, for each particle shape. Finally, we demonstrate how an additional machine learning algorithm, recursive feature elimination, provides an avenue to better understand how softness arises from particular structural aspects. We identify the most crucial structure functions in determining softness and discuss their physical implications.
physics  granular-materials  machine-learning  neural-networks  rather-interesting  approximation  feature-extraction  to-simulate  to-write-about  consider:genetic-programming  consider:feature-discovery  consider:re-representation
4 weeks ago by Vaguery
[1805.03963] Monotone Learning with Rectified Wire Networks
We introduce a new neural network model, together with a tractable and monotone online learning algorithm. Our model describes feed-forward networks for classification, with one output node for each class. The only nonlinear operation is rectification using a ReLU function with a bias. However, there is a rectifier on every edge rather than at the nodes of the network. There are also weights, but these are positive, static, and associated with the nodes. Our "rectified wire" networks are able to represent arbitrary Boolean functions. Only the bias parameters, on the edges of the network, are learned. Another departure in our approach, from standard neural networks, is that the loss function is replaced by a constraint. This constraint is simply that the value of the output node associated with the correct class should be zero. Our model has the property that the exact norm-minimizing parameter update, required to correctly classify a training item, is the solution to a quadratic program that can be computed with a few passes through the network. We demonstrate a training algorithm using this update, called sequential deactivation (SDA), on MNIST and some synthetic datasets. Upon adopting a natural choice for the nodal weights, SDA has no hyperparameters other than those describing the network structure. Our experiments explore behavior with respect to network size and depth in a family of sparse expander networks.
machine-learning  neural-networks  representation  algorithms  to-simulate  to-write-about  consider:performance-measures  rather-interesting
4 weeks ago by Vaguery
Extending the “Open-Closed Principle” to Automated Algorithm Configuration | Semantic Scholar
Metaheuristics are an effective and diverse class of optimization algorithms: a means of obtaining solutions of acceptable quality for otherwise intractable problems. The selection, construction, and configuration of a metaheuristic for a given problem has historically been a manually intensive process based on experience, experimentation, and reasoning by metaphor. More recently, there has been interest in automating the process of algorithm configuration. In this article, we identify shared state as an inhibitor of progress for such automation. To solve this problem, we introduce the Automated Open-Closed Principle (AOCP), which stipulates design requirements for unintrusive reuse of algorithm frameworks and automated assembly of algorithms from an extensible palette of components. We demonstrate how the AOCP enables a greater degree of automation than previously possible via an example implementation
hey-I-know-this-guy  functional-programming  genetic-programming  machine-learning  to-write-about  to-simulate
4 weeks ago by Vaguery
Stochastic synthesis of recursive functions made easy with bananas, lenses, envelopes and barbed wire | Semantic Scholar
Stochastic synthesis of recursive functions has historically proved difficult, not least due to issues of non-termination and the often ad hoc methods for addressing this. This article presents a general method of implicit recursion which operates via an automatically-derivable decomposition of datatype structure by cases, thereby ensuring well-foundedness. The method is applied to recursive functions of long-standing interest and the results outperform recent work which combines two leading approaches and employs ‘human in the loop’ to define the recursion structure. We show that stochastic synthesis with the proposed method on benchmark functions is effective even with random search, motivating a need for more difficult recursive benchmarks in future
hey-I-know-this-guy  genetic-programming  programming-language  formal-languages  machine-learning  to-write-about  to-simulate  functional-programming
4 weeks ago by Vaguery
Computing Receptive Fields of Convolutional Neural Networks
While deep neural networks have overwhelmingly established state-of-the-art results in many artificial intelligence problems, they can still be difficult to develop and debug. Recent research on deep learning understanding has focused on feature visualization , theoretical guarantees , model interpretability , and generalization .

In this work, we analyze deep neural networks from a complementary perspective, focusing on convolutional models. We are interested in understanding the extent to which input signals may affect output features, and mapping features at any part of the network to the region in the input that produces them. The key parameter to associate an output feature to an input region is the receptive field of the convolutional network, which is defined as the size of the region in the input that produces the feature.
deep-learning  machine-learning  neural-networks  dissection  rather-interesting  structure-function  how-things-work  to-write-about  to-do
4 weeks ago by Vaguery
[1910.14481] Continual Unsupervised Representation Learning
machine-learning  reinvented-wheels  continual-learning  neural-networks  unsupervised-learning  algorithms  to-write-about  to-simulate  consider:genetic-programming  revive:1998-paper
4 weeks ago by Vaguery
[1910.11997] Mellotron: Multispeaker expressive voice synthesis by conditioning on rhythm, pitch and global style tokens
Mellotron is a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data. By explicitly conditioning on rhythm and continuous pitch contours from an audio signal or music score, Mellotron is able to generate speech in a variety of styles ranging from read speech to expressive speech, from slow drawls to rap and from monotonous voice to singing voice. Unlike other methods, we train Mellotron using only read speech data without alignments between text and audio. We evaluate our models using the LJSpeech and LibriTTS datasets. We provide F0 Frame Errors and synthesized samples that include style transfer from other speakers, singers and styles not seen during training, procedural manipulation of rhythm and pitch and choir synthesis.
speech-synthesis  natural-language-processing  style-transfer  machine-learning  neural-networks  rather-interesting  consider:performance-measures  to-simulate  time-series  dimension-reduction
4 weeks ago by Vaguery
[1905.05264] Programming multi-level quantum gates in disordered computing reservoirs via machine learning and TensorFlow
Novel machine learning computational tools open new perspectives for quantum information systems. Here we adopt the open-source programming library TensorFlow to design multi-level quantum gates including a computing reservoir represented by a random unitary matrix. In optics, the reservoir is a disordered medium or a multi-modal fiber. We show that trainable operators at the input and the readout enable one to realize multi-level gates. We study various qudit gates, including the scaling properties of the algorithms with the size of the reservoir. Despite an initial low slop learning stage, TensorFlow turns out to be an extremely versatile resource for designing gates with complex media, including different models that use spatial light modulators with quantized modulation levels.
quantums  quantum-computing  machine-learning  reinvented-wheels  genetic-programming  to-write-about  snidely
5 weeks ago by Vaguery
[1610.07717] Distributed and parallel time series feature extraction for industrial big data applications
The all-relevant problem of feature selection is the identification of all strongly and weakly relevant attributes. This problem is especially hard to solve for time series classification and regression in industrial applications such as predictive maintenance or production line optimization, for which each label or regression target is associated with several time series and meta-information simultaneously. Here, we are proposing an efficient, scalable feature extraction algorithm for time series, which filters the available features in an early stage of the machine learning pipeline with respect to their significance for the classification or regression task, while controlling the expected percentage of selected but irrelevant features. The proposed algorithm combines established feature extraction methods with a feature importance filter. It has a low computational complexity, allows to start on a problem with only limited domain knowledge available, can be trivially parallelized, is highly scalable and based on well studied non-parametric hypothesis tests. We benchmark our proposed algorithm on all binary classification problems of the UCR time series classification archive as well as time series from a production line optimization project and simulated stochastic processes with underlying qualitative change of dynamics.
time-series  feature-extraction  machine-learning  algorithms  to-understand  compare:pareto-gp
5 weeks ago by Vaguery
[1808.07202] A Survey on Food Computing
Food is very essential for human life and it is fundamental to the human experience. Food-related study may support multifarious applications and services, such as guiding the human behavior, improving the human health and understanding the culinary culture. With the rapid development of social networks, mobile networks, and Internet of Things (IoT), people commonly upload, share, and record food images, recipes, cooking videos, and food diaries, leading to large-scale food data. Large-scale food data offers rich knowledge about food and can help tackle many central issues of human society. Therefore, it is time to group several disparate issues related to food computing. Food computing acquires and analyzes heterogenous food data from disparate sources for perception, recognition, retrieval, recommendation, and monitoring of food. In food computing, computational approaches are applied to address food related issues in medicine, biology, gastronomy and agronomy. Both large-scale food data and recent breakthroughs in computer science are transforming the way we analyze food data. Therefore, vast amounts of work has been conducted in the food area, targeting different food-oriented tasks and applications. However, there are very few systematic reviews, which shape this area well and provide a comprehensive and in-depth summary of current efforts or detail open problems in this area. In this paper, we formalize food computing and present such a comprehensive overview of various emerging concepts, methods, and tasks. We summarize key challenges and future directions ahead for food computing. This is the first comprehensive survey that targets the study of computing technology for the food area and also offers a collection of research studies and technologies to benefit researchers and practitioners working in different food-related fields.
representation  domain-specific  rather-interesting  rather-odd  engineering-criticism  review  machine-learning  ontology
6 weeks ago by Vaguery
[1902.08171] A Dictionary Based Generalization of Robust PCA
We analyze the decomposition of a data matrix, assumed to be a superposition of a low-rank component and a component which is sparse in a known dictionary, using a convex demixing method. We provide a unified analysis, encompassing both undercomplete and overcomplete dictionary cases, and show that the constituent components can be successfully recovered under some relatively mild assumptions up to a certain global sparsity level. Further, we corroborate our theoretical results by presenting empirical evaluations in terms of phase transitions in rank and sparsity for various dictionary sizes.
dimension-reduction  algorithms  machine-learning  inverse-problems  to-write-about  to-simulate  consider:feature-discovery
6 weeks ago by Vaguery
[1904.05916] Synthetic Examples Improve Generalization for Rare Classes
The ability to detect and classify rare occurrences in images has important applications - for example, counting rare and endangered species when studying biodiversity, or detecting infrequent traffic scenarios that pose a danger to self-driving cars. Few-shot learning is an open problem: current computer vision systems struggle to categorize objects they have seen only rarely during training, and collecting a sufficient number of training examples of rare events is often challenging and expensive, and sometimes outright impossible. We explore in depth an approach to this problem: complementing the few available training images with ad-hoc simulated data.
Our testbed is animal species classification, which has a real-world long-tailed distribution. We analyze the effect of different axes of variation in simulation, such as pose, lighting, model, and simulation method, and we prescribe best practices for efficiently incorporating simulated data for real-world performance gain. Our experiments reveal that synthetic data can considerably reduce error rates for classes that are rare, that as the amount of simulated data is increased, accuracy on the target class improves, and that high variation of simulated data provides maximum performance gain.
machine-learning  training  the-mangle-in-practice  pedagogy  well-duh  toy-problems  image-processing  what-is-it-like-to-be-a-NN?
6 weeks ago by Vaguery
[1601.00670] Variational Inference: A Review for Statisticians
One of the core problems of modern statistics is to approximate difficult-to-compute probability densities. This problem is especially important in Bayesian statistics, which frames all inference about unknown quantities as a calculation involving the posterior density. In this paper, we review variational inference (VI), a method from machine learning that approximates probability densities through optimization. VI has been used in many applications and tends to be faster than classical methods, such as Markov chain Monte Carlo sampling. The idea behind VI is to first posit a family of densities and then to find the member of that family which is close to the target. Closeness is measured by Kullback-Leibler divergence. We review the ideas behind mean-field variational inference, discuss the special case of VI applied to exponential family models, present a full example with a Bayesian mixture of Gaussians, and derive a variant that uses stochastic optimization to scale up to massive data. We discuss modern research in VI and highlight important open problems. VI is powerful, but it is not yet well understood. Our hope in writing this paper is to catalyze statistical research on this class of algorithms.
machine-learning  statistics  feature-construction  to-understand  algorithms
6 weeks ago by Vaguery
[1905.10546] Protecting the Protected Group: Circumventing Harmful Fairness
Machine Learning (ML) algorithms shape our lives. Banks use them to determine if we are good borrowers; IT companies delegate them recruitment decisions; police apply ML for crime-prediction, and judges base their verdicts on ML. However, real-world examples show that such automated decisions tend to discriminate against protected groups. This potential discrimination generated a huge hype both in media and in the research community. Quite a few formal notions of fairness were proposed, which take a form of constraints a "fair" algorithm must satisfy. We focus on scenarios where fairness is imposed on a self-interested party (e.g., a bank that maximizes its revenue). We find that the disadvantaged protected group can be worse off after imposing a fairness constraint. We introduce a family of \textit{Welfare-Equalizing} fairness constraints that equalize per-capita welfare of protected groups, and include \textit{Demographic Parity} and \textit{Equal Opportunity} as particular cases. In this family, we characterize conditions under which the fairness constraint helps the disadvantaged group. We also characterize the structure of the optimal \textit{Welfare-Equalizing} classifier for the self-interested party, and provide an algorithm to compute it. Overall, our \textit{Welfare-Equalizing} fairness approach provides a unified framework for discussing fairness in classification in the presence of a self-interested party.
fairness  multiobjective-optimization  algorithms  machine-learning  technocracy  rather-interesting  the-mangle-in-practice  to-write-about  consider:not-using-the-same-framework-for-all-your-goals
7 weeks ago by Vaguery
[1905.13195] Modeling Uncertainty by Learning a Hierarchy of Deep Neural Connections
Modeling uncertainty in deep neural networks, despite recent important advances, is still an open problem. Bayesian neural networks are a powerful solution, where the prior over network weights is a design choice, often a normal distribution or other distribution encouraging sparsity. However, this prior is agnostic to the generative process of the input data, which might lead to unwarranted generalization for out-of-distribution tested data. We suggest the presence of a confounder for the relation between the input data and the discriminative function given the target label. We propose an approach for modeling this confounder by sharing neural connectivity patterns between the generative and discriminative networks. This approach leads to a new deep architecture, where networks are sampled from the posterior of local causal structures, and coupled into a compact hierarchy. We demonstrate that sampling networks from this hierarchy, proportionally to their posterior, is efficient and enables estimating various types of uncertainties. Empirical evaluations of our method demonstrate significant improvement compared to state-of-the-art calibration and out-of-distribution detection methods.
machine-learning  statistics  meta-modeling  feature-construction  model-interrogation  neural-networks  to-write-about  to-simulate
7 weeks ago by Vaguery
[1905.03694] Supervised Online Hashing via Hadamard Codebook Learning
In recent years, binary code learning, a.k.a hashing, has received extensive attention in large-scale multimedia retrieval. It aims to encode high-dimensional data points to binary codes, hence the original high-dimensional metric space can be efficiently approximated via Hamming space. However, most existing hashing methods adopted offline batch learning, which is not suitable to handle incremental datasets with streaming data or new instances. In contrast, the robustness of the existing online hashing remains as an open problem, while the embedding of supervised/semantic information hardly boosts the performance of the online hashing, mainly due to the defect of unknown category numbers in supervised learning. In this paper, we proposed an online hashing scheme, termed Hadamard Codebook based Online Hashing (HCOH), which aims to solve the above problems towards robust and supervised online hashing. In particular, we first assign an appropriate high-dimensional binary codes to each class label, which is generated randomly by Hadamard codes to each class label, which is generated randomly by Hadamard codes. Subsequently, LSH is adopted to reduce the length of such Hadamard codes in accordance with the hash bits, which can adapt the predefined binary codes online, and theoretically guarantee the semantic similarity. Finally, we consider the setting of stochastic data acquisition, which facilitates our method to efficiently learn the corresponding hashing functions via stochastic gradient descend (SGD) online. Notably, the proposed HCOH can be embedded with supervised labels and it not limited to a predefined category number. Extensive experiments on three widely-used benchmarks demonstrate the merits of the proposed scheme over the state-of-the-art methods. The code is available at this https URL.
representation  hashing  computer-science  rather-odd  machine-learning  to-understand  indirection
7 weeks ago by Vaguery
[1906.07848] Symbolic regression by uniform random global search
Symbolic regression (SR) is a data analysis problem where we search for the mathematical expression that best fits a numerical dataset. It is a global optimization problem. The most popular approach to SR is by genetic programming (SRGP). It is a common paradigm to compare an algorithm's performance to that of random search, but the data comparing SRGP to random search is lacking. We describe a novel algorithm for SR, namely SR by uniform random global search (SRURGS), also known as pure random search. We conduct experiments comparing SRURGS with SRGP using 100 randomly generated equations. Our results suggest that a SRGP is faster than SRURGS in producing equations with good R^2 for simple problems. However, our experiments suggest that SRURGS is more robust than SRGP, able to produce good output in more challenging problems. As SRURGS is arguably the simplest global search algorithm, we believe it should serve as a control algorithm against which other symbolic regression algorithms are compared. SRURGS has only one tuning parameter, and is conceptually very simple, making it a useful tool in solving SR problems. The method produces random equations, which is useful for the generation of symbolic regression benchmark problems. We have released well documented and open-source python code, currently under formal peer-review, so that interested researchers can deploy the tool in practice.
genetic-programming  symbolic-regression  machine-learning  algorithms  guessing  to-write-about
7 weeks ago by Vaguery
[1907.02260] On Explaining Machine Learning Models by Evolving Crucial and Compact Features
Feature construction can substantially improve the accuracy of Machine Learning (ML) algorithms. Genetic Programming (GP) has been proven to be effective at this task by evolving non-linear combinations of input features. GP additionally has the potential to improve ML explainability since explicit expressions are evolved. Yet, in most GP works the complexity of evolved features is not explicitly bound or minimized though this is arguably key for explainability. In this article, we assess to what extent GP still performs favorably at feature construction when constructing features that are (1) Of small-enough number, to enable visualization of the behavior of the ML model; (2) Of small-enough size, to enable interpretability of the features themselves; (3) Of sufficient informative power, to retain or even improve the performance of the ML algorithm. We consider a simple feature construction scheme using three different GP algorithms, as well as random search, to evolve features for five ML algorithms, including support vector machines and random forest. Our results on 21 datasets pertaining to classification and regression problems show that constructing only two compact features can be sufficient to rival the use of the entire original feature set. We further find that a modern GP algorithm, GP-GOMEA, performs best overall. These results, combined with examples that we provide of readable constructed features and of 2D visualizations of ML behavior, lead us to positively conclude that GP-based feature construction still works well when explicitly searching for compact features, making it extremely helpful to explain ML models.
7 weeks ago by Vaguery
[1909.13719] RandAugment: Practical automated data augmentation with a reduced search space
Recent work has shown that data augmentation has the potential to significantly improve the generalization of deep learning models. Recently, automated augmentation strategies have led to state-of-the-art results in image classification and object detection. While these strategies were optimized for improving validation accuracy, they also led to state-of-the-art results in semi-supervised learning and improved robustness to common corruptions of images. An obstacle to a large-scale adoption of these methods is a separate search phase which increases the training complexity and may substantially increase the computational cost. Additionally, due to the separate search phase, these approaches are unable to adjust the regularization strength based on model or dataset size. Automated augmentation policies are often found by training small models on small datasets and subsequently applied to train larger models. In this work, we remove both of these obstacles. RandAugment has a significantly reduced search space which allows it to be trained on the target task with no need for a separate proxy task. Furthermore, due to the parameterization, the regularization strength may be tailored to different model and dataset sizes. RandAugment can be used uniformly across different tasks and datasets and works out of the box, matching or surpassing all previous automated augmentation approaches on CIFAR-10/100, SVHN, and ImageNet. On the ImageNet dataset we achieve 85.0% accuracy, a 0.6% increase over the previous state-of-the-art and 1.0% increase over baseline augmentation. On object detection, RandAugment leads to 1.0-1.3% improvement over baseline augmentation, and is within 0.3% mAP of AutoAugment on COCO. Finally, due to its interpretable hyperparameter, RandAugment may be used to investigate the role of data augmentation with varying model and dataset size. Code is available online.
machine-learning  data-synthesis  training-data  define-your-terms  rather-interesting  coevolution  to-write-about  reinvented-wheels-but=deep=now
9 weeks ago by Vaguery
[1910.06985] How a minimal learning agent can infer the existence of unobserved variables in a complex environment
According to a mainstream position in contemporary cognitive science and philosophy, the use of abstract compositional concepts is both a necessary and a sufficient condition for the presence of genuine thought. In this article, we show how the ability to develop and utilise abstract conceptual structures can be achieved by a particular kind of learning agents. More specifically, we provide and motivate a concrete operational definition of what it means for these agents to be in possession of abstract concepts, before presenting an explicit example of a minimal architecture that supports this capability. We then proceed to demonstrate how the existence of abstract conceptual structures can be operationally useful in the process of employing previously acquired knowledge in the face of new experiences, thereby vindicating the natural conjecture that the cognitive functions of abstraction and generalisation are closely related.
Keywords: concept formation, projective simulation, reinforcement learning, transparent artificial intelligence, theory formation, explainable artificial intelligence (XAI)
machine-learning  representation  rather-interesting  to-understand  conceptualization  cause-and-effect
9 weeks ago by Vaguery
[1802.09904] Algorithmic Causal Deconvolution of Intertwined Programs and Networks by Generative Mechanism
Complex data usually results from the interaction of objects produced by different generating mechanisms. Here we introduce a universal, unsupervised and parameter-free model-oriented approach, based upon the seminal concept of algorithmic probability, that decomposes an observation into its most likely algorithmic generative sources. Our approach uses a causal calculus to infer model representations. We demonstrate its ability to deconvolve interacting mechanisms regardless of whether the resultant objects are strings, space-time evolution diagrams, images or networks. While this is mostly a conceptual contribution and a novel framework, we provide numerical evidence evaluating the ability of our methods to separate data from observations produced by discrete dynamical systems such as cellular automata and complex networks. We think that these separating techniques can contribute to tackling the challenge of causation, thus complementing other statistically oriented approaches.
machine-learning  dynamical-systems  inverse-problems  rather-interesting  system-identification  to-simulate  to-try  to-write-about
12 weeks ago by Vaguery
[1811.05031] A Review of automatic differentiation and its efficient implementation
Derivatives play a critical role in computational statistics, examples being Bayesian inference using Hamiltonian Monte Carlo sampling and the training of neural networks. Automatic differentiation is a powerful tool to automate the calculation of derivatives and is preferable to more traditional methods, especially when differentiating complex algorithms and mathematical functions. The implementation of automatic differentiation however requires some care to insure efficiency. Modern differentiation packages deploy a broad range of computational techniques to improve applicability, run time, and memory management. Among these techniques are operation overloading, region based memory, and expression templates. There also exist several mathematical techniques which can yield high performance gains when applied to complex algorithms. For example, semi-analytical derivatives can reduce by orders of magnitude the runtime required to numerically solve and differentiate an algebraic equation. Open problems include the extension of current packages to provide more specialized routines, and efficient methods to perform higher-order differentiation.
algorithms  numerical-methods  machine-learning  automatic-differentiation  gradient-methods  to-understand  review
12 weeks ago by Vaguery
[1902.10834] An evolutionary model that satisfies detailed balance
We propose a class of evolutionary models that involves an arbitrary exchangeable process as the breeding process and different selection schemes. In those models, a new genome is born according to the breeding process, and then a genome is removed according to the selection scheme that involves fitness. Thus the population size remains constant. The process evolves according to a Markov chain, and, unlike in many other existing models, the stationary distribution -- so called mutation-selection equilibrium -- can be easily found and studied. The behaviour of the stationary distribution when the population size increases is our main object of interest. Several phase-transition theorems are proved.
machine-learning  statistical-mechanics  Markov-models  dynamical-systems  looking-under-the-tractable-lamp-post  to-simulate  to-write-about  consider:lexicase
12 weeks ago by Vaguery
[1909.04240] Neural reparameterization improves structural optimization
Structural optimization is a popular method for designing objects such as bridge trusses, airplane wings, and optical devices. Unfortunately, the quality of solutions depends heavily on how the problem is parameterized. In this paper, we propose using the implicit bias over functions induced by neural networks to improve the parameterization of structural optimization. Rather than directly optimizing densities on a grid, we instead optimize the parameters of a neural network which outputs those densities. This reparameterization leads to different and often better solutions. On a selection of 116 structural optimization tasks, our approach produces the best design 50% more often than the best baseline method.
engineering-design  machine-learning  neural-networks  rather-odd  rather-interesting  you-can-use-a-hammer-to-drive-a-screw-if-you-try-real-hard
november 2019 by Vaguery
[1910.07291] Newton vs the machine: solving the chaotic three-body problem using deep neural networks
Since its formulation by Sir Isaac Newton, the problem of solving the equations of motion for three bodies under their own gravitational force has remained practically unsolved. Currently, the solution for a given initialization can only be found by performing laborious iterative calculations that have unpredictable and potentially infinite computational cost, due to the system's chaotic nature. We show that an ensemble of solutions obtained using an arbitrarily precise numerical integrator can be used to train a deep artificial neural network (ANN) that, over a bounded time interval, provides accurate solutions at fixed computational cost and up to 100 million times faster than a state-of-the-art solver. Our results provide evidence that, for computationally challenging regions of phase-space, a trained ANN can replace existing numerical solvers, enabling fast and scalable simulations of many-body systems to shed light on outstanding phenomena such as the formation of black-hole binary systems or the origin of the core collapse in dense star clusters.
machine-learning  deep-learning  numerical-methods  rather-interesting  approximation  symbolic-regression  to-write-about  to-simulate  consider:genetic-programming
november 2019 by Vaguery
[1909.13768] Backpropagation in the Simply Typed Lambda-calculus with Linear Negation
Backpropagation is a classic automatic differentiation algorithm computing the gradient of functions specified by a certain class of simple, first-order programs, called computational graphs. It is a fundamental tool in several fields, most notably machine learning, where it is the key for efficiently training (deep) neural networks. Recent years have witnessed the quick growth of a research field called differentiable programming, the aim of which is to express computational graphs more synthetically and modularly by resorting to actual programming languages endowed with flow control operators and higher-order combinators, such as map and fold. In this paper, we extend the backpropagation algorithm to a paradigmatic example of such a programming language: we define a compositional program transformation from the simply-typed lambda-calculus to itself augmented with a notion of linear negation, and prove that this computes the gradient of the source program with the same efficiency as first-order backpropagation. The transformation is completely effect-free and thus provides a purely logical understanding of the dynamics of backpropagation.
machine-learning  lambda-calculus  automatic-differentiation  can't-wait-to-understand  to-understand  generalization  I-guess?
november 2019 by Vaguery
[1803.08202] Person Following by Autonomous Robots: A Categorical Overview
A wide range of human-robot collaborative applications in diverse domains such as manufacturing, health care, the entertainment industry, and social interactions, require an autonomous robot to follow its human companion. Different working environments and applications pose diverse challenges by adding constraints on the choice of sensors, the degree of autonomy, and dynamics of a person-following robot. Researchers have addressed these challenges in many ways and contributed to the development of a large body of literature. This paper provides a comprehensive overview of the literature by categorizing different aspects of person-following by autonomous robots. Also, the corresponding operational challenges are identified based on various design choices for ground, underwater, and aerial scenarios. In addition, state-of-the-art methods for perception, planning, control, and interaction are elaborately discussed and their applicability in varied operational scenarios are presented. Then, some of the prominent methods are qualitatively compared, corresponding practicalities are illustrated, and their feasibility is analyzed for various use-cases. Furthermore, several prospective application areas are identified, and open problems are highlighted for future research.
november 2019 by Vaguery
[1910.02822] A mathematical theory of cooperative communication
Cooperative communication plays a central role in theories of human cognition, language, development, and culture, and is increasingly relevant in human-algorithm and robot interaction. Existing models are algorithmic in nature and do not shed light on the statistical problem solved in cooperation or on constraints imposed by violations of common ground. We present a mathematical theory of cooperative communication that unifies three broad classes of algorithmic models as approximations of Optimal Transport (OT). We derive a statistical interpretation for the problem approximated by existing models in terms of entropy minimization, or likelihood maximizing, plans. We show that some models are provably robust to violations of common ground, even supporting online, approximate recovery from discovered violations, and derive conditions under which other models are provably not robust. We do so using gradient-based methods which introduce novel algorithmic-level perspectives on cooperative communication. Our mathematical approach complements and extends empirical research, providing strong theoretical tools derivation of a priori constraints on models and implications for cooperative communication in theory and practice.
hey-I-know-this-guy  cognition  communication  collective-behavior  machine-learning  to-understand  to-write-about  consider:representation  consider:feature-discovery  consider:relevance-theory  compare-n-contrast
october 2019 by Vaguery
[1607.01759] Bag of Tricks for Efficient Text Classification
This paper explores a simple and efficient baseline for text classification. Our experiments show that our fast text classifier fastText is often on par with deep learning classifiers in terms of accuracy, and many orders of magnitude faster for training and evaluation. We can train fastText on more than one billion words in less than ten minutes using a standard multicore~CPU, and classify half a million sentences among~312K classes in less than a minute.
natural-language-processing  text-mining  classification  machine-learning  heuristics  representation  rather-interesting  neural-networks  to-simulate  consider:feature-discovery
september 2019 by Vaguery
A wide range of defenses have been proposed to harden neural networks against adversarial attacks. However, a pattern has emerged in which the majority of adversarial defenses are quickly broken by new attacks. Given the lack of success at generating robust defenses, we are led to ask a fundamental question: Are adversarial attacks inevitable? This paper analyzes adversarial examples from a theoretical perspective, and identifies fundamental bounds on the susceptibility of a classifier to adversarial attacks. We show that, for certain classes of problems, adversarial examples are inescapable. Using experiments, we explore the implications of theoretical guarantees for real-world problems and discuss how factors such as dimensionality and image complexity limit a classifier's robustness against adversarial examples.
september 2019 by Vaguery
[1905.12864] Interpretable Adversarial Training for Text
Generating high-quality and interpretable adversarial examples in the text domain is a much more daunting task than it is in the image domain. This is due partly to the discrete nature of text, partly to the problem of ensuring that the adversarial examples are still probable and interpretable, and partly to the problem of maintaining label invariance under input perturbations. In order to address some of these challenges, we introduce sparse projected gradient descent (SPGD), a new approach to crafting interpretable adversarial examples for text. SPGD imposes a directional regularization constraint on input perturbations by projecting them onto the directions to nearby word embeddings with highest cosine similarities. This constraint ensures that perturbations move each word embedding in an interpretable direction (i.e., towards another nearby word embedding). Moreover, SPGD imposes a sparsity constraint on perturbations at the sentence level by ignoring word-embedding perturbations whose norms are below a certain threshold. This constraint ensures that our method changes only a few words per sequence, leading to higher quality adversarial examples. Our experiments with the IMDB movie review dataset show that the proposed SPGD method improves adversarial example interpretability and likelihood (evaluated by average per-word perplexity) compared to state-of-the-art methods, while suffering little to no loss in training performance.
natural-language-processing  machine-learning  coevolution  adversarial-learning  rather-interesting  consider:looking-to-see  consider:feature-construction  the-mangle-in-practice
september 2019 by Vaguery
[1902.00415] Normalized Wasserstein Distance for Mixture Distributions with Applications in Adversarial Learning and Domain Adaptation
Understanding proper distance measures between distributions is at the core of several learning tasks such as generative models, domain adaptation, clustering, etc. In this work, we focus on {\it mixture distributions} that arise naturally in several application domains where the data contains different sub-populations. For mixture distributions, established distance measures such as the Wasserstein distance do not take into account imbalanced mixture proportions. Thus, even if two mixture distributions have identical mixture components but different mixture proportions, the Wasserstein distance between them will be large. This often leads to undesired results in distance-based learning methods for mixture distributions. In this paper, we resolve this issue by introducing {\it Normalized Wasserstein} distance. The key idea is to introduce mixture proportions as optimization variables, effectively normalizing mixture proportions in the Wasserstein formulation. Using the proposed normalized Wasserstein distance, instead of the vanilla one, leads to significant gains working with mixture distributions with imbalanced mixture proportions. We demonstrate effectiveness of the proposed distance in GANs, domain adaptation, adversarial clustering and hypothesis testing over mixture of Gaussians, MNIST, CIFAR-10, CelebA and VISDA datasets.
statistics  distance  metrics  performance-measure  numerical-methods  dimension-reduction  to-understand  machine-learning  to-simulate
september 2019 by Vaguery
[1904.08770] An Empirical Evaluation of Text Representation Schemes on Multilingual Social Web to Filter the Textual Aggression
This paper attempt to study the effectiveness of text representation schemes on two tasks namely: User Aggression and Fact Detection from the social media contents. In User Aggression detection, The aim is to identify the level of aggression from the contents generated in the Social media and written in the English, Devanagari Hindi and Romanized Hindi. Aggression levels are categorized into three predefined classes namely: Non-aggressive, Overtly Aggressive, and Covertly Aggressive. During the disaster-related incident, Social media like, Twitter is flooded with millions of posts. In such emergency situations, identification of factual posts is important for organizations involved in the relief operation. We anticipated this problem as a combination of classification and Ranking problem. This paper presents a comparison of various text representation scheme based on BoW techniques, distributed word/sentence representation, transfer learning on classifiers. Weighted F1 score is used as a primary evaluation metric. Results show that text representation using BoW performs better than word embedding on machine learning classifiers. While pre-trained Word embedding techniques perform better on classifiers based on deep neural net. Recent transfer learning model like ELMO, ULMFiT are fine-tuned for the Aggression classification task. However, results are not at par with pre-trained word embedding model. Overall, word embedding using fastText produce best weighted F1-score than Word2Vec and Glove. Results are further improved using pre-trained vector model. Statistical significance tests are employed to ensure the significance of the classification results. In the case of lexically different test Dataset, other than training Dataset, deep neural models are more robust and perform substantially better than machine learning classifiers.
text-analysis  text-processing  sentiment-analysis  natural-language-processing  social-media  to-write-about  consider:feature-discovery  machine-learning
september 2019 by Vaguery
[1809.02942] Cellular automata as convolutional neural networks
Deep learning techniques have recently demonstrated broad success in predicting complex dynamical systems ranging from turbulence to human speech, motivating broader questions about how neural networks encode and represent dynamical rules. We explore this problem in the context of cellular automata (CA), simple dynamical systems that are intrinsically discrete and thus difficult to analyze using standard tools from dynamical systems theory. We show that any CA may readily be represented using a convolutional neural network with a network-in-network architecture. This motivates our development of a general convolutional multilayer perceptron architecture, which we find can learn the dynamical rules for arbitrary CA when given videos of the CA as training data. In the limit of large network widths, we find that training dynamics are strongly stereotyped across replicates, and that common patterns emerge in the structure of networks trained on different CA rulesets. We train ensembles of networks on randomly-sampled CA, and we probe how the trained networks internally represent the CA rules using an information-theoretic technique based on distributions of layer activation patterns. We find that CA with simpler rule tables produce trained networks with hierarchical structure and layer specialization, while more complex CA tend to produce shallower representations---illustrating how the underlying complexity of the CA's rules influences the specificity of these internal representations. Our results suggest how the entropy of a physical process can affect its representation when learned by neural networks.
cellular-automata  dynamical-systems  rather-interesting  to-write-about  to-simulate  machine-learning  representation  feature-construction  information-theory  consider:NN-complexity-as-constructed-features
september 2019 by Vaguery
[1908.02894] How much data is sufficient to learn high-performing algorithms?
Algorithms for scientific analysis typically have tunable parameters that significantly influence computational efficiency and solution quality. If a parameter setting leads to strong algorithmic performance on average over a set of typical problem instances, that parameter setting---ideally---will perform well in the future. However, if the set of typical problem instances is small, average performance will not generalize to future performance. This raises the question: how large should this set be? We answer this question for any algorithm satisfying an easy-to-describe, ubiquitous property: its performance is a piecewise-structured function of its parameters. We are the first to provide a unified sample complexity framework for algorithm parameter configuration; prior research followed case-by-case analyses. We present applications from diverse domains, including biology, political science, and economics.
machine-learning  looking-to-see  learning-from-data  rather-interesting  to-write-about  to-simulate  consider:performance-measures  consider:lexicase  consider:symbolic-regression
september 2019 by Vaguery
[PDF] A brief review of image denoising algorithms and beyond
The recent advances in hardware and imaging systems made the digital cameras ubiquitous. Although the development of hardware has steadily improved the quality of images for the last several decades, image degradation is unavoidable due to the many factors affecting the image acquisition process and the subsequent post- processing. Image denoising, which aims to reconstruct a high quality image from its degraded observation, is a classical yet still very active topic in the area of low- level computer vision. It represents an important building block in real applications such as digital photography, medical image analysis, remote sensing, surveillance and digital entertainment. Also, image denoising constitutes an ideal test bed for evaluating image prior modeling methods. In this paper, we briefly review recent progresses in image denoising. We firstly present an overview of prior modeling approaches used in image denoising task. Then, we review conventional sparse representation based denoising algorithms, low-rank based denoising algorithms and recently proposed deep neural networks based approaches. At last, we discuss some emerging topics and open problems about image denoising.
image-processing  numerical-methods  approximation  rather-interesting  algorithms  machine-learning  to-write-about  to-simulate
september 2019 by Vaguery
[1710.02196] Porcupine Neural Networks: (Almost) All Local Optima are Global
Neural networks have been used prominently in several machine learning and statistics applications. In general, the underlying optimization of neural networks is non-convex which makes their performance analysis challenging. In this paper, we take a novel approach to this problem by asking whether one can constrain neural network weights to make its optimization landscape have good theoretical properties while at the same time, be a good approximation for the unconstrained one. For two-layer neural networks, we provide affirmative answers to these questions by introducing Porcupine Neural Networks (PNNs) whose weight vectors are constrained to lie over a finite set of lines. We show that most local optima of PNN optimizations are global while we have a characterization of regions where bad local optimizers may exist. Moreover, our theoretical and empirical results suggest that an unconstrained neural network can be approximated using a polynomially-large PNN.
neural-networks  machine-learning  constraint-satisfaction  multiobjective-optimization  rather-interesting  diversity  to-write-about  to-simulate  consider:feature-discovery  consider:packing-order
september 2019 by Vaguery
september 2019 by Vaguery
[1701.07396] LAREX - A semi-automatic open-source Tool for Layout Analysis and Region Extraction on Early Printed Books
A semi-automatic open-source tool for layout analysis on early printed books is presented. LAREX uses a rule based connected components approach which is very fast, easily comprehensible for the user and allows an intuitive manual correction if necessary. The PageXML format is used to support integration into existing OCR workflows. Evaluations showed that LAREX provides an efficient and flexible way to segment pages of early printed books.
OCR  digitization  machine-learning  algorithms  image-processing  image-segmentation  digital-humanities  to-understand  to-write-about  to-simulate
september 2019 by Vaguery
[1711.08042] "I know it when I see it". Visualization and Intuitive Interpretability
Most research on the interpretability of machine learning systems focuses on the development of a more rigorous notion of interpretability. I suggest that a better understanding of the deficiencies of the intuitive notion of interpretability is needed as well. I show that visualization enables but also impedes intuitive interpretability, as it presupposes two levels of technical pre-interpretation: dimensionality reduction and regularization. Furthermore, I argue that the use of positive concepts to emulate the distributed semantic structure of machine learning models introduces a significant human bias into the model. As a consequence, I suggest that, if intuitive interpretability is needed, singular representations of internal model states should be avoided.
august 2019 by Vaguery
[1908.01817] Sparsity Emerges Naturally in Neural Language Models
Concerns about interpretability, computational resources, and principled inductive priors have motivated efforts to engineer sparse neural models for NLP tasks. If sparsity is important for NLP, might well-trained neural models naturally become roughly sparse? Using the Taxi-Euclidean norm to measure sparsity, we find that frequent input words are associated with concentrated or sparse activations, while frequent target words are associated with dispersed activations but concentrated gradients. We find that gradients associated with function words are more concentrated than the gradients of content words, even controlling for word frequency.
deep-learning  interpretability  approximation  sparse-matrices  to-write-about  machine-learning  yeah-but-it's-still-a-constrained-ontology  consider:genetic-programming
august 2019 by Vaguery
[1908.00971] Physical machine learning outperforms "human learning" in Quantum Chemistry
Two types of approaches to modeling molecular systems have demonstrated high practical efficiency. Density functional theory (DFT), the most widely used quantum chemical method, is a physical approach predicting energies and electron densities of molecules. Recently, numerous papers on machine learning (ML) of molecular properties have also been published. ML models greatly outperform DFT in terms of computational costs, and may even reach comparable accuracy, but they are missing physicality - a direct link to Quantum Physics - which limits their applicability. Here, we propose an approach that combines the strong sides of DFT and ML, namely, physicality and low computational cost. We derive general equations for exact electron densities and energies that can naturally guide applications of ML in Quantum Chemistry. Based on these equations, we build a deep neural network that can compute electron densities and energies of a wide range of organic molecules not only much faster, but also closer to exact physical values than current versions of DFT. In particular, we reached a mean absolute error in energies of molecules with up to eight non-hydrogen atoms as low as 0.9 kcal/mol relative to CCSD(T) values, noticeably lower than those of DFT (approaching ~2 kcal/mol) and ML (~1.5 kcal/mol) methods. A simultaneous improvement in the accuracy of predictions of electron densities and energies suggests that the proposed approach describes the physics of molecules better than DFT functionals developed by "human learning" earlier. Thus, physics-based ML offers exciting opportunities for modeling, with high-theory-level quantum chemical accuracy, of much larger molecular systems than currently possible.
emergent-design  machine-learning  neural-networks  quantums  molecular-design  complexology  to-write-about
august 2019 by Vaguery
[1810.08237] Large-scale Hierarchical Alignment for Data-driven Text Rewriting
We propose a simple unsupervised method for extracting pseudo-parallel monolingual sentence pairs from comparable corpora representative of two different text styles, such as news articles and scientific papers. Our approach does not require a seed parallel corpus, but instead relies solely on hierarchical search over pre-trained embeddings of documents and sentences. We demonstrate the effectiveness of our method through automatic and extrinsic evaluation on text simplification from the normal to the Simple Wikipedia. We show that pseudo-parallel sentences extracted with our method not only supplement existing parallel data, but can even lead to competitive performance on their own.
natural-language-processing  text-mining  translation  rather-interesting  algorithms  representation  machine-learning  clustering  unsupervised-learning  to-understand
august 2019 by Vaguery
[1907.08226] Who is Afraid of Big Bad Minima? Analysis of Gradient-Flow in a Spiked Matrix-Tensor Model
Gradient-based algorithms are effective for many machine learning tasks, but despite ample recent effort and some progress, it often remains unclear why they work in practice in optimising high-dimensional non-convex functions and why they find good minima instead of being trapped in spurious ones.
Here we present a quantitative theory explaining this behaviour in a spiked matrix-tensor model.
Our framework is based on the Kac-Rice analysis of stationary points and a closed-form analysis of gradient-flow originating from statistical physics. We show that there is a well defined region of parameters where the gradient-flow algorithm finds a good global minimum despite the presence of exponentially many spurious local minima.
We show that this is achieved by surfing on saddles that have strong negative direction towards the global minima, a phenomenon that is connected to a BBP-type threshold in the Hessian describing the critical points of the landscapes.
machine-learning  optimization  fitness-landscapes  metaheuristics  representation  to-understand  topology  dynamical-systems  consider:performance-measures  consider:better-faster
august 2019 by Vaguery
[1807.01672] Ranked Reward: Enabling Self-Play Reinforcement Learning for Combinatorial Optimization
Adversarial self-play in two-player games has delivered impressive results when used with reinforcement learning algorithms that combine deep neural networks and tree search. Algorithms like AlphaZero and Expert Iteration learn tabula-rasa, producing highly informative training data on the fly. However, the self-play training strategy is not directly applicable to single-player games. Recently, several practically important combinatorial optimisation problems, such as the travelling salesman problem and the bin packing problem, have been reformulated as reinforcement learning problems, increasing the importance of enabling the benefits of self-play beyond two-player games. We present the Ranked Reward (R2) algorithm which accomplishes this by ranking the rewards obtained by a single agent over multiple games to create a relative performance metric. Results from applying the R2 algorithm to instances of a two-dimensional and three-dimensional bin packing problems show that it outperforms generic Monte Carlo tree search, heuristic algorithms and integer programming solvers. We also present an analysis of the ranked reward mechanism, in particular, the effects of problem instances with varying difficulty and different ranking thresholds.
machine-learning  reinforcement-learning  algorithms  self-play  mechanism-design  optimization  performance-measure  to-understand  operations-research  metaheuristics
august 2019 by Vaguery
[1804.02584] Random Order Contention Resolution Schemes
Contention resolution schemes have proven to be an incredibly powerful concept which allows to tackle a broad class of problems. The framework has been initially designed to handle submodular optimization under various types of constraints, that is, intersections of exchange systems (including matroids), knapsacks, and unsplittable flows on trees. Later on, it turned out that this framework perfectly extends to optimization under uncertainty, like stochastic probing and online selection problems, which further can be applied to mechanism design.
We add to this line of work by showing how to create contention resolution schemes for intersection of matroids and knapsacks when we work in the random order setting. More precisely, we do know the whole universe of elements in advance, but they appear in an order given by a random permutation. Upon arrival we need to irrevocably decide whether to take an element or not. We bring a novel technique for analyzing procedures in the random order setting that is based on the martingale theory. This unified approach makes it easier to combine constraints, and we do not need to rely on the monotonicity of contention resolution schemes.
Our paper fills the gaps, extends, and creates connections between many previous results and techniques. The main application of our framework is a k+4+ε approximation ratio for the Bayesian multi-parameter unit-demand mechanism design under the constraint of k matroids intersection, which improves upon the previous bounds of 4k−2 and e(k+1). Other results include improved approximation ratios for stochastic k-set packing and submodular stochastic probing over arbitrary non-negative submodular objective function, whereas previous results required the objective to be monotone.
operations-research  approximation  subproblems  rather-interesting  to-understand  online-learning  mechanism-design  machine-learning  uncertainty
august 2019 by Vaguery
[1711.04226] AON: Towards Arbitrarily-Oriented Text Recognition
Recognizing text from natural images is a hot research topic in computer vision due to its various applications. Despite the enduring research of several decades on optical character recognition (OCR), recognizing texts from natural images is still a challenging task. This is because scene texts are often in irregular (e.g. curved, arbitrarily-oriented or seriously distorted) arrangements, which have not yet been well addressed in the literature. Existing methods on text recognition mainly work with regular (horizontal and frontal) texts and cannot be trivially generalized to handle irregular texts. In this paper, we develop the arbitrary orientation network (AON) to directly capture the deep features of irregular texts, which are combined into an attention-based decoder to generate character sequence. The whole network can be trained end-to-end by using only images and word-level annotations. Extensive experiments on various benchmarks, including the CUTE80, SVT-Perspective, IIIT5k, SVT and ICDAR datasets, show that the proposed AON-based method achieves the-state-of-the-art performance in irregular datasets, and is comparable to major existing methods in regular datasets.
OCR  image-processing  image-segmentation  machine-learning  invariant-models  generalization  to-write-about  to-simulate
august 2019 by Vaguery
[1908.00868] Machine Learning as Ecology
Machine learning methods have had spectacular success on numerous problems. Here we show that a prominent class of learning algorithms - including Support Vector Machines (SVMs) -- have a natural interpretation in terms of ecological dynamics. We use these ideas to design new online SVM algorithms that exploit ecological invasions, and benchmark performance using the MNIST dataset. Our work provides a new ecological lens through which we can view statistical learning and opens the possibility of designing ecosystems for machine learning.
machine-learning  analogies  via:cshalizi  but-I-am-less-skeptical  a-good-analogy-is-a-blessing-to-the-mind  to-read
august 2019 by Vaguery
[1412.8615] On Randomized Algorithms for Matching in the Online Preemptive Model
We investigate the power of randomized algorithms for the maximum cardinality matching (MCM) and the maximum weight matching (MWM) problems in the online preemptive model. In this model, the edges of a graph are revealed one by one and the algorithm is required to always maintain a valid matching. On seeing an edge, the algorithm has to either accept or reject the edge. If accepted, then the adjacent edges are discarded. The complexity of the problem is settled for deterministic algorithms.
Almost nothing is known for randomized algorithms. A lower bound of 1.693 is known for MCM with a trivial upper bound of 2. An upper bound of 5.356 is known for MWM. We initiate a systematic study of the same in this paper with an aim to isolate and understand the difficulty. We begin with a primal-dual analysis of the deterministic algorithm due to McGregor. All deterministic lower bounds are on instances which are trees at every step. For this class of (unweighted) graphs we present a randomized algorithm which is 2815-competitive. The analysis is a considerable extension of the (simple) primal-dual analysis for the deterministic case. The key new technique is that the distribution of primal charge to dual variables depends on the "neighborhood" and needs to be done after having seen the entire input. The assignment is asymmetric: in that edges may assign different charges to the two end-points. Also the proof depends on a non-trivial structural statement on the performance of the algorithm on the input tree.
The other main result of this paper is an extension of the deterministic lower bound of Varadaraja to a natural class of randomized algorithms which decide whether to accept a new edge or not using independent random choices.
algorithms  online-algorithms  dynamic-programming  machine-learning  graph-theory  to-understand  mechanism-design  to-write-about  something-in-it
august 2019 by Vaguery
[1801.03462] Online Maximum Matching with Recourse
We study the online maximum matching problem in a model in which the edges are associated with a known recourse parameter k. An online algorithm for this problem has to maintain a valid matching while edges of the underlying graph are presented one after the other. At any moment the algorithm can decide to include an edge into the matching or to exclude it, under the restriction that at most k such actions per edge take place, where k is typically a small constant. This problem was introduced and studied in the context of general online packing problems with recourse by Avitabile et al. [AMP13], whereas the special case k=2 was studied by Boyar et al. [BFKL17].
In the first part of this paper, we consider the {\em edge arrival} model, in which an arriving edge never disappears from the graph. Here, we first show an improved analysis on the performance of the algorithm AMP given in [AMP13], by exploiting the structure of the matching problem. In addition, we extend the result of [BFKL17] and show that the greedy algorithm has competitive ratio 3/2 for every even k and ratio 2 for every odd k. Moreover, we present and analyze an improvement of the greedy algorithm which we call L-Greedy, and we show that for small values of k it outperforms the algorithm of [AMP13]. In terms of lower bounds, we show that no deterministic algorithm better than 1+1/(k−1) exists, improving upon the lower bound of 1+1/k shown in [AMP13].
The second part of the paper is devoted to the \emph{edge arrival/departure model}, which is the fully dynamic variant of online matching with recourse. The analysis of L-Greedy and AMP carry through in this model; moreover we show a lower bound of k2−3k+6k2−4k+7 for all even k≥4. For k∈{2,3}, the competitive ratio is 3/2.
online-algorithms  online-learning  rather-interesting  incremental-algorithms  decision-making  algorithms  machine-learning  combinatorics  to-understand  to-follow-downstream  to-write-about  ReQ  consider:backtracking  consider:performance-measures
august 2019 by Vaguery
[1608.02153] OCR of historical printings with an application to building diachronic corpora: A case study using the RIDGES herbal corpus
This article describes the results of a case study that applies Neural Network-based Optical Character Recognition (OCR) to scanned images of books printed between 1487 and 1870 by training the OCR engine OCRopus [@breuel2013high] on the RIDGES herbal text corpus [@OdebrechtEtAlSubmitted]. Training specific OCR models was possible because the necessary *ground truth* is available as error-corrected diplomatic transcriptions. The OCR results have been evaluated for accuracy against the ground truth of unseen test sets. Character and word accuracies (percentage of correctly recognized items) for the resulting machine-readable texts of individual documents range from 94% to more than 99% (character level) and from 76% to 97% (word level). This includes the earliest printed books, which were thought to be inaccessible by OCR methods until recently. Furthermore, OCR models trained on one part of the corpus consisting of books with different printing dates and different typesets *(mixed models)* have been tested for their predictive power on the books from the other part containing yet other fonts, mostly yielding character accuracies well above 90%. It therefore seems possible to construct generalized models trained on a range of fonts that can be applied to a wide variety of historical printings still giving good results. A moderate postcorrection effort of some pages will then enable the training of individual models with even better accuracies. Using this method, diachronic corpora including early printings can be constructed much faster and cheaper than by manual transcription. The OCR methods reported here open up the possibility of transforming our printed textual cultural heritage into electronic text by largely automatic means, which is a prerequisite for the mass conversion of scanned books.
OCR  image-processing  natural-language-processing  algorithms  machine-learning  rather-interesting  commodity-software  digital-humanities  to-write-about  consider:swarms  consider:stochastic-resonance
august 2019 by Vaguery
[1506.08415] PLG2: Multiperspective Processes Randomization and Simulation for Online and Offline Settings
Process mining represents an important field in BPM and data mining research. Recently, it has gained importance also for practitioners: more and more companies are creating business process intelligence solutions. The evaluation of process mining algorithms requires, as any other data mining task, the availability of large amount of real-world data. Despite the increasing availability of such datasets, they are affected by many limitations, in primis the absence of a "gold standard" (i.e., the reference model).
This paper extends an approach, already available in the literature, for the generation of random processes. Novelties have been introduced throughout the work and, in particular, they involve the complete support for multiperspective models and logs (i.e., the control-flow perspective is enriched with time and data information) and for online settings (i.e., generation of multiperspective event streams and concept drifts). The proposed new framework is able to almost entirely cover the spectrum of possible scenarios that can be observed in the real-world. The proposed approach is implemented as a publicly available Java application, with a set of APIs for the programmatic execution of experiments.
process-mining  learning-by-watching  machine-learning  statistics  modeling  dynamical-systems  what-gets-measured  to-write-about  representation  discrete-event-simulators  to-simulate  time-series
july 2019 by Vaguery
[1804.02101] Modeling Popularity in Asynchronous Social Media Streams with Recurrent Neural Networks
Understanding and predicting the popularity of online items is an important open problem in social media analysis. Considerable progress has been made recently in data-driven predictions, and in linking popularity to external promotions. However, the existing methods typically focus on a single source of external influence, whereas for many types of online content such as YouTube videos or news articles, attention is driven by multiple heterogeneous sources simultaneously - e.g. microblogs or traditional media coverage. Here, we propose RNN-MAS, a recurrent neural network for modeling asynchronous streams. It is a sequence generator that connects multiple streams of different granularity via joint inference. We show RNN-MAS not only to outperform the current state-of-the-art Youtube popularity prediction system by 17%, but also to capture complex dynamics, such as seasonal trends of unseen influence. We define two new metrics: promotion score quantifies the gain in popularity from one unit of promotion for a Youtube video; the loudness level captures the effects of a particular user tweeting about the video. We use the loudness level to compare the effects of a video being promoted by a single highly-followed user (in the top 1% most followed users) against being promoted by a group of mid-followed users. We find that results depend on the type of content being promoted: superusers are more successful in promoting Howto and Gaming videos, whereas the cohort of regular users are more influential for Activism videos. This work provides more accurate and explainable popularity predictions, as well as computational tools for content producers and marketers to allocate resources for promotion campaigns.
social-networks  prediction  machine-learning  marketing  cart-before-the-horse  to-write-about  to-simulate  consider:performance-measures  consider:false-positives
july 2019 by Vaguery
[1811.01910] Evolutionary Data Measures: Understanding the Difficulty of Text Classification Tasks
Classification tasks are usually analysed and improved through new model architectures or hyperparameter optimisation but the underlying properties of datasets are discovered on an ad-hoc basis as errors occur. However, understanding the properties of the data is crucial in perfecting models. In this paper we analyse exactly which characteristics of a dataset best determine how difficult that dataset is for the task of text classification. We then propose an intuitive measure of difficulty for text classification datasets which is simple and fast to calculate. We show that this measure generalises to unseen data by comparing it to state-of-the-art datasets and results. This measure can be used to analyse the precise source of errors in a dataset and allows fast estimation of how difficult a dataset is to learn. We searched for this measure by training 12 classical and neural network based models on 78 real-world datasets, then use a genetic algorithm to discover the best measure of difficulty. Our difficulty-calculating code ( this https URL ) and datasets ( this http URL ) are publicly available.
july 2019 by Vaguery
[1811.00430] GA Based Q-Attack on Community Detection
Community detection plays an important role in social networks, since it can help to naturally divide the network into smaller parts so as to simplify network analysis. However, on the other hand, it arises the concern that individual information may be over-mined, and the concept community deception thus is proposed to protect individual privacy on social networks. Here, we introduce and formalize the problem of community detection attack and develop efficient strategies to attack community detection algorithms by rewiring a small number of connections, leading to individual privacy protection. In particular, we first give two heuristic attack strategies, i.e., Community Detection Attack (CDA) and Degree Based Attack (DBA), as baselines, utilizing the information of detected community structure and node degree, respectively. And then we propose a Genetic Algorithm (GA) based Q-Attack, where the modularity Q is used to design the fitness function. We launch community detection attack based on the above three strategies against three modularity based community detection algorithms on two social networks. By comparison, our Q-Attack method achieves much better attack effects than CDA and DBA, in terms of the larger reduction of both modularity Q and Normalized Mutual Information (NMI). Besides, we find that the adversarial networks obtained by Q-Attack on a specific community detection algorithm can be still effective on others, no matter whether they are modularity based or not, indicating its strong transferability.
july 2019 by Vaguery
[1906.07774] Information matrices and generalization
This work revisits the use of information criteria to characterize the generalization of deep learning models. In particular, we empirically demonstrate the effectiveness of the Takeuchi information criterion (TIC), an extension of the Akaike information criterion (AIC) for misspecified models, in estimating the generalization gap, shedding light on why quantities such as the number of parameters cannot quantify generalization. The TIC depends on both the Hessian of the loss H and the covariance of the gradients C. By exploring the similarities and differences between these two matrices as well as the Fisher information matrix F, we study the interplay between noise and curvature in deep models. We also address the question of whether C is a reasonable approximation to F, as is commonly assumed.
machine-learning  neural-networks  generalization  performance-measure  rather-interesting  to-understand  to-generalize  (hah)  to-write-about
july 2019 by Vaguery
[1712.07822] Geometrical Insights for Implicit Generative Modeling
Learning algorithms for implicit generative models can optimize a variety of criteria that measure how the data distribution differs from the implicit model distribution, including the Wasserstein distance, the Energy distance, and the Maximum Mean Discrepancy criterion. A careful look at the geometries induced by these distances on the space of probability measures reveals interesting differences. In particular, we can establish surprising approximate global convergence guarantees for the 1-Wasserstein distance,even when the parametric generator has a nonconvex parametrization.
machine-learning  representation  approximation  define-your-terms  looking-to-see  rather-interesting  mathematical-programming  to-write-about  may-not-apply-outside-the-silo
july 2019 by Vaguery
[1802.07481] Celer: a Fast Solver for the Lasso with Dual Extrapolation
Convex sparsity-inducing regularizations are ubiquitous in high-dimensional machine learning, but solving the resulting optimization problems can be slow. To accelerate solvers, state-of-the-art approaches consist in reducing the size of the optimization problem at hand. In the context of regression, this can be achieved either by discarding irrelevant features (screening techniques) or by prioritizing features likely to be included in the support of the solution (working set techniques). Duality comes into play at several steps in these techniques. Here, we propose an extrapolation technique starting from a sequence of iterates in the dual that leads to the construction of improved dual points. This enables a tighter control of optimality as used in stopping criterion, as well as better screening performance of Gap Safe rules. Finally, we propose a working set strategy based on an aggressive use of Gap Safe screening rules. Thanks to our new dual point construction, we show significant computational speedups on multiple real-world problems.
machine-learning  statistics  horse-races  performance-measure  computational-complexity  to-read  algorithms
july 2019 by Vaguery
How the Transformers broke NLP leaderboards - Hacking semantics
If leaderboards are to highlight the actual progress, we need to incentivize new architectures rather than teams outspending each other. Obviously, huge pretrained models are valuable, but unless the authors show that their system consistently behaves differently from its competition with comparable data & compute, it is not clear whether they are presenting a model or a resource.

Furthermore, much of this research is not reproducible: nobody is going to spend \$250,000 just to repeat XLNet training. Given the fact that its ablation study showed only 1-2% gain over BERT in 3 datasets out of 4 (Yang et al., 2019), we don’t actually know for sure that its masking strategy is more successful than BERT’s.

At the same time, the development of leaner models is dis-incentivized, as their task is fundamentally harder and the leaderboard-oriented community only rewards the SOTA. That, in its turn, prices out of competitions academic teams, which will not result in students becoming better engineers when they graduate.

Last but not the least, huge DL models are often overparametrized (Frankle & Carbin, 2019; Wu, Fan, Baevski, Dauphin, & Auli, 2019). As an example, the smaller version of BERT achieves better scores on a number of syntax-testing experiments than the larger one (Goldberg, 2019). The fact that DL models require a lot of compute is not necessarily a bad thing in itself, but wasting compute is not ideal for the environment (Strubell, Ganesh, & McCallum, 2019).
benchmarking  machine-learning  horse-races  performance-measure  multiobjective-optimization  (use-it)
july 2019 by Vaguery
Krazy Kat Comics
This page goes into detail on how I used Machine Learning to find hundreds of Krazy Kat comics that are now in the public domain.

As a result of this project, several hundred high resolution scans of Krazy Kat comics are now easily available online, including a comic that I couldn't find in any published book!

What follows is a detailed description of what I did to find these comics in online newspaper archives.
OCR  archives  newspapers  rather-interesting  digitization  to-write-about  deep-learning  machine-learning  image-processing
july 2019 by Vaguery
[1811.06678] The Potential of Learned Index Structures for Index Compression
Inverted indexes are vital in providing fast key-word-based search. For every term in the document collection, a list of identifiers of documents in which the term appears is stored, along with auxiliary information such as term frequency, and position offsets. While very effective, inverted indexes have large memory requirements for web-sized collections. Recently, the concept of learned index structures was introduced, where machine learned models replace common index structures such as B-tree-indexes, hash-indexes, and bloom-filters. These learned index structures require less memory, and can be computationally much faster than their traditional counterparts. In this paper, we consider whether such models may be applied to conjunctive Boolean querying. First, we investigate how a learned model can replace document postings of an inverted index, and then evaluate the compromises such an approach might have. Second, we evaluate the potential gains that can be achieved in terms of memory requirements. Our work shows that learned models have great potential in inverted indexing, and this direction seems to be a promising area for future research.
indexing  machine-learning  approximation  summarization  feature-construction  rather-interesting  compression  to-write-about
july 2019 by Vaguery
A survey of Boolean matching techniques for library binding
When binding a logic network to a set of cells, a fundamental problem is recognizing whether a cell can implement a portion of the network. Boolean matching means solving this task using a formalism based on Boolean algebra. In its simplest form, Boolean matching can be posed as a tautology check. We review several approaches to Boolean matching as well as to its generalization to cases involving don't care conditions and its restriction to specific libraries such as those typical of anti-fuse based FPGAs. We then present a general formulation of Boolean matching supporting multiple-output logic cells.
Boolean-functions  inverse-problems  rather-odd  machine-learning  information-theory  engineering-design  representation  nudge-targets  to-understand
june 2019 by Vaguery
[1708.04597] An Efficient NPN Boolean Matching Algorithm Based on Structural Signature and Shannon Expansion
An efficient pairwise Boolean matching algorithm to solve the problem of matching single-output specified Boolean functions under input negation and/or input permutation and/or output negation (NPN) is proposed in this paper. We present the Structural Signature (SS) vector, which is composed of a 1st signature value, two symmetry marks, and a group mark. As a necessary condition for NPN Boolean matching, the structural signature is more effective than is the traditional signature. Two Boolean functions, f and g, may be equivalent when they have the same SS vector. The symmetry mark can distinguish symmetric variables and asymmetric variables and search multiple variable mappings in a single variable-mapping search operation, which reduces the search space significantly. Updating the SS vector using Shannon decomposition provides benefits in distinguishing unidentified variables, and the group mark and the phase collision check discover incorrect variable mappings quickly, which also speeds up the NPN Boolean matching process. Using the algorithm proposed in this paper, we tested both equivalent and non-equivalent matching peeds on the MCNC benchmark circuit sets and the random circuit sets. In the experiment, our algorithm is two times faster than competitors when testing equivalent circuits and averages at least one hundred times faster when testing non-equivalent circuits. The experimental results show that our approach is highly effective in solving the NPN Boolean matching problem.
jargon-overlap-warning  Boolean-functions  inverse-problems  rather-interesting  machine-learning  algorithms  to-understand  no-really-why-are-they-using-these-terms-it-hurts
june 2019 by Vaguery
[1906.07983] Explanations can be manipulated and geometry is to blame
Explanation methods aim to make neural networks more trustworthy and interpretable. In this paper, we demonstrate a property of explanation methods which is disconcerting for both of these purposes. Namely, we show that explanations can be manipulated arbitrarily by applying visually hardly perceptible perturbations to the input that keep the network's output approximately constant. We establish theoretically that this phenomenon can be related to certain geometrical properties of neural networks. This allows us to derive an upper bound on the susceptibility of explanations to manipulations. Based on this result, we propose effective mechanisms to enhance the robustness of explanations.
neural-networks  machine-learning  well-duh  adversarial-testing  metaheuristics  not-surprising-really-now-izzit?
june 2019 by Vaguery
[1803.08625] A Concept Learning Tool Based On Calculating Version Space Cardinality
In this paper, we proposed VeSC-CoL (Version Space Cardinality based Concept Learning) to deal with concept learning on extremely imbalanced datasets, especially when cross-validation is not a viable option. VeSC-CoL uses version space cardinality as a measure for model quality to replace cross-validation. Instead of naive enumeration of the version space, Ordered Binary Decision Diagram and Boolean Satisfiability are used to compute the version space. Experiments show that VeSC-CoL can accurately learn the target concept when computational resource is allowed.
machine-learning  data-analysis  representation  to-understand  error  head-scratcher
june 2019 by Vaguery
[1810.00481] Two new results about quantum exact learning
We present two new results about exact learning by quantum computers. First, we show how to exactly learn a k-Fourier-sparse n-bit Boolean function from O(k1.5(logk)2) uniform quantum examples for that function. This improves over the bound of Θ˜(kn) uniformly random classical examples (Haviv and Regev, CCC'15). Our main tool is an improvement of Chang's lemma for the special case of sparse functions. Second, we show that if a concept class  can be exactly learned using Q quantum membership queries, then it can also be learned using O(Q2logQlog||) classical membership queries. This improves the previous-best simulation result (Servedio and Gortler, SICOMP'04) by a logQ-factor.
quantums  quantum-computing  machine-learning  classification  algorithms  to-understand
june 2019 by Vaguery
[1904.01681] Augmented Neural ODEs
We show that Neural Ordinary Differential Equations (ODEs) learn representations that preserve the topology of the input space and prove that this implies the existence of functions Neural ODEs cannot represent. To address these limitations, we introduce Augmented Neural ODEs which, in addition to being more expressive models, are empirically more stable, generalize better and have a lower computational cost than Neural ODEs.
neural-networks  machine-learning  representation  rather-interesting  generalization  numerical-methods  to-write-about  consider:genetic-programming
june 2019 by Vaguery
[1901.06758] A deep learning approach to real-time parking occupancy prediction in spatio-temporal networks incorporating multiple spatio-temporal data sources
A deep learning model is applied for predicting block-level parking occupancy in real time. The model leverages Graph-Convolutional Neural Networks (GCNN) to extract the spatial relations of traffic flow in large-scale networks, and utilizes Recurrent Neural Networks (RNN) with Long-Short Term Memory (LSTM) to capture the temporal features. In addition, the model is capable of taking multiple heterogeneously structured traffic data sources as input, such as parking meter transactions, traffic speed, and weather conditions. The model performance is evaluated through a case study in Pittsburgh downtown area. The proposed model outperforms other baseline methods including multi-layer LSTM and Lasso with an average testing MAPE of 10.6\% when predicting block-level parking occupancies 30 minutes in advance. The case study also shows that, in generally, the prediction model works better for business areas than for recreational locations. We found that incorporating traffic speed and weather information can significantly improve the prediction performance. Weather data is particularly useful for improving predicting accuracy in recreational areas.
machine-learning  city-planning  data-analysis  looking-to-see  prediction  deep-learning  to-write-about  consider:data-sourcing
june 2019 by Vaguery
[1503.02406] Deep Learning and the Information Bottleneck Principle
Deep Neural Networks (DNNs) are analyzed via the theoretical framework of the information bottleneck (IB) principle. We first show that any DNN can be quantified by the mutual information between the layers and the input and output variables. Using this representation we can calculate the optimal information theoretic limits of the DNN and obtain finite sample generalization bounds. The advantage of getting closer to the theoretical limit is quantifiable both by the generalization bound and by the network's simplicity. We argue that both the optimal architecture, number of layers and features/connections at each layer, are related to the bifurcation points of the information bottleneck tradeoff, namely, relevant compression of the input layer with respect to the output layer. The hierarchical representations at the layered network naturally correspond to the structural phase transitions along the information curve. We believe that this new insight can lead to new optimality bounds and deep learning algorithms.
neural-networks  information-theory  representation  define-your-terms  to-understand  modeling-is-not-mathematics  machine-learning  to-generalize
may 2019 by Vaguery
[1804.02998] Anomaly Detection for Industrial Big Data
As the Industrial Internet of Things (IIoT) grows, systems are increasingly being monitored by arrays of sensors returning time-series data at ever-increasing 'volume, velocity and variety' (i.e. Industrial Big Data). An obvious use for these data is real-time systems condition monitoring and prognostic time to failure analysis (remaining useful life, RUL). (e.g. See white papers by this http URL, and output of the NASA Prognostics Center of Excellence (PCoE).) However, as noted by Agrawal and Choudhary 'Our ability to collect "big data" has greatly surpassed our capability to analyze it, underscoring the emergence of the fourth paradigm of science, which is data-driven discovery.' In order to fully utilize the potential of Industrial Big Data we need data-driven techniques that operate at scales that process models cannot. Here we present a prototype technique for data-driven anomaly detection to operate at industrial scale. The method generalizes to application with almost any multivariate dataset based on independent ordinations of repeated (bootstrapped) partitions of the dataset and inspection of the joint distribution of ordinal distances.
anomaly-detection  machine-learning  dimension-reduction  statistics  to-write-about  algorithms
may 2019 by Vaguery
[1811.06838] The Trace Criterion for Kernel Bandwidth Selection for Support Vector Data Description
Support vector data description (SVDD) is a popular anomaly detection technique. The SVDD classifier partitions the whole data space into an inlier region, which consists of the region near the training data, and an outlier region, which consists of points away from the training data. The computation of the SVDD classifier requires a kernel function, for which the Gaussian kernel is a common choice. The Gaussian kernel has a bandwidth parameter, and it is important to set the value of this parameter correctly for good results. A small bandwidth leads to overfitting such that the resulting SVDD classifier overestimates the number of anomalies, whereas a large bandwidth leads to underfitting and an inability to detect many anomalies. In this paper, we present a new unsupervised method for selecting the Gaussian kernel bandwidth. Our method, which exploits the low-rank representation of the kernel matrix to suggest a kernel bandwidth value, is competitive with existing bandwidth selection methods.
machine-learning  anomaly-detection  outlier-detection  algorithms  parametrization  approximation  performance-measure  unsupervised-learning  to-understand  heuristics
april 2019 by Vaguery
per page:    204080120160

Copy this bookmark: