recentpopularlog in


« earlier   
Performance Evaluation in Machine Learning:The Good, The Bad, The Ugly and The Way Forward
"This paper gives an overview of some ways in which our understanding of performance evaluation measures for machine-learned classifiers has improved over the last twenty years. I also highlight a range of areas where this understanding is still lacking, leading to ill-advised practices in classifier evaluation. This suggests that in order to make further progress we need to develop a proper measurement theory of machine learning. I then demonstrate by example what such a measurement theory might look like and what kinds of new results it would entail. Finally, I argue that key properties such as classification ability and data set difficulty are unlikely to be directly observable, suggesting the need for latent-variable models and causal inference."
machine-learning  evaluation  measurement 
9 days ago by arsyed
[1901.11373] Learning and Evaluating General Linguistic Intelligence
We define general linguistic intelligence as the ability to reuse previously acquired knowledge about a language's lexicon, syntax, semantics, and pragmatic conventions to adapt to new tasks quickly. Using this definition, we analyze state-of-the-art natural language understanding models and conduct an extensive empirical investigation to evaluate them against these criteria through a series of experiments that assess the task-independence of the knowledge being acquired by the learning process. In addition to task performance, we propose a new evaluation metric based on an online encoding of the test data that quantifies how quickly an existing agent (model) learns a new task. Our results show that while the field has made impressive progress in terms of model architectures that generalize to many tasks, these models still require a lot of in-domain training examples (e.g., for fine tuning, training task-specific modules), and are prone to catastrophic forgetting. Moreover, we find that far from solving general tasks (e.g., document question answering), our models are overfitting to the quirks of particular datasets (e.g., SQuAD). We discuss missing components and conjecture on how to make progress toward general linguistic intelligence.
evaluation  nlp  nlu 
10 days ago by arsyed
Questions for a new technology.
"Given that coordination and communication swamp all other costs in modern software development it is a pressing area to invest in, especially as your team scales."
development  evaluation  management  technology  business  questions 
11 days ago by garrettc
Questions for a new technology. | Kellan Elliott-McCrea
Good questions to ask yourself or your team before jumping on The New Thing. Like Dr. Wave’s metaphor of a nail in the head: some things are more painful to change than to live with (and constant change is even worse) so adopting the new thing must be done judiciously.
technology  adoption  evaluation 
13 days ago by dlkinney
My alma mater is seeking a consultant (based anywhere) to conduct an of its…
evaluation  from twitter
14 days ago by sdp
Questions for a new technology. | Kellan Elliott-McCrea
1. What problem are we trying to solve?
2. How could we solve the problem with our current tech stack?
3. Are we clear on what new costs we are taking on with the new technology?
4. What about our current stack makes solving this problem in a cost-effective manner difficult?
5. If this new tech is a replacement for something we currently do, are we committed to moving everything to this new technology in the future?
6. Who do we know and trust who uses this tech? Have we talked to them about it? What did they say about it? What don’t they like about it?
7. What’s a low risk way to get started?
8. Have you gotten a mixed discipline group of senior folks together and thrashed out each of the above points? Where is that documented?
via:swillison  advise  evaluation  software 
15 days ago by leeomara
"7GUIs defines seven tasks that represent typical challenges in GUI programming. In addition, 7GUIs provides a recommended set of evaluation dimensions."
programming  guis  frameworks  evaluation  dopost 
22 days ago by niksilver

Copy this bookmark:

to read