python "debugger"
Bit like a blend of pdb and println'ing everything
python  debugging 
yesterday by tobym
Dask.distributed — Dask.distributed 1.26.0+2.g716142e8 documentation
Dask.distributed is a lightweight library for distributed computing in Python. It extends both the concurrent.futures and dask APIs to moderate sized clusters.
python  distributed 
14 days ago by tobym
cuML and Dask hyperparameter optimization
cuML emulates the scikit-learn API. GridSearchCV or RandomizedSearchCV are used to define the search space for hyper-parameters. Dask-ML improves efficiency of that search. cuML is a drop-in replacement for scikit-learn. Still in development really, but even more significant speed gains are expected soon; direction already known.

By rapidsai.
python  datascience 
14 days ago by tobym
Open GPU Data Science | RAPIDS
Open source software libraries to execute e2e data science and analytics pipelines entirely on GPUs.
gpu  python  nvidia 
14 days ago by tobym
Network Graphs — HoloViews
Holoview + datashader can show a relatively large dataset in a relatively decent amount of time
network  graph  visual  dataviz  visualization  python  notebook  bigdata 
17 days ago by tobym
Altair: Declarative Visualization in Python — Altair 2.4.1 documentation
Altair is a declarative statistical visualization library for Python, based on Vega and Vega-Lite, and the source is available on GitHub.
python  visualization  analytics  dataviz 
22 days ago by tobym
A Grammar of Graphics for Python — plotnine 0.5.1+38.g0c9f85b documentation
plotnine is an implementation of a grammar of graphics in Python, it is based on ggplot2. The grammar allows users to compose plots by explicitly mapping data to the visual objects that make up the plot.

Plotting with a grammar is powerful, it makes custom (and otherwise complex) plots are easy to think about and then create, while the simple plots remain simple.
python  dataviz 
22 days ago by tobym
PyViz 0.10.0 documentation
PyViz is a coordinated effort to make data visualization in Python easier to use, easier to learn, and more powerful.

Focusing on interactive plotting in web browsers, PyViz provides:

High-level tools that make it easier to apply Python plotting libraries to your data.
A comprehensive tutorial showing how to use the available tools together to do a wide range of different tasks.
A Conda metapackage "pyviz" that makes it simple to install matching versions of libraries that work well together.
Sample datasets to work with.
python  dataviz  visualization 
22 days ago by tobym
Datashader 0.6.9 documentation
Turns even the largest data into images, accurately

Datashader is a graphics pipeline system for creating meaningful representations of large datasets quickly and flexibly. Datashader breaks the creation of images into a series of explicit steps that allow computations to be done on intermediate representations. This approach allows accurate and effective visualizations to be produced automatically without trial-and-error parameter tuning, and also makes it simple for data scientists to focus on particular data and relationships of interest in a principled way.
python  big-data  visualization  dataviz 
22 days ago by tobym
Welcome — Magic-Wormhole 0.11.2+75.ga5e011f.dirty documentation
Easily and securely send files from one location to another. Requires a transit and relay server; a public one is in the code. Source:
encryption  python  filesharing 
9 weeks ago by tobym
Gunicorn - Python WSGI HTTP Server for UNIX
Gunicorn 'Green Unicorn' is a Python WSGI HTTP Server for UNIX. It's a pre-fork worker model. The Gunicorn server is broadly compatible with various web frameworks, simply implemented, light on server resources, and fairly speedy.
web  server  python 
9 weeks ago by tobym
hmmlearn — hmmlearn 0.2.1 documentation
Simple algorithms and models to learn Hidden Markov Models. API similar to scikit-learn, just adapted for sequence data. BSD license.

Implements the Baum-Welch algorithm to estimate model parameters from just the observed data. This is how RenTech started off crushing it in the markets.
hidden-markov-model  datascience  python  hmm  finance  algorithm 
11 weeks ago by tobym
Pattern | CLiPS
Pattern is a web mining module for the Python programming language.

It has tools for data mining (Google, Twitter and Wikipedia API, a web crawler, a HTML DOM parser), natural language processing (part-of-speech taggers, n-gram search, sentiment analysis, WordNet), machine learning (vector space model, clustering, SVM), network analysis and <canvas> visualization.
nlp  web  python  mining 
11 weeks ago by tobym
TextBlob: Simplified Text Processing — TextBlob 0.15.2 documentation
TextBlob is a Python (2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.

Uses NLTK and pattern.

Noun phrase extraction
Part-of-speech tagging
Sentiment analysis
Classification (Naive Bayes, Decision Tree)
Language translation and detection powered by Google Translate
Tokenization (splitting text into words and sentences)
Word and phrase frequencies
Word inflection (pluralization and singularization) and lemmatization
Spelling correction
Add new models or languages through extensions
WordNet integration
python  nlp 
11 weeks ago by tobym
Mars is a tensor-based unified framework for large-scale data computation
numpy API (a bunch of it, at least) re-implemented to support distributed computation; by Alibaba. Run on single node with threaded scheduler, or on a cluster with thousands of nodes. Simple cluster setup with 1 master, 1 web api, n workers.
python  distributed  numpy  computing  cluster 
january 2019 by tobym
Chainer: A flexible framework for neural networks
A Powerful, Flexible, and Intuitive Framework for Neural Networks

Chainer is a Python-based deep learning framework aiming at flexibility. It provides automatic differentiation APIs based on the define-by-run approach (a.k.a. dynamic computational graphs) as well as object-oriented high-level APIs to build and train neural networks. It also supports CUDA/cuDNN using CuPy for high performance training and inference.

Compare with tensorflow, pytorch, keras, CNTK.
machinelearning  python  framework  library  ai  ml  neuralnetwork 
december 2018 by tobym
lolviz - visualize python data structures in jupyter, via graphviz's dot
A simple Python data-structure visualization tool for lists of lists, lists, dictionaries; primarily for use in Jupyter notebooks / presentations
jupyter  python  visualization  datastructure 
december 2018 by tobym
Dask: Scalable analytics in Python
Natively scale python. Sort of like PySpark, but more python-natural. Scale from 1 to thousands of nodes. Works with numpy, pandas, scikit-learn.

Use this with a notebook and datashader to quickly and interactively visualize huge datasets.
bigdata  python  analytics  parallel 
october 2018 by tobym
seaborn: statistical data visualization — seaborn 0.9.0 documentation
Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics
python  statistics  visualization  library  dataviz  charts 
october 2018 by tobym
Welcome to Bokeh — Bokeh 0.13.0 documentation
Bokeh is an interactive visualization library that targets modern web browsers for presentation. Its goal is to provide elegant, concise construction of versatile graphics, and to extend this capability with high-performance interactivity over very large or streaming datasets. Bokeh can help anyone who would like to quickly and easily create interactive plots, dashboards, and data applications.
python  visualization  dataviz  bokeh  streaming 
october 2018 by tobym
Shapely — Shapely 1.2 and 1.3 documentation
Shapely is a BSD-licensed Python package for manipulation and analysis of planar geometric objects. It is based on the widely deployed GEOS (the engine of PostGIS) and JTS (from which GEOS is ported) libraries. Shapely is not concerned with data formats or coordinate systems, but can be readily integrated with packages that are.
python  gis 
october 2018 by tobym
Folium — Folium 0.6.0+26.gd67cc26 documentation
folium builds on the data wrangling strengths of the Python ecosystem and the mapping strengths of the leaflet.js library. Manipulate your data in Python, then visualize it in on a Leaflet map via folium.

Overlay support with GeoJSON and TopoJSON.
python  visualization  javascript  maps  geo  dataviz 
october 2018 by tobym
streamlit · PyPI
pip install streamlit
python -m streamlit help

Python library for producing "live" reports, kind of like a notebook but not interactive. You import streamlit into your report code, run python computation and use streamlit to format the output (charts, code blocks, intuitively formatted numpy arrays and pandas dataframes). When you run it, streamlit starts a proxy and opens the local browser, pointing to that proxy, and shows your code/report that way.
charts  python 
july 2018 by tobym
Babel — Babel 2.6.0 documentation
Babel is a collection of tools for internationalizing Python applications.
python  i18n 
july 2018 by tobym
Spyder - Documentation — Spyder 3 documentation
IDE for python, has graphical frontend to pdb for debugging. Breakpoint support and the normal stuff. Not sure how it compares to IntelliJ with python plugin or PyCharm.
python  ide  debugging 
july 2018 by tobym
spotify/annoy: Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk
Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk
python  c++  graph  algorithm  library 
july 2018 by tobym
Welcome to PyTables’ documentation! — PyTables 3.4.4 documentation
PyTables is a package for managing hierarchical datasets and designed to efficiently and easily cope with extremely large amounts of data.
python  data  gui  hdf5  viewer  tool  library 
july 2018 by tobym
Overview | ViTables
ViTables is a component of the PyTables family. It is a GUI for browsing and editing files in both PyTables and HDF5 formats. It is developed using Python and PyQt5 (the Python bindings to Qt, so it can run on any platform that supports these components.

ViTables capabilities include easy navigation through the data hierarchy, displaying of real data and its associated metadata, a simple, yet powerful, browsing of multidimensional data and much more.

As a viewer, one of the greatest strengths of ViTables is its ability to display very large datasets. Tables with one thousand millions of rows (and beyond) are navigated stunningly fast and with very low memory requirements. So, if you ever need to browse huge tables, don’t hesitate, ViTables is your choice.

If you need a customized browser for managing your HDF5 data, ViTables is an excellent starting point.
python  bigdata  datascience  gui  tool  hdf5  data 
july 2018 by tobym
Deep Universal Probabilistic Programming

Markov models, bayesian regression, variational autoencoders

Uses PyTorch backend.
python  probabilistic  programming  probability  bayesian  markov 
june 2018 by tobym
ShutIt | Automation framework for programmers
Originally written to manage complex Docker builds.
automation  python  bash  deployment  docker 
may 2018 by tobym
Datasette Facets
Instantly publish structured data to the internet with a JSON API; with a web UI that supports facets, search, and plugins like for mapping (
json  api  data  sqlite  csv  cool  tool  utility  web  python 
may 2018 by tobym
Numba — Numba
Python module to JIT your code to native CPU or GPU instructions so array-oriented and math-heavy Python code runs super fast.
python  numerical  performance  library 
may 2018 by tobym
mypy - Optional Static Typing for Python
PEP484-conforming, experimental static type checker for Python. Compare with Facebook's pyre.
python  static  types 
may 2018 by tobym
Prophet | Prophet is a forecasting procedure implemented in R and Python. It is fast and provides completely automated forecasts that can be tuned by hand by data scientists and analysts.
Prophet is a procedure for forecasting time series data. It is based on an additive model where non-linear trends are fit with yearly and weekly seasonality, plus holidays. It works best with daily periodicity data with at least one year of historical data. Prophet is robust to missing data, shifts in the trend, and large outliers.

Prophet is open source software released by Facebook’s Core Data Science team.
statistics  python  R  metrics  forecasting  monitoring  tool 
may 2018 by tobym
Graphlab Create™. Fast, Scalable Machine Learning Modeling in Python. | Turi
Simple development of custom machine learning models in Python. Originally Graphlab. Open-sourced by Apple.
ai  platform  machinelearning  ml  cv  python  notebook 
may 2018 by tobym
Dash by Plotly - Plotly
Build beautiful web-based interfaces in Python
Dash is a Python framework for building analytical web applications. No JavaScript required.
Built on top of Plotly.js, React, and Flask, Dash ties modern UI elements like dropdowns, sliders, and graphs to your analytical Python code
dashboard  charts  python  visualization  framework 
march 2018 by tobym
Diamond is a python daemon that collects system metrics and publishes them to Graphite (and others). It is capable of collecting cpu, memory, network, i/o, load and disk metrics. Additionally, it features an API for implementing custom collectors for gathering metrics from almost any source.
python  stats 
march 2018 by tobym
Python library for parallelizable ensemble learning.
python  library  ml  machinelearning 
february 2018 by tobym
igraph – Network analysis software
igraph is a collection of network analysis tools with the emphasis on efficiency, portability and ease of use. igraph is open source and free. igraph can be programmed in R, Python and C/C++.
graph  analysis  python 
january 2018 by tobym
An open source Python framework for automated feature engineering. Automatically creates features from temporal and relational datasets.
ml  python  library  machinelearning 
november 2017 by tobym
Home Assistant
Home Assistant is an open-source home automation platform running on Python 3. Track and control all devices at home and automate control. Perfect to run on a Raspberry Pi.
home  automation  iot  python 
october 2017 by tobym
Dat Project
Dat is the distributed data sharing tool. Use the desktop app, command line tool, and javascript library. P2p data sharing.

Has some slight overlap with Quilt.
data  sharing  python  dataset 
september 2017 by tobym
Diaoul/subliminal: Subliminal - Subtitles, faster than your thoughts
command-line tool to download subtitles; fast, simple, and correct
subtitles  movies  cli  python  video 
september 2017 by tobym
pq 1.5 : Python Package Index
A transactional queue system for PostgreSQL written in Python.
postgresql  python  queue  postgres 
september 2017 by tobym
jeffknupp/sandman2: Automatically generate a RESTful API service for your legacy database. No code required!
Automatically generate a RESTful API service for your legacy database. No code required
python  api  rest 
september 2017 by tobym
