recentpopularlog in

tsuomela : scraping   15

OutWit - Harvest The Web
"OutWit Hub dissects Web pages into their different elements. As the program knows how to navigate from page to page in sequences of results, it can automatically extract quantities of information objects and organize them into usable collections."
web-archive  downloads  software  automation  scraping 
november 2015 by tsuomela
tabula.technology
"If you’ve ever tried to do anything with data provided to you in PDFs, you know how painful it is — there's no easy way to copy-and-paste rows of data out of PDF files. Tabula allows you to extract that data into a CSV or Microsoft Excel spreadsheet using a simple, easy-to-use interface. Tabula works on Mac, Windows and Linux."
data  transformation  scripting  scraping  pdf  extraction 
march 2015 by tsuomela
Scrapy | An open source web scraping framework for Python
"Scrapy is a fast high-level screen scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing."
web-programming  programming  python  library  scraping  data-collection  online 
march 2011 by tsuomela
Scrubyt Documentation - File: README
A simple to learn and use, yet very powerful web extraction framework written in Ruby. Navigate through the Web, Extract, query, transform and save relevant data from the Web page of your interest by the concise and easy to use DSL.
programming  library  ruby  web  scraping  data-collection 
december 2008 by tsuomela
(theinfo)
This is a site for large data sets and the people who love them: the scrapers and crawlers who collect them, the academics and geeks who process them, the designers and artists who visualize them.
data-mining  data  collection  analysis  visualization  scraping  via:vaguery 
january 2008 by tsuomela
perl.com: Data Munging with Sprog
GUI front end to use perl for web scraping.
search  web  scraping  perl  gui 
june 2005 by tsuomela

Copy this bookmark:





to read