Daily Shaarli
09/01/19
This is a set of command line utilities for manipulating large tabular data files. Files of numeric and text data commonly found in machine learning, data mining, and similar environments. Filtering, sampling, statistics, joins, and more.
These tools are especially useful when working with large data sets. They run faster than other tools providing similar functionality, often by significant margins. See Performance Studies for comparisons with other tools.
They perform data manipulation and statistical calculations on tab delimited data. They are intended for large files. Larger than ideal for loading entirely in memory in an application like R, but not so big as to necessitate moving to Hadoop or similar distributed compute environments. The features supported are useful both for standalone analysis and for preparing data for use in R, Pandas, and similar toolkits.
From eBay.
AutoOut is an automated outlier detection and treatment tool that allows you to get better models with even better accuracy without writing a single line of code. With it's easy to use and simple interface you can detect and treat outliers in your dataset, that can help improve your final model.
At the heart of decentralized systems today is a demoralizing irony. Vast resources---intellect, equipment, and energy---go into avoiding centralized control and creating "trustless" systems like Bitcoin. But hapless users then defeat the whole purpose of these systems by handing over their private keys to centralized entities like Coinbase.
Would it be nice if there were a truly decentralized system that could do the impossible? I.e.,
- Make key management easier for ordinary users.
- Manage secret keys for transparent objects without secret state, like smart contracts.
- Operate seamlessly when nodes come and go.