Search: [dataset] - Toolleeo's Links

OpenRefine

OpenRefine (previously Google Refine) is a powerful tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data.

OpenRefine always keeps your data private on your own computer until YOU want to share or collaborate. Your private data never leaves your computer unless you want it to. (It works by running a small server on your computer and you use your web browser to interact with it)

homepage · data_science · software · opensource · dataset

Mon Mar 23 20:15:31 2020 * · permalink

·

https://openrefine.org/

The Fortran 77 codes for the open-loop and the closed-loop simulations for the Tennessee Eastman process (TEP).

The Fortran 77 codes for the open-loop and the closed-loop simulations for the Tennessee Eastman process (TEP) as well as the training and testing data files used for evaluating the data-driven methods (PCA, PLS, FDA, and CVA).

software · simulation · coding_lang:fortran · research · automation · dataset

Wed Mar 11 21:55:57 2020 · permalink

·

https://github.com/camaramm/tennessee-eastman-profBraatz

Million Song Dataset

dataset · homepage

Tue Oct 29 21:20:17 2019 * · permalink

·

http://millionsongdataset.com/

The Dataverse Project

The Dataverse Project - Dataverse.org

research · science · dataset

Sun Oct 6 21:53:43 2019 * · permalink

·

https://dataverse.org/#

Introducing the CodeSearchNet challenge

GitHub announces the CodeSearchNet Challenge and releasing a large dataset for natural language processing and machine learning.

article · machine_learning · dataset

Sat Sep 28 18:38:16 2019 * · permalink

·

https://github.blog/2019-09-26-introducing-the-codesearchnet-challenge/

frictionlessdata | A Python library for working with Data Packages

python · library · machine_learning · file_format · json · csv · dataset · data · source_code · coding_lang:python

Fri Sep 6 14:07:40 2019 · permalink

·

https://github.com/frictionlessdata/datapackage-py

Estimating the success of re-identifications in incomplete datasets using generative models | Nature Communications

Anonymization has been the main means of addressing privacy concerns in sharing medical and socio-demographic data. Here, the authors estimate the likelihood that a specific person can be re-identified in heavily incomplete datasets, casting doubt on the adequacy of current anonymization practices.

machine_learning · dataset · paper

Mon Jul 29 02:58:54 2019 · permalink

·

https://www.nature.com/articles/s41467-019-10933-3

Carburanti – Archivio storico dei prezzi praticati e dell'anagrafica degli impianti

Archivio dei dataset pubblicati da marzo 2015 raggruppati per trimestre fornito dal MISE.

Per la ricerca e la consultazione in tempo reale dei prezzi praticati e la ricerca degli impianti è possibile consultare il sito dell’Osservatorio prezzi carburanti.

I dati sono in formato .csv. A causa delle notevoli dimensioni i file sono comunque compressi come tar.gz.

dataset · locale:it

Fri Apr 19 09:18:00 2019 * · permalink

·

https://www.mise.gov.it/index.php/it/open-data/elenco-dataset/2036944-carburanti-archivio-prezzi

How to Use Correlation to Understand the Relationship Between Variables

There may be complex and unknown relationships between the variables in your dataset.

It is important to discover and quantify the degree to which variables in your dataset are dependent upon each other. This knowledge can help you better prepare your data to meet the expectations of machine learning algorithms, such as linear regression, whose performance will degrade with the presence of these interdependencies.

In this tutorial, you will discover that correlation is the statistical summary of the relationship between variables and how to calculate it for different types variables and relationships.

After completing this tutorial, you will know:

How to calculate a covariance matrix to summarize the linear relationship between two or more variables.
How to calculate the Pearson’s correlation coefficient to summarize the linear relationship between two variables.
How to calculate the Spearman’s correlation coefficient to summarize the monotonic relationship between two variables.

statistics · dataset · tutorial

Sun Oct 14 20:51:42 2018 * · permalink

·

https://machinelearningmastery.com/how-to-use-correlation-to-understand-the-relationship-between-variables/

How to Model Human Activity From Smartphone Data

Human activity recognition is the problem of classifying sequences of accelerometer data recorded by specialized harnesses or smart phones into known well-defined movements.

It is a challenging problem given the large number of observations produced each second, the temporal nature of the observations, and the lack of a clear way to relate accelerometer data to known movements.

Classical approaches to the problem involve hand crafting features from the time series data based on fixed-sized windows and training machine learning models, such as ensembles of decision trees. The difficulty is that this feature engineering requires deep expertise in the field.

Recently, deep learning methods such as recurrent neural networks and one-dimensional convolutional neural networks, or CNNs, have been shown to provide state-of-the-art results on challenging activity recognition tasks with little or no data feature engineering.

In this tutorial, you will discover the ‘Activity Recognition Using Smartphones‘ dataset for time series classification and how to load and explore the dataset in order to make it ready for predictive modeling.

machine_learning · human_activity_recognition · blog · article · sensors · dataset · coding_lang:python

Sat Oct 6 17:14:20 2018 * · permalink

·

https://machinelearningmastery.com/how-to-model-human-activity-from-smartphone-data/