Search: [statistics] - Toolleeo's Links

Principal Component Analysis

PCA is a linear dimensionality reduction technique. Many non-linear dimensionality reduction techniques exist, but linear methods are more mature, if more limited.

article · statistics · algorithm · methodology · analytics · data_science

Wed Oct 9 17:46:37 2019 * · permalink

·

http://www.oranlooney.com/post/ml-from-scratch-part-6-pca/

scc | Sloc, Cloc and Code: scc is a very fast accurate code counter with complexity calculations

Sloc, Cloc and Code: scc is a very fast accurate code counter with complexity calculations and COCOMO estimates written in pure Go.

programming · statistics · software · opensource · coding_lang:go

Tue Oct 1 07:12:31 2019 · permalink

·

https://github.com/boyter/scc/

Processing 40 TB of code from ~10 million projects with a dedicated server and Go for $100 | Ben E. C. Boyter

article · statistics · language · programming

Tue Oct 1 07:08:49 2019 · permalink

·

https://boyter.org/posts/an-informal-survey-of-10-million-github-bitbucket-gitlab-projects/

Goodhart’s Law: Are Academic Metrics Being Gamed?

Publishing a paper in academia is challenging, stimulating, and a bit baffling. Challenging because the research might fail. Stimulating because research may start assuming one outcome and finish with a totally different one. Baffling because after the paper is written and ready, I have to find it a home for

research · article · statistics

Wed Sep 25 18:21:48 2019 · permalink

·

https://thegradient.pub/over-optimization-of-academic-publishing-metrics/

AutoOut | Automated Outlier Detection and Treatment Tool

AutoOut is an automated outlier detection and treatment tool that allows you to get better models with even better accuracy without writing a single line of code. With it's easy to use and simple interface you can detect and treat outliers in your dataset, that can help improve your final model.

software · opensource · tools · machine_learning · statistics · coding_lang:python · source_code

Sun Sep 1 19:55:11 2019 * · permalink

·

https://github.com/MateLabs/AutoOut

git-stats - Git add-on to get statistics of a repository

"Local git statistics including GitHub-like contributions calendars."

#cli-app · versioning · git · statistics · software · opensource · homepage

Sun Aug 25 14:50:01 2019 * · permalink

·

https://github.com/IonicaBizau/git-stats

Logistic Regression from Bayes' Theorem

In this post we’ll explore how we can derive logistic regression from Bayes’ Theorem. Starting with Bayes’ Theorem we’ll work our way to computing the log odds of our problem and the arrive at the inverse logit function. After reading this post you’ll have a much stronger intuition for how logistic

machine_learning · statistics · bayesian · techniques · algorithm · article

Thu Jun 13 15:17:24 2019 * · permalink

·

https://www.countbayesie.com/blog/2019/6/12/logistic-regression-from-bayes-theorem

Significant Pattern Mining for Time Series - Christian Bock

In the midst of the deep learning hype, p-values might not be the hottest topic in data science. However, association mapping remains a fundamental tool to justify and underpin scientific conclusions. Inspired by an approach for time series classification based on predictive subsequences (i.e shapelets [1]), we developed S3M, a method that identifies short time series subsequences that are statistically associated with a class or phenotype while tackling the multiple hypothesis problem.

time_series · science · research · article · statistics · machine_learning

Thu Jun 13 06:58:10 2019 * · permalink

·

https://christian.bock.ml/posts/significant_shapelets/

BATgrowth - Monitoring Brave Browser adoption

When you first start reading about Brave, you learn that it is a new reward system for publishers and a new advertising model.

You may wondered how many publishers are there, and who they were.

batgrowth.com scrapes the web to list websites that are BAT publishers.

cryptocurrency · list · webservice · browser · statistics

Sat Nov 3 14:55:53 2018 * · permalink

·

https://batgrowth.com/

Using regression trees for forecasting double-seasonal time series with trend in R - Peter Laurinec

You will learn in this post how to:

decompose double-seasonal time series
detrend time series
model and forecast double-seasonal time series with trend
use two types of simple regression trees
set important hyperparameters related to regression tree

machine_learning · time_series · forecasting · R · article · blog · statistics

Fri Nov 2 14:10:10 2018 * · permalink

·

https://petolau.github.io/Regression-trees-for-forecasting-time-series-in-R/

Statistical forecasting: notes on regression and time series analysis

This web site contains notes and materials for an advanced elective course on statistical forecasting that is taught at the Fuqua School of Business, Duke University. It covers linear regression and time series forecasting models as well as general principles of thoughtful data analysis.

The time series material is illustrated with output produced by Statgraphics, a statistical software package that is highly interactive and has good features for testing and comparing models, including a parallel-model forecasting procedure that I designed many years ago.

The material on multivariate data analysis and linear regression is illustrated with output produced by RegressIt, a free Excel add-in which I also designed. However, these notes are platform-independent. Any statistical software package ought to provide the analytical capabilities needed for the various topics covered here.

statistics · time_series · forecasting · research · data_science · 5_stars

Tue Oct 16 16:59:54 2018 * · permalink

·

http://people.duke.edu/~rnau/411home.htm

How to Use Correlation to Understand the Relationship Between Variables

There may be complex and unknown relationships between the variables in your dataset.

It is important to discover and quantify the degree to which variables in your dataset are dependent upon each other. This knowledge can help you better prepare your data to meet the expectations of machine learning algorithms, such as linear regression, whose performance will degrade with the presence of these interdependencies.

In this tutorial, you will discover that correlation is the statistical summary of the relationship between variables and how to calculate it for different types variables and relationships.

After completing this tutorial, you will know:

How to calculate a covariance matrix to summarize the linear relationship between two or more variables.
How to calculate the Pearson’s correlation coefficient to summarize the linear relationship between two variables.
How to calculate the Spearman’s correlation coefficient to summarize the monotonic relationship between two variables.

statistics · dataset · tutorial

Sun Oct 14 20:51:42 2018 * · permalink

·

https://machinelearningmastery.com/how-to-use-correlation-to-understand-the-relationship-between-variables/

ROC curves calculator

A receiver operating characteristic (ROC) is a graph that illustrates the performance of a binary classifier as its discrimination threshold (cutoff) is changed.

The curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various cutoff settings. The true-positive rate is known as sensitivity, the false-positive rate is known as the fall-out and is calculated as (1 - specificity).

The ROC curve is thus a plot of the true positives (TPR) versus the false positives (FPR). The ROC curve can be generated by plotting the cumulative distribution function (area under the probability distribution from - ∞ to + ∞ ) of the correct detection probability in the y-axis versus the cumulative distribution function of the false-alarm probability in x-axis.

statistics · web · math · science · data_science

Sat Jul 7 10:43:07 2018 * · permalink

·

https://kennis-research.shinyapps.io/ROC-Curves/

Basketball - About Box Plus/Minus (BPM)

Box Plus/Minus (BPM) is a box score-based metric for evaluating basketball players' quality and contribution to the team. It is the latest version of a stat previously called Advanced Statistical Plus/Minus; it is NOT a version of Adjusted Plus/Minus, which is a play-by-play regression metric.

statistics · basketball · sport · article

Tue Oct 13 05:59:11 2015 * · permalink

·

http://www.basketball-reference.com/about/bpm.html

NBA Real Plus-Minus

Glossary

GP: Games Played
MPG: Minutes Per Game
ORPM: Player's estimated on-court impact on team offensive performance, measured in points scored per 100 offensive possessions
DRPM: Player's estimated on-court impact on team defensive performance, measured in points allowed per 100 defensive possessions
RPM: Player's estimated on-court impact on team performance, measured in net point differential per 100 offensive and defensive possessions. RPM takes into account teammates, opponents and additional factors
WAR: The estimated number of team wins attributable to each player, based on RPM

statistics · basketball · sport · article

Tue Oct 13 05:58:35 2015 * · permalink

·

http://espn.go.com/nba/statistics/rpm/_/sort/RPM

Journal Metrics: Research analytics redefined | Home

Elsevier introduces IPP, SNIP & SJR: A new perspective in journal metrics for researchers and publishers

research · science · paper · statistics · work

Thu Jul 24 19:03:36 2014 · permalink

·

http://www.journalmetrics.com/

OpenHub

Discover, Track and Compare Open Source.

web · statistics

Fri Jul 21 14:49:01 2006 · permalink

·

https://www.openhub.net/