PCA is a linear dimensionality reduction technique. Many non-linear dimensionality reduction techniques exist, but linear methods are more mature, if more limited.

Sloc, Cloc and Code: scc is a very fast accurate code counter with complexity calculations and COCOMO estimates written in pure Go.

Publishing a paper in academia is challenging, stimulating, and a bit baffling. Challenging because the research might fail. Stimulating because research may start assuming one outcome and finish with a totally different one. Baffling because after the paper is written and ready, I have to find it a home for

AutoOut is an automated outlier detection and treatment tool that allows you to get better models with even better accuracy without writing a single line of code. With it's easy to use and simple interface you can detect and treat outliers in your dataset, that can help improve your final model.

"Local git statistics including GitHub-like contributions calendars."

In this post we’ll explore how we can derive logistic regression from Bayes’ Theorem. Starting with Bayes’ Theorem we’ll work our way to computing the log odds of our problem and the arrive at the inverse logit function. After reading this post you’ll have a much stronger intuition for how logistic

In the midst of the deep learning hype, p-values might not be the hottest topic in data science. However, association mapping remains a fundamental tool to justify and underpin scientific conclusions. Inspired by an approach for time series classification based on predictive subsequences (i.e shapelets [1]), we developed S3M, a method that identifies short time series subsequences that are statistically associated with a class or phenotype while tackling the multiple hypothesis problem.

When you first start reading about Brave, you learn that it is a new reward system for publishers and a new advertising model.

You may wondered how many publishers are there, and who they were.

batgrowth.com scrapes the web to list websites that are BAT publishers.

You will learn in this post how to:

- decompose double-seasonal time series
- detrend time series
- model and forecast double-seasonal time series with trend
- use two types of simple regression trees
- set important hyperparameters related to regression tree

This web site contains notes and materials for an advanced elective course on statistical forecasting that is taught at the Fuqua School of Business, Duke University. It covers linear regression and time series forecasting models as well as general principles of thoughtful data analysis.

The time series material is illustrated with output produced by Statgraphics, a statistical software package that is highly interactive and has good features for testing and comparing models, including a parallel-model forecasting procedure that I designed many years ago.

The material on multivariate data analysis and linear regression is illustrated with output produced by RegressIt, a free Excel add-in which I also designed. However, these notes are platform-independent. Any statistical software package ought to provide the analytical capabilities needed for the various topics covered here.

A receiver operating characteristic (ROC) is a graph that illustrates the performance of a binary classifier as its discrimination threshold (cutoff) is changed.

The curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various cutoff settings. The true-positive rate is known as sensitivity, the false-positive rate is known as the fall-out and is calculated as (1 - specificity).

The ROC curve is thus a plot of the true positives (TPR) versus the false positives (FPR). The ROC curve can be generated by plotting the cumulative distribution function (area under the probability distribution from - ∞ to + ∞ ) of the correct detection probability in the y-axis versus the cumulative distribution function of the false-alarm probability in x-axis.

Elsevier introduces IPP, SNIP & SJR: A new perspective in journal metrics for researchers and publishers

Discover, Track and Compare Open Source.