127 private links
PCA is a linear dimensionality reduction technique. Many non-linear dimensionality reduction techniques exist, but linear methods are more mature, if more limited.
Sloc, Cloc and Code: scc is a very fast accurate code counter with complexity calculations and COCOMO estimates written in pure Go.
Publishing a paper in academia is challenging, stimulating, and a bit baffling. Challenging because the research might fail. Stimulating because research may start assuming one outcome and finish with a totally different one. Baffling because after the paper is written and ready, I have to find it a home for
AutoOut is an automated outlier detection and treatment tool that allows you to get better models with even better accuracy without writing a single line of code. With it's easy to use and simple interface you can detect and treat outliers in your dataset, that can help improve your final model.
"Local git statistics including GitHub-like contributions calendars."
In this post we’ll explore how we can derive logistic regression from Bayes’ Theorem. Starting with Bayes’ Theorem we’ll work our way to computing the log odds of our problem and the arrive at the inverse logit function. After reading this post you’ll have a much stronger intuition for how logistic
In the midst of the deep learning hype, p-values might not be the hottest topic in data science. However, association mapping remains a fundamental tool to justify and underpin scientific conclusions. Inspired by an approach for time series classification based on predictive subsequences (i.e shapelets [1]), we developed S3M, a method that identifies short time series subsequences that are statistically associated with a class or phenotype while tackling the multiple hypothesis problem.
When you first start reading about Brave, you learn that it is a new reward system for publishers and a new advertising model.
You may wondered how many publishers are there, and who they were.
batgrowth.com scrapes the web to list websites that are BAT publishers.
You will learn in this post how to:
- decompose double-seasonal time series
- detrend time series
- model and forecast double-seasonal time series with trend
- use two types of simple regression trees
- set important hyperparameters related to regression tree
This web site contains notes and materials for an advanced elective course on statistical forecasting that is taught at the Fuqua School of Business, Duke University. It covers linear regression and time series forecasting models as well as general principles of thoughtful data analysis.
The time series material is illustrated with output produced by Statgraphics, a statistical software package that is highly interactive and has good features for testing and comparing models, including a parallel-model forecasting procedure that I designed many years ago.
The material on multivariate data analysis and linear regression is illustrated with output produced by RegressIt, a free Excel add-in which I also designed. However, these notes are platform-independent. Any statistical software package ought to provide the analytical capabilities needed for the various topics covered here.
There may be complex and unknown relationships between the variables in your dataset.
It is important to discover and quantify the degree to which variables in your dataset are dependent upon each other. This knowledge can help you better prepare your data to meet the expectations of machine learning algorithms, such as linear regression, whose performance will degrade with the presence of these interdependencies.
In this tutorial, you will discover that correlation is the statistical summary of the relationship between variables and how to calculate it for different types variables and relationships.
After completing this tutorial, you will know:
- How to calculate a covariance matrix to summarize the linear relationship between two or more variables.
- How to calculate the Pearson’s correlation coefficient to summarize the linear relationship between two variables.
- How to calculate the Spearman’s correlation coefficient to summarize the monotonic relationship between two variables.
A receiver operating characteristic (ROC) is a graph that illustrates the performance of a binary classifier as its discrimination threshold (cutoff) is changed.
The curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various cutoff settings. The true-positive rate is known as sensitivity, the false-positive rate is known as the fall-out and is calculated as (1 - specificity).
The ROC curve is thus a plot of the true positives (TPR) versus the false positives (FPR). The ROC curve can be generated by plotting the cumulative distribution function (area under the probability distribution from - ∞ to + ∞ ) of the correct detection probability in the y-axis versus the cumulative distribution function of the false-alarm probability in x-axis.
Box Plus/Minus (BPM) is a box score-based metric for evaluating basketball players' quality and contribution to the team. It is the latest version of a stat previously called Advanced Statistical Plus/Minus; it is NOT a version of Adjusted Plus/Minus, which is a play-by-play regression metric.
Glossary
- GP: Games Played
- MPG: Minutes Per Game
- ORPM: Player's estimated on-court impact on team offensive performance, measured in points scored per 100 offensive possessions
- DRPM: Player's estimated on-court impact on team defensive performance, measured in points allowed per 100 defensive possessions
- RPM: Player's estimated on-court impact on team performance, measured in net point differential per 100 offensive and defensive possessions. RPM takes into account teammates, opponents and additional factors
- WAR: The estimated number of team wins attributable to each player, based on RPM
Elsevier introduces IPP, SNIP & SJR: A new perspective in journal metrics for researchers and publishers
Discover, Track and Compare Open Source.