Name Collision of the Year: Vector | Crunchy Data Blog

I can’t get through a zoom call, a conference talk, or an afternoon scroll through LinkedIn without hearing about vectors. Do you feel like the term vector is everywhere this year? It is. Vector actually means several different things and it's confusing. Vector means AI data, GIS locations, digital graphics, and a type of query optimization, and more. The terms and uses are related, sure. They all stem from the same original concept. However their practical applications are quite different.

So “Vector” is my choice for this year’s name collision of the year.

blog · machine_learning · article

Tue Dec 31 07:16:45 2024 * · permalink

·

https://www.crunchydata.com/blog/name-collision-of-the-year-vector

silero-models: Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple

machine_learning · text-processing · AI · opensource · source_code

Mon Jun 20 20:19:01 2022 * · permalink

·

https://github.com/snakers4/silero-models

Semantle - Guess the secret word

Each guess must be a word. Semantle will tell you how semantically similar it thinks your word is to the secret word. Unlike that other word game, it's not about the spelling; it's about the meaning. The similarity value comes from Word2vec.

games · online · free · machine_learning

Tue Jun 7 17:21:52 2022 * · permalink

·

https://semantle.com/

faiss - A library for efficient similarity search and clustering of dense vectors

machine_learning · library · software · source_code · opensource · math

Tue Dec 14 21:25:26 2021 * · permalink

·

https://github.com/facebookresearch/faiss

haystack - An open source NLP framework that leverages Transformer models

library · machine_learning · NLP · deep_learning · software · opensource · source_code

Fri Dec 10 06:08:35 2021 * · permalink

·

https://github.com/deepset-ai/haystack

Logistic Regression from scratch

Learn how to use the Logistic Regression model to classify unseen data.

tutorial · machine_learning · algorithm · article

Thu Jun 25 20:33:38 2020 * · permalink

·

https://philippmuens.com/logistic-regression-from-scratch/

Unsupervised Translation of Programming Languages

A transcompiler, also known as source-to-source translator, is a system that converts source code from a high-level programming language (such as C++ or Python) to another. Transcompilers are primarily used for interoperability, and to port codebases written in an obsolete or deprecated language (e.g. COBOL, Python 2) to a modern one. They typically rely on handcrafted rewrite rules, applied to the source code abstract syntax tree. Unfortunately, the resulting translations often lack readability, fail to respect the target language conventions, and require manual modifications in order to work properly. The overall translation process is timeconsuming and requires expertise in both the source and target languages, making code-translation projects expensive.
Although neural models significantly outperform their rule-based counterparts in the context of natural language translation, their applications to transcompilation have been limited due to the scarcity of parallel data in this domain. In this paper, we propose to leverage recent approaches in unsupervised machine translation to train a fully unsupervised neural transcompiler. We train our model on source code from open source GitHub projects, and show that it can translate functions between C++, Java, and Python with high accuracy.
Our method relies exclusively on monolingual source code, requires no expertise in the source or target languages, and can easily be generalized to other
programming languages. We also build and release a test set composed of 852 parallel functions, along with unit tests to check the correctness of translations. We show that our model outperforms rule-based commercial baselines by a significant margin.

research · programming · machine_learning · neural_networks · paper

Tue Jun 9 23:34:08 2020 · permalink

·

https://arxiv.org/abs/2006.03511

active-semi-supervised-clustering - Active semi-supervised clustering algorithms for scikit-learn

Semi-supervised clustering

Seeded-KMeans
Constrainted-KMeans
COP-KMeans
Pairwise constrained K-Means (PCK-Means)
Metric K-Means (MK-Means)
Metric pairwise constrained K-Means (MPCK-Means)

Active learning of pairwise clustering

Explore & Consolidate
Min-max
Normalized point-based uncertainty (NPU) method

machine_learning · algorithm · source_code · library · coding_lang:python · opensource

Sun Apr 26 19:22:47 2020 * · permalink

·

https://github.com/datamole-ai/active-semi-supervised-clustering

The Illustrated FixMatch for Semi-Supervised Learning

Deep Learning has shown very promising results in the field of Computer Vision. But when applying it to practical domains such as medical imaging, lack of labeled data is a major challenge.

In practical settings, labeling data is a time consuming and expensive process. Though, you have a lot of images, only a small portion of them can be labeled due to resource constraints. In such settings, how can we leverage the remaining unlabeled images along with the labeled images to improve the performance of our model? The answer is semi-supervised learning.

FixMatch is a recent semi-supervised approach by Sohn et al. from Google Brain that improved the state of the art in semi-supervised learning(SSL). It is a simpler combination of previous methods such as UDA and ReMixMatch. In this post, we will understand the concept of FixMatch and also see how it got 78% median accuracy and 84% maximum accuracy on CIFAR-10 with just 10 labeled images.

machine_learning · research · techniques · science · article

Fri Apr 3 19:24:52 2020 * · permalink

·

https://amitness.com/2020/03/fixmatch-semi-supervised/

Deep Learning Algorithms - The Complete Guide | AI Summer

All the essential Deep Learning Algorithms you need to know including models used in Computer Vision and Natural Language Processing.

machine_learning · deep_learning · tutorial · docs · AI

Sun Mar 8 08:19:55 2020 * · permalink

·

https://theaisummer.com/Deep-Learning-Algorithms/

Datasaur.ai - The AI Toolbox

Manage your entire data labeling workflow with a single tool.

machine_learning · homepage · software

Sat Mar 7 21:54:40 2020 * · permalink

·

https://datasaur.ai/

PRML algorithms implemented in Python

Python codes implementing algorithms described in Bishop's book "Pattern Recognition and Machine Learning"

machine_learning · algorithm · programming · coding_lang:python · software · source_code

Tue Mar 3 20:46:09 2020 * · permalink

·

https://github.com/ctgk/PRML

orange-scripts: - Scripts for the Python Script Orange widget

Scripts for the Python Script Orange widget.

source_code · opensource · software · coding_lang:python · machine_learning · library

Thu Jan 30 16:37:22 2020 * · permalink

·

https://github.com/biolab/orange-scripts

Notebook on conversational model

In this task we will try our first approach at training a conversational model.

machine_learning · text_generation · AI · notebook · NLP

Tue Jan 21 20:45:29 2020 * · permalink

·

https://colab.research.google.com/drive/1iHcQ8_K0cfRE3v8QX6FMKAzdSSGtf5IX

The Secretive Company That Might End Privacy as We Know It

Clearview AI devised a groundbreaking facial recognition app. You take a picture of a person, upload it and get to see public photos of that person, along with links to where those photos appeared. The system — whose backbone is a database of more than three billion images that Clearview claims to have scraped from Facebook, YouTube, Venmo and millions of other websites — goes far beyond anything ever constructed by the United States government or Silicon Valley giants.

machine_learning · article · AI · privacy · issue

Sun Jan 19 12:45:04 2020 * · permalink

·

https://dnyuz.com/2020/01/18/the-secretive-company-that-might-end-privacy-as-we-know-it/

A distributional code for value in dopamine-based reinforcement learning

Analyses of single-cell recordings from mouse ventral tegmental area are consistent with a model of reinforcement learning in which the brain represents possible future rewards not as a single mean of stochastic outcomes, as in the canonical model, but instead as a probability distribution.

article · machine_learning · algorithm · research · neural_networks

Sat Jan 18 03:47:05 2020 * · permalink

·

https://www.nature.com/articles/s41586-019-1924-6

karateclub - Unsupervised machine learning library for graphs

Karate Club is an unsupervised machine learning extension library for NetworkX.

Karate Club consists of state-of-the-art methods to do unsupervised learning on graph structured data. To put it simply it is a Swiss Army knife for small-scale graph mining research. First, it provides network embedding techniques at the node and graph level. Second, it includes a variety of overlapping and non-overlapping commmunity detection methods. Implemented methods cover a wide range of network science (NetSci, Complenet), data mining (ICDM, CIKM, KDD), artificial intelligence (AAAI, IJCAI) and machine learning (NeurIPS, ICML, ICLR) conferences, workshops, and pieces from prominent journals.

graph · library · machine_learning · software · opensource · source_code

Sun Jan 12 17:41:23 2020 * · permalink

·

https://github.com/benedekrozemberczki/karateclub

A list of beginner-friendly NLP projects

This article is designed to serve as a directory of software projects built on NLP (natural language processing), that anyone — even someone without ML experience — can build.

article · machine_learning · NLP

Wed Dec 25 19:36:48 2019 * · permalink

·

https://towardsdatascience.com/a-list-of-beginner-friendly-nlp-projects-using-pre-trained-models-dc4768b4bec0

Deepfake Bot Submissions to Federal Public Comment Websites Cannot Be Distinguished from Human Submissions | Technology Science

Federal public comment websites currently are unable to detect Deepfake Text once submitted. I created a computer program (a bot) that generated and submitted 1,001 deepfake comments regarding a Medicaid reform waiver to a federal public comment website, stopping submission when these comments comprised more than half of all submitted comments. I then formally withdrew the bot comments.

article · text_generation · machine_learning · AI · science · tech · society

Fri Dec 20 16:37:18 2019 * · permalink

·

https://techscience.org/a/2019121801/

Document Clustering with Python

A guide to document clustering with Python

machine_learning · text_mining · research · python · jupyter

Sun Dec 15 15:27:24 2019 * · permalink

·

http://brandonrose.org/clustering_mobile