127 private links
Provides an implementation of today's most used tokenizers, with a focus on performance and versatility.
Main features:
- Train new vocabularies and tokenize, using today's most used tokenizers.
- Extremely fast (both training and tokenization), thanks to the Rust implementation. Takes less than 20 seconds to tokenize a GB of text on a server's CPU.
- Easy to use, but also extremely versatile.
- Designed for research and production.
- Normalization comes with alignments tracking. It's always possible to get the part of the original sentence that corresponds to a given token.
- Does all the pre-processing: Truncate, Pad, add the special tokens your model needs.
CleverCSV provides a drop-in replacement for the Python csv package with improved dialect detection for messy CSV files. It also provides a handy command line tool that can standardize a messy file or generate Python code to import it.
Karate Club is an unsupervised machine learning extension library for NetworkX.
Karate Club consists of state-of-the-art methods to do unsupervised learning on graph structured data. To put it simply it is a Swiss Army knife for small-scale graph mining research. First, it provides network embedding techniques at the node and graph level. Second, it includes a variety of overlapping and non-overlapping commmunity detection methods. Implemented methods cover a wide range of network science (NetSci, Complenet), data mining (ICDM, CIKM, KDD), artificial intelligence (AAAI, IJCAI) and machine learning (NeurIPS, ICML, ICLR) conferences, workshops, and pieces from prominent journals.
A fast, efficient universal vector embedding utility package.
Collections: double linked list, deque, RBtree, channels.
A collection of tkinter made widgets. Contribute to Dogeek/tkinter-pp development by creating an account on GitHub.
TextDistance, a python library for comparing distance between two or more sequences by many algorithms.
Features:
- 30+ algorithms
- Pure python implementation
- Simple usage
- More than two sequences comparing
- Some algorithms have more than one implementation in one class.
- Optional
numpy
usage for maximum speed.
Python3 implementation of the Schwartz-Hearst algorithm for extracting abbreviation-definition pairs.
Super fast list of dicts to pre-formatted tables converter library for Python 2/3.
Arcade is an easy-to-learn Python library for creating 2D video games.
It is ideal for people learning to program, or developers that want to code a 2D game without learning a complex framework.
TOM (TOpic Modeling) is a Python 3 library for topic modeling and browsing, licensed under the MIT license.
Its objective is to allow for an efficient analysis of a text corpus from start to finish, via the discovery of latent topics. To this end, TOM features functions for preparing and vectorizing a text corpus. It also offers a common interface for two topic models (namely LDA using either variational inference or Gibbs sampling, and NMF using alternating least-square with a projected gradient method), and implements three state-of-the-art methods for estimating the optimal number of topics to model a corpus. What is more, TOM constructs an interactive Web-based browser that makes it easy to explore a topic model and the related corpus.
pyLDAvis
is a python library for interactive topic model visualization. It is a port of the fabulous R package by Carson Sievert and Kenny Shirley. They did the hard work of crafting an effective visualization. pyLDAvis
makes it easy to use the visualiziation from Python and, in particular, Jupyter notebooks.
To learn more about the method behind the visualization, it is possible to read the original paper explaining it.
This notebook provides a quick overview of how to use pyLDAvis
.
SMACH is a task-level architecture for rapidly creating complex robot behavior. At its core, SMACH is a ROS-independent Python library to build hierarchical state machines. SMACH is a new library that takes advantage of very old concepts in order to quickly create robust robot behavior with maintainable and modular code.
Karpet is a tiny library with just a few dependencies for fetching coins/tokens metrics data the internet.
It can provide following data:
- coin/token historical price data (no limits)
- google trends for the given list of keywords (longer period than official API)
- twitter scraping for the given keywords (no limits)
- much more info about crypto coins/tokens (no rate limits)
A community driven open source initiative to create a JSON based standard for resumes.
The set of all points closest to a given point in a point set than to all other points in the set is an interesting spatial structure called a Voronoi Polygon for the point. The union of all the Voronoi polygons for a point set is called Voronoi Tessellation.
Many applications have been found based on the neighbourhood information provided by this tessellation. The dual of Voronoi tessellation is Delaunay Tessellation, also referred to as Delaunay Triangulation or Triangulated Irregular Network (TIN), which are lines drawn between points where their Voronoi polygons have an edge in common.
Delaunay tessellation is the most fundamental neighbourhood structure because many other important neighbourhood structures, such as, Gabriel Graph, Relative Neighbourhood Graph and Minimal Spanning Tree, can be derived from it.
License music from the only royalty free audio library dedicated to retro music.