Search: [text-processing] - Toolleeo's Links

silero-models: Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple

machine_learning · text-processing · AI · opensource · source_code

Mon Jun 20 20:19:01 2022 * · permalink

·

https://github.com/snakers4/silero-models

hck - A sharp cut clone

A close to drop in replacement for cut that can use a regex delimiter instead of a fixed string. Additionally this tool allows for specification of the order of the output columns using the same column selection syntax as cut (see below for examples).

#cli-app · opensource · source_code · text-processing · coding_lang:rust · software

Sat Jul 17 09:10:31 2021 * · permalink

·

https://github.com/sstadick/hck

glow - Render markdown on the CLI, with pizzazz!

markdown · software · #cli-app · visualization · terminal · source_code · coding_lang:go · text-processing

Fri Jul 10 23:25:04 2020 * · permalink

·

https://github.com/charmbracelet/glow

structured-text-tools - A list of command line tools for manipulating structured text data

tools · list · text-processing · markdown · csv

Sun May 31 19:49:26 2020 * · permalink

·

https://github.com/dbohdan/structured-text-tools

brok - Find broken links in text documents

text-processing · web · tools · software · opensource · source_code · coding_lang:haskell

Mon Apr 20 20:05:18 2020 * · permalink

·

https://github.com/smallhadroncollider/brok

rapidfuzz - Rapid fuzzy string matching in Python and C++ using the Levenshtein Distance

opensource · software · library · text_manipulation · text-processing · source_code

Mon Mar 30 20:59:26 2020 * · permalink

·

https://github.com/rhasspy/rapidfuzz

jc - Serializes the output of command line tools to JSON

This tool serializes the output of popular gnu linux command line tools and file types to structured JSON output.

This allows piping of output to tools like jq.

#cli-app · json · opensource · software · command_line · text-processing · source_code

Tue Feb 18 21:07:20 2020 * · permalink

·

https://github.com/kellyjonbrazil/jc

Headliner — Easy training and deployment of seq2seq models

At Axel Springer, Europe’s largest digital publishing house, we own a lot of news articles from various media outlets such as Welt, Bild, Business Insider and many more. Arguably, the most important part of a news article is its title, and it is not surprising that journalists tend to spend a fair amount of their time to come up with a good one. For this reason, it was an interesting research question for us at Axel Springer AI whether we could create an NLP model that generates quality headlines from Welt news articles (see Figure 1). This could, for example, serve our journalists as inspiration for creating SEO titles, which our journalists often don’t have time for (in fact we’re working together with our colleagues from SPRING and AWS on creating a SEO title generator).

text_mining · data_mining · article · text-processing · text_generation

Thu Feb 6 15:51:19 2020 * · permalink

·

https://medium.com/axel-springer-tech/headliner-easy-training-and-deployment-of-seq2seq-models-2a26508b4dae

Parsr - Transforms PDF, Documents and Images into Enriched Structured Data

Parsr, is a minimal-footprint document (image, pdf) cleaning, parsing and extraction toolchain which generates readily available, organized and usable data for data scientists and developers.

It provides users with clean structured and label-enriched information set for ready-to-use applications ranging from data entry and document analysis automation, archival, and many others.

Currently, Parsr can perform:

Document Hierarchy Regeneration - Words, Lines and Paragraphs
Headings Detection
Table Detection and Reconstruction
Lists Detection
Text Order Detection
Named Entity Recognition (Dates, Percentages, etc)
Key-Value Pair Detection (for the extraction of specific form-based entries)
Page Number Detection
Header-Footer Detection
Link Detection
Whitespace Removal

framework · software · opensource · source_code · coding_lang:python · text_manipulation · text-processing

Tue Jan 14 05:49:38 2020 * · permalink

·

https://github.com/axa-group/Parsr

textdistance | Text comparison algorithms

TextDistance, a python library for comparing distance between two or more sequences by many algorithms.

Features:

30+ algorithms
Pure python implementation
Simple usage
More than two sequences comparing
Some algorithms have more than one implementation in one class.
Optional numpy usage for maximum speed.

source_code · coding_lang:python · opensource · text-processing · algorithm · library · software

Wed Oct 30 21:51:30 2019 * · permalink

·

https://github.com/life4/textdistance

ripgrep-all - grep in text files but also search in PDFs, E-Books, office documents, zip, tar.gz, etc.

rga is a line-oriented search tool that allows you to look for a regex in a multitude of file types.

rga wraps the awesome ripgrep and enables it to search in pdf, docx, sqlite, jpg, movie subtitles (mkv, mp4), etc.

#cli-app · software · opensource · file_management · search · text-processing · source_code · coding_lang:rust

Mon Aug 26 18:13:58 2019 * · permalink

·

https://github.com/phiresky/ripgrep-all

q - Run SQL-like queries on CSV/TSV files

Executes SQL-like queries on CSVs/TSVs tabular data files; each tabular file is treated as a database table; support to all SQL constructs (WHERE, GROUP BY, JOIN).

#cli-app · text-processing · software · opensource · source_code · tools · terminal · search · filter · csv · SQL · category:text_processing

Sun Aug 25 14:54:03 2019 * · permalink

·

http://harelba.github.io/q/

pick - Fuzzy selection among a list of options

Utility that allows users to choose one option from a set of choices using an interface with fuzzy search functionality.

#cli-app · text-processing · software · opensource · source_code · tools · terminal · search · filter · category:text_processing

Sun Aug 25 14:54:00 2019 * · permalink

·

https://github.com/calleerlandsson/pick

percol - Interactive selection of lines coming from the standard input

A Python script that

1) receives input lines from stdin or a file,
2) lists the input lines and waits for input that filter/select the line(s),
3) outputs the selected line(s) to stdout;

Can be used to add interactivity to many regular shell commands.

#cli-app · text-processing · software · opensource · source_code · tools · terminal · search · filter · coding_lang:python · category:text_processing

Sun Aug 25 14:53:57 2019 * · permalink

·

https://github.com/mooz/percol

jq - JSON query

(JSON Query?) is sed-like processor for JSON data; can be used to process JSON files and data streams and perform operations such as those allowed by cat, sed, grep and awk on regular text files.

#cli-app · text-processing · software · opensource · source_code · tools · terminal · search · filter · json · category:text_processing

Sun Aug 25 14:53:54 2019 * · permalink

·

https://stedolan.github.io/jq/

grc - Colorize the standard input according to a regex

(Generic Colouriser) can be configured to parse a given text stream and to colorize it according to regexp written in configuration files; different patterns can be associated to file types.

#cli-app · text-processing · software · opensource · source_code · tools · terminal · search · filter · category:text_processing

Sun Aug 25 14:53:51 2019 * · permalink

·

https://github.com/pengwynn/grc

ccat - A cat command with colorized output

A cat command with colorized output.

#cli-app · text-processing · software · opensource · source_code · tools · terminal · category:text_processing

Sun Aug 25 14:53:46 2019 * · permalink

·

https://github.com/jingweno/ccat

ripgrep - Recursively searches directories for a regex pattern

ripgrep is a line-oriented search tool that recursively searches your current directory for a regex pattern while respecting your gitignore rules.

ripgrep has first class support on Windows, macOS and Linux, with binary downloads available for every release.

ripgrep is similar to other popular search tools like The Silver Searcher, ack and grep.

#cli-app · command_line · tools · search · opensource · coding_lang:rust · software · file_management · text-processing

Sun Oct 14 15:03:31 2018 * · permalink

·

https://github.com/BurntSushi/ripgrep

fzf - The fuzzy finder

(FuZzy Finder) is a general-purpose command-line finder with fuzzy search/filter capabilities; good integration with vim.

#cli-app · text-processing · software · opensource · source_code · tools · terminal · search · filter · coding_lang:go

Fri Mar 7 07:39:26 2014 * · permalink

·

https://github.com/junegunn/fzf