127 private links
Large Language Models (LLM) are on fire, capturing public attention by their ability to provide seemingly impressive completions to user prompts (NYT coverage). They are a delicate combination of a radically simplistic algorithm with massive amounts of data and computing power. They are trained by playing a guess-the-next-word game with itself over and over again. Each time, the model looks at a partial sentence and guesses the following word. If it makes it correctly, it will update its parameters to reinforce its confidence; otherwise, it will learn from the error and give a better guess next time.
In recent years, large-scale transformer-based language models have become the pinnacle of neural networks used in NLP tasks. They grow in scale and complexity every month, but training such models requires millions of dollars, the best experts, and years of development. That’s why only major IT companies have access to this state-of-the-art technology. However, researchers and developers all over the world need access to these solutions. Without new research, their growth could wane. The only way to avoid this is by sharing best practices with the developer community.
We’ve been using YaLM family of language models in our Alice voice assistant and Yandex Search for more than a year now.
Interactively analyze NLP models for model understanding in an extensible and framework agnostic interface.
BigBird is shown to dramatically improve performance across long-context NLP tasks, producing SOTA results in question answering and summarization.
LanguageTool is a free proofreading tool for English, German, Spanish, Russian, and more than 20 other languages.
The goal of this repository is to build a comprehensive set of tools and examples that leverage recent advances in NLP algorithms, neural architectures, and distributed machine learning systems. The content is based on our past and potential future engagements with customers as well as collaboration with partners, researchers, and the open source community.
In this task we will try our first approach at training a conversational model.
A fast, efficient universal vector embedding utility package.
This article is designed to serve as a directory of software projects built on NLP (natural language processing), that anyone — even someone without ML experience — can build.
Python3 implementation of the Schwartz-Hearst algorithm for extracting abbreviation-definition pairs.
With the increasing number of scientific publications, the analysis of the trends and the state-of-the-art in a certain scientific field is becoming very time-consuming and tedious task. In response to urgent needs of information, for which the existing systematic review model does not well, several other review types have emerged, namely the rapid review and scoping reviews.
The paper proposes an NLP powered tool that automates most of the review process by automatic analysis of articles indexed in the IEEE Xplore, PubMed, and Springer digital libraries. We demonstrate the applicability of the toolkit by analyzing articles related to Enhanced Living Environments and Ambient Assisted Living, in accordance with the PRISMA surveying methodology. The relevant articles were processed by the NLP toolkit to identify articles that contain up to 20 properties clustered into 4 logical groups.
The analysis showed increasing attention from the scientific communities towards Enhanced and Assisted living environments over the last 10 years and showed several trends in the specific research topics that fall into this scope. The case study demonstrates that the NLP toolkit can ease and speed up the review process and show valuable insights from the surveyed articles even without manually reading of most of the articles. Moreover, it pinpoints the most relevant articles which contain more properties and therefore, significantly reduces the manual work, while also generating informative tables, charts and graphs.
In research & news articles, keywords form an important component since they provide a concise representation of the article’s content. Keywords also play a crucial role in locating the article from information retrieval systems, bibliographic databases and for search engine optimization. Keywords also help to categorize the article into the relevant subject or discipline.
Conventional approaches of extracting keywords involve manual assignment of keywords based on the article content and the authors’ judgment. This involves a lot of time & effort and also may not be accurate in terms of selecting the appropriate keywords. With the emergence of Natural Language Processing (NLP), keyword extraction has evolved into being effective as well as efficient.
And in this article, we will combine the two — we’ll be applying NLP on a collection of articles (more on this below) to extract keywords.
Toolkit for Text Generation and Beyond. Contribute to asyml/texar development by creating an account on GitHub.
Natural language processing (NLP) is an exciting field of computer science, artificial intelligence, and computational linguistics concerned with the interactions between computers and human (natural) languages.
TextBlob is a Python (2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.
Features
- Noun phrase extraction
- Part-of-speech tagging
- Sentiment analysis
- Classification (Naive Bayes, Decision Tree)
- Language translation and detection powered by Google Translate
- Tokenization (splitting text into words and sentences)
- Word and phrase frequencies
- Parsing
- n-grams
- Word inflection (pluralization and singularization) and lemmatization
- Spelling correction
- Add new models or languages through extensions
- WordNet integration
“Don’t think of the overwhelming majority of the impossible.”
“Grew up your bliss and the world.”
“what we would end create, creates the ground and you are the one to warm it”
“look and give up in miracles”
All the quotes above have been generated by a computer, using a program that consists of less than 20 lines of python code.