127 private links
In recent years, large-scale transformer-based language models have become the pinnacle of neural networks used in NLP tasks. They grow in scale and complexity every month, but training such models requires millions of dollars, the best experts, and years of development. That’s why only major IT companies have access to this state-of-the-art technology. However, researchers and developers all over the world need access to these solutions. Without new research, their growth could wane. The only way to avoid this is by sharing best practices with the developer community.
We’ve been using YaLM family of language models in our Alice voice assistant and Yandex Search for more than a year now.
Reinforcement Learning (RL) should be better seen as a “fine-tuning” paradigm that can add capabilities to general-purpose pretrained models, rather than a paradigm that can bootstrap intelligence from scratch.
A transcompiler, also known as source-to-source translator, is a system that converts source code from a high-level programming language (such as C++ or Python) to another. Transcompilers are primarily used for interoperability, and to port codebases written in an obsolete or deprecated language (e.g. COBOL, Python 2) to a modern one. They typically rely on handcrafted rewrite rules, applied to the source code abstract syntax tree. Unfortunately, the resulting translations often lack readability, fail to respect the target language conventions, and require manual modifications in order to work properly. The overall translation process is timeconsuming and requires expertise in both the source and target languages, making code-translation projects expensive.
Although neural models significantly outperform their rule-based counterparts in the context of natural language translation, their applications to transcompilation have been limited due to the scarcity of parallel data in this domain. In this paper, we propose to leverage recent approaches in unsupervised machine translation to train a fully unsupervised neural transcompiler. We train our model on source code from open source GitHub projects, and show that it can translate functions between C++, Java, and Python with high accuracy.
Our method relies exclusively on monolingual source code, requires no expertise in the source or target languages, and can easily be generalized to other
programming languages. We also build and release a test set composed of 852 parallel functions, along with unit tests to check the correctness of translations. We show that our model outperforms rule-based commercial baselines by a significant margin.
Since the 1940s, electric guitarists, keyboardists, and other instrumentalists have been using effects pedals, devices that modify the sound of the original audio source. Typical effects include distortion, compression, chorus, reverb, and delay. Early effects pedals consisted of basic analog circuits, often along with vacuum tubes, which were later replaced with transistors. Although many pedals today apply effects digitally with modern signal processing techniques, many purists argue that the sound of analog pedals can not be replaced by their digital counterparts. We’ll follow a deep learning approach to see if we can use machine learning to replicate the sound of an iconic analog effect pedal, the Ibanez Tube Screamer. This post will be mostly a reproduction of the work done by Alec Wright et al. in Real-Time Guitar Amplifier Emulation with Deep Learning1. Alec Wright et al., “Real-Time Guitar Amplifier Emulation with ↩
Computer simulations are invaluable tools for scientific discovery. However, accurate simulations are often slow to execute, which limits their applicability to extensive parameter exploration, large-scale data analysis, and uncertainty quantification. A promising route to accelerate simulations by building fast emulators with machine learning requires large training datasets, which can be prohibitively expensive to obtain with slow simulations. Here we present a method based on neural architecture search to build accurate emulators even with a limited number of training data. The method successfully accelerates simulations by up to 2 billion times in 10 scientific cases including astrophysics, climate science, biogeochemistry, high energy density physics, fusion energy, and seismology, using the same super-architecture, algorithm, and hyperparameters. Our approach also inherently provides emulator uncertainty estimation, adding further confidence in their use. We anticipate this work will accelerate research involving expensive simulations, allow more extensive parameters exploration, and enable new, previously unfeasible computational discovery.
Analyses of single-cell recordings from mouse ventral tegmental area are consistent with a model of reinforcement learning in which the brain represents possible future rewards not as a single mean of stochastic outcomes, as in the canonical model, but instead as a probability distribution.
Facebook AI has developed the first neural network that uses symbolic reasoning to solve advanced mathematics problems.
This article discusses GPT-2 and BERT models, as well using knowledge distillation to create highly accurate models with fewer parameters than their teachers
A neuroevolution game experiment.
A simple walkthrough of what RNNs are, how they work, and how to build one from scratch in Python.
Your new best friend built with an artificial neural network - olivia-ai/olivia
This article focuses on using a Deep LSTM Neural Network architecture to provide multidimensional time series forecasting using Keras and Tensorflow - specifically on stock market datasets to provide momentum indicators of stock price.
The following article sections will briefly touch on LSTM neuron cells, give a toy example of predicting a sine wave then walk through the application to a stochastic time series. The article assumes a basic working knowledge of simple deep neural networks.
License plate detection is a common use case which has been solved (somewhat) several times, but felt that we could provide something better than the current options.
Time Series Forecasting with the Long Short-Term Memory Network in Python - Machine Learning Mastery
The Long Short-Term Memory recurrent neural network has the promise of learning long sequences of observations. It seems a perfect match for time series forecasting, and in fact, it may be. In this tutorial, you will discover how to develop an LSTM forecast model for a one-step univariate time series forecasting problem. After completing this …
Course materials and notes for Stanford class CS231n: Convolutional Neural Networks for Visual Recognition.