DeepMind’s researchers are working round the clock to push the frontiers of AI. The lab has published 34 research papers in the last four months. Let’s look at the key papers the Alphabet subsidiary has published in 2022.
- An empirical analysis of compute-optimal large language model training
The paper found the model size and the training dataset size should be scaled in equal measure for compute-optimal training. The researchers tested the theory by training a compute-optimal model, Chinchilla, using the same compute budget as Gopher but with 70B parameters and 4x more data. Chinchilla outperformed Gopher, GPT-3, Jurassic-1, and Megatron-Turing NLG on a slew of downstream evaluation tasks. Chinchilla clocked an average accuracy of 67.5% on the MMLU benchmark, a 7% improvement over Gopher.
- Restoring and attributing ancient texts using deep neural networks
The paper proposed a deep neural network, Ithaca, for the textual restoration, geographical attribution and chronological attribution of ancient Greek inscriptions. The model focuses on collaboration, decision support and interpretability and clocked 62% accuracy in restoring damaged texts. Historians leveraged Ithaca and saw an accuracy improvement from 25% to 72%. Ithaca can map inscriptions to their original location with 71% accuracy.
Restorative tools like Ithaca shows how historians can leverage AI to study important periods in human history.
- Red Teaming Language Models with Language Models
Language models have a tendency to go rogue. Using human annotators to come up with maximum test cases before deployment is expensive The paper outlines how a language model can be used to generate test cases (red teaming) to size up the harmful potential of LMs. The researchers evaluated the target LM’s replies to generated test questions using a classifier trained to identify offensive content, surfacing thousands of offensive replies in a 280B parameter LM chatbot. The team explored methods like zero-shot generation and reinforcement learning to generate test cases with varying levels of diversity and difficulty. The paper showed that LM-based red teaming is one promising tool for finding and fixing unexpected LM behaviors before deployment.
- Magnetic control of tokamak plasmas through deep reinforcement learning
The paper introduced an architecture for tokamak magnetic controller design that autonomously learns to command the full set of control coils. The research team produced and controlled a varied set of plasma configurations on the Tokamak Configuration Variable. The approach accurate tracking of the location, current and shape for these configurations. The researchers also demonstrated sustained ‘droplets’ on TCV where two separate plasmas are maintained simultaneously within the vessel, a remarkable advance for tokamak feedback control, showing the potential of reinforcement learning to accelerate research in the fusion domain.
- Competition-Level Code Generation with AlphaCode
The paper introduced Alphacode, a code generation system for solving competitive-level programming problems. The team used large transformer language models to generate code, pre-training them on select GitHub code and fine-tuning on a set of competitive programming problems. AlphaCode achieved an estimated rank within the top 54% of participants in programming competitions.
- MuZero with Self-competition for Rate Control in VP9 Video Compression
The paper proposed an application of the MuZero algorithm to optimize video compression. The team looked into the problem of learning a rate control policy to select the quantization parameters (QP) in the encoding process of libvpx, an open source VP9 video compression library. The researchers treated it as a sequential decision making problem to maximize the video quality with an episodic constraint from the target bitrate. The team introduced a novel self-competition based reward mechanism to solve constrained RL with variable constraint satisfaction difficulty. The MuZero-based rate control achieved an average 6.28% reduction in size of the compressed videos for the same delivered video quality level compared to libvpx’s two-pass VBR rate control policy.
- Learning Robust Real-Time Cultural Transmission without Human Data
The DeepMind team developed a method for generating zero-shot, high recall cultural transmission in AI agents. The agents succeeded at real-time cultural transmission from humans in novel contexts without using any pre-collected human data. The artificial agent was parameterized by a neural network and the team used deep reinforcement learning (RL) to train the weights. The resulting neural network (with fixed weights) is capable of zeroshot, high-recall cultural transmission within a “test” episode on a wide range of unseen tasks.
- Fair Normalizing flows
The paper introduced a new approach, Fair Normalizing Flows (FNF), providing more rigorous fairness guarantees for learned representations. The main idea is to model the encoder as a normalizing flow trained to minimize the statistical distance between the latent representations of different groups. FNF offers guarantees on the maximum unfairness of any potentially adversarial downstream predictor. The team demonstrated the effectiveness of FNF in enforcing various group fairness notions, interpretability and transfer learning across challenging real-world datasets.