Demystifying the Lottery Ticket Hypothesis in Deep Learning (2024)

Why lottery tickets are the next big thing in training neural networks

Demystifying the Lottery Ticket Hypothesis in Deep Learning (1)

Published in

Towards Data Science

·

4 min read

·

Mar 3, 2022

--

Training neural networks is expensive. OpenAI’s GPT-3 has been calculated to have a training cost of $4.6M using the lowest-cost cloud GPU on the market. It’s no wonder that Frankle and Carbin’s 2019 Lottery Ticket Hypothesis started a gold rush in research, with attention from top academic minds and tech giants like Facebook and Microsoft. In the paper, they prove the existence of winning (lottery) tickets: subnetworks of a neural network that can be trained to produce performance as good as the original network, with a much smaller size. In the post, I’ll tackle how this works, why it is revolutionary, and the state of research.

Traditional wisdom says that neural networks are best pruned after training, not at the start. By pruning weights, neurons, or other components, the resulting neural network is smaller, faster, and consumes fewer resources during inference. When done right, the accuracy is unaffected while the network size can shrink manifold.

By flipping traditional wisdom on its head, we can consider whether we could have pruned the network before training and achieved the same result. In other words, was the information from the pruned components necessary for the network to learn, even if not to represent its learning?

The Lottery Ticket Hypothesis focuses on pruning weights and offers empirical evidence that certain pruned subnetworks could be trained from the start to achieve similar performance to the entire network. How? Iterative Magnitude Pruning.

When a task like this was tried historically, the pruned networks weights would be reinitialized randomly and the performance would drop off quickly.

The key difference here is that the weights were returned to their original initialization. When trained, the results matched the original performance in the same training time, at high levels of pruning.

Demystifying the Lottery Ticket Hypothesis in Deep Learning (3)

This suggests that these lottery tickets exist, as an intersection of a specific subnetwork and initial weights. They are “winning the lottery,” so to say, as the match of that architecture and those weights perform as well as the entire network. Does this hold for bigger models?

For bigger models, this does not hold true with the same approach. When looking at sensitivity to noise, Frankle and Carbin duplicated the pruned networks and trained them on data ordered differently. IMP succeeds where linear mode connectivity exists, a very rare phenomenon where multiple networks converge to the same local minima. For small networks, this happens naturally. For large networks, it does not. So what to do?

Starting with a smaller learning rate results in IMP working for large models, as sensitivity to initial noise from the data is lessened. The learning rate can be increased over time. The other finding is that rewinding our pruned neural network’s weights to their values at a later training iteration rather than the first iteration works as well. For example, the weights at the 10th iteration in a 1000 iteration training.

These results have held steady across architectures as different as transformers, LSTMs, CNNs, and reinforcement learning architectures.

While this paper proved the existence of these lottery tickets, it does not yet provide a way to identify them. Hence, the gold rush in finding their properties and whether they can be identified before training. They’re also inspiring work in heuristics for pruning early, since our current heuristics are focused on pruning after training.

One Ticket to Win Them All (2019) shows that lottery tickets encode information that is invariant to datatype and optimizers. They are able to successfully transfer lottery tickets between networks trained on different datatypes (e.g. VGG to ImageNet), finding success.

A key indicator was the relative size of the training data for the networks. If the lottery ticket source was trained on a larger dataset than the destination network, it performed better; otherwise, similarly or worse.

Demystifying the Lottery Ticket Hypothesis in Deep Learning (4)

Drawing Early-Bird Tickets (2019): This paper aims to prove that lottery tickets can be found early in training. Each training iteration, they compute a pruning mask. If the mask in the last iteration and this one have a mask distance (using Hamming distance) below a certain threshold, the network stops to prune.

Pruning Neural Networks Without Any Data by Iteratively Conserving Synaptic Flow (2020): This paper focuses on calculating pruning at initialization with no data. It outperforms existing state-of-the-art pruning pruning algorithms at initialization. The technique focuses on maximizing critical compression, the maximum pruning that can occur without impacting performance. To do so, the authors aim to prevent entire layers from being pruned. The network does this by positively scoring keeping layers and reevaluating the score every time the network prunes.

The existence of small subnetworks in neural architectures that can be trained to perform as well as the entire neural network is opening a world of possibilities for efficient training. In the process, researchers are learning a lot about how neural networks learn and what is necessary for learning. And who knows? One day soon we may be able to prune our networks before training, saving time, compute, and energy.

Demystifying the Lottery Ticket Hypothesis in Deep Learning (5)
Demystifying the Lottery Ticket Hypothesis in Deep Learning (2024)

FAQs

Demystifying the Lottery Ticket Hypothesis in Deep Learning? ›

The Lottery Ticket Hypothesis (LTH) is the ultimate representation of the 80–20 principle in Deep Learning. It posits that within randomly initialized, dense neural networks lie subnetworks capable of achieving the same performance as the full network after training, but with significantly fewer parameters.

What is the lottery ticket hypothesis? ›

The Lottery Ticket Hypothesis (LTH) states that a dense neural network model contains a highly sparse subnetwork (i.e., winning tickets) that can achieve even better performance than the original model when trained in isolation.

What is hypothesis in deep learning? ›

A hypothesis is a function that best describes the target in supervised machine learning. The hypothesis that an algorithm would come up depends upon the data and also depends upon the restrictions and bias that we have imposed on the data.

What is the psychology behind buying lottery tickets? ›

The pleasure principle plays a key role here—our brain's dopamine release, linked to feelings of pleasure and reward, occurs not just when we win, but also in the anticipation of winning. Therefore, even the act of buying a lottery ticket triggers excitement, reinforcing our desire to play again.

How is deep learning used for prediction? ›

Deep learning algorithms, built upon the structure of ANNs, enhance prediction accuracy through their multi-layered networks of nodes, or neurons. These neurons process and transmit data, enabling the network to learn from and accurately predict outcomes based on large datasets.

What is the lottery hypothesis in AI? ›

The Lottery Ticket Hypothesis (LTH) is the ultimate representation of the 80–20 principle in Deep Learning. It posits that within randomly initialized, dense neural networks lie subnetworks capable of achieving the same performance as the full network after training, but with significantly fewer parameters.

What is the message of the lottery ticket? ›

Answer and Explanation: In "The Lottery Ticket", Chekhov develops the theme that the love of money can destroy one's satisfaction. Chekhov creates this theme through Ivan and his wife's reactions to the idea of money as well as each other.

What are the 3 major types of hypothesis? ›

Types of hypothesis are: Simple hypothesis. Complex hypothesis. Directional hypothesis.

What is the null hypothesis in deep learning? ›

Often denoted as H0, the null hypothesis is a general statement or default position that there is no relationship between two measured phenomena or no association among groups. In other words, it assumes that any kind of difference or significance you see in a set of data is due to chance.

What are the three stages to build the hypotheses or models in machine learning? ›

In machine learning, building a hypothesis or model typically involves three stages:
  • Model selection: This is the process of selecting the most appropriate type of model for the problem at hand. ...
  • Model training: Once a model is selected, the next step is to train the model on the available data. ...
  • Model evalua.

What is the real purpose of The Lottery theory? ›

The lottery paradox was designed to demonstrate that three attractive principles governing rational acceptance lead to contradiction: It is rational to accept a proposition that is very likely true. It is irrational to accept a proposition that is known to be inconsistent and is jointly inconsistent.

Who is most likely to buy a lottery ticket? ›

The tendency to play the lottery in a given year increases for people in their twenties and thirties — the proportion hovers around 70% in those age groups.

What is the idea behind The Lottery? ›

Answer and Explanation: In 'The Lottery,' the central idea is that people should not blindly follow traditions without questioning them.

What is The Lottery hypothesis in biology? ›

Williams (1975) was the first to use a lottery analogy for sexual reproduction. He suggested that the production of genetically variable offspring as a result of sexual outbreeding could lead to the production of a “winning” genotype that was especially well adapted to the environment.

What is The Lottery hypothesis in ecology? ›

Lottery competition in ecology is a model for how organisms compete. It was first used to describe competition in coral reef fish. Under lottery competition, many offspring compete for a small number of sites (e.g., many fry competing for a few territories, or many seedlings competing for a few treefall gaps).

What is the summary of the lottery ticket? ›

Anton Checkhov's Short Story “The Lottery Ticket” points out how money spoils human relationship. A couple wishes to win a lottery price. The incident makes hatred among them. This story is a satirical comment on Selfish human thoughts and aspirations.

Top Articles
Edmunds: The best cars you can still get with a manual transmission
2016 Mazda CX-9 Grand Touring AWD Driving Impressions
Aged Grimm Character Nyt Crossword
The Shoppes At Zion Directory
Guidelines & Tips for Using the Message Board
Dover Nh Power Outage
Exploring the Northern Michigan Craigslist: Your Gateway to Community and Bargains - Derby Telegraph
Kiwifarms Shadman
Wotr Dyra
Gwenson Mallory Crutcher
Zenuwbeknelling in de voorvoet (Mortons neuroom)
Seafood Restaurants Open Late Near Me
Tacos Diego Hugoton Ks
Schüleraustausch Neuseeland - Schulabschluss mit Study Nelson
Unterschied zwischen ebay und ebay Kleinanzeigen: Tipps, Vor- und Nachteile
Cgc Verification Number
Dabs Utah State Liquor Store #09 - Murray
Unit 5 Lesson 6 Coding Activity
Dtm Urban Dictionary
Hdmovie 2
Accuweather Mold Count
Anon Rotten Tomatoes
Moss Adams Client Portal
Taco Bell Fourth Of July Hours
Kp Scheduling
Joanna Gaines Reveals Who Bought the 'Fixer Upper' Lake House and Her Favorite Features of the Milestone Project
Fort Worth Craiglist
Ati System Disorder Hypertension
Knock At The Cabin Showtimes Near Alamo Drafthouse Raleigh
Doculivery Cch
Filmy4Wap Xyz.com 2022
Clinical Pharmacology Quality Assurance (CPQA) Program: Models for Longitudinal Analysis of Antiretroviral (ARV) Proficiency Testing for International Laboratories
Https://Gw.mybeacon.its.state.nc.us/App
Road Conditions Riverton Wy
Hanging Hyena 4X4
Oakly Rae Leaks
Shs Games 1V1 Lol
The "Minus Sign (−)" Symbol in Mathematics
Vance Outdoors | Online Shopping for Firearms, Ammunition and Shooting Accessories
New York Sports Club Carmel Hamlet Photos
Kcu Sdn
Coventry Evening Telegraph Ccfc
10 Teacher Tips to Encourage Self-Awareness in Teens | EVERFI
Indium Mod Fabric
Game On Classroom 6X
Build:Mechanist - Power Mechanist
Mnps Payroll Calendar 2022-23
Kohl's Hixson Tennessee
Centurylink Outage Map Mesa Az
I spruced up my kitchen for £131 - people can’t believe it’s the same room
Conan Exiles Rhino Guide - Conan Fanatics
Choices’ summer movie preview
Latest Posts
Article information

Author: Barbera Armstrong

Last Updated:

Views: 6112

Rating: 4.9 / 5 (79 voted)

Reviews: 94% of readers found this page helpful

Author information

Name: Barbera Armstrong

Birthday: 1992-09-12

Address: Suite 993 99852 Daugherty Causeway, Ritchiehaven, VT 49630

Phone: +5026838435397

Job: National Engineer

Hobby: Listening to music, Board games, Photography, Ice skating, LARPing, Kite flying, Rugby

Introduction: My name is Barbera Armstrong, I am a lovely, delightful, cooperative, funny, enchanting, vivacious, tender person who loves writing and wants to share my knowledge and understanding with you.