Download PDFOpen PDF in browser

Lossless and Near-Lossless Compression for Foundation Models

EasyChair Preprint no. 12943, version 2

Versions: 12history
12 pagesDate: May 3, 2024


With the growth of model sizes and scale of their deployment, their sheer size burdens the infrastructure requiring more network and more storage to accommodate these. While there is a vast literature about reducing model sizes, we investigate a more traditional type of compression – one that compresses the model to a smaller form and is coupled with a decompression algorithm that returns it to its original size – namely lossless compression. Somewhat surprisingly, we show that specific lossless compression can gain significant network and storage reduction on popular models, at times reducing over 50% of the model size. We investigate the source of model compressibility and introduce specialized compression variants tailored for models that further increase the effectiveness of compression. We also categorize models to compressibility groups and introduce a tunable lossy compression technique that can further reduce size even on the group of less compressible models with little to no effect on the model accuracy. Finally, we explore the usefulness of delta compression for checkpointing and model variations. We estimate that these methods could save over an ExaByte per month of network traffic downloaded from a large model hub like Hugging Face.

Keyphrases: AI HUB, Compression, Foundation Models, LLM Models, lossless compression, lossy compression

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
  author = {Moshik Hershcovitch and Leshem Choshen and Andrew Wood and Ilias Ennmouri and Peter Chin and Swaminathan Sundararaman and Danny Harnik},
  title = {Lossless and Near-Lossless Compression for Foundation Models},
  howpublished = {EasyChair Preprint no. 12943},

  year = {EasyChair, 2024}}
Download PDFOpen PDF in browser