Download PDFOpen PDF in browserCurrent version

SPA: Towards a Computational Friendly Cloud-Base and on-Devices Collaboration Seq2seq Personalized Generation with Causal Inference

EasyChair Preprint 15343, version 1

Versions: 12history
12 pagesDate: November 1, 2024

Abstract

Large language models(LLMs) have shown its outperforming ability on various tasks and question answering. However, LLMs require substantial memory storage on low-resource devices. More critically, the computational speed on these devices is also severely limited. In this paper, we propose SPA(Side Plugin Adaption), a lightweight architecture for fast on-devices inference on the constraints of strict on-devices computation and memory constraints. Compared with other on-devices seq2seq generation, SPA could make a fast and stable inference on low-resource constraints, allowing it to obtain cost effiency. Our method establish an interaction between a pretrained LLMs on-cloud and additive parameters on-devices, which could provide the knowledge on both pretrained LLMs and featured personal feature. Further more, SPA provides a framework to keep feature-base parameters on low computational devices while leave the parameters containing general information on the high computational devices.

Keyphrases: Cloud-device Collaboration, Personalized LLM, inference acceleration

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
@booklet{EasyChair:15343,
  author    = {Yanming Liu and Xinyue Peng and Jiannan Cao and Le Dai and Xingzu Liu and Ruilin Nong and Weihao Liu and Songhang Deng},
  title     = {SPA: Towards a Computational Friendly Cloud-Base and on-Devices Collaboration Seq2seq Personalized Generation with Causal Inference},
  howpublished = {EasyChair Preprint 15343},
  year      = {EasyChair, 2024}}
Download PDFOpen PDF in browserCurrent version