DYNAMIC EARLY-EXIT VS. STATIC STRUCTURED PRUNING: A COMPARATIVE TRADE-OFF STUDY FOR ULTRA-LIGHTWEIGHT EDGE INFERENCE

Authors

  • Mujtaba K. Tahir Department of Computer Science, Faculty of Computing, Nigerian Army University Biu (NAUB), Borno State,
  • Zainab D. Marmara
  • Aisha B. Muhammad
  • Maryam Sani

Abstract

Deploying deep learning models on edge devices requires careful balancing of predictive accuracy against stringent constraints on latency, memory, and energy consumption. Although static compression and dynamic inference techniques have been explored independently, their comparative performance under identical hardware conditions remains underexamined. This work provides a unified hardware-aware evaluation of structured pruning and dynamic early-exit inference on a MobileNetV2 backbone. Experiments were conducted on Broadcom BCM2711 SoC (Raspberry Pi 4), with all results averaged over ten independent trials to ensure statistical reliability. The 70% structured pruning strategy achieves a deterministic memory footprint of 4.0 MB, while the dynamic early-exit approach yields a lower mean latency of 10.86 ± 2.10 ms and flexible post-deployment energy savings up to 51.7% by tuning the confidence threshold (tau). An ablation study highlights how tau mediates the trade-off between speed and accuracy and offers a practical guideline for selecting optimization strategies based on device-specific constraints. These findings advance the understanding of hybrid edge optimization and provide actionable insights for deploying lightweight models in real-world scenarios.

Downloads

Published

2026-03-30

Issue

Section

ARTICLES