DYNAMIC EARLY-EXIT VS. STATIC STRUCTURED PRUNING: A COMPARATIVE TRADE-OFF STUDY FOR ULTRA-LIGHTWEIGHT EDGE INFERENCE
Abstract
Deploying deep learning models on edge devices requires careful balancing of predictive accuracy against stringent constraints on latency, memory, and energy consumption. Although static compression and dynamic inference techniques have been explored independently, their comparative performance under identical hardware conditions remains underexamined. This work provides a unified hardware-aware evaluation of structured pruning and dynamic early-exit inference on a MobileNetV2 backbone. Experiments were conducted on Broadcom BCM2711 SoC (Raspberry Pi 4), with all results averaged over ten independent trials to ensure statistical reliability. The 70% structured pruning strategy achieves a deterministic memory footprint of 4.0 MB, while the dynamic early-exit approach yields a lower mean latency of 10.86 ± 2.10 ms and flexible post-deployment energy savings up to 51.7% by tuning the confidence threshold (tau). An ablation study highlights how tau mediates the trade-off between speed and accuracy and offers a practical guideline for selecting optimization strategies based on device-specific constraints. These findings advance the understanding of hybrid edge optimization and provide actionable insights for deploying lightweight models in real-world scenarios.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Science World Journal

This work is licensed under a Creative Commons Attribution 4.0 International License.