Adaptive AI Task Partitioning and Offloading in Heterogeneous Edge-Cloud Networks

Published:

In recent years, the use of AI on resource-constrained IoT devices has grown significantly. However, most existing solutions for partitioning and offloading AI inference tasks across edge-cloud networks rely on static methods that are fixed before deployment and do not adapt to changes during runtime. Additionally, many of these solutions are evaluated in simulated environments rather than on real hardware. This thesis addresses this gap by designing and implementing an adaptive partitioning and offloading framework that dynamically determines where to split a neural network’s layers across a three-node heterogeneous network. The framework was built in Python and tested on a physical testbed consisting of a Raspberry Pi 5 edge device, a laptop fog, and a high-performance GPU desktop PC as the cloud. Three CNN models were used for evaluation: VGG-16, AlexNet and MobileNetV2. The framework profiles the model at startup, measures the network link conditions between nodes, and periodically re-evaluates the partition to react to changes in the environment. Results show that compared to a static partitioning baseline, the framework achieved energy reductions of 35.82% for VGG-16, 35.70% for AlexNet and 27.09% for MobileNetV2. End-to-end latency was also reduced by 6.34%, 22.92% and 14.20%, respectively. These results show that adaptive partitioning can reduce energy consumption on resource-constrained devices while maintaining acceptable latency in a real heterogeneous edge-cloud network.

Recommended citation: A. A. Deng and E. Butkus, “Adaptive AI task partitioning and offloading in heterogeneous edge-cloud networks: REAP—Runtime Energy-Aware Adaptive Partitioning and Offloading Framework,” M.Sc. thesis, Dept. Comput. Syst. Sci., Stockholm Univ., Stockholm, Sweden, 2026.
Download Paper