The edge-to-cloud compute continuum has become increasingly popular in recent years for effectively collecting and analyzing data generated by Internet of Things (IoT) devices at the network edge, ensuring low latency, high scalability, and privacy preservation. This continuum of computing resources, features, and services, which spans from the edge to the cloud, can be effectively leveraged in various application domains like smart cities, industrial IoT, and smart healthcare. However, many unexplored scenarios still exist where this technology can be successfully applied. This chapter investigates how the compute continuum can support speaker tracking in smart spaces, such as smart homes, offices, and public venues, especially focusing on multimodal systems that leverage both audio and visual data. The effectiveness of the edge-to-cloud continuum in supporting such systems was assessed through a simulation-based experimental evaluation performed with the iFogSim toolkit. Our findings reveal that edge-cloud integration improves application performance in terms of network usage and latency, compared to a centralized solution that solely relies on cloud computing.