We leverage the latest advancements in generative AI for music creation to develop an automated system producing short sound messages. These sound-based messages, referred to as Transmit In Sound code (TIScode), are brief audio sequences lasting 5 seconds that carrying digital information. They can be recognized by a specific smartphone application in an Internet of Audio Things (IoAuT) scenario. We describe the methodologies of the TIScode pipeline, which includes generation, transmission, and ultimately, reception and decoding. For the generation phase, we use MusicGen, a state-of-the-art autoregressive transformer model, and we introduce a channel coding system based on the quantization of sound features and high-level features extracted through convolutional neural networks (CNNs). The extracted features are mapped to create a unique bitmap for each TIScode, simplifying the decoding process. We present an algorithm for the recognition phase, combining sound feature analysis with frequency-based peak analysis to enhance detection accuracy. Experimental results, obtained through simulation and field tests, demonstrate the effectiveness of the system in retrieving the digital information encoded within sound messages.
Generative AI for Short Sound Message Transmission in the Internet of Things
Manuele Favero
;Alessandro Buratto
;Leonardo Badia
;Sergio Canazza
;
2025
Abstract
We leverage the latest advancements in generative AI for music creation to develop an automated system producing short sound messages. These sound-based messages, referred to as Transmit In Sound code (TIScode), are brief audio sequences lasting 5 seconds that carrying digital information. They can be recognized by a specific smartphone application in an Internet of Audio Things (IoAuT) scenario. We describe the methodologies of the TIScode pipeline, which includes generation, transmission, and ultimately, reception and decoding. For the generation phase, we use MusicGen, a state-of-the-art autoregressive transformer model, and we introduce a channel coding system based on the quantization of sound features and high-level features extracted through convolutional neural networks (CNNs). The extracted features are mapped to create a unique bitmap for each TIScode, simplifying the decoding process. We present an algorithm for the recognition phase, combining sound feature analysis with frequency-based peak analysis to enhance detection accuracy. Experimental results, obtained through simulation and field tests, demonstrate the effectiveness of the system in retrieving the digital information encoded within sound messages.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.