We leverage the latest advancements in generative AI for music creation to develop an automated system producing short sound messages. These sound-based messages, referred to as Transmit In Sound code (TIScode), are brief audio sequences lasting 5 seconds that carrying digital information. They can be recognized by a specific smartphone application in an Internet of Audio Things (IoAuT) scenario. We describe the methodologies of the TIScode pipeline, which includes generation, transmission, and ultimately, reception and decoding. For the generation phase, we use MusicGen, a state-of-the-art autoregressive transformer model, and we introduce a channel coding system based on the quantization of sound features and high-level features extracted through convolutional neural networks (CNNs). The extracted features are mapped to create a unique bitmap for each TIScode, simplifying the decoding process. We present an algorithm for the recognition phase, combining sound feature analysis with frequency-based peak analysis to enhance detection accuracy. Experimental results, obtained through simulation and field tests, demonstrate the effectiveness of the system in retrieving the digital information encoded within sound messages.

Generative AI for Short Sound Message Transmission in the Internet of Things

Manuele Favero
;
Alessandro Buratto
;
Leonardo Badia
;
Sergio Canazza
;
2025

Abstract

We leverage the latest advancements in generative AI for music creation to develop an automated system producing short sound messages. These sound-based messages, referred to as Transmit In Sound code (TIScode), are brief audio sequences lasting 5 seconds that carrying digital information. They can be recognized by a specific smartphone application in an Internet of Audio Things (IoAuT) scenario. We describe the methodologies of the TIScode pipeline, which includes generation, transmission, and ultimately, reception and decoding. For the generation phase, we use MusicGen, a state-of-the-art autoregressive transformer model, and we introduce a channel coding system based on the quantization of sound features and high-level features extracted through convolutional neural networks (CNNs). The extracted features are mapped to create a unique bitmap for each TIScode, simplifying the decoding process. We present an algorithm for the recognition phase, combining sound feature analysis with frequency-based peak analysis to enhance detection accuracy. Experimental results, obtained through simulation and field tests, demonstrate the effectiveness of the system in retrieving the digital information encoded within sound messages.
2025
2025 IEEE International Conference on Machine Learning for Communication and Networking (ICMLCN)
2025 IEEE International Conference on Machine Learning for Communication and Networking (ICMLCN)
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3547902
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact