All of the examples can be heard here.
Natural language processing is used in the voice of the home assistant. Most existing techniques need people to prepare transcriptions and label text-based training data which takes a lot of time. Text-based data can be used to create song lyrics.
AudioLM, described in a non-peer-reviewed paper last month, doesn't need transcription or labeling. Sound databases are fed into the program and machine learning is used to compress the audio files into sound snippets. Natural language processing is used to learn the sound's patterns from the tokenized training data.
A small amount of sound is fed into AudioLM to predict what will happen next. The process is similar to what language models do.
Audio clips released by the team sound natural. The sound of piano music generated using AudioLM is more fluid than the sound of piano music generated using existing artificial intelligence techniques.