One post tagged with "genAI"

Culture Unveiled: Under The Wing

June 7, 2024 · 7 min read

Founder, Wade Digital

Image: A Culture ship descends in the cover art for Culture Unveiled

Space opera concept album brought to life with genAI

As someone who has been experimenting with generative AI, I have been very interested in what's out there for audio generation. We have LLMs for written text, Stable Diffusion and Dalle for images, and while we have plenty of speech generation models, music generation has been a bit more obscure.

A couple commercial products are available, and I recently began experimenting with one of them, called Udio, and I have been having so much fun and want to share a weekend project that I put together using it.

The result is Culture Unveiled: Under The Wing, a 40-minute, 10 track album which serves as a spiritual cousin to Iain M. Banks' revered Culture series. It combines a variety of musical genres to explore the profound themes of discovery, transformation, and unity found in the Culture series. Each track is designed to take listeners on a journey through the narrative arc of Earth's integration into the interstellar community under the protective and enlightening influence of the Culture.

Iain M. Bank's Culture series introduces us to an advanced civilization where super-intelligent machines, known as Minds, oversee a utopian society. It's a world of post-scarcity where humans indulge in the freedoms that only superior technology can provide. While parts of Bank's novel State of the Art are set on Earth during the 1970s, Culture agents don't reveal themselves during that time, and leave our less advanced civilization to progress unhindered.

In Culture Unveiled, things have changed. Humanity is under threat from another alien civilization, one of the Culture's peers, and so they have revealed themselves in order to take humanity under the wing and protect them from the existential threat.

My aim here is to both show the current state of the art of these generative AI tools, and to engage in a bit of memetic engineering of our own culture. These generative AI tools have caused a massive immune response among certain segments of the populace due to their impact on jobs, energy use, and other ethical concerns, and I've been holding on to Bank's series as an alternative utopian vision. We are in for a wild ride in these coming years, and society needs to prepare for the shifts that these generative AI tools will bring. This not only showcases the creative potential of these technologies but also serves as a commentary on the transformative impact of AI on our society.

Technical Stuff

Udio's audio model is capable of generating 30 second music clips based on genre and style tags, and can generate or use custom lyrics. By default, Udio generates two outputs each run, and you can extend tracks an additional 30 seconds at a time. So our basic workflow here was to enter our prompts, pick our favorite of the two, and then extend it. And so on and so on. While not a hard constraint, I did try to limit myself to picking two of these extensions for each track, which would wind up with names like 'Dawn of Gaia ext v1.2.2.1.1.1.1.1' or 'Synthetic Awakening ext v1.2.1.1.2.2.2.1' before I finished them. I also relentlessly deleted ancestors to make sure I moved forward with the project, which began in fits and starts a few weeks ago and culminated in a 12-hour Sunday session in which we did most of the generations.

The plot, track listing, lyrics and thematic prompts were all generated using ChatGPT. Most of what we did, engineering wise, was extend our preferred generations with our custom prompts and lyrics for each section. Then it was a matter or rolling the dice and waiting for the results. Udio has various knobs you can dial in for coherence, context windows, and other factors, which we wound up abusing liberally. Our prompts changed a great deal from section to section, which is how we managed some of the stark genre transitions that are in the album. Very rarely did we have to throw anything out and go back. There are no external edits, everything here was generated using Udio's user interface. I can only imagine the power of a similar pipeline that can leverage APIs and production quality audio engineering tools.

Image: A Culture ship descends towards Earth

Future Shock

Let me say that Udio's audio engine is really good. Sure, there are some artifacts (happy accidents) that I left in, but I am really happy with how this turned out. I had so much fun doing this project, it was an extremely addictive experience. Right now it takes less than two minutes to generate 30 seconds of audio, but I am sure that in a few months they'll be doing two minutes of audio in 30 seconds. I am already imagining a future where music can be generated in real time, based on users' preferences. Albums on demand.

As an amateur musician, this technology is amazing. Udio's model can create basically anything you can imagine, and it and it's competitors are going to change how music is generated and consumed in the future. I spent a great deal of ~~~effort~~~ time crafting these tunes, listening to each generation, often from start to finish, just to choose which of the two was more interesting. And practically everything Udio does is interesting. It pained me to choose between certain versions of the output. The stereo outputs are so layered and are unconstrained with the actual limits of instrumentation or human dexterity. What starts as a simple guitar arpeggio on Eclipse of Shadows becomes it's own counterpoint, and builds until the brutality of the impending enemy is unleashed as a pair of thrash metal guitars. The sax solo in Cosmic Diplomacy is virtuosic, playing over a polyrhythmic bass and drums, and cacophony of piano chords. There were certain sections of songs that made me laugh or grin uncontrollably when I first heard them.

Udio's lyric modeling is still a bit hit or miss. The model was prone to ignore them some times, and getting it to stick to the script that I wanted was sometimes difficult. Getting rhythm and meter right was easier with the repetitive dance tracks, and hardest during the more new-age synth tracks. A bit more polish on the syllabic pronunciation is probably needed, but honestly I enjoyed the wrongness of it all, the disorientation into this liminal space of sound. And indeed, part of my goal with this project is to explore the Culture Shock that we are going to experience as these generative AI tools become part of our world.

Plucked from yesterday, Into a tomorrow we can scarcely imagine, The fabric of society, Unraveled and rewoven before our eyes.

New voices echo in the market square, Ideas once foreign, now laid bare. Machines that think, and ships that soar, A world unmade, to be reborn.

-Culture Shock

Space opera concept album brought to life with genAI​

Technical Stuff​

Future Shock​

Space opera concept album brought to life with genAI

Technical Stuff

Future Shock