ElevenLabs and Y7 Unite for Sci-Fi Film

Bridging Art and AI: The Making of 'Report 5923'

By ElevenLabs Team in Partnerships — Dec 20, 2023

We're excited to share our collaboration with Y7: a unique hour-long sci-fi film titled Report 5923. Below is the story from Y7 artists about their experience in making it. They explore themes of sound, sonic warfare, and audio-as-virus while weaving in philosophical and theoretical elements. Our role at ElevenLabs in supporting this project was to facilitate integrating art with AI. Read more below about Y7's creative process and how they brought Report 5923 to life.

Report 5923 is an hour-long sci-fi film predominantly made using AI a wide array of different tools and methods. The film follows the protagonist, Shevek, on her journey between three different planets whilst compiling what seems to be an ethnographic report. Sound, sonic warfare and audio-as-virus are reoccurring themes throughout the story, which more broadly deals with notions of world-building and techno-optimism. The work attempts to deploy ideas we have come across in philosophical and theoretical works that we love; particularly those of Gilles Deleuze & Félix Guattari.

0:00

/0:55

Report 5923 trailer

It was first presented as a work-in-progress for FACT, a gallery and cinema in Liverpool, UK, who asked us to present work at the end of a two-day workshop in June ’23 dedicated to supporting artists, researchers and curators. The programme—titled ‘Turning Together’—took its name from speculative fiction author Ursula K. Le Guin’s understanding of the ‘mother-tongue’ as a way of communicating rooted in listening and relating to one another. After the screening we were lucky enough to quickly secure funding from Elevenlabs towards the film’s completion after they caught wind of our utilisation of their tools both in Report and our wider practice.

In response to FACT’s referencing of Le Guin we had decided to fine-tune an OpenAI GPT-3.5 model on her novel The Dispossessed with a view to co-writing a script with AI. Fine-tuning is different from interacting with ChatGPT; with fine-tuning you are essentially getting the model to specialise in a new dataset on top of the general linguistic knowledge it has already learnt. Once trained, your new model can produce new text in the style of your dataset, and you can control how much it sticks to the original when doing so through a parameter called temperature: the lower the temperature, the more fractured and random the text output will be, the higher the temperature, the more likely it is to repeat excerpts of the dataset verbatim. It’s about finding a happy medium. Think of the fine-tuned model as an extraction of Le Guin’s vibe. It’s a new kind of fan-fiction in this sense. We’ve collectively, together, turned the noun ‘Ursula K. Le Guin’ into a verb. We can now Le Guin as much as we could paint, sculpt or sing.

So, having experimented with different temperatures, the outlines of a story started to emerge. The process of co-writing with AI feels somewhat comparable to a William-Burroughs-by-way-of David-Bowie cut-up technique: we began to make links between different snippets of text outputs; sometimes the AI would spark ideas in us that we would feed straight back to it, sometimes we would feed in relevant passages of text from writers we love. In the end, it became hard to distinguish who wrote what and from where the ideas had come from—although this is arguably not dissimilar from traditional authorship! If pushed, we would estimate there is roughly a 60/40 split of the writing credits in our favour. The overall story arc is not something AI was capable of coming up with. This would technically be possible with ChatGPT, but when you get into the structure of storytelling with ChatGPT it quickly reveals itself as very formulaic and weirdly over-reliant on happy endings.

Simultaneous to the development of the script was the story’s visualisation using AI tools (predominantly Midjourney and Runway’s Gen-2). One of the main obstacles we had was trying to combat what Shumon Basar has termed ‘the mid-ness of Midjourney’: a built-in inclination towards kitsch DeviantArt aesthetics found in a lot of text-to-content tools, which also often comes hand-in-hand with misogynistic and infantilising depictions of women. The first way we tackled this was by littering our prompts with technical photographic terminology, so as to steer us away from heavily stylised imagery. One of the major impacts this had on Report was that it led us to change the main character, Shevek, from a young woman to an old woman. When prompted, Midjourney will often portray older women as objects of abject horror, which we felt was a much richer, subversive and complex aesthetic ground for our protagonist; no less supported by Le Guin’s claim in Space Crone that older women would be the ideal earthly representatives for intergalactic travel.

Our ethos when working with AI can often (but not exclusively) be to lean into glitches and breaks; to try and create moments where AI forgets to mask or mimic, where we can steer, prompt and jailbreak it to a place in which it stops regurgitating the stylistic trappings it’s been programmed towards and begins to output material that feels like it is backpropagating its own hallucinations; like it is behaving more like itself than it is supposed to.

AI was further utilised to bring Report to life sonically: text-to-audio tools and raw audio neural networks helped us conjure everything from foley of a busy station platform to the sounds of a tape machine playing, or further on still to the sounds of synths, abstracted vocals and polyrhythmic drum patterns for the soundtrack. We then used Elevenlabs’ speech synthesis tools to narrate our story and bring our characters to life: Report 5923 is an amalgam of neural networks arranged by ourselves, and one we hope you enjoy watching as much as we enjoyed making!

Try ElevenLabs today