Synthetic media (also known as AI-generated media, media produced by generative AI, personalized media, personalized content, and colloquially as deepfakes) is a catch-all term for the artificial production, manipulation, and modification of data and media by automation means, especially through the use of artificial intelligence , such as for the purpose of producing automated content or producing cultural works (e.g. text, image, sound or video) within a set of human prompted parameters automatically. Synthetic media as a field has grown rapidly since the creation of generative adversarial networks, primarily through the rise of deepfakes as well as music synthesis, text generation, human image synthesis, speech synthesis, and more. Though experts use the term "synthetic media," individual methods such as deepfakes and text synthesis are sometimes not referred to as such by the media but instead by their respective terminology (and often use "deepfakes" as a euphemism, e.g. "deepfakes for text" for natural-language generation; "deepfakes for voices" for neural voice cloning, etc.) Significant attention arose towards the field of synthetic media starting in 2017 when Motherboard reported on the emergence of AI altered pornographic videos to insert the faces of famous actresses. Potential hazards of synthetic media include the spread of misinformation, further loss of trust in institutions such as media and government, the mass automation of creative and journalistic jobs and a retreat into AI-generated fantasy worlds. Synthetic media is an applied form of artificial imagination.
Despite the technical capabilities of these machines, however, none were capable of generating original content and were entirely dependent upon their mechanical designs.
In 1960, Russian researcher R.Kh.Zaripov published worldwide first paper on algorithmic music composing using the "Ural-1" computer.
In 1965, inventor Ray Kurzweil premiered a piano piece created by a computer that was capable of pattern recognition in various compositions. The computer was then able to analyze and use these patterns to create novel melodies. The computer was debuted on Steve Allen's I've Got a Secret program, and stumped the hosts until film star Harry Morgan guessed Ray's secret.
Before 1989, artificial neural networks have been used to model certain aspects of creativity. Peter Todd (1989) first trained a neural network to reproduce musical melodies from a training set of musical pieces. Then he used a change algorithm to modify the network's input parameters. The network was able to randomly generate new music in a highly uncontrolled manner.Todd, P.M., and Loy, D.G. (Eds.) (1991). Music and connectionism. Cambridge, MA: MIT Press.
In 2014, Ian Goodfellow and his colleagues developed a new class of machine learning systems: generative adversarial networks (GAN). Two contest with each other in a game (in the sense of game theory, often but not always in the form of a zero-sum game). Given a training set, this technique learns to generate new data with the same statistics as the training set. For example, a GAN trained on photographs can generate new photographs that look at least superficially authentic to human observers, having many realistic characteristics. Though originally proposed as a form of generative model for unsupervised learning, GANs have also proven useful for semi-supervised learning, fully supervised learning, and reinforcement learning. In a 2016 seminar, Yann LeCun described GANs as "the coolest idea in machine learning in the last twenty years".
In 2017, Google unveiled transformers, a new type of neural network architecture specialized for language modeling that enabled for rapid advancements in natural language processing. Transformers proved capable of high levels of generalization, allowing networks such as GPT-3 and Jukebox from OpenAI to synthesize text and music respectively at a level approaching humanlike ability. There have been some attempts to use GPT-3 and GPT-2 for screenplay writing, resulting in both dramatic (the Italian short film Frammenti di Anime Meccaniche , written by GPT-2) and comedic narratives (the short film Solicitors by YouTube Creator Calamity AI written by GPT-3).
The term deepfakes originated around the end of 2017 from a Reddit user named "deepfakes". He, as well as others in the Reddit community r/deepfakes, shared deepfakes they created; many videos involved celebrities' faces swapped onto the bodies of actresses in pornographic videos, while non-pornographic content included many videos with actor Nicolas Cage's face swapped into various movies. In December 2017, Samantha Cole published an article about r/deepfakes in Vice Media that drew the first mainstream attention to deepfakes being shared in online communities. Six weeks later, Cole wrote in a follow-up article about the large increase in AI-assisted fake pornography. In February 2018, r/deepfakes was banned by Reddit for sharing involuntary pornography. Other websites have also banned the use of deepfakes for involuntary pornography, including the social media platform Twitter and the pornography site Pornhub. However, some websites have not yet banned Deepfake content, including 4chan and 8chan.
Non-pornographic deepfake content continues to grow in popularity with videos from YouTube creators such as Ctrl Shift Face and Shamook. A mobile application, Impressions, was launched for iOS in March 2020. The app provides a platform for users to deepfake celebrity faces into videos in a matter of minutes.
Synthesized speech can be created by concatenating pieces of recorded speech that are stored in a database. Systems differ in the size of the stored speech units; a system that stores phones or provides the largest output range, but may lack clarity. For specific usage domains, the storage of entire words or sentences allows for high-quality output. Alternatively, a synthesizer can incorporate a model of the vocal tract and other human voice characteristics to create a completely "synthetic" voice output.
Virtual assistants such as Siri and Alexa have the ability to turn text into audio and synthesize speech.
In 2016, Google DeepMind unveiled WaveNet, a deep generative model of raw audio waveforms that could learn to understand which waveforms best resembled human speech as well as musical instrumentation. Some projects offer real-time generations of synthetic speech using deep learning, such as 15.ai, a web application text-to-speech tool developed by an MIT research scientist.
Deepfakes have been used to misrepresent well-known politicians in videos. In separate videos, the face of the Argentine President Mauricio Macri has been replaced by the face of Adolf Hitler, and Angela Merkel's face has been replaced with Donald Trump's.
In June 2019, a downloadable Windows and Linux application called DeepNude was released which used neural networks, specifically generative adversarial networks, to remove clothing from images of women. The app had both a paid and unpaid version, the paid version costing $50. On June 27 the creators removed the application and refunded consumers.
The US Congress held a senate meeting discussing the widespread impacts of synthetic media, including deepfakes, describing it as having the "potential to be used to undermine national security, erode public trust in our democracy and other nefarious reasons."
In 2019, voice cloning technology was used to successfully impersonate a chief executive's voice and demand a fraudulent transfer of €220,000. The case raised concerns about the lack of encryption methods over telephones as well as the unconditional trust often given to voice and to media in general.
Starting in November 2019, multiple social media networks began banning synthetic media used for purposes of manipulation in the lead-up to the 2020 United States presidential election.
In 2024, Elon Musk shared a parody without clarifying that it’s a satire but raised his voice against AI in politics. The shared contains Kamala Harris saying things she never said in real life. A few lines from the video transcription include, “I, Kamala Harris, am your Democrat candidate for president because Joe Biden finally exposed his senility at the debate,” The voice then says that Kamala is a “Diversity hire”, and that she has no idea about “the first thing about running the country”.
These are some examples of synthetic media potentially affecting the public reaction to celebrities, political party or organizations, business or MNCs. The potential to harm their image and reputation is concerning. It may also erode social trust in public and private institutions, and it will be harder to maintain a belief in their ability to verify or authenticate "true" over "fake" content. Citron (2019) lists the public officials who may be most affected are, “elected officials, appointed officials, judges, juries, legislators, staffers, and agencies.” Even private institutions will have to develop an awareness and policy responses to this new media form, particularly if they have a wider impact on society. Citron (2019) further states, “religious institutions are an obvious target, as are politically engaged entities ranging from Planned Parenthood to the NRA. ” Indeed, researchers are concerned that synthetic media may deepen and extend social hierarchy or class differences which gave rise to them in the first place. The major concern tends to revolve around synthetic media is that it isn’t only a matter of proving something that is wrong, it’s also a concern of proving that something is original. For example, a recent study shows that two out three cyber security professionals noticed that deepfakes used as part of disinformation against business in 2022, which is apparently a 13% increase in number from the previous year.
Advanced text-generating internet bot could potentially be used to manipulate social media platforms through tactics such as astroturfing.
Deep reinforcement learning-based natural-language generators could potentially be used to create advanced chatbots that could imitate natural human speech.
One use case for natural-language generation is to generate or assist with writing novels and short stories, while other potential developments are that of stylistic editors to emulate professional writers.
Image synthesis tools may be able to streamline or even completely automate the creation of certain aspects of visual illustrations, such as animated cartoons, comic books, and political cartoons. Because the automation process takes away the need for teams of designers, artists, and others involved in the making of entertainment, costs could plunge to virtually nothing and allow for the creation of "bedroom multimedia franchises" where singular people can generate results indistinguishable from the highest budget productions for little more than the cost of running their computer. Character and scene creation tools will no longer be based on premade assets, thematic limitations, or personal skill but instead based on tweaking certain parameters and giving enough input.
A combination of speech synthesis and deepfakes has been used to automatically redub an actor's speech into multiple languages without the need for reshoots or language classes. It can also be used by companies for employee onboarding, eLearning, explainer and how-to videos.
An increase in cyberattacks has also been feared due to methods of phishing, catfishing, and social hacking being more easily automated by new technological methods.
Natural-language generation bots mixed with image synthesis networks may theoretically be used to clog search results, filling search engines with trillions of otherwise useless but legitimate-seeming blogs, websites, and marketing spam.
There has been speculation about deepfakes being used for creating digital actors for future films. Digitally constructed/altered humans have already been used in before, and deepfakes could contribute new developments in the near future. Amateur deepfake technology has already been used to insert faces into existing films, such as the insertion of Harrison Ford's young face onto Han Solo's face in , and techniques similar to those used by deepfakes were used for the acting of Princess Leia in Rogue One.
GANs can be used to create photos of imaginary fashion models, with no need to hire a model, photographer, makeup artist, or pay for a studio and transportation. GANs can be used to create fashion advertising campaigns including more diverse groups of models, which may increase intent to buy among people resembling the models or family members. GANs can also be used to create portraits, landscapes and album covers. The ability for GANs to generate photorealistic human bodies presents a challenge to industries such as fashion modeling, which may be at heightened risk of being automated.
In 2019, Dadabots unveiled an AI-generated stream of death metal which remains ongoing with no pauses.
Musical artists and their respective brands may also conceivably be generated from scratch, including AI-generated music, videos, interviews, and promotional material. Conversely, existing music can be completely altered at will, such as changing lyrics, singers, instrumentation, and composition. In 2018, using a process by WaveNet for timbre musical transfer, researchers were able to shift entire genres from one to another. Through the use of artificial intelligence, old bands and artists may be "revived" to release new material without pause, which may even include "live" concerts and promotional images.
Neural network-powered photo manipulation also has the potential to support problematic behavior of various state actors, not just totalitarian and Autocracy regimes.
A sufficiently technically competent government or community may use synthetic media to engage in a rewrite of history using various synthetic technologies, fabricating history and personalities as well as changing ways of thinking – a form of potential epistemicide. Even in otherwise rational and democratic societies, certain social and political groups may use synthetic media to craft cultural, political, and scientific Filter bubble that greatly reduce or even altogether undermine the ability of the public to agree on basic objective facts. Conversely, the existence of synthetic media may be used to discredit factual news sources and scientific facts as "potentially fabricated."
|
|