CBN Friday Special丨“AI Stefanie Sun” takes social media by a storm, unnervingly good but likely ille

2023-05-19 21:01:01 来源 : 21世纪经济报道

Singaporean singer Stefanie Sun would probably never have imagined that she would bask in the limelight again in this manner. The sudden revival of interest in Sun, who hasn’t released an album since 2017, comes not from the artist having another moment of genius.

Her voice is the centerpiece of a series of music videos generating a buzz on Bilibili.com, a Gen-Z-dominated Chinese video platform, recently. But she hasn't sung these songs. They were generated entirely by artificial intelligence software.

(资料图)

Such videos are generated by so-vits-svc fork, an open source software developed on GitHub that enables anyone to train their own AI model to speak in any voice and language.

One of the uploader has garnered millions of views on the few AI-generated videos he posted. Many left comments saying they were astonished by the fidelity of Sun's voice generated by AI.

With Sun's distinctive voice and the way she articulates while singing, her voice has a grainy quality and a unique breath. These qualities somehow matched the AI model, which is probably why videos using Sun's voice have gained the most views.

In addition to generating Sun's voice, the software has also replicated the voices of other famous singers including pop diva Faye Wong and Singapore's JJ Lin.

Meanwhile, on the social media platform Weibo, the hashtag “I didn’t expect that the first jobless person after AI came out was Stefanie Sun” garnered a whopping 4.32 million views, and another hashtag entitled “Expert detailed explanation for why AI Sun is infringement” amassed a staggering 4.46 million views. Many netizens discussed the potential benefits and legal drawbacks of AI-generated music.

Fans in reminiscence

Sun is no stranger to Chinese audiences. Rising to fame in the early 2000s with her hit albums, she quickly became one of the most popular singers in China. Her unique voice, coupled with her charming personality, won the hearts of millions of fans across the Chinese-speaking world.

Despite her popularity, Sun has not held any live concert in China since her last appearance at the end of 2019. This has left many fans feeling disappointed, especially given the fact that though she had not released any new albums since 2017, several singles have been released over the years.

During the pandemic, Sun hosted several online concerts. A one-hour online concert in 2022 orchestrated by Douyin, the Chinese version of TikTok, saw more than 240 million viewers. Though Sun last week appeared at a music festival in Central China's Changsha, fans are far from satisfied as she only sang a couple of songs.

Reminiscing on the golden age of Mandarin pop music, tech-savvy Chinese internet users took the liberty of mimicking Sun’s voice using singing voice conversion, a deep learning method that lets a user deliver one person’s singing in another person’s voice, and swap it into a compilation of Mandarin pop classics.

A search for “AI孙燕姿” (“AI Stefanie Sun”) yields hundreds of videos on Bilibili uploaded within the last month. The most popular ones have amassed over one million views. WeChat Index, which tracks keywords across the super app’s social and content ecosystem, shows that the term’s trending score skyrocketed to 50,000 on May 5 from zero just two days before.

One of them, a video of Fa Ru Xue, or "Hair Like Snow", in which Sun's voice was digitally inserted in place of the original Chinese singer, Jay Chou, or "Rainy Day" by music group Nan Quan Mama - both of which have over 1 million views on the platform since it was posted on April 14, and resulted in some surreal experiences for music fans.

Generative AI has found adoption in helping to fill people’s emotional void, whether it’s used for remembering deceased loved ones or, in deepfake Sun’s case, addressing the dearth of good Mandopop today. As one AI product manager tweeted: “It’s like Sun’s fans have suddenly entered the festival mode.”

AI singers seemed to spring up, almost overnight. How did they come along suddenly?

In fact, deepfake singing has captivated audiences in the West just a bit earlier than AI Stefanie Sun became popular. The first sign of trouble came in February, when DJ David Guetta announced that the sample of Eminem’s voice he’d played during a recent live set had been created with AI. In March, the electronic hip-hop duo AllttA shared the track “Savages,” in which a human rapper trades verses with an AI Jay-Z.

And then, most famously, in early April, an anonymous producer released an original song called “Heart on My Sleeve” featuring AI vocals modeled on those of Drake and the Weeknd. “Heart on My Sleeve” was streamed by tens of millions of people, some of whom noted that they liked it better than recent singles by the actual Drake and Weeknd.

TikTok and YouTube are now flooded with music by AI clones, including covers of “Get Lucky,” by AI Michael Jackson, “Party in the U.S.A.,” by AI Ariana Grande, “Song 2,” by AI Kurt Cobain, and “Kill Bill,” by AI Rihanna.

The software that makes this possible is called SoftVC VITS Singing Voice Conversion, or So-Vits-SVC. It’s free, open source, and can run locally on any computer with a decent GPU. When it launched in March, it was buggy and required coding ability to use, but it’s been getting easier as updated versions arrive almost daily. If you just want to create a simple cover song, there are now websites that automate most of the process.

Two months ago, AI voice-cloning technology barely existed. Now it’s forcing the music industry to consider such tricky questions as whether pop stars own the sounds produced by their own larynges and if we even need flesh-and-blood pop stars at all anymore.

Fake singing, which was already rampant, becomes even harder to detect, the threshold for singers' profession becomes lower, and copyright infringements regarding a singer's voice become more frequent.

AI singers for good cause?

While many viewers were shocked by the AI-generated songs, commenting that it is too difficult to distinguish their idols' voices from the AI versions, with some embracing a technology that provides people with a different way to enjoy their favorite music.

By restoring the voices of deceased singers such as Teresa Teng, Leslie Cheung, and Michael Jackson, audiences can mourn them. For example, to commemorate the 22nd anniversary of Teresa Teng's death, the Japanese program "Kin SMA" used holographic projection technology to "resurrect" the late Chinese singer, providing comfort to their fans and loved ones.

Some musicians hold an open attitude towards "AI cloning.” Canadian singer Grimes openly welcomes people using AI to imitate her voice for creative purposes, saying creators are free to use her voice without penalty and she will split royalties derived from any successful recordings with them.

Chinese Taiwanese crooner Sandee Chan revealed on social media last month that her new song "Teach Me How to Be Your Lover," released on March 14, was actually sung by "AI Sandee Chan".

The news shocked the music industry because no one had noticed it before. In fact, some fans even commented that Chan's voice seemed to have "regained her youth," and her singing level in the number was better than her recent performances.

A well-trained "AI singer" can deceive the vast majority of people. Chan later revealed her intention, saying that she had hoped the song could make people who care about creativity think.

If the AI era is inevitable, then maybe creators should not care more about "whether we will be replaced," but "what else we can do," she pointed out.

Possible copyright infringement

As ChatGPT, an AI chatbot developed by Open-AI, takes the technology world by storm, AI-related products and services have rapidly grown. While they have improved the efficiency of people's lives and work, they have also created new problems.

Legal professionals have expressed concerns about the possibility of rights infringements. AI-generated songs allegedly infringe upon the copyright of singers, lyricists and composers, even as some AI zealots argue that they play the songs for free just for fun.

It's hard for the producers of AI-generated songs to explain that they didn't make the songs for profit, because uploading the videos on such a big streaming platform can legally be deemed as business behavior.

Lawyers said using AI to simulate a singer's voice without permission, and then mass-sharing it, meets the definition of infringement according to the Chinese Copyright Law. The AI-generated songs may even be considered as trademark infringement, if a singer successfully registers a trademark through his or her unique or recognizable voice.

Moreover, music videos featuring AI-generated voices may also have infringed on the names, portraits and voice rights of those singers. Using celebrities' names and imitating their voices through AI could also create unfair competition, because the act could mislead audiences and confuse the public.

Douyin is the quickest to address the legal implications of the explosion of AI content. The ByteDance-own company published last week a guideline on AI-generated content, which is largely based on China’s new synthetic technology regulation.

Content uploaders should mark AI-generated content with “distinguishing labels” and are responsible for the “consequences” of such content, the short video platform’s guideline reads. Any content that infringes on copyrights is prohibited and subject to “severe punishment” once detected by the platform.

The question is, then, whether songs made with tools that mimic singers’ voices without their consent violate the artists’ rights. Sun hasn’t publicly responded to the dozens of songs created using her AI voice.

Last November, the Cyberspace Administration of China (CAC), China’s Ministry of Industry and Information Technology, and the Ministry of Public Security jointly issued regulations on the management of deep synthesis in internet information services. These regulations make institutional arrangements to regulate the application of deep synthesis technology.

On April 11, the CAC released the draft Administrative Measures for Generative AI Services to solicit public opinion. This is undoubtedly a meaningful start for promoting the healthy development and standardized application of the generative AI technologies.

The 21-article draft clarified that the country supports innovation, promotion, use and international cooperation on AI, but underscored that actions will be taken if AI-generated products, including texts, images, voices and videos, are found to have infringed on people's images, reputations, privacy or business secrets.

After the draft regulations are finalized and implemented, they will help better define the rights and interests, as well as liabilities and obligations, of the AI platform companies and the original intellectual property rights owners of the songs, pictures, films, literature works and other intellectual products from which the AI companies can seek profits through “re-creation”.

一夜之间，“AI孙燕姿”爆火出圈。

如果问华语乐坛近期产量最高的歌手是谁，“AI孙燕姿”一定榜上有名。目前，在B站与“AI孙燕姿”相关的视频已经有了上千条，翻唱类型包括民歌、童谣、动漫主题曲、流行歌曲等等，其中翻唱的《发如雪》《下雨天》都超过100万，《半岛铁盒》《爱在西元前》等播放量也超过了60万，还有其他的翻唱歌曲也都有着可观的播放量。

据了解，“UP主”子鱼（化名），虽然至今只做了4个“AI孙燕姿”的音频，但总播放数已经超过了150万次。其中仅《下雨天》单曲就超过了100万次，收藏人数接近2万人。他表示，用模型去替换掉原本歌曲的干声即可合成歌曲，好似柯南脖子上的“蝴蝶结变声器”，熟手的话，整个操作过程不超过2小时。在搜索引擎里输入“AI歌手”、”声音克隆“等关键词，很容易就能找到相关的视频教程或是文字教程。

据了解，“AI孙燕姿”是通过AI技术提取歌手孙燕姿的音色，然后再翻唱其他歌曲。这种“音色转换”的技术，需要提前“准备数据集”，需要准备很多首歌手孙燕姿的高品质曲目，还有她平时的一些采访、直播等视频素材——质量越高、数量越多效果就越好。之后，通过“去除呼吸声”等细节处理，以及一系列复杂的“训练”，即可得到自己的“AI孙燕姿”。

“技术门槛”的降低，让各大平台上纷纷涌现出“AI王心凌”、“AI周杰伦”的翻唱作品。网友评价：本人未曾开口，轻而易举地“占据”华语乐坛半壁江山。

不过，目前各种AI翻唱视频的播放量都不如“AI孙燕姿”。为什么孙燕姿的声音被AI仿出了欢迎度？

B站的一位创作者回应网友称，他也尝试了周杰伦、林俊杰、王菲等歌手的音色转换，但效果不如孙燕姿的好，他认为这位小天后的声音“颗粒感清晰，音色百搭，有特点”，因此翻唱其他歌曲也更合适。

有人说，2003年大火的歌手是孙燕姿，2023年爆火的歌声是“AI孙燕姿”。孙燕姿的最大对手，可能是20年前自己的嗓音。

对于那些一直期待看到孙燕姿现场演出但却一直没有机会的粉丝们来说，AI技术成了他们的救星。他们自己动手使用AI生成他们想听的歌曲，有些歌曲甚至是孙燕姿本人从未唱过的。当然，现在的AI技术虽然只能模仿音色，无法完全还原孙燕姿所有的唱法技巧，但至少能够复刻她的嗓音，为歌迷们提供了足够的二创空间。

AI歌手集体出道

AI歌手如今看似大行其道，但并非今时今日才出现的新事物，而是能够追溯到AI驱动型虚拟偶像的诞生。2007年，虚拟偶像鼻祖初音未来便是以电子语音合成软件VOCALOID2引擎为基础，采用了声优藤田的声源，五年后，国内第一位中文虚拟偶像洛天依的声音最初基于VOCALOID3引擎合成，复制了初音未来的运营模式。

而现在流行的AI音色替换软件，则实现了“高仿版个人专属演唱会”。尤其是在今年演唱会票价高昂且秒光，市场供需不匹配的情况下，在有意识的推波助澜下，粉丝、上传者完成了一场“共谋”与狂欢。令粉丝激动的除了对新技术的好奇，天王天后的“跨界碰撞”，还有对带有情怀滤镜的偶像许久不发新歌的“代偿”心理。

在海外，AI歌手真正爆火是在上个月。由AI模仿知名歌手Drake的《Heart on My Sleeve》一度被上传到各大音乐平台，歌曲在TikTok上点击量超过1500万次，在流媒体音乐平台Spotify上的播放量超过60万。

这是一个名叫Ghostwriter997的创作者发布的，创作风格和声音来自Drake和The Weeknd。很快，这首歌引起了Drake和The Weeknd所属的唱片公司环球音乐集团的注意，该公司的副总裁James Murtagh-Hopkins发表声明，“使用我们旗下艺人的声音进行训练进而生成内容，这既违反了我们的协议，也违反了版权法。”

在环球音乐的投诉下，被AI仿声演唱的《Heart on My Sleeve》相继从Spotify、Apple Music、YouTube、Amazon Music等各大平台下架。除此之外，环球音乐集团还要求Spotify等音乐流媒体平台切断AI公司对其内部音乐的访问权限，防止开发人员利用版权音乐训练AI模型。

随着“AI孙燕姿”的蹿红，“AI王心凌”、“AI周杰伦”也出现在B站。AI翻唱的出现与仿声应用比以前更优质、更易上手有关。如果你在网上检索“AI翻唱”、“声音克隆”等关键词，教程一抓一大把，大部分教程中提到了一个采用音色转换算法的开源AI项目So-VITS-SVC。

同AI作画以画家作品训练AI模仿其画风的原理一样，“喂”给AI更多的歌手公开音频资料，便能够得到更准确的训练结果，使之在音色模仿方面达到八成以上的相似度。

使用该声音训练模型，制作一首AI翻唱曲目只需三步：使用某些音频软件分离歌曲中的伴奏与人声，将人声音频拆分成5秒-15秒的小段；然后利用处理干声文件通过So-VITS-SVC中的程序训练目标音色模型；最后，用该模型对目标转换文件进行推理预测，就得能得到AI翻唱的歌曲。有跟着教程学习过的网友称，即使是个新手，学几个小时后也能制作出和原歌手有三五分相似度的AI翻唱歌曲。

平台也为这些UP主大开方便之门，4月28日至6月24日，B站音乐区推出“虚拟之声创作计划”，UP主可选择话题“AI虚拟之声实验室”投稿，平台将给予优质稿件提供流量扶持和活动奖励。目前，该活动已经有1亿浏览，超34万次讨论。

但在B站“AI歌手”爆火之后，Sovits开发者Rcell很快发布紧急公告称，由于最近在B站等平台出现了众多使用svc (包括Sovits,diff-svc等)和未授权数据集训练明星、知名艺人、知名公众人物的模型，且这些作品在平台流量高，并且引起敏感性话题。为避免引起更严重的法律问题，Sovits仓库已经删除。

是福是祸？

在长沙音乐节上，孙燕姿被粉丝问到了关于”AI孙燕姿”的事情，她只是笑笑表示：“有听过此事”。经纪公司也说：“目前并没有委请律师处理。”

通过模型训练和后期处理，让AI用孙燕姿的声音翻唱其他歌手的歌曲。这样的行为究竟是一种新技术勃发后，创作者们搏粉丝一笑的创意行为，还是用技术圈钱，实则侵犯了歌手“声音权”的侵权行为，对此网络上议论纷纭。这一波势不可挡的AI浪潮，究竟该以怎样的姿态面对？

有律师指出，首先，“AI孙燕姿”借用了歌手孙燕姿的名字。根据法律规定，公民姓名权作为重要的人格权，受到法律的严格保护。如果未经孙燕姿本人允许，使用“AI孙燕姿”为人工智能歌手冠名，显然侵犯了对方的姓名权，损害了对方的商誉。其次，如果“AI孙燕姿”没有得到许可而去翻唱，将可能构成对他人著作权的侵犯。比如“AI孙燕姿”翻唱的《下雨天》《发如雪》在B站点击量破百万，带来不菲的流量利益，已然有侵权之嫌。除了创作者，如果平台推荐AI歌曲或将AI歌曲放进排行榜，平台就有标注义务，告知访问者，这样可以避免混淆。

虽然部分创作者极力解释自己只是兴趣使然，并未商用获取收益，但其实际行为也已构成侵权。

有法律专家表示，AI翻唱主要是一些歌手声音的再现，会涉及对原创（歌手）声音的模仿，这种情况下往往会涉及表演者对声音所享有的相应权利。另外，通常判断构成侵权行为的方式，并不是以点击量或者是有偿无偿的方式判断，所以即使不以营利为目的，AI翻唱作品依旧属于侵权。

对于版权问题，歌手格莱姆斯，她就公开表示过“很乐意大家使用她的声音”。但她不是没有条件的，她的要求是要分50%的版税。处在AI大爆发的今天，一定还会有更多眼花缭乱的应用给法律、监管甚至伦理出难题。如何平衡监管与创新，显然还需要在具体实践中不断摸索。

另外，技术本身没有善恶，关键要看使用者的意图一样。有网友提出，AI技术可以还原邓丽君、张国荣这些已故歌手的声音，让听众们得以缅怀，完成偶像的“数字永生”。AI歌手的爆火，某种程度上意味着歌手的音色获得了永生。哪怕只是模仿音色，也足够让人兴奋了。

但也需要提防，AI滥用引发伪造声音诈骗等违法犯罪行为。所以，“AI歌手”的出现既有可能更好为公众服务，也有可能成为犯罪者的帮凶，其核心还是要看使用者的抉择。

如果说AI歌手目前尚处在自娱自乐的阶段，而陈珊妮则主动拥抱了“歌手AI”。今年3月，歌手陈珊妮在社交平台宣布，她于3月14日发布的新歌《教我如何做你的爱人》是自己的AI模型演唱的，封面也由AI生成。她“调教”了一年，发音呼吸，和声都是全AI完成。评论区里，有人感叹，“难怪觉得公主的vocal回春了，原来是AI唱的。”

在她看来，如果AI的时代必将到来，身为创作人该在意的或许不是“我们是否会被取代”，而是“我们还可以做些什么”，她的实验证明了制作人是音乐创作中的核心环节。

动了谁的蛋糕？

站在歌手背后的音乐公司，观感就更复杂了，AI歌手直接动了他们“蛋糕”。

在国外，AI抵制者则已经开始采取手段维护自己的权益。去年，美国唱片业协会向美国政府提交了一份AI开发者的名单，声称它们抓取受版权保护的作品来训练模型的行为是“未经授权使用的，侵犯了我们会员的权利”。

环球唱片执行副总裁迈克尔·纳什（Michael Nash）更是在一篇专栏文章中直言，人工智能音乐正在稀释市场，使原创作品更难找到，并侵犯了艺术家获得作品报酬的合法权利。

3月16日，美国唱片业协会等30多个社会团体，还共同发起了“人类艺术运动”，以保证AI不会取代或“侵蚀”人类文化和艺术。

事实上，针对人工智能技术的监管也一直在推进。4月11日，国家互联网信息办公室发布《生成式人工智能服务管理办法（征求意见稿）》，对隐私、知识产权、训练数据、不公平竞争等设立了“藩篱”，重点是训练数据要合法，不得侵犯知识产权，AI工具提供者应承担生成内容生产者的责任。由此可见，不久后或将迎来AI技术管理办法的正式公布。

5月9日，抖音平台发布了一则针对人工智能生成内容的倡议，禁止利用生成式人工智能技术创作、发布侵权内容，包括但不限于肖像权、知识产权等，一经发现将严格处罚；此外还要求发布者对人工智能生成内容进行显著标识，加上统一的官方“水印”。而针对纯AI驱动型主播，平台提到：“使用已注册的虚拟人形象进行直播时，必须由真人驱动进行实时互动，不允许完全由AI驱动进行互动。”这意味着真人驱动型虚拟主播被允许，而纯AI驱动型主播则被封杀。

继AI绘图、ChatGΡT，打工人焦虑不断，现在轮到了歌手。但越是科技浪潮滚滚而来，人类越是需要保持清醒：“人”才是作品的核心。AI技术可以模拟出声音，但是不能模拟出音乐的内涵和精神，只能作为一种工具或媒介。

孙燕姿从出道到现在23年，发行过14张专辑和若干单曲，估计约200首。也正是因为像她这样的歌手出作品的周期长、数量少，让歌迷一直苦苦等待她的新歌，才使得“AI歌手”有了市场，给了歌迷们“AI代餐”。

不过，如果以平均每天40多首的速度，一年就可以产生14000多首作品，10年就有14万首，而数量的增多意味着作品将从“稀缺”变为“充足”甚至“过剩”，到那时不知人们是否还会对歌手本人的作品抱有期待。

人类歌手或许不会马上被AI取代，但放任AI歌手无序生长，滋生出的种种问题和潜在危机也会直接打击创作者的创作积极性。有了规范有了倡议，才能更有利于人工智能产业的良性健康发展。在遵从科技伦理、人本主义、法治理念的基础上，人工智能相关立法还需不断完善，紧跟科技革命的步伐，让大众在AI这片新的疆土上自由驰骋时，不忘套上法律法规和伦理道德的缰绳。

Executive Editor: Sonia YU

Editor: LI Yanxia

Host: Stephanie LI

Writer: Stephanie LI

Sound Editor: Stephanie LI

Graphic Designer: ZHENG Wenjing, LIAO Yuanni

Produced by 21st Century Business Herald Dept. of Overseas News.

Presented by SFC

编委: 于晓娜

策划、编辑：李艳霞

播音：李莹亮

撰稿：李莹亮