=Paper= {{Paper |id=Vol-2903/IUI21WS-HAIGEN-5 |storemode=property |title=avatars4all |pdfUrl=https://ceur-ws.org/Vol-2903/IUI21WS-HAIGEN-5.pdf |volume=Vol-2903 |authors=Eyal Gruss |dblpUrl=https://dblp.org/rec/conf/iui/Gruss21 }} ==avatars4all== https://ceur-ws.org/Vol-2903/IUI21WS-HAIGEN-5.pdf
avatars4all
Eyal Gruss
Tel-Aviv, Israel


                                        Abstract
                                        We present an environment [1] for running First Order Motion Model [2], using a live webcam feed, in
                                        the browser over Google Colaboratory. This allows novice users to experience almost real-time live head
                                        puppeteering, or so called "deep fake avatars", with no need of dedicated hardware, software installation
                                        or technical know how. A rich GUI allows extensive control of model and media options, as well as
                                        some unique innovations including fast auto-calibration and a Muppets generator [3]. This, and other
                                        accompanying notebooks, serve in practice as educational, creative and activist tools.

                                        Keywords
                                        deepfakes, avatars, Google Colab,


1. Main                                                                                                species, live action, cartoon, it’s
                                                                                                       all your call. [Ready Player One
With the advance of the Coronavirus pan-                                                               film, 2018]
demic in the beginning of 2020, The majority
of Human social activity has been forced on-                                                     The repository [1] contains a few Colab
line to the virtual realm. Only a few months                                                     notebooks that attempt to make the tech-
earlier, First Order Motion Model (FOMM)                                                         nology accessible for all. Requiring only
[2] was released, introducing the ability                                                        a browser and a Google account, these
of one-shot video-driven image animation.                                                        notebooks can be operated with one click
Soon followed by [4], a real-time environ-                                                       ("run all"). However, they are also flexible
ment for FOMM allowing using "Avatars for                                                        tools, allowing users to use and manipulate
Zoom, Skype and other video-conferencing                                                         their own selected media. The live webcam
apps". Is the time ripe to claim the once                                                        environment is based on WebSocket similar
promised cybernetic utopia? Could we at                                                          to [5]. To the author’s best knowledge, it
last shed our physical shells and be whoever                                                     is the fastest purely online solution for live
we want to be in Zoom-space?                                                                     FOMM avatars, as well as one of very few
                                                                                                 real-time webcam Colab implementations.
                                    People come to the Oasis for all                             The GUI in figure 1 shows a multitude of
                                    the things they can do, but they                             controls for zooming, calibration, switching
                                    stay because of all the things                               between avatars, generating new avatars,
                                    they can be: tall, beautiful,                                and various model and display parameters.
                                    scary, a different sex, a different                             A novel fast auto-calibration mode that
                                                                                                 works in real-time, finds the best alignment
Joint Proceedings of the ACM IUI 2021 Workshops, April                                           between driver and avatar based on model
13–17, 2021, College Station, USA
" eyalgruss@gmail.com (E. Gruss)                                                                 keypoints (rather than facial landmarks).
~ eyalgruss.com (E. Gruss)                                                                       Following Avatarify [4], which inspired
                                                                                                this project, the user can generate new
                                     © 2021 Copyright ©2021 for this paper by its authors. Use
                                     permitted under Creative Commons License Attribution        avatars based on StyleGAN "This Person
                                     4.0 International (CC BY 4.0).
 CEUR
               http://ceur-ws.org
                                     CEUR   Workshop                      Proceedings            Does Not Exist" [6] website. Taking the
                                     (CEUR-WS.org)
 Workshop      ISSN 1613-0073
 Proceedings
idea further, one can also generate avatars     these facilities are still accessible mostly to
specifically of men, women, boys, girls         the tech savvy and those of means to hire
[7], Waifus [8], Fursonas [9] and Muppets       them. It may not be long, before we have
[3], the latter developed especially for this   ubiquitous and seamless smartphone apps
project by Doron Adler, in collaboration        that can create perfect deep fakes. However,
with the author. One can also drag and drop     it is the author’s opinion that precisely in
local or web images on the GUI to upload        this interim, it is imperative to liberate and
new avatars, as inspired by [10]. Other         democratize the technology.
innovations include an exaggeration factor         The advancement of technology cannot
slider to lever stronger keypoint motions,      be stopped. AI and synthetic media, like
an option to take your own snapshot and         electricity, fire and other technologies, can
puppeteer it, reminiscent to Nvidia Maxine      be used for good and for bad. It can be used
[11], which may help understanding the          both to infringe one’s privacy and to protect
mechanism, an optional post-process step        one’s privacy. It can be used to bully and to
for the pipeline for offline videos, using      harass, or to promote self expression and self
Wav2Lip [12] following FOMM, to fix the lip     acceptance. Fake news is not a new problem.
sync, and combining Wav2Lip with speaker        Blood libels have existed throughout the last
diarization for automatic animated skit         millennia. Photomontage technology has
creation from audio ("Wav2Skit").               been used to fake photographs as early as
   These tools were the basis for several       1857. Videos are harder to fake, but Hol-
workshops and tutorials at international        lywood, Disney and government agencies
festivals and conferences in 2020, including    have been doing so for the last century.
Suoja/Shelter, South Africa NAF, ADAF,          Contemporary examples show that it is
Reclaim Futures, Fubar, ISEA, Technarte,        enough to change the label on an image,
EVA London, Piksel, Stuttgarter Filmwinter,     or slightly edit an audiovisual recording,
Dorot-Con and MozFest [1]. They are now         to achieve a strong effect. The solution to
being introduced in elementary and middle       combat this is in education. Making the
schools in Israel with the Pisga-Cyber excel-   technology accessible to educators, artists,
lence program [13]. A pleasantly surprising     journalists as well as the general public, will
first real-world usage of the described         serve to raise awareness, healthy skepticism
system.                                         and critical thinking, toward media and the
                                                spectrum of contemporary possibilities in
                                                media creation and manipulation.
Broader impact and ethical
implications
This is a dangerous time. The ability to
synthesize and manipulate media is improv-
ing by the day. In the quality of outcome,
in the mediums, modalities and conditions
dealt with, in the required compute and
data resources, and in the availability and
accessibility of the technology. We are
in the midst of a transition period, were
Figure 1: GUI for live webcam avatar in Colab. The author (left) is puppeteering a generated Muppet.
References
 [1] E. Gruss, avatars4all, 2020. URL: https:
     //github.com/eyaler/avatars4all.
 [2] A. Siarohin, S. Lathuilière, S. Tulyakov,
     E. Ricci, N. Sebe,       First order mo-
     tion model for image animation, in:
     H. Wallach, H. Larochelle, A. Beygelz-
     imer, F. d'Alché-Buc, E. Fox, R. Gar-
     nett (Eds.), Advances in Neural Infor-
     mation Processing Systems 32, Curran
     Associates, Inc., 2019, pp. 7137–7147.
     URL: https://aliaksandrsiarohin.github.
     io/first-order-model-website.
 [3] D. Adler, E. Gruss, This mup-
     pet does not exist, 2020. URL:
     https://thismuppetdoesnotexist.com.
 [4] A. Aliev, K. Iskakov, Avatarify, 2020.
     URL:          https://github.com/alievk/
     avatarify.
 [5] a2kiti, Webcam google colab, 2020.
     URL:           https://github.com/a2kiti/
     webCamGoogleColab.
 [6] 2020.            URL:              https://
     thispersondoesnotexist.com.
 [7] 2020. URL: https://fakeface.rest.
 [8] 2020.         URL:           https://www.
     thiswaifudoesnotexist.net.
 [9] 2020.            URL:              https://
     thisfursonadoesnotexist.com.
[10] 2020. URL: https://terryky.github.io/
     tfjs_webgl_app/face_landmark.
[11] 2020. URL: https://developer.nvidia.
     com/MAXINE.
[12] K. R. Prajwal, R. Mukhopadhyay,
     V. Namboodiri, C. V. Jawahar, A lip
     sync expert is all you need for speech
     to lip generation in the wild, 2020.
     URL: http://bhaasha.iiit.ac.in/lipsync.
     arXiv:2008.10010.
[13] 2020. URL: https://pisgacyber.co.il.