I’ve been slightly regular on the Twitch platform recently, mostly to catch my favorite chess streamers Eric Rosen and Daniel Naroditsky. When I see the number of people viewing Twitch just from the list on the side, I was quite surprised. Apparently, the daily average of people on Twitch is 15 million, which is a staggering number. This made me wonder, what’s next for the researchers and engineers who build models to emulate human characteristics - WaveNet + John Legend + Google + DeepMind, Human-like chess from this paper, etc. This is where I believe Twitch can come into play.
The current state-of-the-art methods in each of the subdomains required to emulate human streaming on Twitch is covered - there are RL agents that play games, GANs that generate humans/characters(VTubers), chatbots that generate text, near-perfect text-to-speech engines. Frankly, a couple of engineers passionate about combining the current SOTA models can generate a system that can stream and try to keep the audience entertained, at least for a few minutes. This can potentially create a new benchmark on the human-level tasks, where the models are expected to perform well on the current benchmarks and are expected to work well in the pipeline with other models. Achieving this could be similar to how Tesla currently trains their networks(check out this video), but with different domains, although all these domains don’t need to be combined at once.
This is ambitious in a lot of ways, but an incremental approach toward achieving this could possibly unlock newer models, techniques, etc. Starting from a simple GAN and chatbot + text-to-speech, where the model could be fine-tuned to talk on a random-topic for the day and building this via a bottom-up approach. This is certainly exciting to think about, and by no means am I an expert on these, so if you have thoughts on this topic, feel free to comment and we can discuss!