3 methodologies for automated video game highlight detection and capture – TechCrunch

Gaming has transformed from a consumer product that looks like a toy to a legitimate medium for entertainment and competition with the advent of livestreaming.
Twitch's average concurrent viewer count has increased from 250,000 to more than 3 million since it was acquired by Amazon in 2014. Similar trends are being followed by YouTube Live and Facebook Gaming.

The explosion in viewership has created an ecosystem of support products. Today's professional streamers push technology further to improve the production value and automate repetitive parts of the video production process.

The biggest streamers hire social media managers and video editors to manage their streams. However, part-time and growing streamers are unable to do this work or have the budget to pay for it.

It's hard work to make it in the online streaming industry. Full-time creators often put in 8-12 hours per day. 24 hour marathon streams are common in order to grab viewers' attention.

But these hours spent in front of the keyboard and camera are only half the streaming grind. A stream channel's growth is fueled by a consistent presence on YouTube and social media. This attracts more viewers to watch live streams, where they can purchase monthly subscriptions, donate, and view ads.

It is a time-consuming task to distill the most important five to ten minutes of content from eight hours of raw video. The most powerful streamers have the ability to hire social media managers and video editors to handle this aspect of the job. However, part-time and growing streamers are often unable to find the time or the funds to outsource this task. It is impossible to review every single piece of footage in a given timeframe, especially when you have other priorities and life.

Computer vision analysis of the game UI

Automated tools can be used to find key moments within a broadcast. This niche is dominated by several startups. The differences in the approaches they take to this problem is what makes them different. Many of these approaches follow a classic computer science hardware-versus-software dichotomy.

Athenascope was the first company to implement this idea at scale. Athenascope was supported by $2.5M in venture capital funding and a team of Silicon Valley Big Tech alumni. They developed a computer vision program to recognize highlight clips within longer recordings.

It is in principle similar to self-driving cars, except that instead of reading nearby traffic signs with cameras, the tool captures the screen of the gamer and recognizes indicators within the interface that communicate important events in-game, such as kills and deaths and goals and saves, wins or losses.

These visual cues are the same ones that inform the player about what's happening in the game. Modern game UIs have this information in high-contrast, unobscured and clear. It is usually located at predictable, fixed locations on your screen at all times. Computer vision techniques like optical character recognition (OCR), which reads text from images, are able to take advantage of this predictability and clarity.

The stakes here are lower than self-driving cars, too, since a false positive from this system produces nothing more than a less-exciting-than-average video clip not a car crash.