Facebook wants machines to see the world through our eyes

Facebook AI Research (FAIR), a division of Facebook, has partnered with 13 universities to create the largest ever set of first-person video specifically to train deep-learning image recognition models. The data set will help AIs to better control robots that interact directly with humans or interpret images from smart glasses. Kristen Grauman, FAIR's project leader, said that machines will only be able help us in daily life if they can see the world through our eyes.
This tech could be used to assist people with disabilities at home or help them in their learning process. Michael Ryoo, a computer scientist at Google Brain and Stony Brook University, New York, said that the video in this data set is closer to how people see the world.

However, the misuses that could be made are alarming and clear. Facebook funded the research, a social media giant recently accused by the US Senate of putting profits above people's well-being. This was confirmed by MIT Technology Reviews' own investigations.

Facebook and other Big Tech companies have a business model that aims to extract as much information as possible from online behavior and then sell it to advertisers. The AI described in the project could expand that reach to people's offline behavior. It will reveal what objects you have around your home, your favorite activities, who you spend time with, where you gazed longest, and so much more.

Grauman says privacy is a key issue when you move from exploratory research to a product. This project could be a good example of that kind of work.


100 hours of footage taken in the kitchen is the largest previous set of first-person videos. Ego4D contains 3,025 hours video that 855 people recorded at 73 locations in nine countries (US., UK. India, Japan. Singapore, Saudi Arabia. Colombia.

Participants were all different in age and background. Some were chosen for their visually appealing occupations like mechanics, bakers, carpenters, or landscapers.

Most data sets from the past consisted of short, semi-scripted clips that lasted only a few seconds. Participants wore head-mounted cameras up to 10 hours per day and recorded first-person video of their daily activities. This included walking down a street, reading, shopping, playing board games and playing with pets. The footage includes audio and data about the location of participants' gazes, as well as multiple perspectives. Ryoo says it is the first such data set.