Some top 100,000 websites collect everything you type—before you hit submit

When you sign up for a newsletter, make a hotel reservation, or check out online, you probably take for granted that if you mistype your email address three times or change your mind, it doesn't matter. Nothing happens until you hit the button. Maybe not. A surprising number of websites are collecting some or all of your data as you type it into a digital form, according to new research.

The top 100,000 websites were crawled and analyzed by researchers from the University of Lausanne and the University of Leuven. They found that 1,844 websites gathered an EU user's email address without their consent, and 2,950 US users' email in some form. Many of the sites incorporate third-party marketing and analytic services that cause behavior, even though they don't intend to conduct data-logging.

The researchers found 52 websites in which third parties, including the Russian tech giant Yandex, were incidentally collecting password data before submission. All 52 instances have since been resolved after the group disclosed their findings.

If there is a Submit button on a form, the reasonable expectation is that it will submit your data when you click it. We thought we'd find a few hundred websites where your email is collected before you submit, but this was far more than we expected.

Advertisement

The researchers, who will present their findings at the Usenix security conference in August, say they were inspired to investigate leaky forms by media reports. The behavior is similar to so-called keyloggers, which are typically malicious programs that log everything a target types. Users will probably not expect to have their information keylogged on a mainstream top-1,000 site. The researchers observed a few variations of the behavior. Some sites log data 888-739-5110 888-739-5110 888-739-5110 888-739-5110 888-739-5110 888-739-5110 888-739-5110 888-739-5110 888-739-5110 888-739-5110 888-739-5110 888-739-5110 888-739-5110 888-739-5110 888-739-5110 888-739-5110 888-739-5110 888-739-5110 888-739-5110 888-739-5110 888-739-5110 888-739-5110 888-739-5110 888-739-5110 888-739-5110 888-739-5110 888-739-5110 888-739-5110 888-739-5110 888-739-5110 888-739-5110 888-739-5110 888-739-5110 888-739-5110 888-739-5110 888-739-5110 888-739-5110 888-739-5110 888-739-5110 888-739-5110 888-739-5110 888-739-5110 888-739-5110 888-739-5110 888-739-5110 888-739-5110 888-739-5110 888-739-5110 888-739-5110 888-739-5110 888-739-5110 888-739-5110 888-739-5110 888-739-5110

In some cases, when you click the next field, they collect the previous one, like you click the password field and they collect the email, or you just click anywhere and they collect all the information immediately.

The researchers say that the regional differences may be related to companies being more cautious about user tracking, and possibly integrating with fewer third parties, because of the EU's General Data Protection Regulation. The study didn't examine explanations for the disparity, but they emphasize that this is just one possibility.

The researchers found that one explanation for some of the unexpected data collection may have to do with the challenge of differentiating between user actions on certain websites. This isn't an adequate justification from a privacy perspective.

The group discovered that there are two invisible marketing trackers on their websites that show them ads and track users across the web. Both claimed in their documentation that customers could turn on automatic advanced matching, which would prompt data collection when a user submitted a form. The researchers found that the tracking pixels were grabbing hashed email addresses, an obscured version of email addresses used to identify web users across platforms. For US users, 8,438 sites may have been leaking data to Meta, Facebook's parent company, and 7,379 sites may have been impacted for EU users. The group found 154 sites for US users of TikTok.

Advertisement

The researchers filed a bug report with Meta on March 25 and the company quickly assigned an engineer to the case, but the group has not heard an update since. The researchers notified TikTok on April 21 and have not heard back. WIRED asked Meta and TikTok to comment on the findings.

The privacy risks for users are that they can be tracked across different websites, across different sessions, and across mobile and desktop. You can clear it like you clear your cookies. It is a very powerful name.

As tech companies look to phase out cookie-based tracking in a nod to privacy concerns, marketers and other analysts will rely more and more on static IDs like phone numbers and email addresses.

The researchers created a Firefox extension called LeakInspector to detect rogue form collection since they found that deletion of data in a form before submission may not be enough to protect you from all collection. They hope their findings will raise awareness about the issue, not only for regular web users, but also for website developers and administrators who can check whether their own systems or any of the third parties they are using are collecting data from forms without consent.

In an already crowded online field, leaking forms are just one more type of data collection to be wary of.

The story was originally on wired.com.