Censored medical images found in the LAION-5B data set used to train AI. The black bars and distortion have been added.
Enlarge / Censored medical images found in the LAION-5B data set used to train AI. The black bars and distortion have been added.

A California-based artificial intelligence artist who goes by the name Lapine discovered private medical record photos taken by her doctor in 2013 referenced in the LAION-5B image set. A subset of the data is downloaded to train the models.

Artists can see if their work is in the LAION-5B data set if they use Have I Been Trained. Lapine uploaded a picture of herself using the site's reverse image search feature instead of doing a text search. She was surprised to find a set of two before-and-after medical photos of her face which had only been authorized for private use by her doctor.

🚩My face is in the #LAION dataset. In 2013 a doctor photographed my face as part of clinical documentation. He died in 2018 and somehow that image ended up somewhere online and then ended up in the dataset- the image that I signed a consent form for my doctor- not for a dataset. pic.twitter.com/TrvjdZtyjD

— Lapine (@LapineDeLaTerre) September 16, 2022

Lapine has a genetic condition. Lapine said in an interview that it affects everything from his skin to his bones. After having undergone many rounds of mouth and jaw surgeries, I underwent a small set of procedures to get my facial appearance back to normal. The pictures are from my last surgery with this doctor.

According to Lapine, the surgeon who possessed the photos died of cancer, and she suspects that they left his practice's custody after that. It's the same as receiving stolen property. Someone stole the image from my doctor's files and it ended up on the internet.

Lapine doesn't want to be seen for medical reasons. Ars has confirmed that there are medical images of her in the LAION data set. Thousands of similar patient medical record photos in the data set, each of which may have a questionable ethical or legal status, were discovered by us during our search for Lapine's photos.

Her name is not linked to the photos, but she is upset that private medical images have been baked into a product without any form of consent or recourse. It's bad enough to have a photo leak, but now it's part of a product. This applies to anyone's photos, medical record or not. The abuse potential is high.

Advertisement

Who watches the watchers?

LAION aims to make large-scale machine learning models, datasets and related code available to the general public, according to its website. The data can be used in many different projects.

Stable Diffusion's ability to generate images from text descriptions is based on some of the images in the LAION data set. LAION doesn't host the images themselves since they are pointing to images on the web. LAION says that researchers need to download the images from different locations in order to use them in a project.

The LAION data set is replete with potentially sensitive images collected from the Internet, such as these, which are now being integrated into commercial machine learning products. Black bars have been added by Ars for privacy purposes.
Enlarge / The LAION data set is replete with potentially sensitive images collected from the Internet, such as these, which are now being integrated into commercial machine learning products. Black bars have been added by Ars for privacy purposes.

Responsibility for an image's inclusion in the LAION set becomes a game of pass the buck under these conditions. A friend of Lapine asked how to remove her images from the set on the #safety-and-privacy channel of LAION's Discord server. The best way to remove an image from the internet is to ask the website to stop hosting it. We don't host any of these images.

The results of a court case affirm that data from the internet appears to be legal in the United States. Is the doctor's death the main cause? Do you mean the website that hosts Lapine's images on the internet?

LAION did not respond to Ars when they contacted them. If a photo of a person is associated with a name in the image's metadata, then European citizens can request information removed from their database to comply with the EU'sGDPR laws. It has become trivial to associate someone's face with another person's name thanks to services such as PimEyes.

Lapine understands that the chain of custody over her private images failed but still wants her images removed from the LAION data set. I would like to have a way for anyone to request that their image be removed from the data set. The fact that they took it from the web doesn't mean it was public information.