Researchers at Stanford University revealed, this Wednesday, December 20, the presence of child pornography files in a database of images used to train artificial intelligence.
According to this investigation, at least 1,679 illegal images of sexual abuse of minors were found in this database, called LAION-5B. It is especially used to train artificial intelligence, such as the very popular Stable Diffusion image generator or Imagen, the image generator created by Google.
The platform disabled
The images present in LAION-5B come mainly from public archives present online. In the case of images of child pornography, investigators said they flagged all illegal content.
LAION, the German non-profit organization behind this database, told Bloomberg that it had temporarily disabled it to ensure the compliance of its contents, having a “zero tolerance policy” towards illegal content.
A spokesperson for the Stable Diffusion software also recalled that the artificial intelligence model is based on a leaked database and that the platform’s conditions of use do not allow the use of keywords that could lead to illegal content. For example, certain keywords, such as those related to child pornography, are prohibited.
An uncertain impact
However, researchers express concern about the impact these images may have had. “These models are great for learning concepts from small amounts of images. And we know that some of these images are used dozens of times in the database,” they told Bloomberg.
In detail, this means that artificial intelligence models trained with these images could naturally integrate the concept of sexuality in children, leading to serious biases in their creations.
However, according to researchers at Stanford University, this is the first time a database like LAION-5B has been examined so closely. Additional work should reveal whether other problematic content appears there.
Source: BFM TV

