Chatbots are flourishing as the success of ChatGPT grows. Since going mainstream in late 2022, the artificial intelligence created by OpenAI has attracted at least 200 million users. A wave that clearly has not gone unnoticed and is being emulated.
In this trend, a group of researchers from South Korea wanted to rethink the use of the chatbot by linking it to the dark web, that huge and difficult-to-access part of the internet where illegal activities abound.
DarkBERT, by name, is therefore intended to help “cope with cyber threats on the dark web”, indicates the report published on May 18, 2023 by the group of South Korean researchers, detected by O1net and consulted by Tech&Co.
DarkBERT is based on existing language models. “We compared DarkBERT to other widely used language models, such as BERT (Google) and RoBERTa (Meta), which have been trained on data collected from the ‘surface’ Internet to verify the effectiveness of DarkBERT in the textual domain of the dark web. “said the researchers say in the report.
Discarded sensitive data
In fact, DarkBERT was trained on 5.3 gigabytes of data from the dark web. Some data has been deliberately omitted, such as sensitive personal data.
Among the sensitive information mentioned by the researchers is mainly stolen personal data, which is resold at exorbitant prices on dedicated dark web forums. There are documents relating to identity or even financial or medical information etc. Difficult to access, the dark web is an ideal reference point for illicit activities, ranging from the sale of weapons and drugs to the resale of this precious data.
“As new forums emerge every day, massive human resources are required to manually identify each threat. Automating the detection of potential threats could significantly reduce the workload of cybersecurity experts,” the researchers say.
The model, which is still being tested, is promising, according to the group of researchers. “Our study demonstrates that DarkBERT exceeds current language models and could serve as a valuable resource for future research on the dark web,” they state at the end of this report.
Source: BFM TV
