Researchers at the Korea Institute of Science and Technology (KAIST) have created a DarkBERT AI model for search.valuable scientific informationon the dark web. The neural network has special filters for this, which are intended to protect the user from unwanted information.
Image source: freepik
DarkBERT is based on the RoBERTa architecture developed in 2019. It’s experienced something of a renaissance, and researchers have found that the architecture actually delivers more performance than before. To train the AI model, the researchers scanned the dark web through Tor’s anonymous firewall, then filtered the raw data using deduplication, category balancing, and data preprocessing to create the dataset needed for training. The result of the work done was DarkBERT, which is able to analyze content from the dark web and extract useful information from it.
One of the key features of large language models (LLMs) is language understanding. The Dark Web uses a very specific mix of languages for business communications, and DarkBERT has been trained to use them. The study found DarkBERT to be superior to other major language models, which should allow security researchers and law enforcement agencies to delve deeper into the dark web.
As with other LLMs, this does not mean that work on DarkBERT is complete. According to the researchers, they intend to continue training and optimizing the model to improve its results.
Add Comment