In case you were distressed that the current looping of generative AIs are too decent and empathic , scientists have got you covered – a new language modelling has been trained on the worst part of the internet , the Dark Web .

Given perhaps the funniest name yet , DarkBERT ( yes , that ’s actually its name ) is a generativeAItrained entirely on the Dark Web so as to equate it to a vanilla extract counterpart . The team behind it - report their findings in a preprint paper that is yet to undergo compeer - review - wanted to understand whether using the Dark Web as a dataset would give an AI better context on the linguistic communication used there , making it more valuable to the great unwashed wish to trawl the Dark Web for research and for law enforcement fighting cyber law-breaking .

It also did an extensive trawl of a berth that most human do n’t really want to go and index its various area , so thanks for taking one for the squad DarkBERT .

TheDark Webis an area of the internet that Google and other search engines ignore , foreclose the vast majority of people from going there . It is only accessible by using specialized software called Tor ( or similar ) , and as such has gained quite the reputation forwhat goes onthere . Urban legends have talked of torture rooms , contract killer whale , and all variety of horrific crimes , but the truth is that most of it is just cozenage and other way to slip your data point without the safety of web browser app security , which we all take very much for yield . Still , the Dark Web is supposedly used by cyber criminal offense networks to anonymously talk , reach it an super important mark for law enforcement .

A team from South Korea hooked up a language framework to trawl through the Dark Web using Tor and to render the raw data it found , creating a model that could make better horse sense of the terminology used there . Once done , they compare how it performed to subsist exemplar the researcher had created prior , including RoBERTa and BERT .

The finding presented in the preprint showed that DarkBERT surmount the others in all datasets , but it was close . As all the AI were from a exchangeable model , it is anticipate that they would have similar execution , but DarkBERT stand out on the Dark Web specifically .

So , what will DarkBERT be used for ? Hopefully it wo n’t be gift the nuclear launch codes , but the squad anticipate it to be a powerful tool in scan the Dark Web for cybersecurity threat , as well as celebrate pill on forums to identify illicit activity .

Let ’s just hope this does n’t give OpenAI any idea .

The preprint , which is a preliminary version of a study that has not yet been peer - review , can be found on thearXiv .