A detailed description of the latest update to the spam filters of the Gmail mail service has appeared in the Google developer blog. The company called him “one of the biggest security updates in recent years”.
We are talking about the integration of the new text classification system Resilient & Efficient Text Vectorizer (RETVec) into the postal service. According to the developers, it effectively detects spam messages, including emails with a large number of special characters, emoticons, typos and other elements that were previously recognizable to humans but were difficult for spam filters to detect. According to the available data, the new algorithm effectively recognizes, among other things, messages with homoglyphs, i.e. graphically very similar characters with different meanings.
According to Google, the RETVec algorithm is trained to effectively identify messages that contain a test that has been manipulated in any way, including insertion or deletion of characters, typos, homoglyphs, etc. The algorithm has been trained using an advanced encoder that can effectively encode any characters and words in UTF-8 format. As a result, the developers received an algorithm that works out of the box in more than 100 languages of the world.
RETVec apparently works in many ways the same way people read it. The algorithm is based on the TensorFlow AI framework, using visual “similarity” to determine the meaning of words, rather than the characters that actually make them up. According to Google, replacing the previously used Gmail text vectorizer with RETVec increased the level of spam detection by 38% compared to baseline, and the number of false positives decreased by 19.4%. At the same time, the number of Tensor Processing Units (TPUs) used by the model has dropped by 83%, making the current update one of the largest for the Gmail security system in recent years.