Google has shared the details about an upgraded technique it’s using to improve spam detection on its free email service, Gmail. In the latest Google Security blog post, the tech giant notes that this is “one of the largest defense upgrades” that Gmail has received in recent years. The company claims that its latest model is capable of better text identification and has improved spam detection by 38%.
How spammers bypassed Google earlier
To identify harmful content like phishing attacks, inappropriate comments and scams, systems like Gmail, YouTube and Google Playrely on text classification models, the company notes.These types of texts are harder for machine learning models to classify because spammers use adversarial text manipulations to evade these classifiers. For example, attackers used homoglyphs, invisible characters and keyword stuffing to bypass Google’s defenses.
What is RETVec
To help make these text classifiers more strict and efficient, Google has developed a new, multilingual text vectoriser called RETVec (Resilient & Efficient Text Vectoriser). This helps the spam filter models to offer more accurate classification performance and significantly cuts computational cost. The company has also shared how it’s using RETVec to help protect Gmail inboxes.
How RETVec is improving Gmail’s spam filters
Over the past year, Google has extensively used RETVec to evaluate its importance and has discovered it to be highly effective for security and anti-abuse applications. The company replaced Gmail’s previous text vectorizer with RETVec which improved the service’s spam detection rate by 38% and reduced the false positive rate by 19.4%.

Moreover, using RETVec reduced the model’s power usage by 83%. It works on every language and “and all UTF-8 characters” and doesn’t need any text preprocessing. This makes it the ideal for on-device, web, and large-scale text classification deployments.
Google claims “models trained with RETVec exhibit faster inference speed due to its compact representation.” The company also adds these “smaller models reduce computational costs and decrease latency, which is critical for large-scale applications and on-device models.” These models that are trained with RETVec can be converted to TFLite for mobile and edge devices. The open source model is available on Github.





Source link

Leave a Reply

Your email address will not be published. Required fields are marked *