Skip to main content

Gmail spam detection has received its ‘largest defense upgrades in recent years’

Blocking email spam is a constant, ever-evolving battle, and Gmail’s latest technique results in a 38% boost to detection thanks to better text identification. 

Spammers often use homoglyphs (characters that look similar to actual letters), invisible characters, keyword stuffing, and other “adversarial text manipulations” to bypass Gmail’s text classification models that identify phishing attacks, scams, and other harmful content.

Google is countering with RETVec (Resilient & Efficient Text Vectorizer). Open sourced by Google Research, this approach “helps models achieve state-of-the-art classification performance and drastically reduces computational cost,” while supporting “every language and all UTF-8 characters without the need for text preprocessing.” This makes it ideal for on-device, web, and other large-scale use cases:

  • “Models trained with RETVec can be seamlessly converted to TFLite for mobile and edge devices, as a result of a native implementation in TensorFlow Text. For web application model deployment, we provide a TensorflowJS layer implementation that is available on Github and you can check out a demo web page running a RETVec-based model.”

In Gmail, RETVec has improved the “spam detection rate over the baseline by 38%,” while reducing both the false positive rate (by 19.4%) and Tensor Processing Unit usage (by 83%).

RETVec achieves these improvements by sporting a very lightweight word embedding model (~200k parameters), allowing us to reduce the Transformer model’s size at equal or better performance, and having the ability to split the computation between the host and TPU in a network and memory efficient manner.

Google says it has “battle-tested RETVec extensively” over the past year “and found it to be highly effective for security and anti-abuse applications.”

If you would like to use RETVec for your own use cases or research, we created a tutorial to help you get started.

FTC: We use income earning auto affiliate links. More.

You’re reading 9to5Google — experts who break news about Google and its surrounding ecosystem, day after day. Be sure to check out our homepage for all the latest news, and follow 9to5Google on Twitter, Facebook, and LinkedIn to stay in the loop. Don’t know where to start? Check out our exclusive stories, reviews, how-tos, and subscribe to our YouTube channel

Comments

Author

Avatar for Abner Li Abner Li

Editor-in-chief. Interested in the minutiae of Google and Alphabet. Tips/talk: abner@9to5g.com