Trong thời đại “Big Data” thì các kho dữ liệu (Data Warehouse) truyền thống gặp…
How Google deals with spam emails
On average, an employee will have to spend up to 11 hours per week (out of a total of 40 working hours) reading and processing work on emails. This situation could not be better when there are about 14.5 million spam emails sent and received every day. Fortunately, 1.5 billion Gmail users per month (including employees of 5 million businesses are using G SuiteWorkspace] ) can all be satisfied thanks to Google's technology. Through the application of technology and built-in security features in the product, Gmail has always been a leader in tackling spam emails. On average, just under 0.1% emails in a user's inbox are spam. So how does Google deal with spam emails?
1/ Early detection of phishing emails thanks to Machine Learning
As early as 4 years ago (2015), Google has confirmed that Gmail can block 99.9% spam emails, phishing emails, and malware from users' inboxes. In addition to preventing spam, Google's Gmail must also ensure that legitimate emails, "clean" emails reach the mailbox, and the number of emails mistakenly sent to the spam folder is only below 0.05%. To do this, Google must build a “Artificial Neural Network” Artificial Neural Network. This is a system of computers and general computing system devices, connected in some way to partially simulate the activities of the neural system in the human brain.
Can say, Machine Learning helps Google prevent spam by allowing the system to detect general activity patterns of large volumes of data (which humans cannot). Machine learning technology also easily changes and adapts quickly to new tricks that spammers often use, while still ensuring personalization for each user, thanks to the use of behavioral data. user's vi. Personalization is essential when it comes to spam classification. Because each person's definition of spam is different, a message may be spam to one person but be important and useful to another.
2/ Warn potentially harmful links thanks to Google Safe Browsing
To detect harmful links, the Machine Learning system will have to work with Google Safe Browsing to select and delay some emails for additional analysis of the risk of information fraud from incoming messages. If you're worried about the number of emails being delayed too much, Google has pledged that this number is just under 0.05% of total emails, and the delay will not last more than four minutes.
Google Safe Browsing is a service that provides Chrome, Firefox, Safari browsers and Internet Service Providers with a list of URL links containing content, malicious software or fraudulent purposes to obtain information. . By employing techniques like reputation and similarity analysis of URLs, Gmail can issue new warnings for potentially dangerous links. This model can adapt faster than any traditional method and will improve over time.
3/ Use TensorFlow to detect potential threats in attachments
Simply put: TensorFlow is an open source software library, built and developed by Google. TensorFlow strongly supports mathematical operations for calculations in machine learning and deep learning.
It has to be confirmed: Google has been applying Machine Learning for a long time in the past and has put TensorFlow into use since May 2017. However, it is only when the company puts this technology to use on a large scale, with a broader, more extensive spam classification category that Gmail can detect millions of people. leftover spam*. It is thanks to “new protection”, built with TensorFlow – an open source Machine learning library, that Google can prevent more 100 million spam messages everyday.
And leftover spam What is here? Mostly messages image-based spam. With image spam, text messages are embedded in image files attached to emails – this is the type of images that email servers will display directly to users. By inserting text into images, spam messages can avoid spam filtering tools by analyzing text or scanning images (Optical Character Recognition (OCR). Spammers can use the technique of converting letters into meaningless characters (Obfuscation) to avoid being scanned by OCR, or to fool signature detection algorithms, thereby penetrating directly into the mailbox. user mail. But now you can rest assured that TensorFlow can fully detect and deal with sophisticated spam emails like this.
4/ Based on the user's own behavior to classify spam emails
Gmail has long relied on user behavior to tell if an email is safe or spam. Gmail will then feed this data into retraining its artificial intelligence system.
You've probably had to manually mark "report spam" or "not spam" yourself many times for some emails you receive. The operations that you do are completely meaningless. They will notify email filtering algorithms thanks to Machine learning, thereby gradually improving the ability to detect unwanted emails, based on your own habits. This also explains why it's the same promotional email or newsletter, but some people will see it in the inbox, others will not). The actions you take will also help perfect Google's phishing email detection model, as these algorithms will learn and update your behavior in real time.
With today's "cat and mouse" game in cyberspace, spam is a problem that requires the "right" characters to always be one (or more) steps ahead of hackers. And for now we can feel optimistic about this for the moment because Machine Learning, AI has taken spam filtering to the next level, at least if you are using Google's Gmail.
Source: Gimasys