You’ve just checked your corporate Gmail inbox, and there’s an email from someone you don’t recognize. You open it, and it seems…off. You take a screenshot (or better yet, download the email) and send it to your security team. Hopefully, they will be able to tell you whether or not the email is legit.
But how does the security team figure out if the email is phishing or not?
There is inherent risk involved with misclassifying an email reported as phishing. A malicious email that is identified as legitimate will almost certainly cause a security incident, whereas an urgent, legitimate email identified as malicious could also cause significant impact to a business, as in the case of a missed invoice. Therefore, the security team must classify emails correctly or risk heavy consequences.
To do this, a security team looks at many pieces of information in the email. Typically, only one or two red flags are sufficient to classify the email as malicious, and everyone can move on with their lives. In some cases, however, it can sometimes take significantly more effort to correctly classify an email.
Phishing campaigns always have a goal of some kind, typically to convince the victim to:
- Disclose sensitive information
- Perform some harmful action (like transfer money)
- Download a malicious file
It is important to keep these objectives in mind when reviewing a phishing email. Generally speaking, if the email doesn’t solicit any of these objectives, it is probably not dangerous. While this generalization may not hold true in advanced phishing attacks that exploit browser vulnerabilities or set up relationships for future phishing attacks, it does hold true for the vast majority of phishing emails.
The tricky part is that many legitimate emails do ask for money (i.e. invoices), sensitive information, and/or software installation. So how does one determine what is legitimate and what is malicious?
Before discussing anything else, it should be noted that emails claiming to be from internal sources can easily be verified by confirming with the sender via a different means of communication, such as an instant message through Slack or Teams, or a quick call to a phone number obtained from the company’s HR system. This is easy for the average employee to do, and should be a standard practice when receiving internal emails that solicit sensitive information.
Sometimes, emails are blatantly classified as phishing based on what is visible in the interface provided by your email provider (probably Gmail or Outlook). These kinds of red flags are easily detectable by someone with no technical background, and employees can be trained to look for them. The vast majority of phishing emails I’ve run across are detectable with information in this category.
If the sender claims to be representing an organization, their email address domain (everything after the @) should match the domain of the organization. If it is a lookalike domain (looks similar but isn’t identical), then it’s almost certainly phishing. It’s also possible to see how long the domain has been registered for using tools like Whois – if the domain was registered last night and the sender claims to be from Twitter, it’s not legitimate.
The domain of the email support@google is “google.com”. A whois lookup shows the domain was registered in 1997.
For blatant phish, there are typically red flags present in the body of the email.
Grammar and spelling: Large organizations typically have extensive proofreading and review processes before using email templates (such as password resets and security alerts). Consistently terrible grammar and/or spelling in an email claiming to be from a large organization is typically indicative of a phishing attempt.
Gift cards: If the sender is asking the recipient to buy gift cards, it’s often a scam.
Free money: If the sender promises the recipient large amounts of money for free, it is probably malicious.
Weird attachments: Encrypted ZIP files are a common method of sending malicious attachments, because email scanners do not have a way to scan the contents for anything malicious. Other common examples include HTML files and encrypted PDFs.
One of the most common payloads to send in a phishing email is a link to a website. Links take the victim to malicious websites that entice them to input login credentials, download malware, or even occasionally exploit out-of-date browsers (please, please, please, keep your browser up-to-date). Sometimes it is easy to determine that the link is legitimate by just looking at the domain.
As a refresher, the domain of a link is the bit between the first `//` and the next `/`. I’ve bolded the domain in the following example:
The last two bits of the domain are the most important – in this case, nytimes.com. If this is a well-recognized domain or the domain you expect, then the link is probably safe. Be aware that attackers often try to make their domains look like legitimate ones by swapping letters or using subdomains. Examples of lookalike domains might include:
- nyt1mes.com (number one instead of letter i)
- nytimes.com.evil.com (domain is evil.com)
If you want to see what the website looks like without clicking on the link, you can use a site like https://urlscan.io. This tool will show a screenshot of the website and information about the domain, including if it’s been flagged as malicious. However, this can be unreliable – some malicious sites are able to detect these kinds of scanners, and will redirect to a different site if detected. Additionally, it is not wise to scan links that could be sensitive (like links to invoices or password resets). In these cases, a virtual machine can be used to open and analyze the link in a safe environment (more on virtual machines later).
One additional note – there are occasions where an attacker can trick a legitimate website into automatically redirecting to a malicious website. This behavior is called “arbitrary redirect,” and some consider it to be a security vulnerability. This is because attackers can send a link to a legitimate site, and when the user clicks on it, the link automatically takes the user to the malicious site. I’ve personally seen this technique used in phishing attacks, although it is rare. Therefore, just because a link is to a legitimate domain doesn’t mean that it’s not possible for the link to take you to an illegitimate website.
Phishing emails frequently rely on malicious attachments to cause harm. These attachments commonly take the form of (but are not limited to) encrypted zip or PDF files, Microsoft Office documents (.docx, .xslx), and HTML files (basically a web page, but as a file). These attachments typically either infect a computer with malware or display a fake webpage intended to steal credentials. Some examples I’ve run across:
- Word documents with malicious macros that download and install malware
- An HTML (webpage) file that automatically downloads an encrypted zip file with an infected ISO (disk image)
- An HTML file that appears to be a Google login, but actually sends the username/password someplace else
On the surface, it can be difficult to determine if an attachment is malicious without opening or downloading it, which is obviously very dangerous. To stay safe but also analyze the file, security professionals typically use virtual machines to analyze potential malware. A virtual machine is a computer-within-a-computer. If the virtual machine gets infected or compromised, it can be reset with no impact to the host computer. As long as the virtual machine is configured correctly, there is very little risk when playing with malware within the virtual machine.
Within the virtual machine, an attachment can be examined to see what happens when it is opened. It is common to record network traffic with a program like Wireshark to see if the attachment is trying to communicate with a third party server, which is a common indicator of malware.
Other tools are available to scan files for malicious behavior, such as virustotal.com. These websites need to be used with care because the files uploaded are typically left public – for example, if you upload a legitimate invoice, it can be made available to the entire internet. One alternative when using these sites is to search for attributes about the file, such as the file hash (essentially a fingerprint of the file) or name. This can yield helpful information in the case where the exact file has been scanned before.
Sometimes the content visible in the interface is not sufficient to determine if an email is legitimate or not. These phishing messages are more sophisticated and typically impersonate external sources (where it is not always possible to confirm the sender’s identity through a different mechanism than email).
Emails are more than what you see in the interface provided by your email provider. They contain information about where the email originated from, what servers processed the email, and other important information related to security. This information is contained in the headers of an email, and is not visible by default in your email interface. They can be viewed by downloading the email and opening it in a text editor, or using options provided by your email provider (i.e. the “Show Original” button in Gmail).
Here are some examples of headers I might look at:
- From: This header, like all other headers in an email, can be set to whatever the attacker wants it to be. It is typically what shows up in the email interface as the “From:” email address.
- Return-Path: This instructs the mail server as to where to send bounces notifications (in the case that the email address doesn’t exist).
- Reply-To: When you reply to an email, this is the address that it will be sent to (by default).
- Authentication-Results: This header indicates the results of the mail provider’s attempts to verify the legitimacy of the email. There are two primary ways an email provider does this: SPF and DKIM. This article is long enough, and these topics are complex, so I’ll spare you the pain of reading about them, but let it suffice to say that they help email providers know who is allowed to send email for a given domain.
- Received: There can be many instances of this header, but together they lay out the journey of the email through the internet. In theory, every server that helped process the email adds this header, so you can trace where the email originated from, and then look up information about the originating IP address.
There are many different indicators one can use to determine if an email is legitimate or malicious. Because the risk of misclassification is high, it is important that security teams perform sufficient due diligence before declaring a verdict to the user who reported the email.
Thanks! I didn’t know about whois or urlscan.