[Article updated on November 8, 2022]
Have you used google dorks today? If you have searched for a word within quotes or combined two terms with AND, then the answer is yes.
Googles dorks are advanced search operators that allow you to better target your research.
They can even enable to identify vulnerabilities and strengthen your security, let’s see how.
What are google dorks?
The use of google dorks, also called google hacking or google dorking, is a technique that relies on the Google search engine to find specific information. Google dorking consists in using search operators to better target the searched information.
Contrary to what the nickname Google hacking implies, the idea is not to « hack » Google, but to exploit its crawling and indexing capacities to retrieve public data that aren’t so visible.
Google is in fact a search engine that continuously indexes everything it sees on the Internet… including sensitive or strategic elements from a security point of view.
Using google hacking is legal, as it involves employing a search engine with filters. You can manipulate these advanced queries to check your websites or personal data without worrying.
But exploiting obtained results with the aim of attacking a target is illegal.
What are the most used google dorks?
Google dorks are put directly in google, as for a classic research. The syntax is operator:value, for example cache:vaadata.com
will return the cache of the website vaadata.com.
The main operators are:
site:<target-website.com>
– searches will be limited to the given websitecache:<target-website.com>
– gives the cached version of the websiteafter:<date>
– displays results indexed after a specific datefiletype:<file extension>
ouext:<file extension>
– focuses on the type of file requestedinurl:<researched term>
– urls containing this term will be returnedallinurl:<various terms>
– urls containing all these terms will be returnedintitle:<several different terms>
– pages containing this term in their title will be retrievedallintitle:< several different terms>
– pages containing all these terms in their title will be retrievedintxt:<researched term>
– search for the given text anywhere on the page (useful with other operators, otherwise it’s the equivalent of a classic google search)
The operators can be combined for even more precise searches. For example:
filetype:xls inurl:"email" site:target-website.com
inurl:admin site:target-website.com
intitle:"2022" ("Income statement" | "Profit & Loss Statement" | "P&L") filetype:pdf site:target-website.com
inurl:github.com filename:database intxt:target-website.com
inurl:gitlab.com secret.yaml | credentials.xml intxt:site
site:drive.google.com <target>
site:target-website.com ext:php
- …
The same principle of searching with operators applies to other sites. Depending on your target, you can continue your exploration on Twitter, Github, Pastedin…
You can rely on this great list of google dorks which can be used for inspiration during a security audit or for bug bounty.
Examples of google dorks that led to finding security flaws
Discovering sensitive documents on an e-commerce website with: site:target
During a penetration test on an e-commerce website, we detected thanks to site:<target>
some URLs that shouldn’t have been indexed. These pages allowed us to retrieve sensitive documents that an external user shouldn’t have had access to.
What is interesting is that other fuzzing or discovery tools would not have found these URLs, as they were composed of a long random part.
Discovering contact forms with site:support.google.com inurl:/contact/
A bug hunter, Rio, explains how he took advantage of google dorking to discover contact forms and exploit a blind XSS flaw. His write-up gives further details.
Discovering PHP files indexed with site:.cible.com ext:php
A classic test when running a pentest across many subdomains is to check for files with a .php extension. .php files, when exposed, often contain code developed by internally without the use of a framework. We found on several occasions vulnerabilities (such as SQL injection) via indexed PHP files. Putting a dot in front of the domain name in the site
operator allows the search to be performed on all subdomains of target.com
. It’s also possible to look for other types of indexed files such as xml, txt, docx…
Using google dorks to identify and reduce your attack surface
During a pentest, google dorking enables to discover information which allows going further and performing more tests.
But we also use this technique during the reconnaissance phase (the initial phase of any penetration test) and especially during reconnaissance audits.
We can find files, email lists, passwords, open servers, SSH keys, etc. Most of the time, they are vulnerabilities that disclose technical or sensitive data.
Identifying this information allows you to know your attack surface and to reduce it, by cleaning or deleting the elements that should not have been public.
Attackers also rely on google hacking when they want for example to exploit a known flaw (CVE) in a specific technology. Googe dorks may allow them to obtain a list of sites potentially vulnerable to CVE.
They can also use them to harvest emails in preparation for non-targeted phishing campaigns.
Using google dorking to prevent social engineering attacks
Attackers are increasingly exploiting social engineering in their cyberattacks and are particularly targeting strategic persons in a company.
It’s interesting to research the elements that can be found with google hacking on the people often targeted in these attacks (mainly administrative, financial, IT and managerial functions).
In fact, any information can be used to create a pretext or give credibility to an unusual approach. For example, a pretended sports club might request an interview to discuss sponsorship, based on the target’s interest in the sport. The email will contain an attachment or a malicious link, which they will be more tempted to open.
Google dorks can help you to check what data is available online, to remove it (where possible) or invalidate it.
Researching personal information
The first research is very simple but can reveal a lot of information: first and last name. Personal data, such as emails, postal addresses, pseudonyms, bank details, etc., have sometimes been indexed. You can also check that this personal information is not mentioned anywhere by looking directly for an email address or a telephone number.
You can then target the files, with queries like:
first name filetype:pdf intitle:resume
firstname filetype:pdf title:invoice
- …
Relevant file types to browse are usually: pdf, doc, docx, xml, xls, ppt and rtf.
However, be careful not to enter any information that is too sensitive (such as a password or bank ID). Although Google doesn’t index your search, it’s nevertheless sent to the servers and therefore potentially recorded. Moreover, you’re never safe from a malicious person intercepting your traffic.
Investigating photos and looking for fake profiles
Another angle to watch out for is picture spoofing, for example to create a fake profile. Google Images has a very interesting feature, which allows you to search for pages that use a particular image.
By clicking on the camera icon at the end of the search bar, you can upload a picture that is used on a social network or published in a news story and check the different sites where it appears.
This way you can find out pages that are about the people you want to protect and whether their photos are being spoofed to create fake articles or profiles.
After a data leak, checking if information has been compromised
Finally, sometimes data is leaked during a cyberattack or a technical problem. When a website is hacked, it’s common for the stolen data to be published on the Internet, for instance on pastebin.com.
A good practice is to verify that no information about the people whose exposure you want to limit is accessible on this platform, for example with a query
name first name site:pastebin.com
What to do if you find exposed data online?
If you have discovered elements that should be private, delete it as far as possible with the account settings. If the data is published by a third party (social networks, blog, website, etc.), make a request to delete it.
If the responsible of the site doesn’t respond, you can exercise the right to remove your personal information from search engines. Privacy is a valid reason for requesting the removal of personal data.
If you have found a current password or banking information, take steps as quickly as possible to make it unusable.
Protecting yourself from data leakage via google dorks
The first step is to use a properly configured robots.txt file to prevent robots from indexing some areas of the site, certain files or file extensions. In general, the indexing of the site should be limited to public content such as main pages or articles. For this, we recommend following official documentation.
Vulnerability scans can be used on a regular basis. Most scanners test the most common google dork.
Finally, do not hesitate to perform google dorking yourself to check that attackers are not able to retrieve sensitive information.
You now know the basics of Google hacking. With the few operators given here and a bit of imagination, you can create an infinite number of queries that can bring you more and more results.