Google Hacking 101

by Steve Mansfield-Devine

[Published in: Network Security]

If you’re looking for security weak points in a website, forget NMAP, Nessus and all those tools of the pen-tester’s trade. Your first stop should be something much more basic: Google.

The search engine’s relentless spidering of sites is remarkably adept at discovering weaknesses, all of which are meticulously indexed. This creates a valuable resource for security specialists and hackers alike through the use of specially crafted search queries — the technique of Google hacking.

Before we look at the basics of Google hacking, it’s worth considering how this technique turns to usual image of hacking on its head. By and large, hacking consists of a miscreant targeting a specific site, beavering away to discover its weak spots. And Google hacking is indeed useful as a way of footprinting and probing a specific site.

But the real danger it poses is bringing your site to a hacker’s attention. A Google hacking query can turn up a list of all sites that suffer from the vulnerability the query is designed to reveal. If your site has this vulnerability, then Google will bring hackers directly to it. It is an automated way of broadcasting your weaknesses. Hackers will not only be drawn to it, but will arrive knowing in advance what vulnerabilities they will find.

So, for those charged with the security of a website, ensuring that your site is not among those turned up in a search query is very important.

 

Google dorks

These search techniques were first made famous, and were documented by, Johnny Long, owner of the ‘Johnny I Hack Stuff’ website which is now dead but still currently hosts the Google Hacking Database (GHDB).1 He’s also author of a two-volume book on Google hacking.2

The GHDB has been little updated since 2006 but still carries the most complete list of so-called ‘Google Dorks’ — the search terms used to discover vulnerabilities. These make extensive use of Google’s special search operators — terms that refine or modify the search and the reason why Google is so effective at this task. For example, searching with:

allintext: admin user password restricted list

This means that Google will return searches only where the pages contains all of those words in the text. Similary, the site: operator restricts the search to a specific site or domain — handy for testing your own website.

The GoogleGuide.com site has a useful quick reference sheet for these operators.3

For our purposes, the most important categories of Google dorks are: discovering vulnerable software; finding files and directories that shouldn’t be visible; and exploiting error messages or system failures.

 

Vulnerable software

Google hacking provides a number of basic footprinting methods to profile a website — server software, operating system and so on. But much of that information is more easily found through sites such as Netcraft.com. Where Google dorks really come into their own is when the software you’re running is know to have vulnerabilities.

Software often uses easily identifiable filenames that will turn up in URLs. For example, one Google dork from 2004 targeted the Comersus APS-based e-commerce package which had an XSS flaw in the file comersus_message.asp. This could be exploited with a specially crafted URL.

To find sites running this package all a hacker had to do was type the following into Google:

inurl:comersus_message.asp

Software identifies itself in other ways, too. There’s often a credit along the lines of “Powered by…”. Worse, some packages even include the version number. The second that version is known to contain a flaw, hackers worldwide will scour the web for vulnerable sites.

These credit lines are typically part of default installations and many publicly available templates or themes. It’s generally fairly easy to remove them, especially if you develop your own theme. Go through the code and remove everything that identifies the software, including any HTML comments or meta tags. But remember to check each time you upgrade the software that these lines haven’t been reintroduced.

Software credits also find they way into <title> tags sometimes. And even when they don’t say “powered by…”, the page title is often enough to identify your site — using Google’s intitle: operator — as running on a vulnerable version of the software because the text is so specific to that page. There is little you can do about that other than ensure that your software is always up to date.

 

Open directories

There is nothing a hacker loves more than an unprotected directory. If web server software receives a request that contains a directory name, rather than a specific file, it will look for a default ‘index’ file — called ‘index.html’ or any one of a number of other standard files depending on the server configuration. If it doesn’t find one, it will helpfully present a list of files and sub-directories in that directory, with each filename clickable.

Web servers use standard terms in the page or title when they does this, so such open directories are easily found. Here’s one way to find .txt, doc and .pdf files on the site www.example.com. The first section uses the negation modifier (‘-’) with the inurl: operator to tell the search to ignore .html, .htm and .php files.

-inurl:(htm|html|php) intitle:"index of" +"last modified" +"parent directory" +description +size +(.txt|.doc|.pdf)

That will find examples across the web. Add “site:www.example.com” to search a specific site.

It’s surprising how often such directories contain text files with configuration, even password, information. (Beware, however, if you experiment with this: the results generated by Google dorks aimed at password files often lead to honey traps.) Even if the directory contains only harmless files — images, for example — the fact that your site has an unprotected directory will draw hackers to it, who will have assumed that your security is shoddy. That’s not good.

It helps an attacker map the topography of your system. Some of those sub-directories, for example, might contain files that you don’t want people to know about (although keeping such files in a publicly accessible part of the directory tree is a bad idea to start with).

Google won’t necessarily find such directories if they are not referenced anywhere in your website, but that hardly counts as protection.

This problem is very easily fixed. Always have an index file in every directory. A simple index.html or index.htm (depending on server settings) will do it. It doesn’t even have to contain anything — so long as the server has something to grab and serve up, it won’t provide a directory listing. A sensible solution, though, is to have index.html contain a very basic web page, perhaps with a link to your home page.

 

Sensitive scripts

Some of the files left publicly available are there by mistake. Bizarrely, others are deliberate. Webmasters often allow scripts to output logs. For instance, the following gives interesting information about a specific piece of bulletin board software (just click through any messages that pop up):

inurl:CrazyWWWBoard.cgi intext:"detailed debugging information"

If you have administration software that reports on, perhaps, network performance, ask yourself if those reports need to be available online. If so, make the pages that contain the output of any scripts password-protected directory.

 

Error messages

Hackers really hit paydirt when your site goes wrong. Error messages often contain useful data for the hacker. Not long ago, The Register reported on a website that displayed a huge amount of PHP information due to an error. Google, of course, duly indexed this. It’s not unknown for some site developers to enable a debugging mode that displays the output of the phpinfo() function if there’s a problem. That function prints out a vast amount of information useful to a hacker. To see some sites with this issue, try:

intitle:phpinfo "PHP Version"

Various kinds of software display standard (and thus searchable) messages when they hit a problem. For example, try searching with:

"mySQL error with query"

These error messages may include database, table and field names — invaluable for SQL injection attacks — and even user names. PHP, ASP and other scripting systems may produce errors that reveal directory structures, names of otherwise obscure script files and other useful detail.

Even if the admin has since fixed the problem, so that the error message is no longer displayed, the fact that you’ve found this site with a search means that the problem page, including the error message, is still in Google’s cache. So, on the Google results page, you simply opt for the ‘Cached’ link.

Certain Google dorks will find default pages that might suggest a poorly installed or maintained site. Some of these will reveal interesting information, for example:

intitle:"Apache Status" "Apache Server Status for"

This can reveal data about virtual hosts, directory structure and files.

It’s therefore wise to turn off error reporting for live sites — in the database, scripting language, CMS and any other software you’re using. And to make sure that you have completed configuration for all installed software.

 

Security through obscurity

What Google dorks teach us is that there is no security through obscurity on the web — simply because there is no obscurity. You might think that you’re the only one that knows where the login page is for your CMS, or that a certain file is not linked from anywhere else and will not, therefore, be found by Google. But it’s a mistake to rely on this.

Even when a Google dork doesn’t reveal specific information, it can tell a hacker where to strat looking. For example, there are dorks that reveal the login pages for administrators — pages that may not be linked ordinarily from the public side of the site.

inurl:/admin/login.asp

 

Tools

This is just a taste of what Google dorks can achieve. Helpfully, there are tools available to automate Google hacking. One of them is produced by Google itself: GoogleHacks.4 It’s somewhat basic, but script kiddies will love it.

The Cult of the Dead Cow group — notorious for the Back Orifice trojan — has released a Windows-only tool, Goolag.5 This is rather more sophisticated. It comes with a database of Google Dorks, supplied in XML format, so it’s easily readable and amendable. You can also specify your own searches.

By automating groups of Google Dorks, Goolag is a useful first step in penetration testing of your own sites. But there’s no real substitute for working through the Google Dorks yourself, given that you will have some idea of where weaknesses may lie.

 

Countermeasures

We’ve already outlined some of the measures you can take to protect yourself. The best approach is to Google hack your own site, identify all those flaws that can be picked up by Google and fix them. Skilled hackers may still be able to use some of these tricks to survey your site if they have already targeted it. But at least you won’t be advertising your problems.

 

References

1. http://johnny.ihackstuff.com/ghdb.php

2. Google Hacking for Penetration Testers (Syngress, 2005, 2007) — available from Amazon. Volume 1, co-written with Ed Skoudis and Alrik van Eijkelenborg seems to be out of print.

3. GoogleGuide.com — a handy reference for making the most from Google searches, including a two-page cheat sheet at: http://www.googleguide.com/print/adv_op_ref.pdf

4. GoogleHacks — Google’s own hacking tool for Windows, Linux and Mac OS X can be found here: http://code.google.com/p/googlehacks/

5. Goolag – http://www.goolag.org/download.html