The ability to find relevant results on Google has long been regarded as part science, part art. While choosing the correct search terms is undoubtedly an important success factor, there is so much more to master before you become...
...a true Google Ninja.
A simple, but necessary step on your path to become a Ninja is understanding Boolean search commands in Google.
There exist three such commands:
- + (logical AND)
- OR, or equivalently |
- - (logical NOT)
+ is included by default in Google. If you type two (or more) words in the search bar, eg. solar wind, it looks for pages simultaneously containing both keywords solar and wind. In fact, it is equivalent to typing solar + wind in the google search bar. As intended, you will get webpages talking about the solar wind (the continuous flow of charged particles from the sun which permeates the solar system).
OR (|) is more of a special case. Imagine for a moment that you are not interested in the solar wind but you want to look for pages containing information about renewable energies. More specifically, you are interested in both solar and wind. So instead of doing two searches (one for solar, one for wind), you can just enter solar OR wind (equivalently solar|wind ) and have both types of pages in your search results.
- (NOT) is useful for excluding useles or irrelevant search results. Take for instance that you still want to look for solar but at this point, you have developed an outright phobia for the 'solar wind' websites out there. The query solar -wind will return results containing the keyword 'solar', but without any mention to wind.
" " (double quotation marks)
The double quotation marks " " are a must-know in the Google universe. Let's go back to our initial example: you really want to know about the solar wind. Unfortunately, when you type in solar wind, you also get results where the word 'solar' is in the beginning of the document and the word 'wind' occurs five sentences later! If instead you enter "solar wind", you are forcing google to match this exactly! This means that Goole will only give you results with the two words occuring directly next to each other.
Before I forget, " " also works for individual words! You might not know this, but Google has a bit a mind of its own. Words like the are routinely being excluded from the search because they have little semantic value. Do you really want to include them in the search? Put them between quotes (eg "the" island). Also, when you enter favourite, Google will automatically search for favourite (UK) and favorite (US). If you really only want to look for favourite, you can just put this between quotes: "favourite".
Little known by Google users, there is actually a solution in between. Using the AROUND(...) operator, you can actually specify the maximum distance between your keywords. The number between the brackets specifies the maximum number of words between the words on the left and the words on the right of your query. This is very helpful for quotes of which you only remember part of the text:
Lastly, you can use the wildcard character * to replace any word in a search query. An example:
This will give you all the top 10 lists composed for 2016, no matter what the subject (movies, cars, technologies etc.).
Let's get practical
Now that we have digested the appetizer, we can get started on the main course. And oh boy, it sure is a glorious dish.
In this section, I will go over some the most powerful search operators and give a real-world practical application for them. This way, I hope to give an illustration of the real-world value they contain.
The filetype: operator enables you to search for any type of file indexed by Google. It is especially useful to look for ebooks (pdf, epub) or movies (avi, mp4, mkv). You might be surprised of the amount of books/movies you can find through Google.
The site: operator is one of the most underrated operators you will come across. It allows you to limit your search to a certain website/web domain. An obvious use is as a replacement for a site's own search functionality (as such it is often used by Reddit users). It can also be used succesfully to exclude certain sites: you just pair it with the NOT command (I prefer to use the shorthand: -) like so: inception -site:imdb.com. It is also possible to search within sections of sites: site:example.com/section1/section2/.
Evil genius level
The inurl: operator specifically looks for words occuring in the URL of any webpage. Ever since the advent of the internet of things, this parameter is often used to scour the web for unsecured devices, such as webcams, printers etc. The idea behind it all is that each manufacturer of a webcam or printer composes the public URL for that device in a particular way. If you find out what are the recurring elements composing that URL, you can actively search for it using Google.
For webcams, you can try the following:
- inurl:top.htm inurl:currenttime
- ... (and so many more, for a list click here)
For printers, a quite well-known query is the following.
- HP: inurl:hp/device/this.LCDispatcher?nav=hp.Print
- Dell:inurl:"cgi-bin/dynamic/" inurl:"html" intitle:"Printer Status"
- Samsung: intitle:"SyncThru Web Service" inurl:"sws"
- ... (and so many more, for a list click here)
An interesting website that lists all Google security exploits (unsecured devices, files, servers etc.) can be found here
It has to be said that searching for and accessing these devices are in a legally grey area (to say the least). There are also important moral concerns when watching somebody through his/her unsecured webcam or sending the entire text of Mein Kampf to be printed at an unsuspecting grandma's home.
A wonderful method to find file (videos, images etc.) on the internet is obtained by combining most of the search operators discussed above.
For example, to find Breaking Bad, we can enter the following query.
"breaking bad" -inurl:(htm|html|php|pls|txt) intitle:index.of last modified (mkv|mp4|avi)
This gives us the following results:
As you can see, you are able to access the directory structure of unsecured web servers. These pages are often referred to as the 'main' or 'index' page and list the files at that location with size and last modified date.
Now that we have reached an admirable level of query complexity by combining our search operators, it will be helpful to explain the query in more detail.
"breaking bad": title of file we eventually want to find. I put double quotation marks because I want the words 'breaking' and 'bad' appearing exactly after each other, and not two lines apart.
-inurl:(htm|html|php|ls|txt): this part is a combination of the - (the shorthand for the NOT operator) and the inurl: operator. It basically says: "I want to exclude pages with html, htm, php, ls or txt in the URL."The | (pipe, shorthand for OR) is used to say that as soon that at least one of the file extensions is present in the URL, the page should be excluded from the search results. This filters out most of the actual web pages.
intitle:index.of: The title of the page should contain the words Index of. This is a clever trick to find the index pages of a web server, given that these tend to have the title "Index of ....".
last modified: as is obvious from the image above, a typical index page is composed of headers named 'name', 'last modified', 'size'. It is not combined with a search operator, it is just a normal search term. This is to ensure that Google will return index pages, and not actual web pages.
(mkv|mp4|avi): we are looking for at least one of the following words (here representing file extensions) on the page: mkv, mp4, avi. This will ensure that the index page actually lists video files.
It's perfectly reasonable that you can't memorize the syntax of the above query. Luckily, some noble strangers have created specialized search engines for this purpose: