2.a. The Language of Search Engines
It is a capital mistake to theorize before
you have all the evidence. It biases the judgment.
A Study in Scarlet
Search engines are your helpers. They are
information assistants who help you find the answers to your problems. Like any other
assistant, the degree to which they are able to help depends on the degree to which you
are able to tell them what you want. Therefore, communicating with your search engine is a
critical part of the search process.
Search engines need to know what
information you seek, and they need this desire described in a logical way, since they are
computers after all. The language that we traditionally use to talk with computer-based
searching tools is called Boolean, named after George Boole, an English
mathematician of the 19th century.
In Boolean Logic we use keywords
to describe what words to consider when searching for information that is relevant to our
information quest. We also use operators to describe the relationships between our keywords
and the information that we seek. The basic operators are AND,
OR, and NOT.
Let's use an example to explore how we
would use Boolean Logic to search for information on the Internet. We will look for
information about Native Americans in the state of Ohio.
||A keyword is a
word or term that we want the search engine to consider in looking for relevant
information. In our example one word that would likely appear in a web page about Native
Americans is Indian.
||In many cases,
there may be a synonym of our keyword that might appear in the web page instead of the
keyword we have already chosen. So we will want to expand the number of pages that the
search engine sends us to include the ones using the synonym. In the case of our example,
many web pages would likely use the term Native American, which is more commonly
used today than Indian. In this case we would use the operator, OR, to say
that we want web pages with either the word Indian or the term Native American.
||Since we are
looking for information about Native Americans in the state of Ohio, then an additional
keyword will be Ohio. We want to narrow the web pages that we get to only those
about Native Americans in Ohio, so we will say that both terms must be present. Here is
where we will use AND.
Native American AND Ohio
||As we think
through the information that we are likely to receive, we realize that there is a baseball
team in Cleveland, Ohio called the Indians. We will want to filter out all web pages about
the baseball team. So we will add a new keyword, baseball, and connect it to our
search express with the operator, Not. We are saying that the acceptable web page
should NOT have the keyword baseball in it.
Native American AND Ohio NOT baseball
||Just as we use
commas, question marks, and other punctuation to help communicate with people, we use
special symbols to clarify what we want from a search engine. One example is the use of
quotation marks to define phrases. In our example, Native American is going to look like
two separate words to the search engine that could each appear any place in the web page.
To communicate that these two words belong together as a distinct phrase, we use quotes.
"Native American" AND Ohio NOT baseball
in a search expression defines a distinct keyword concept.
Keyword 1 AND Keyword 2
Keyword 3 OR Keyword 4
Keyword 5 NOT Keyword 6
A keyword concept can consist of:
- A single keyword or phrase
- Two single keywords or phrases connected
by an operator
- Keyword concepts connected by an operator
to other keyword concepts or single keywords or phrases.
Individual keyword concepts are marked by
enclosing them in parentheses. In our example, the following are distinct keyword
(Indian OR "Native American")
((Indian OR "Native American")
The final keyword concept, the one that
includes all constituent keyword concepts is called our search expression.
"Native American") AND Ohio) NOT baseball
Admittedly, Boolean Logic is not the simplest thing to understand or
teach. However, it is a very effective way to communicate with search engines, refining
your request for specific information resources.
To make things easier for casual users, Internet search engines have developed
alternatives to traditional Boolean Logic. One of the most common conventions is the use
of pluses (+) and minuses (-), to indicate which terms must (+) and must not (-) be
present in the returned documents. Each search engine has developed its own version of
these searching conventions, each trying to improve upon these standards, and this
evolution of the search language continues. None is perfect and you will find that finding
information from the Internet is more a process than the click of a button.