
The ECN No Name Newsletter is no longer being published. This is an archived issue.
[previous article] [next article]
There are many different search engines to help manage the tangle of information we call the World Wide Web. Certain engines concentrate on special topics, other engines are collections of lists, and still other engines are encyclopedic in nature. Two specialized collections that I use regularly are CDNow, a search engine that handles most musical recordings in print, and Switchboard, a search engine that handles addresses and phone numbers of people in the United States.
Encyclopedic search engines try to catalog as much of the web as possible. Content and structure of the information provided differs based on search method and selection of retrieved data.To ease our discussion, I will divide search engines into two different types, calling them "List" and "Index."
You are able to call both styles of engines' attention to a specific page. A mini list of submission pages is shown above. AltaVista is my favorite of the list style and is the search engine the ECN uses to search itself. Of the index style engines, I prefer Yahoo. Submission interaction differs between these two search engines. AltaVista's database is added to automatically, so the process for adding new URLs is easy. Pages are added by hand into Yahoo's hierarchy, thus you have to figure out which subject area best suits your page.
There are ways you can keep your pages secret, if that is what you want, but a
more common problem is that nobody can find your pages when you want them
public. A big problem comes from the nature of spiders. Spiders follow links,
but if there's no link to your pages, it doesn't locate your site.
Unfortunately, not everyone thinks
to take the time to tell the spiders to come and search.
The main undergraduate machine for PUCC, expert, has a page at
Another difficulty search engines encounter is with image maps.
In essence, what a spider sees when
searching your page is what you can see using Lynx. Using server-side image
maps, it is nearly impossible for a spider to search beyond the map.
The point here is that you should be sure,
either through a no-image version or
through client-side image maps, that the pages you want located can be found.
It also makes it much easier for people who,
for lack of fast computers, fast modems or PPP/SLIP connections,
insist on using Lynx, or other text-only browsers, to browse the Internet.
Submit-It is a site created to make it easy for you
to introduce your pages to search engines.
It collects a bit of information from you and transmits this information
to many of the Internet's search engines.
Boolean Operators
Searching for a long character string, rather than single words, will also help
narrow your search. For example, if you were interested in the insulating
effect of cheese and crust, you could type +thermodynamics +pizza and find
nearly 200 pages with both words. By contrast, you could instead
type "thermodynamics of pizza" and find nine pages.
+Link:yourmachine.ecn.purdue.edu/~you/yourpage -Host:yourmachine.ecn.purdue.edu
<FORM method=GET action="/cgi-bin/query">
Since you are initiating the action
from your machine and not AltaVista's,
you have to specify that it is
AltaVista's cgi-bin you are calling by giving the full URL:
<FORM method=GET
action="http://www.altavista.digital.com/cgi-bin/query">
Below is code that will allow you
to search Yahoo and AltaVista from your own pages.
AltaVista (search form)
ECN Search Example
Take a "learning" look at the support code generating
the ECN FAQ search.
Only a minor "tweak" will be required to customize
this code for use on your pages.
Please send me email if you have questions.
Hopefully, the suggestions offered in this article will help you perform more
exacting searches. By implementing various combinations of the above operators
and tags, you can narrow your searches to return a manageable number of hits,
rather than the thousands resulting from a wider search.
Additionally, once
you've mastered both list and index search engines, you will have a better
notion of the information you must provide to get the word out on your page.
Locating Information
With millions of web pages available via the Internet,
locating the correct information is a fairly common problem.
Say you're looking up the subject "pizza" on
your favorite search engine, and you end up with 9,000 pages to examine.
How do you know which are the fun ones, which are the useful ones,
and which ones have just a passing reference? The most effective methods involve narrowing
the scope of your search.
There are many ways to narrow your search request
to limit or expand your results.
First, there are Boolean operators. If there's more than one word in
your request, the browser will return pages with any one of the words.
For example, if you ask for
The search engine may return a page about
any combination of the four words.
A page about cheese
with no references to pepperoni or anchovies
may be located.
This instruction requires that the page must mention
pizza but may or may not necessarily have any of the other toppings.
This command will locate pages about pizza with or without cheese and pepperoni
but will remove from the listing any page that contains
the word anchovies.
Tag Limiters
Specific to AltaVista simple search, there are other ways to limit your search.
The
Host:
tag can be used to specify that the search include or dismiss certain host machines.
For instance, if you know the page you are seeking is a Purdue Engineering page,
Host:ecn.purdue.edu
will ensure all returned pages are only from the Engineering network.
This can be useful, especially if you're trying to
configure SLIrP and want site-specific information for the ECN.
You could also force the return of only
government, military, commercial or educational pages.
You can also use the Host tag
to edit out servers.
Two examples of server reduction are
-Host:jp ,
which eliminates all pages from Japan, and
-Host:aol.com ,
which removes all America Online pages from consideration.
Using
Title:foo
would find all pages with "foo" in their
Title line.
Anchor:click-here
would find pages where you click
text labeled "click-here" to go on to another page.
You can also specify the URL, by typing
URL:page_URL_you_know.html ,
which will return all the pages on different servers with the URL of
page_URL_you_know.html.
Use
Applet:
and
Image:
to find (or avoid) images and Java applets.
In addition, you can search for people linking to your page,
by using
+Link:
and
-Host:
tags in tandem.
This combination (shown below)
will result in output telling you who, if anyone, is linking to
your pages, without listing your internal links to that page.
Searching From Your Page
One of the guiding principles of the World Wide Web is that since you can
see a page's code by clicking on "view source,"
you can copy code from other sites.
This is the most commonly used method of learning HTML I've seen.
Taking advantage of this "learning" method
provides an easy method to start searches from one of your pages,
rather than having to start from Yahoo's
or AltaVista's main page.
If you copy their code,
often the only alteration you need to make to the code
is to add the full site URL to the
Form:
tag.
For example,
AltaVista's form has this as the first line:
<FORM method=GET action="http://www.altavista.digital.com/cgi-bin/query">
<INPUT TYPE=hidden NAME=pg VALUE=q>
<B>Search <SELECT NAME=what>
<OPTION VALUE=web SELECTED>the Web
<OPTION VALUE=news >Usenet</SELECT>
and Display the Results <SELECT NAME=fmt>
<OPTION VALUE="." SELECTED>in Standard Form
<OPTION VALUE=c >in Compact Form
<OPTION VALUE=d >in Detailed Form</SELECT>
</B><BR>
<INPUT NAME=q size=55 maxlength=200 VALUE="">
<INPUT TYPE=submit VALUE="Submit">
</FORM>
Yahoo (search form)
<form action="http://search.yahoo.com/bin/search">
<input size=25 name=p>
<input type=submit value=Search>
</form>
The Engineering web provides an AltaVista search interface
on many of its central pages.
These requests provide an easy option to
limit your search to the ECN or in some cases,
to select a specified site.
webmaster@ecn.purdue.edu
Last modified: Saturday, 02-Oct-99 12:31:38 EST