ECN No Name Newsletter: February, 1997

The ECN No Name Newsletter is no longer being published. This is an archived issue.

[previous article] [next article]

Search Engines: Using Their Services Effectively

Dave Jacoby
jacoby@purdue.edu

Insuring Your Material Is Found

You've come up with a novel idea, say a proof for Fermat's Last Theorem. In order to share it with the world, you format it in HTML and put it on your home page. You spice it up with graphics and links to math departments around the world. But when the accolades you expect fail to roll in, you begin to think nobody's seeing your work. Now why is that the case? Because nobody knows about your site. To heighten your exposure, you need to master search engines.

There are many different search engines to help manage the tangle of information we call the World Wide Web. Certain engines concentrate on special topics, other engines are collections of lists, and still other engines are encyclopedic in nature. Two specialized collections that I use regularly are CDNow, a search engine that handles most musical recordings in print, and Switchboard, a search engine that handles addresses and phone numbers of people in the United States.

Encyclopedic search engines try to catalog as much of the web as possible. Content and structure of the information provided differs based on search method and selection of retrieved data.To ease our discussion, I will divide search engines into two different types, calling them "List" and "Index."

Name
Submit Site
AltaVista
http://www.altavista.digital.com/cgi-bin/query?pg=tmpl&v=addurl.html
Excite
http://www.excite.com/info/add_url/
HotBot
http://www.hotbot.com/addurl.html
Infohiway
http://www.infohiway.com/docs/eo.html
Infoseek
http://www.infoseek.com/AddUrl?pg=DCaddurl.html
Lycos
http://www.lycos.com/addasite.html
Magellan
http://www.mckinley.com/feature.cgi?add_bd
WWW Worm
http://wwww.cs.colorado.edu/home/mcbryan/WWWWadd.html
Yahoo
http://add.yahoo.com/fast/add?
Submit It
http://www.submit-it.com/

List style engines use three main software components--the spider, the database and the search element--to canvass the Web without requiring submissions from users.

"Index" Style Search Engines

Indexes like Yahoo and Excite rely on people to submit URLs to them, which are then placed into hierarchies using spider programs. They can either be searched through the hierarchy or using the same kind of search discussed for list style engines. However, the engine, instead of returning just a random list of links to outside pages, returns links to places in the hierarchy. In other words, a search of an index-style engine for "indiana university" returns a list of all the colleges and universities in Indiana, rather than 200,000 pages containing both words.

You are able to call both styles of engines' attention to a specific page. A mini list of submission pages is shown above. AltaVista is my favorite of the list style and is the search engine the ECN uses to search itself. Of the index style engines, I prefer Yahoo. Submission interaction differs between these two search engines. AltaVista's database is added to automatically, so the process for adding new URLs is easy. Pages are added by hand into Yahoo's hierarchy, thus you have to figure out which subject area best suits your page.

There are ways you can keep your pages secret, if that is what you want, but a more common problem is that nobody can find your pages when you want them public. A big problem comes from the nature of spiders. Spiders follow links, but if there's no link to your pages, it doesn't locate your site. Unfortunately, not everyone thinks to take the time to tell the spiders to come and search. The main undergraduate machine for PUCC, expert, has a page at where they list all the students on expert with web pages. However, there is no central directory of pages on Schools of Engineering machines, and if you're running your own httpd server, you're even more invisible. The solution to this problem is submitting it to a search engine.

Another difficulty search engines encounter is with image maps. In essence, what a spider sees when searching your page is what you can see using Lynx. Using server-side image maps, it is nearly impossible for a spider to search beyond the map. The point here is that you should be sure, either through a no-image version or through client-side image maps, that the pages you want located can be found. It also makes it much easier for people who, for lack of fast computers, fast modems or PPP/SLIP connections, insist on using Lynx, or other text-only browsers, to browse the Internet.

Submit-It is a site created to make it easy for you to introduce your pages to search engines. It collects a bit of information from you and transmits this information to many of the Internet's search engines.

Locating Information

With millions of web pages available via the Internet, locating the correct information is a fairly common problem. Say you're looking up the subject "pizza" on your favorite search engine, and you end up with 9,000 pages to examine. How do you know which are the fun ones, which are the useful ones, and which ones have just a passing reference? The most effective methods involve narrowing the scope of your search.

Boolean Operators
There are many ways to narrow your search request to limit or expand your results. First, there are Boolean operators. If there's more than one word in your request, the browser will return pages with any one of the words. For example, if you ask for

Searching for a long character string, rather than single words, will also help narrow your search. For example, if you were interested in the insulating effect of cheese and crust, you could type +thermodynamics +pizza and find nearly 200 pages with both words. By contrast, you could instead type "thermodynamics of pizza" and find nine pages.

Tag Limiters


Specific to AltaVista simple search, there are other ways to limit your search.

+Link:yourmachine.ecn.purdue.edu/~you/yourpage -Host:yourmachine.ecn.purdue.edu

Searching From Your Page

One of the guiding principles of the World Wide Web is that since you can see a page's code by clicking on "view source," you can copy code from other sites. This is the most commonly used method of learning HTML I've seen. Taking advantage of this "learning" method provides an easy method to start searches from one of your pages, rather than having to start from Yahoo's or AltaVista's main page. If you copy their code, often the only alteration you need to make to the code is to add the full site URL to the Form: tag. For example, AltaVista's form has this as the first line:

<FORM method=GET action="/cgi-bin/query">

Since you are initiating the action from your machine and not AltaVista's, you have to specify that it is AltaVista's cgi-bin you are calling by giving the full URL:

<FORM method=GET action="http://www.altavista.digital.com/cgi-bin/query">

Below is code that will allow you to search Yahoo and AltaVista from your own pages.

AltaVista (search form)

<FORM method=GET action="http://www.altavista.digital.com/cgi-bin/query">
<INPUT TYPE=hidden NAME=pg VALUE=q>
<B>Search <SELECT NAME=what>
<OPTION VALUE=web SELECTED>the Web
<OPTION VALUE=news >Usenet</SELECT>
and Display the Results <SELECT NAME=fmt>
<OPTION VALUE="." SELECTED>in Standard Form
<OPTION VALUE=c >in Compact Form
<OPTION VALUE=d >in Detailed Form</SELECT>
</B><BR>
<INPUT NAME=q size=55 maxlength=200 VALUE="">
<INPUT TYPE=submit VALUE="Submit">
</FORM>
Yahoo (search form)
<form action="http://search.yahoo.com/bin/search">
<input size=25 name=p>
<input type=submit value=Search>
</form>

ECN Search Example
The Engineering web provides an AltaVista search interface on many of its central pages. These requests provide an easy option to limit your search to the ECN or in some cases, to select a specified site.

Take a "learning" look at the support code generating the ECN FAQ search.

Only a minor "tweak" will be required to customize this code for use on your pages. Please send me email if you have questions.

Hopefully, the suggestions offered in this article will help you perform more exacting searches. By implementing various combinations of the above operators and tags, you can narrow your searches to return a manageable number of hits, rather than the thousands resulting from a wider search. Additionally, once you've mastered both list and index search engines, you will have a better notion of the information you must provide to get the word out on your page.


webmaster@ecn.purdue.edu
Last modified: Saturday, 02-Oct-99 12:31:38 EST

[HTML Check] HTML