0% found this document useful (0 votes)
99 views20 pages

Chap10 Free Tools

Yahoo! and GoogIe provide some free and exceptionaIIy usefuI tools and services to heIp you communicate the structure of your site to search engines and see your site from their perspective.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
99 views20 pages

Chap10 Free Tools

Yahoo! and GoogIe provide some free and exceptionaIIy usefuI tools and services to heIp you communicate the structure of your site to search engines and see your site from their perspective.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

BuiId with XHTML.

BuiId
your audience. BuiId with
Web Standards.BuiId with
Ajax.BuiId to be found. BuiId
with microformats. BuiId
keyword density. BuiId with
XHTML & CSS. BuiId sem
anticaIIy meaningfuI code.
BuiId with RSS. BuiId sites
you can track. BuiId Iinks to
your site. BuiId hCard con-
tact information. BuiId a rep-
utation. BuiId repeat visit
ors. BuiId a sitemap XML
hIe. BuiId content that drives
trafhc. BuiId custom search.
BuiId to be seen by search
engines. BuiId hndabIe
bIogs. BuiId with PHP and
MySQL. BuiId keyword den-
Build with XHTML. Build your
y our site. Build hCard con-
tact information. Build a rep-
utation. Build repeat visit
ors. Build a sitemap XML
fle. Build content that
drives traffc. Build custom
search. Build to be seen by
search engines. Build fnd-
able blogs. Build with PHP
and MySQL. Build keyword
density. Build with JavaS-
cript. Build with XHTML.
Build your audience. Build
search engine friendly URLs.
Build hCalendar event list-
ings. Build meta data. Build
tag clouds. Build consen-
sus in your team. Build with
web standards. Build with
Ajax. Build to be found. Build
with microformats. Build
with CSS. Build with RSS.
Build sites you can track.
10
You can communicate
information about your site to
search engines and see your
site from their perspective using
some free services and utiIities
from Yahoo! and GoogIe.
Free Search Engine
Tools and Services
2
Yahoo! and GoogIe provide some free and
exceptionaIIy usefuI tooIs and services to heIp
you communicate the structure of your site,
and get a cIearer understanding of how weII its
being indexed. These tooIs aIso provide vaIuabIe
insight about inbound Iinks to your site, keywords
generating search referraIs, and various other
data that can heIp you assess the success of your
findabiIity efforts.
Before evaIuating statistics youII need to make
sure search engines are indexing aII of the content
on your site. You can make it easier for search
engine spiders to crawI your site by drawing them
a map.
BuiIding and Submitting sitemap.xmI
Historically, the communication between webmasters and search engines
has been very limited. In the past, once youd built your site you would submit
the home page URL to all of the major search engines to let them know youd
like their spiders to begin indexing your content. With nothing but the home
page URL, spiders can potentially overlook some pages in your site, especially
those that may only be accessible via your search system. A few years ago
Google recognized that this problem could adversely affect the comprehen-
siveness of its search index, and created a simple solution called sitemap.xml.
In June 2005 Google introduced a standardized XML sitemap protocol
that allows webmasters to communicate the structure of their site to search
engines for more accurate indexing (http://sitemaps.org). Today, because
the sitemap.xml protocol is supported by Yahoo!, Ask, and MSN Live Search
as well, the same XML file can let all major search engines know which pages
they should index in your site.
The sitemap.xml protocol also lets webmasters include information about
each page, including the date it was updated, the frequency of change, and
how important it is in the site. This type of additional information can help
search engines crawl your site more intelligently.
fRee SeARcH eNGINe TooLS ANd SeRvIceS 3
The structure of a sitemap.xml file is relatively simple. Heres an abbreviated
example that illustrates the tags common to the protocol:
<?xml version=1.0 encoding=UTF-8?>
<urlset
xmlns=http://www.sitemaps.org/schemas/sitemap/0.9
xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance
xsi:schemaLocation=http://www.sitemaps.org/schemas/sitemap/0.9
http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd>
<url>
<loc>http://aarronwalter.com/</loc>
<priority>1.0</priority>
<changefreq>daily</changefreq>
</url>
<url>
<loc>http://aarronwalter.com/about/</loc>
<priority>0.5</priority>
<changefreq>monthly</changefreq>
</url>
</urlset>
Although this example provides URLs for just two pages on my website, the
sitemap.xml protocol can describe sites of any scale. The file begins with
the XML prologue and a definition of the schema being used. Thats not the
important part, though. Notice there are two open and close <url> tags, each
containing different information. each <url> tag defines a different pages
location, priority, and change frequency. Because the home page is the most
important page in the site it has a higher priority value than the interior page in
this example.
You can optionally define the priority and change frequency of your pages
from within each <url> tag. The <priority> tag contains a floating point num-
ber from 0.0 to 1.0, where 1.0 is the highest priority. It lets search engines
know which pages you deem most important so spiders can prioritize as they
index your site.
The <changefreq> tag provides search engines general information about how
often your pages will change, but dont take this too seriously as it may not
correlate to how often your page gets crawled. Youll find a list of possible val-
ues for the <changefreq> tag and further information about sitemap.xml tags
at http://www.sitemaps.org/protocol.php.
4 cHApTeR 10
This file is quite short and it is relatively simple so it would be easy to build.
If your site had hundreds or thousands of pages it would be far too time con-
suming and tedious to write it by hand.
Luckily there are scores of desktop and Web applications that will crawl your
site and create a sitemap.xml file automatically so you can spend more time in
your hammock than writing repetitive XML documents. There are even scripts
in a variety of languages that you can freely integrate into your sites, content
Management Systems, or Web applications to automate the process even
further.
Youll find an exhaustive list of options at http://code.google.com/
sm_thirdparty.html. Some options are free while others may cost you a few
bucks. If youre a dreamweaver user you may want to try George petrovs
free Google Sitemap Generator extension available on dMXZone.com
(http://www.dmxzone.com/showDetail.asp?TypeId=3&NewsId=10538).
See FIGURE 10.1. once youve downloaded and installed the extension youll
need to restart dreamweaver and then define a site in the site manger so the
extension can crawl all of the HTML pages in your site. To define a site simply
choose Site, select New Site, then enter the information requested.
once your site is defined, select commands, choose create Google Sitemap
and the extension will work its magic. The final sitemap.xml file will be placed
in the root folder of your site.
If you are using a server-side scripting language to create a page template
system, youll find that the extension will have trouble crawling your sites files.
Instead, you will need to use a sitemap generation tool that crawls the live site
in order to create the sitemap.xml file.
one such option is XML-Sitemaps.com (http://www.xml-sitemaps.com/).
If your site is 500 pages or less, XML-Simaps.com will crawl your site and
generate a sitemap.xml file for free. for sites that are larger you would need
to purchase their pHp script, which you could integrate into your own proj-
ects. The script also generates sitemaps in RSS and HTML formats. HTML
sitemaps are especially useful to users whove lost their way on your site, or
just want a little help understanding the scope of your information. Any search
engine that doesnt support the sitemap.xml protocol could use the HTML
sitemap to navigate your site.
When the script runs, it automatically notifies Google of the file update so
it can revisit your site to index recent content. The XML-Sitemaps.com pHp
FIGURE 10.1 George
Petrovs Google Sitemap
Generator is a free exten-
sion for Dreamweaver that
greatly simplifies the cre-
ation of sitemap.xml files.
fRee SeARcH eNGINe TooLS ANd SeRvIceS 5
script will also track down broken links in your site so you can repair them.
If you are looking for an equally feature-rich pHp script to integrate into your
projects for free, check out phpSitemapNG (http://enarion.net/google/).
You can schedule a regular refresh of your sitemap.xml file with php-
SitemapNG using cron, a Unix and Linux operating system utility that sched-
ules tasks to run automatically. If youre on a Windows server, use the built-in
task scheduler in the control panel to run the pHp script instead.
There are also good desktop applications that can make short work of
sitemap.xml file development. RAGe makes Google Sitemap Automator for
the Mac (http://www.ragesw.com/products/googlesitemap.html) that will
crawl your site, generate the sitemap.xml file, upload it to your server, and tell
Google once its posted (see FIGURE 10.2). Its intuitive interface makes the
process very easy. Although you can use the demo version of Google Sitemap
Automator to generate and upload your sitemap.xml file, youll have to pur-
chase a license to use the Google notification feature.
If youre new to cron,
check out Aaron
Brazells helpful article at
http://www.sitepoint.com/
article/introducing-cron.
Windows server users look-
ing for an introduction to
Task Scheduler can check
out http://www.iopus.com/
guides/winscheduler.htm.
iArchitect makes a great sitemap utility for Windows called Sitemap Gen-
erator (http://www.iarchitect.net/Products/Sitemap-Generator/). See
FIGURE 10.3. In addition to creating, uploading, and notifying search engines
of your sitemap, Sitemap Generator creates HTML sitemaps and provides
scheduling options so you can further automate the building and updating of
your sitemap.xml file.
If youre not using a sitemap generation program that will automatically upload
your sitemap.xml file youll need to do it yourself. Generally the file is placed in
the Web root folder on a server, which is typically called public_html or www.
Then youll need to inform search engines of its location so they can read the
sitemap and begin crawling your site.
FIGURE 10.2 RAGEs
Google Sitemap Automator
(http://www.ragesw.com/
products/googlesitemap.
html) quickly builds a
sitemap.xml file, uploads it
to your server and notifies
Google of its location.
6 cHApTeR 10
Informing Search Engines About Your
sitemap.xmI FiIe
There are three ways you can let search engines know about your sitemap.xml
file once its uploaded or updated:
n
robots.txt
n
ping
n
Manual submission
for the best results youll want to use a combination of these three to ensure
all search engines are aware of your sitemap.
robots.txt chapter 3, Server-Side Strategies, introduced the robots.
txt protocol used to tell search engine spiders which files or directories on
a server should be excluded from indexing. The file is placed in the Web root
folder of your server, and is automatically read by all search engine spiders.
A robots.txt file can also be used to communicate the location of your
sitemap.xml file to all search spiders that visit your site. Simply add the follow-
ing to your robots.txt file to define the location:
Sitemap: sitemap.xml
The benefit of using robots.txt to communicate your sitemaps location is
that any search engine that supports the protocol will automatically find the file
when it visits your site. This approach does not send a message out to search
FIGURE 10.3 Sitemap
Generator for Windows
offers features beyond most
of its competitors, includ-
ing HTML sitemaps and
scheduled updating of your
sitemap.xml file.
fRee SeARcH eNGINe TooLS ANd SeRvIceS 7
tip
engines inviting them to index your site. If youre launching a new site, its a
good idea to notify search engines directly as well using one of the next two
methods.
Ping A ping is a short message from one computer to another. When you
publish a new site and want to request a full indexing you can ping search
engines individually to let them know where your sitemap.xml file is located.
Google, Ask, and Yahoo! all offer ping notification services, which can be
used by simply navigating to each in your browser. Replace the highlighted
sample URL in these examples with the absolute path to your sitemap.xml file.
n
Google: http://www.google.com/webmasters/sitemaps/ping?sitemap=
http://example.com/sitemap.xml
n
Ask: http://submissions.ask.com/ping?sitemap= http://example.com/
sitemap.xml
n
Yahoo! : http://search.yahooapis.com/SiteexplorerService/v1/
ping?sitemap= http://example.com/sitemap.xml
MSN Live Search is conspicuously absent from this list. Although it supports
the sitemap.xml protocol, at the time of writing it didnt offer a sitemap sub-
mission tool of any kind. The only way to let MSN Live Seach know about your
sitemap is via your robots.txt file.
These ping URLs are also very useful if youre building a content management
system or Web application that needs to continually update search engines
with new pages. Your application could automatically update your sitemap.xml
file, then connect to these ping services for you.
To quickly ping all search engines with sitemap.xml ping services,
create a bookmark folder in your browser, then add bookmarks for
each ping service with your sitemap.xml URL trailing. Anytime you
update your sitemap.xml file simply launch these bookmarks to
instantly send word of your new content to search engines.
ManuaI Submission You can also let search engines know about your
sitemap.xml file by visiting each one and manually submitting your sitemap
URL. Unfortunately, only Google and Yahoo! currently provide manual
sitemap.xml submission tools. for sitemap updates this is certainly the most
tedious of the three options. If youre launching a new site, a manual submis-
sion is probably a good idea.
8 cHApTeR 10
note
As well discover in the rest of this chapter, when you manually submit your
sitemap to Google and Yahoo! they will parse your file and notify you of any
errors theyve encountered. They also provide information about the last
time they read your sitemap so youll know if your newest content is in their
search index.
If you need to generate a sitemap for a Wordpress blog youll want
to read the section in chapter 5 entitled Automatically Generating
an XML Sitemap.
GoogIe Webmaster CentraI Services
Google provides an amazing array of free statistics, diagnostics, and manage-
ment utilities in Webmaster central (http://www.google.com/webmasters/).
See FIGURE 10.4. from here you can submit your URL to Google, and check
the status of your site to see if it has been indexed. But the utility of Webmas-
ter central goes far beyond URL submission.
The bulk of Googles free utilities in Webmaster central can be found in Web-
master Tools (https://www.google.com/webmasters/tools/). Before you
can use Googles Webmaster Tools, youll need to create a Google account if
FIGURE 10.4 Googles
Webmaster Central con-
tains a wide variety of free
and very useful utilities that
provide statistics, diag-
nostics, and management
features.
fRee SeARcH eNGINe TooLS ANd SeRvIceS 9
you dont already have one. You can manage sitemaps and view statistics for
multiple sites from a single account, which is great if you have to keep tabs on
many client websites.
once youve logged in youll be taken to the dashboard, where you can add
your sites URL in order to view data about it and manage some preferences.
Youll be prompted to verify that you have the authority to manage the site in
one of two ways. You can either add a special meta tag to your home page
with a value unique to your site, or upload a special file Google provides to
your servers Web root folder. When youve completed one of these tasks you
simply click the verify button, and Google will visit your site immediately to
confirm that the meta tag or file is present. once youve proven you are the
webmaster of the site, you can start using Googles tools.
Webmaster TooIs
Googles Webmaster Tools are divided into five key sections:
n
diagnostics
n
Statistics
n
Links
n
Sitemaps
n
Tools
When you log in to Webmaster Tools and have clicked on a URL you want to
manage youll be taken to an overview page that provides some quick informa-
tion (see FIGURE 10.5).
FIGURE 10.5 The overview
page in Googles Webmaster
Tools provides quick access to
statistics and information that is
available within subsections.
10 cHApTeR 10
essential information like 404 errors, pages that were requested but timed
out, HTTp errors, or pages blocked by your robots.txt file can be viewed at a
glance on the overview page. If any errors or issues of concern were detected
a link is provided to learn more about the problem.
Also on the overview page youll find the date of the last index of your home
page and indication of whether your sites content is in Googles index. This
quick view makes spotting serious problems easier. Lets examine some of the
more detailed features of Googles Webmaster Tools.
Diagnostics
In the diagnostics section youll find further detail about trouble Google
encountered while indexing your site (see FIGURE 10.6). There are six types
of errors and issues that Google logs:
n
HTTP errors: server configuration errors, forbidden directories, etc.
n
Not found: 404 errors
n
URLs not followed: any pages you may have indicated that were not to
be indexed
n
URLs restricted by robots.txt
n
URLs timed out: pages that were requested but couldnt be indexed
because of a slow network connection, defective code, or some other
reason
n
Unreachable URLs: pages listed in your sitemap.xml file but were not
reachable
FIGURE 10.6 You can
pinpoint any problems
Google encountered when
crawling your website from
the Diagnostics section of
Webmaster Tools.
fRee SeARcH eNGINe TooLS ANd SeRvIceS 11
Youll notice in the diagnostics section a sub-section entitled Mobile crawl.
If your site is delivered in a format specific to mobile devices such as cHTML
or WML, Google will still index your content. Look for issues indexing your
mobile content here.
Statistics
Statistics is perhaps the most interesting of the Webmaster Tools (see
FIGURE 10.7). It contains a series of subsections that show very interesting
data about what keywords are sending people your way, what Google sees
when it crawls your site, index stats, crawl stats, and stats on how many peo-
ple are subscribing to your RSS feeds. Its a lot of information, all of which can
really open your eyes to Googles perspective of your site.
FIGURE 10.7 The Statis-
tics section of Googles
Webmaster Tools provides
plenty of detailed informa-
tion about your site. You
can learn what content
Google perceives as impor-
tant on your pages, what
keywords or phrases are
generating traffic to your
site, and more.
Top Search Queries Top Search Queries is an especially interesting area
to explore. Not only can you learn what keywords and phrases are generating
traffic to your site, but you can also see what your ranking is for each query.
Although its interesting to discover in this data the keywords youve targeted
in your site, its even more interesting to discover those that youd never have
thought would generate any traffic. Youre likely to find some pretty bizarre
search phrases that are directing people to your site! Watch this area closely
for cues on what content you should expand or enhance on your site.
You can isolate and view segments of the data by geographic location, or
search type including blog, image, mobile, or Web.
What GoogIebot Sees In the What Googlebot Sees section you can view
Googles ranking of keywords and phrases in the content of your site. If you
12 cHApTeR 10
dont see the keywords you originally targeted at the top of the list, then youll
need to revisit the keyword strategies outlined in chapter 2, Markup Strate-
gies, and chapter 4, creating content that drives Traffic, to try to improve
their prominence in your pages.
even more intriguing in this section is the listing of keywords other people have
included in their links to your site. Its rumored that this is one of the most sig-
nificant factors that Google weighs when attempting to understand the content
of a site and assigning their proprietary pageRank. Unfortunately they dont
provide links here to the sites that have created inbound links to your site, but
they do in the next section of the Webmaster Tools. Well explore additional
methods of attaining information about inbound links later in this chapter.
The more correlation you can create between the top keywords in your site
and the top keywords in inbound link labels, the more success youll have
achieving top rankings for these words. controlling text in your site is easy,
but controlling it on other peoples sites is pretty tough. If you know the people
running sites that link to yours, you can always make a friendly request that
they rewrite their link label to include your target keywords.
further down the page in this section youll see a breakdown of the types of
content that Google has crawled on your site, including HTML, XML, pdf,
flash, plain text, and other formats.
CrawI Stats The crawl Stats subsection has less information than others,
but the data is equally informative. Here youll find the average Google page-
Rank of pages in your site, and your top-rated page for the past three months.
evaluate the page that ranks highest in your site to determine what qualities
make it stand out over the others. Usually the quantity and quality of inbound
links from reputable sites will determine which page ranks highest. As dis-
cussed in chapter 4, valuable content will usually elicit plenty of inbound links.
This is another important bit of information to monitor for ideas on what con-
tent is most valuable to your audience.
Index Stats The Index Stats section provides some links to run Google
search queries using search operators that will reveal information about your
site. You actually dont need to visit this section of Googles Webmaster Tools
to view this data. You can simply search for your URL preceded by any of the
following search operators in Googles search box to learn a little about how
theyve indexed your site:
n
site: example.comindexed pages in your site
n
allinurl: example.compages that refer to your sites URL
fRee SeARcH eNGINe TooLS ANd SeRvIceS 13
n
link: example.compages that link to your site
n
cache: example.comthe current cache of your site
n
info: example.cominformation Google has about your site
n
related: example.compages that are similar to your site
A word of warning: The link operator provides a pretty incomplete listing of all
inbound links to a site. dont panic if you see a surprisingly short list of results
when using this operator. The Links tool, which well take a look at shortly, pro-
vides a comprehensive list with very useful extended data. You can also use
the allinurl operator to see a more comprehensive list of inbound links.
Subscriber Stats The information provided in the Subscriber Stats sec-
tion is nice, but provide a very limited snapshot of the number of people who
have subscribed to your sites RSS feeds. Google only tracks subscriptions
within its own RSS aggregators. If users are subscribing to your feeds in Blog-
lines (http://bloglines.com), Netvibes (http://Netvibes.com), or any other
RSS aggregator besides those created by Google, you wont see this data
reflected here.
Take this information with a big grain of salt. In chapter 13, Analyzing Your
Traffic, well take a look at feedBurners (http://feedburner.com) subscrip-
tion statistics, which provide a more complete snapshot of how many people
are subscribing to and reading your feeds.
Links
The Links section provides really interesting data about what sites are linking
to yours, and which pages in your site are garnering the most inbound links
(see FIGURE 10.8).
FIGURE 10.8 Using the
Links tool you can identify
which pages in your site are
receiving the most inbound
links, and what sites are
linking to them.
14 cHApTeR 10
pages that are receiving many inbound links are obviously providing your audi-
ence with the type of content they find interesting. Watch this information
closely so you can determine which pages are popular, and worth expanding
to generate even more traffic.
The Links section also lists all of the pages in your site that have internal links
to other pages in your site. This is primarily useful to determine if you are
cross-linking enough to help circulate traffic through the site. The more traf-
fic circulation you can create, the longer your users will stay on your site, and
perhaps complete business objectives, like make a purchase or sign up for the
mailing list.
Sitemaps
earlier in this chapter you learned how to create a sitemap.xml file and sub-
mit it to search engines. providing search engines with a sitemap helps them
more intelligently index your content.
In the Sitemaps section of Googles Webmaster Tools you can post the URL
for your sitemap.xml file and monitor its status (see FIGURE 10.9). If youre
launching a new site its a good idea to post your sitemap here so you can
observe any parsing problems that Google might encounter.
FIGURE 10.9 With the
Sitemaps tool you can post
the URL for your sitemap
file and monitor its status.
Google will let you know if
it runs into trouble reading
your file.
Besides letting you know if it encounters errors parsing your sitemap file, the
Sitemaps tool also tells you when the file was last read, and how many URLs
were included. You can also provide a separate sitemap.xml file if you have a
mobile version of your site.
fRee SeARcH eNGINe TooLS ANd SeRvIceS 15
Tools
In the Tools section (see FIGURE 10.10), you can analyze your robots.txt
file, manage the site verification process you selected, set the rate at which
Google will crawl your site, define the preferred domain name format for your
site, and remove certain URLs from Googles index.
FIGURE 10.10 The Tools
section provides a host
of useful preferences and
utilities.
AnaIyze robots.txt You can identify parse errors in your robots.txt file
using the Analyze robots.txt tool. If Google has been to your site and found a
robots.txt file, youll see its content on the page here. This tool also lets you
enter URLs to pages on your site to test whether your robots.txt file will pre-
vent Google from indexing them.
Manage Site Verification If for some reason you need to get another look
at the meta tag or file name Google is using to verify that you are the owner of
the site being managed, youll find that information in the Manage Site verifica-
tion section. Unfortunately, theres no way to switch verification methods.
Set CrawI Rate You can keep tabs on the frequency and speed at which
Google is crawling your site in the Set crawl Rate section. Information such
as the average number of pages on your site Google indexes per day, the aver-
age number of kilobytes downloaded per day, and the time it takes Google to
load your pages can be found in this section.
Youll notice a dramatic drop in load times if you optimize your sites perfor-
mance as outlined in chapter 3, Server-Side Strategies. FIGURE 10.11 shows
a big change in the time it took Google to crawl my site before optimization
and after.
16 cHApTeR 10
If your site suddenly becomes extremely popular, and bandwidth is more of a
concern than keeping Googles index current, you can throttle back the index-
ing frequency of your site in this section.
Set Preferred Domain In chapter 3 you learned that when Google indexes
sites, it sees URLs with and without the preceding www as entirely different
sites. Because the URL http://www.mysite.com might have more inbound links
to it than http://mysite.com, Google might assign it a higher pageRank even
though these URLs go to the same site. This is called the Google canonical
problem.
In the Set preferred domain section, you can tell Google to choose one URL
format so it doesnt split your pageRank. chapter 3 provides another solution
to the problem that uses Apaches mod_rewrite module to remap all page
requests to a single, consolidated format. Its not a bad idea to do both to
cover your bases with Google and other search engines as well.
EnabIe Enhanced Image Search If youre oK with your images being
discovered via Googles image search, you can choose to enable enhanced
image search. Images can be located via their file names or alt text, but
Google has an even more ingenious approach that makes their image search
more accurate.
Google Image Labeler (http://images.google.com/imagelabeler/) is an
image identification project that presents volunteers with a random series
of images gleaned from sites that have enabled the enhanced image search
option. As volunteers view each image, they provide descriptive meta data that
Googles image search uses to generate exceptionally accurate results to que-
ries. Because image search is hugely popular, enabling this option can gener-
ate a lot of traffic to your site.
Remove URLs If Google has indexed content on your site youd rather
it didnt, you can submit a request to have it removed in the Remove URLs
FIGURE 10.11 After opti-
mizing my site, I saw an
approximately 90 percent
speed increase in Googles
indexing of my content.
Before optimization
After optimization
fRee SeARcH eNGINe TooLS ANd SeRvIceS 17
section. of course you can block indexing using robots.txt (see chap-
ter 3, Server-Side Strategies) or the noindex meta tag (see chapter 2,
Markup Strategies). But neither method will immediately remove a page
from Googles index. The content would only be removed from the index once
Google returns to re-index your site. If you need something removed immedi-
ately, the Remove URLs section is the place to do it.
Getting Info About Your Site with
Yahoo! Site ExpIorer
Site explorer is a free set of tools that provide insight into how Yahoo! is index-
ing your site (https://siteexplorer.search.yahoo.com/), and who is linking to
your pages. Site explorer is also where you would submit a sitemap.xml file to
Yahoo! when you launch a new website.
To use Site explorer youll need to create a Yahoo! account, if you dont
already have one, then authenticate your site following a similar process as
Googles Webmaster Tools. You can prove a site is yours by adding a special
meta tag to your home page or by uploading a file Yahoo! provides to your
servers Web root folder for authentication. once youve completed one of
these tasks you can initiate Yahoo! to verify that you are the owner of the site.
from the main control panel area called My Sites you can add URLs youd
like to explore, and keep tabs on your sites authentication status (see
FIGURE 10.12). You can let Yahoo! know the location of your sitemap.xml file in
the feeds section of Site explorer, which can be found by clicking the Manage
button next to your URL.
FIGURE 10.12 You can
manage and explore any
number of sites in Yahoo!s
Site Explorer.
Reproduced with permission of Yahoo! Inc. 2007 by Yahoo! Inc. YAHoo! and the YAHoo! logo are trademarks of Yahoo! Inc
18 cHApTeR 10
In the feeds section, simply enter the path to your sitemap file on your server
and Yahoo! will read it, then crawl your site (see FIGURE 10.13). Here you can
also define a sitemap for a mobile version of your site, if you have one. The
date and time Yahoo! last read and processed your sitemap file will be dis-
played here as well.
FIGURE 10.13 You can let
Yahoo! know the location
of your Web and mobile
sitemap.xml files in the
Feeds section of Site
Explorer.
Reproduced with permission of Yahoo! Inc. 2007 by Yahoo! Inc. YAHoo! and the YAHoo! logo are trademarks of Yahoo! Inc
FIGURE 10.14 Site Explorer
lists all pages in your site
that Yahoo! has indexed.
Here you can also view a
complete listing of sites that
link to yours.
Reproduced with permission of Yahoo! Inc. 2007 by Yahoo! Inc. YAHoo! and the YAHoo! logo are trademarks of Yahoo! Inc
The real heart of Site explorer is the explore tool, which you can access by
returning to the My Sites page, then clicking the explore button to the right
of your URL. Site explorer lists all of the pages that Yahoo! has indexed in
your site (see FIGURE 10.14). This is especially useful when you first launch
your site, as you can keep a close eye on what content has officially made it
into the Yahoo! index. You can also remove any page from Yahoo!s index by
clicking the delete URL/path button, which is visible when you hover over any
page record.
You can view a comprehensive list of sites that link to your pages by clicking
the link labeled Inlinks at the top of the page. This is important information, as
the more links from reputable sources your site receives the higher your page
fRee SeARcH eNGINe TooLS ANd SeRvIceS 19
rank will be. If youve asked friends, colleagues, and affiliates to create a link
on their site to yours, you can watch this section of Site explorer to see when
Yahoo! has noticed the new inbound links.
Some sites try to dishonestly create inbound links by blogging about and link-
ing to blog posts on high-ranking websites in order to create a trackback. As
explained in chapter 5, Building a findable Wordpress Blog, a trackback
is an automatically generated post excerpt that will be displayed under a blog
post when another blog links to that post. The trackback will usually include a
link to the site that generated it.
Some bloggers may write meaningless or unrelated posts with links to your
site so they can build their inbound links and search rankings dishonestly.
If you see inbound links to your site that look like spam when browsing Site
explorer, you can report them to Yahoo! by hovering over the record and click-
ing the Report Spam button.
Theres a lot to be learned about your site from Site explorer. Its a good idea
to monitor the pages indexed and inbound links to your site on a regular basis
to ensure your site is visible to your audience via Yahoo! search.

You might also like