Every website on theimagefile includes two files to guide search engines and other automated crawlers:
robots.txt tells search engines which files they're not allowed to index. We maintain this file. (It lists certain admin and shopping pages that should never be indexed.) If you don't ever want any search engines to visit your site, set your website to PRIVATE (see below) to forbid bots completely.
Your XML sitemap lists all public web pages. Without a sitemap, search engines must "crawl" all the links on a website to find all pages. This takes a long time, and some content is missed, while other content is duplicated. Providing a sitemap allows search engines to instantly list everything that's available, including what's new and what's important.
At Control Panel -> My Web Site -> Naming and Access -> Sitemaps and SEO, you choose:
BEST SEO — robots.txt allows search engines to visit, and automatically tells every bot about your XML sitemap. This is by far the easiest and best way to promote your website. It ensures that every search engine (Google, Yahoo, Bing, and all others) will know about all your pages, without you needing to setup complicated webmaster accounts with each.
LIMITED SEO — robots.txt still allows bots, but doesn't automatically tell them about your XML sitemap. Few websites will want this setting — but use it if publishing the sitemap is unacceptable for any reason.
PRIVATE — robots.txt file forbids all bots and does not link to your sitemap. Every web page will also include a noindex tag, to further restrict indexing. Use this for private, members-only, or test websites that should never appear in search engines.
What's included in your XML sitemap
Your sitemap will include everything listed at Edit Web Pages except pages that are soft-deleted; require passwords; or have robots = none|noindex meta tags. These are called "public pages".
Events and Galleries collections will be included (with their password automatically embedded in the URL) if these criteria are met:
First, the collection must be linked from a "public" Client Area Page with auto-login.
Second, the collection must have "Allow Bots" ticked.
Finally, the collection's Ask For Email setting must not require an email to visit (which blocks all bots).
For any collections meeting these requirements, all sub-collections will also be included in the sitemap. Any top-level EG collections meeting these requirements will be listed at the very top of your sitemap, so you can visually check it and tell at a glance which collections are publicised.
Your sitemap will include all Stock Library collections if you tick "Stock Library?" at Control Panel -> My Web Site -> Naming and Access. (If you have multiple websites, you should only show your stock on one of them.)
From the Naming and Access page, you can click to view your sitemap. Although the XML is meant for computers to read, you'll be able to see all the links, and hopefully have a clear idea of what is included and what's not. Top-level EG collections are listed first; then public web pages; followed by stock and subcollections. If you see anything there that should not be public, you can remove it. For web pages, set the robots=none meta tag. For E+G collections, uncheck the "Allow Bots" setting.
The sitemap assigns a relative importance to each web page. The maximum "1.0" is used for your HOME page, and "0.8" for all other public pages. "0.3" is used for Stock Library resources, and "0.1" for Events and Galleries.
What sitemap URL to give out
If you need to provide your sitemap URL, always give out this format:
(If you choose "BEST SEO", we'll publish a copy of your XML sitemap at /sitemap.xml, but that copy will disappear if you change to a different setting. The XML sitemap at /sitemap_auto.xml is always present and is the one you should register.)