[Faccus] New campus web search: update

Pat Lafranier pllafranier at uwaterloo.ca
Thu Sep 27 09:36:21 EDT 2012


Good morning!

The Google Search Appliance (GSA) is now in final testing stage, with positive feedback from campus IT staff.

The initial crawl of our sites produced over a million pages! Some of these pages were duplicates so we have now adjusted the settings.

We are pleased with the results so far but need to continue working on one issue:  refinement of the 'People' search. We expect a solution shortly.

In the meantime, please review the previous communication especially the section on "Do you have webpages you don't want searched?" because once the GSA goes 'live' the search results will be public to the world.

We look forward to replacing our current web search with the GSA on Tuesday, October 2nd. Comments/concerns can be forwarded as noted in the original announcement below.

Kris Olafson
WCMS Technical Lead
Information Systems and Technology


-- previous communication, August 2012 -

In order to improve the quality of web content searches on campus, the University is acquiring a Google Search Appliance<http://www.google.com/enterprise/search/products_gsa.html> (GSA).  The search appliance will be located on campus and will index the uWaterloo web space by crawling public facing pages.  After the search appliance is deployed, it will eventually replace the current search at search.uwaterloo.ca<http://search.uwaterloo.ca/>.

Deployment?
Early fall.  More communication will occur before the date.

In preparation, you may have a few questions, such as...
*         If our website has been migrated to the central WCMS, do we need to change anything?  No.
*         We haven't migrated to the WCMS yet, do we need to make a change to our UW CLF Dreamweaver template and its files to use the new search?  No, the query will be redirected to the new search.

Do you have webpages you don't want searched?
Pages will not be indexed if they are password protected<https://cas.uwaterloo.ca/docs/> or have a suitable robots.txt file<http://en.wikipedia.org/wiki/Robots_exclusion_standard>.  If you have been deliberately hiding content by only allowing on-campus IP addresses to access your site, the search appliance may index your content and make it available to the outside world through page previews.  If you have such a site, it is recommended that you password protect it if possible, or add a robots.txt file that indicates it should not be crawled.  If this is not an option, please send email to request at uwaterloo.ca<mailto:request at uwaterloo.ca> so we can arrange to have your content excluded from crawling by some other means.

Comments or concerns?
Email request at uwaterloo.ca<mailto:request at uwaterloo.ca> and someone from the WCMS team will be happy to chat with you.

Kris Olafson
WCMS Team Technical Lead
Information Systems and Technology


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.uwaterloo.ca/pipermail/faccus/attachments/20120927/f5879240/attachment.html>


More information about the Faccus mailing list