As part of setting up Enterprise Search for a major IT company, I had to create content sources for a number of web sites hosted on the company’s intranet. Pretty easy you say, well yes. But the problems arose when MOSS began crawling the websites.
There were more Crawl Errors in the crawl log than Crawl Successes. Seems that the web applications MOSS was crawling were not being maintained properly and there were loads of broken links. Naturally, I pointed this fact out to the administrators and naturally they asked me for the list of broken links, etc. Well, if there were only a couple of errors I could easily copy and paste the offending URLs, but these sites had loads and I couldn’t find an “export log” feature, drat!!!
Right, so the only option was to cook up a little C# winform application to export the crawl log. Easier said than done :) Other than for MSDN I couldn’t find anything else online to get me started, so, had to use Reflector quite a bit to figure out the internal workings of the assemblies, but managed to get something going….
I’ve uploaded the complied application on ProjectDistributor and looking to soon put it up on CodePlex.
As of now the log can be exported to only CSV format and there is no threading, so don’t freak out when the app starts hanging. I’ll be refactoring the code and adding more export options once I get the time…
Get the application from here:
http://www.projectdistributor.net/Projects/Project.aspx?projectId=259