Our wiki guide has been around for a few years now and of course as time passes, some of the information starts to become out of date. I created a script today that is designed to help address this problem. Basically it keeps track of all the outbound links in the guide (those pointing to other sites) and then checks them one by one to see if the page in question still exists.
We have about 36,000 links in our guide, so as you can imagine - this is not something a human would really want to do
I'm logging all the broken links found and have created a page that lists them all. But I still would rather that the fixing of these links is done by humans. There's several ways it can be done; either by just removing the link entirely (say an airline that doesn't exist any more) or by replacing it with the correct link. Either way, it's not really something that can be scripted to be done automatically.
So anyway, all that to say -- if anyone is interested in helping with this task, then we would greatly appreciate it
The page to see all the broken links is here. You have to be logged in to access it.
Thanks in advance to any of you kind souls who fix even a handful of these
Some way to indicate "fixed" would be greatly appreciated. The first link I just wanted to fix (kinderdijk) had already been taken care of (though I can at least still redirect it to the English version of the site).
(Maybe just a column with "date of last edit" or something similar?)
[ Edit: Edited on 09-May-2013, at 04:46 by Sander ]
Might also be worth sending in requests for big batches to be done straight in the database. I don't know how many pages link to http://www.indiapost.gov.in/Index.html - but it's many indeed, and manually changing them all to http://www.indiapost.gov.in/ seems somewhat wasteful when it's a few seconds to write the SQL query for that.
Ok, date of last edit is certainly the easiest, so I'll do that straight up.
I also am seeing there's a strange bug in that it's reporting links on pages that don't actually have an (Washington D.C. namely).
There's two other big batches which would be good to be replaced automatically, namely http://www.canadapost.ca/splash.asp (replace with http://www.canadapost.ca/ ) and http://www.postdanmark.dk/contentfull.dk?lang=en (replace with http://www.postdanmark.dk/en/ ). Everything else happens in such small batches that it can just be done manually.
[ Edit: Edited on 09-May-2013, at 05:01 by Sander ]
Yeah, there's a lot of postal related ones. Really, would be nice to have some sort of "include" system for those often repeated blocks of content - but that's a task for another day. So yes, happy to take requests for straight batch updates into the database.
On the topic of seeing what's done -- I've made a fix so that now after you edit an article it does a rescan of the links on the page and removes any that aren't there any more.
These three major ones (India, Denmark, Canada) have all been done for now. And one other change - I've also made it sort by article, so that all the broken links are grouped and it's easier to fix all the broken links in one article at the same time.
[ Edit: addition ]
Some other things to make this easier
1. There's now an "Update URL" link next to each link where you can just enter the new correct url. Useful if it's just a relocation of that page and not a domain gone entirely missing for example. When you update a link this way, it will update it in ALL articles at once.
2. There's a count of the number of pages with the bad link in it. You can click on that to only see those articles.
I've reduced the number of broken links down from over 2000 to about 1300 now. Probably going to be more difficult as it progresses since there won't be so many that appear on 30 separate pages, etc.
Oh, that's a big improvement!
(Makes it much easier to bring the total character count of my wiki edits down even further!) :D
One bug: Broken links in deleted articles are also listed.
Russland listed for this broken aeroflot link. (Not going to fix that in other articles so as to not risk the bug going away, but once you've seen and fixed it, the corrected link should be http://www.aeroflot.ru/cms/en )