Misplaced Pages talk:Link rot

This is an old revision of this page, as edited by Lexein (talk | contribs) at 02:10, 17 September 2012 (→Archive.is: new section). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Revision as of 02:10, 17 September 2012 by Lexein (talk | contribs) (→Archive.is: new section)(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

This is the talk page for discussing improvements to the Link rot page.

Put new text under old text. Click here to start a new topic.
New to Misplaced Pages? Welcome! Learn to edit; get help.

Archives: 1, 2, 3, 4, 5

Misplaced Pages essays Top‑impact

	This page is within the scope of WikiProject Misplaced Pages essays, a collaborative effort to organize and monitor the impact of Misplaced Pages essays. If you would like to participate, please visit the project page, where you can join the discussion. For a listing of essays see the essay directory.Misplaced Pages essaysWikipedia:WikiProject Misplaced Pages essaysTemplate:WikiProject Misplaced Pages essaysWikiProject Misplaced Pages essays
Top	This page has been rated as Top-impact on the project's impact scale.
	The above rating was automatically assessed using data on pageviews, watchers, and incoming links.

Misplaced Pages Help Project‑class

	This page is within the scope of the Misplaced Pages Help Project, a collaborative effort to improve Misplaced Pages's help documentation for readers and contributors. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks. To browse help related resources see the Help Menu or Help Directory. Or ask for help on your talk page and a volunteer will visit you there.Misplaced Pages HelpWikipedia:Help ProjectTemplate:Misplaced Pages Help ProjectHelp
Project	This page does not require a rating on the project's quality scale.

Archives

Archive 1 (2005-2007)
Archive 2 (2008-2009)

Internet Archive

The Internet Archive doesn't seem to have archived anything since about August 2008. What does this mean for dead links that should have been archived since then? AnemoneProjectors (talk) 14:36, 23 January 2010 (UTC)

See Wayback_Machine#Growth_and_storage: "Snapshots become available 6 to 18 months after they are archived." -- Quiddity (talk) 20:55, 23 January 2010 (UTC)

removing a dead link?

if I fina a dead link and I don't feel like fixing it is it cool to remove it, esp if I think the claim it supported was kinda retarded anyway? --n-dimensional §кakkl€ 18:52, 26 January 2010 (UTC)

uh... not with reasoning like that, no. 'kinda retarded' does not qualify as an objective assessment of the merits of the link, since other editors can easily say 'it aint so retarded' - an equally valid statement without any further evidence. if you're just fixing linkrot, fix the link or flag it for others; if you want to get involved with content editing (to remove 'retarded' content) go ahead and do it explicitly as an edit; don't call it a linkrot fix. --Ludwigs2 20:25, 26 January 2010 (UTC)

Tag it with a {{deadlink}}.--Blargh29 (talk) 22:17, 26 January 2010 (UTC)

Dead link vs. linkrot

What's the difference, exactly? 85.76.80.10 (talk) 20:56, 30 January 2010 (UTC)

There isn't one. "Linkrot" is a term used to describe the phenomenon of good links going dead over time. --ThaddeusB (talk) 04:40, 4 February 2010 (UTC)

Linkrot and sustained notability

If an article is at first supported by a series of links to establish notability, and 100% of those links go bad, does that mean that in some cases the subject of the article can be considered not notable and the article be deleted? Sebwite (talk) 16:21, 18 February 2010 (UTC)

It shouldn't happen. Notability is forever, even if all of the links go bad. That's why it's probably a good idea to use a citation template, so that there is plenty of documentation about the former link. Also, check out WP:OFFLINE, which would apply to dead links. --Blargh29 (talk) 19:16, 18 February 2010 (UTC)

I have seen articles get put up for deletion on the basis that all the links have gone bad, and the noms use the WP:PROVEIT argument to support their cause, while those who support keeping cannot prove it. Those favoring deletion do not buy the WP:OFFLINE argument in these cases. Sebwite (talk) 15:23, 23 February 2010 (UTC)

Editors delete content all the time based on dead links. The Orwellian memory hole lives, and it lives here in the Misplaced Pages. I think it is a huge problem. --Marcwiki9 (talk) 23:18, 4 March 2010 (UTC)

Archiving British web pages

The following Wired article explores some of the problems regarding archiving British web pages: Archiving Britain's web: The legal nightmare explored These problems affect the strategy used here. Squideshi (talk) 19:23, 6 March 2010 (UTC)

Does it though? A web archiving service acts on the laws of its resident country, not on those of the site it is archiving (as I understand the law). So archive.org and WebCite are fine, and that is their concern anyway; only if "we" (the WMF) were to set up our own archiving server in the UK would "we" be affected (as I understand it). - Jarry1250 11:22, 7 March 2010 (UTC)

That is not how I understand it. In fact, the article itself mentions that this is a problem for organizations like the Internet Archive, which hosts the Wayback Machine. It affects us because, as part of our strategy, we specifically recommend using tools, such as the Wayback Machine, which are affected by this law. I'm not asking for a change in the article--I just wanted to make people aware that the Wayback Machine isn't a magic bullet in the effort to help stave off linkrot. Squideshi (talk) 21:28, 8 March 2010 (UTC)

External links are not references

Just to explain my recent changes:

You should (almost) never remove this:

==References==
* Long dead reference

You should cheerfully remove this:

==External links==
* Calculator that you'd think was cool, except it no longer exists

It is not possible to justify a dead "External link" under the External links guideline. WhatamIdoing (talk) 18:32, 18 March 2010 (UTC)

You are correct. However, caution should certainly be used since inexperienced users often put stuff they've used a reference under "External links". --ThaddeusB (talk) 20:30, 29 March 2010 (UTC)

Besides this, it seems to me that an archived copy of an external link may well be a good replacement for the original (as it is for a reference), so linking to such a copy (if available) is preferable to simply removing the link. JudahH (talk) 16:30, 3 January 2012 (UTC)

linkrot vs. stability, e.g. News Corp vs. Fairfax in Australia

I've noticed that links to many articles published by News Corp in Australia are especially susceptible to linkrot, whereas links to articles in the Fairfax papers, The Age and the SMH, are quite solid. If there were enough evidence to support my statement, would WP ever have a guideline such as "Use paper X, Y, Z, if possible, instead of P, Q, as these are less susceptible to linkrot?" cojoco (talk) 20:52, 1 April 2010 (UTC)

We do advise against using Yahoo news stories (which typically decay within weeks), so it is certainly possible. --ThaddeusB (talk) 02:01, 11 April 2010 (UTC)

Of course, nobody reads the directions, so I wouldn't get my hopes up, but you're certainly welcome to include the advice. WhatamIdoing (talk) 03:44, 11 April 2010 (UTC)

Archiving every reference?

Is it suggested that we should archive every reference used in our articles? I see there's a WebCiteBOT, but I've never seen it in action, and certainly not on any article I've worked on. I just recently lost a very important reference and I'm still trying to work on finding a fix (contacting the editors, etc.). This was a great lesson to me about link rot, but now I'm wondering if I'm supposed to archive every reference I use? – Kerαunoςcopia_galaxies 20:48, 9 June 2010 (UTC)

Quite simply put WebCite cannot handle the volume that Misplaced Pages provides, even the small run of 10-50 PDFs a night by Checklinks seems to be contributing to the problem. — Dispenser 22:15, 9 June 2010 (UTC)

That I suppose would explain the bot, but what about manual submissions to the archive? Should I just archive references as I see fit? WayBack's six-month lag seems to be a bit of a long wait considering some website pages disappear in only a few weeks. – Kerαunoςcopia_galaxies 22:18, 9 June 2010 (UTC)

Is this still the case? It could affect issue PYWP-18. — Jeff G. ツ (talk) 04:24, 29 February 2012 (UTC)

Impossible archiving

Some cited sources use various forms of presentation, including streaming audio (sometimes integrated within a written interview), streaming video, and, especially in the case of Billboard's website, flash or some similar method of loading articles. These sites can't be archived at all. Without transcripts published elsewhere, these sites seem to me to be absolutely vulnerable to link rot. – Kerαunoςcopia_galaxies 19:04, 12 June 2010 (UTC)

dafuq? impossible to archive a .mov file? or a .mp4 file? or a .swf file? this is usually no problem... (although most search engines CHOOSE not to do it, but its entirely optional.) 88.88.102.239 (talk) 21:13, 2 May 2012 (UTC)

Link Rescue Bots

Two new bots have just been approved to find archives for dead links. User:DASHBot, the first one, is written and opperated by User:Tim1357. It has gone through all the featured articles, and has made a large dent in the good articles. However, due to some small technical difficulties, it is down for the moment. User: H3llBot is written and operated by User: H3llkn0wz. It does pretty much the same thing. As the two bots finish up the Featured articles and the Good articles i think we will do articles by request. Any ideas of which articles we could let the bots run on next? (Categories are good) Tim1357 17:12, 15 June 2010 (UTC)

I'd say A-Class articles and then all Vital articles that are B-class and below, that is if the bot is able to make that distinction. -- œ 02:25, 21 July 2010 (UTC)

blogs.nzherald.co.nz

URLs http://blogs.nzherald.co.nz will cease 301 redirecting to URLs on http://www.nzherald.co.nz shortly. Checking my logs I note that a few articles have references/links to articles on blogs.nzherald.co.nz ( such as Gordon Ramsay ). These should be updated as soon as possible. The equivalent articles should still exist but will be harder to find after the redirect is gone. Could somebody please inform a bot operator. I have no idea how many links are in place. - NZH Admin —Preceding unsigned comment added by 203.99.66.20 (talk) 03:46, 10 August 2010 (UTC)

Web Link Checking Bot

Hi, I'm currently running a bot on my server against Misplaced Pages to check the external links, using pywikipediabot and the included weblinkchecker.py script. What this bot does is scan the contents of articles for external links, and then proceeds to check the links for 404s or timeouts, and creates a datafile of the non-working links. After about one week, the bot will then recheck the links, and report on the talk pages of the articles which links are dead, according to the data that the bot collected. In the report submitted, the bot will automagically suggest a link to archive.org, which if it was caught, should be a valid archived version of the link. The reason for my post here is to request input from the community, per the suggestion of Tim1357 in this thread. I am watching both this page, and the BRFA thread, so commenting at either location is ok, and your input is greatly appreciated. Thanks, Phuzion (talk) 14:34, 17 August 2010 (UTC)

On dewiki we decided that at minimum 4 weeks delay and 3 tests are required because many links are back online after 2-3 weeks after changing hosting service. But the script on repository has some bugs you should care about. You could test the script this page:

which report errors on all four links above. Merl issimo 16:13, 17 August 2010 (UTC)

Thanks for the input. Do you know if there is an updated version of the script that has the bugs fixed? Phuzion (talk) 16:45, 17 August 2010 (UTC)

No, i never used this script. I only know the reponse from dewiki where we have a template which can be used by users for marking failed dead link bot reports. Merl issimo 17:25, 17 August 2010 (UTC)

What bugs are meant by "the script on repository has some bugs you should care about."? — Jeff G. ツ (talk) 04:57, 17 January 2012 (UTC)

How can I help? I'm interested in helping with any automated deadlink detection/mitigation. Since archive.org stopped archiving as of late 2008, checking it is necessary, but not sufficient. Automated checking of, and pre-emptive archiving with, Webcitation is needed, IMHO (or other service, especially for pages poorly captured by Webcitation - conditionals, Javascript, AJAX, etc have problems). I'm in favor of an on-demand full-rendered-web-page screengrab service, or an as-rendered-html+CSS-only service if one exists - these seem to be the only way to simultaneously guarantee pixel accuracy and actual content presence. Of course, respecting robots.txt. --Lexein (talk) 01:35, 12 September 2010 (UTC)

We mostly need people filling out references. Currently Reflinks is probably the best in filling out references, but I haven't updated it with the feedback/learning mechanisms and the WebCite interface is a bit hard to use. You can also use Checklinks to semi-automatically fix links. — Dispenser 22:37, 12 September 2010 (UTC)

I know and use those tools frequently, but I would certainly participate in revising and betatesting semi-auto tools which help as well. --Lexein (talk) 23:18, 13 September 2010 (UTC)

I have a proposal in for such a bot, and could use some responses at m:Talk:Pywikipediabot/weblinkchecker.py##Questions_from_BRFAs_and_elsewhere_on_English_Wikipedia. — Jeff G. ツ 03:02, 23 March 2011 (UTC)

My request for responses linked above has moved here. — Jeff G. ツ (talk) 20:57, 21 January 2012 (UTC)

Solution against the broken external_links: backup the Internet

Please find the concept description on the Village Pump. JackPotte (talk) 09:53, 3 September 2010 (UTC)

Marking a dead link within a citation template

How is one to mark a dead link within a citation template, e.g.:

"Gujrat Police official website, Standard Operating Procedures" (PDF). Retrieved 2009-03-08.

I did a hack by adding |publisher={{Dead link}} into the template, but that may not be the preferred way to do this. __meco (talk) 16:33, 5 September 2010 (UTC)

It's better not to do so, but rather follow the }} with {{dead link|date=August 2010}}.

"Gujrat Police official website, Standard Operating Procedures" (PDF). Retrieved 2009-03-08.

Yes, it seems to look odd, but I believe it's best practice for "deadlink" to always appear as the last text on a citation or link line. Of course, make an attempt to repair with Checklinks, too... --Lexein (talk) 18:51, 5 September 2010 (UTC)

All links eventually go bad

I think that in the fullness of time, on geologic time scales, all links will go bad. This is simply because those who sponsor such web pages will ultimately die off. Web servers will be lost in fires and floods. Misplaced Pages administration needs to recognize this reality. The future expansion of Virtual Servers with NO PERSISTENT STATES will only make this worse. Please see Amazon Virtual Private Cloud. There are many Misplaced Pages editors who delete content that has a dead link, and use WP:proveit to make a point. Most editors are too lazy to go to the library to verify older information, and just delete things. It is hard to maintain "presumption of good faith" when undereducated editors are denying a lot of history. Look at this example: Misplaced Pages:Articles_for_deletion/Event_Driven_Language. We can see that Beeblebrox, by all accounts a good wikipedian, justified a delete because the Library was too far away. Misplaced Pages should not exist at the convenience of the editors, but should exist in the service of truth. Perhaps there can be some kind of "grandfathering" clause on links. Perhaps, I would suggest, that if a link exists for a long enough period of time, that the standard of proof should shift from the creators/maintainers to those who would delete. In other words, if the link was there for a number of years, and then it rotted, then the link would be "presumed valid" instead of the present case, where is seems to be presumed a fabrication of someone's imagination. This way, the content in Misplaced Pages could age gracefully, becoming more authoritative as it got older. This feels more proper to me. This would be a good alternative to the present case where good content is deleted willy nilly by those who would deny history, simply because it is hard to verify. — Preceding unsigned comment added by Marcwiki9 (talk • contribs) 03:30, 20 December 2010 (UTC)

You seem to have declared everyone's opinion on a single incident. The closing administrators should be experienced enough to separate valid reasons from invalid reasons. The content was not lost, it was merged. Verifiability is a principle of Misplaced Pages, and the reader cannot verify the material if the website rotted years ago. That's why we have this page. Given you posted here, is there an actual change/removal/addition you propose to this guideline? The "more authoritative as it gets older" will in my opinion not pass. — HELLKNOWZ ▎TALK 10:55, 20 December 2010 (UTC)

I don't mean to impugn everyone. What I am proposing is not a reduction in verifiability. Misplaced Pages must remain verifiable, of course. But the system we have now is that overzealous and undereducated editors will deny history, simply because the links have rotted. They are too lazy to verify content, so they delete it. They do it because "the library is 250 miles away", and they cannot just pop over there. I am making the suggestion that this is wrong and bad. Misplaced Pages ought to do something about the very long term problem of rotted links, because all links eventually will rot. WP:linkrot seems to show this as an accellerating problem. As links rot through distant time scales, under the present system, the whole of wikipedia will have to be slowly rewritten. I think this is revisionist history, and it is objectionable to me. It can lead to history being manipulated by those who control search engines. Of course, you all might think I'm wrong. Whatever. I intend it only as food for thought. I am not declaring everyone's opinion on a single incident. I see a pattern here of editors denying history and deleting content, simply because they see the verification as too much work. I see it all the time. It is as if the orwellian memory hole lives. Editors will chuck all content without a valid link, even if the link was good in the past. They do this despite the wikipedia policies expressly forbidding it. --Marcwiki9 (talk) 02:51, 21 December 2010 (UTC)— Preceding unsigned comment added by Marcwiki9 (talk • contribs) 02:40, 21 December 2010 (UTC)

If their actions are against policy, then their edits should be reverted. If their good-faith edits are against policy or guidelines, then they should be educated. If they remove previously undisputed content because a link is bad, they should be informed not to do this. I don't see what solution you propose for the hyperbolic problems you are describing. Misplaced Pages has a strong bias towards electronic sourcing, because frankly websites are easy to access without driving 150 miles to the library. As far as actual record of history is concerned, there is much much written material elsewhere that doesn't "linkrot". — HELLKNOWZ ▎TALK 10:25, 21 December 2010 (UTC)

So, my thoughts are meaningless drivel? To be chucked into the ether? No, the problem is much worse than you are even able to comprehend. You're unshakable defense of the status quo blinds you to even see that there is a problem, much less forge a solution. You admit there is a bias, but yet, fail to point to any solution at all. And when one is put forward as food for thought, not a serious proposal, you dismiss it as hyperbole. And then you make the astonishing claim that Misplaced Pages doesn't matter, because the "actual record of history" lies elsewhere. I guess that Misplaced Pages will overcome all of these problems someday. I was just trying to help.--Marcwiki9 (talk) 00:52, 22 July 2011 (UTC)

It seems you have misinterpreted every sentence I said to the level of personal remarks. Personally, having run a bot that tags and replaces thousands of dead links, I do not see a need to explain my stance or motivation if my replies are misinterpreted anyway. — HELLKNOWZ ▎TALK 07:32, 22 July 2011 (UTC)

Solving link rot problem

We are working to solve the link rot problem here. We would like everybody to voice there concerns here. Thanks - Hydroxonium (H₃O) 14:25, 6 February 2011 (UTC)

Conflict between guidelines

This guideline and WP:DEADREF give conflicting advice about dealing with dead links used to support article content. Please join the conversation at WT:Citing sources#Question_regarding_.22Preventing_and_repairing_dead_links.22. WhatamIdoing (talk) 22:12, 17 February 2011 (UTC)

The lengthy conversation has closed, and I have updated the advice at WP:DEADREF. If anyone wants to check over this page and improve its contents, please feel free. WhatamIdoing (talk) 19:43, 28 March 2011 (UTC)

Proposal for new WikiProject to repair dead links

Just a notice for anyone who's interested. Misplaced Pages:WikiProject Council/Proposals/Dead Link Repair. -- œ 06:39, 20 April 2011 (UTC)

A new WebCiteBOT

Hi all. I'm working in a new WebCiteBOT. I have opened a request for approval. It is free software and written in Python. I hope we can work together on this. Archiving regards. emijrp (talk) 17:15, 21 April 2011 (UTC)

RfC to add dead url parameter for citations

A relevant RfC is in progress at Misplaced Pages:Requests for comment/Dead url parameter for citations. Your comments are welcome, thanks! — HELLKNOWZ ▎TALK 10:49, 21 May 2011 (UTC)

Simple answer

Use more print references...

Obvious really. Misplaced Pages is a joke if it leans too heavily on the web alone.--MacRusgail (talk) 16:32, 10 August 2011 (UTC)

If only more people were aware of the fact that references don't have to be online.. we should promote WP:Offline sources more.. -- œ 15:58, 16 August 2011 (UTC)

But they're so eeeeeeasy! But seriously, in practice, there's a balance to be struck. Some editors such as Cirt have created articles which are fantastically sourced, but completely offline, leaving out all convenience links. I don't know why; it may be due the research tools he uses, which, though deep, are not at all accessible to non-subscribers. Very annoying.

Over at WP:AN/I I finally twigged to Bare link rot harms verifiability. Seems I don't care so much if a link rots if it has been properly, verifiably expanded. --Lexein (talk) 17:23, 16 August 2011 (UTC)

Extension:ArchiveLinks

http://www.mediawiki.org/Extension:ArchiveLinks

Is it possible to ask WMF to enable (maybe also finish) this wonderful extension? Bulwersator (talk) 10:20, 10 January 2012 (UTC)

Incompatibility with Misplaced Pages:Citing sources#Preventing and repairing dead links (even if that is linked here)

This page (Misplaced Pages:Link rot) states in its lead section that "These strategies should be implemented in accordance with Misplaced Pages:Citing sources#Preventing and repairing dead links, which describes the steps to take when a link cannot be repaired."

But how can we do in accordance with Misplaced Pages:Citing sources#Preventing and repairing dead links if some sentence in this page's lead section (for example "Do not delete factual information solely because the URL to the source does not work any longer. WP:Verifiability does not require that all information be supported by a working link, nor does it require the source to be published online.

Except for URLs in the External links section that have not been used to support any article content, do not delete a URL solely because the URL does not work any longer. Recovery and repair options and tools are available.") and the whole "Keeping dead links" section are incompatible with that page?

Does explicit instruction to "implement in accordance with Misplaced Pages:Citing sources#Preventing and repairing dead links" means that that page is predominat? --79.17.150.185 (talk) 22:25, 8 February 2012 (UTC)

Archive.is

I think we should go slow on advocating http://archive.is. The field is littered with defunct archive sites - just look at this article history. Archive.is looks good, very good in fact, and its performance and coverage of essentially all used sources is very encouraging. But IMHO Misplaced Pages can't afford to depend on a brand new site which so far, discloses no public information about its funding, affiliation, or future. I have communicated with the owner, and I am confident the owner is acting in good faith, but it's a solo effort. I'd like to see if the site is here in a year. In the meantime, I would like to advocate using WebCite in parallel with Archive.is, meaning at least archiving at WebCitation, if not citing in ref. I hope this is received as a sensible precaution, in the best interest of Misplaced Pages's future source verifiability. --Lexein (talk) 02:10, 17 September 2012 (UTC)

Categories: