Misplaced Pages talk:Link rot

This is an old revision of this page, as edited by Jayaguru-Shishya (talk | contribs) at 23:14, 8 December 2021 (→Keeping dead links: True. I should have clarified better that I am referring to web-only pages, not to the printed ones. However, the current help page gives an erroneous idea that even some permanently dead, low-quality web pages may be kept in an article — thus preventing further tagging for <nowiki>{{citation needed}}</nowiki>, and the eventual removal of unreferenced material — just on the grounds of being tagged by <nowiki>{{dead link}}</nowiki>.). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Revision as of 23:14, 8 December 2021 by Jayaguru-Shishya (talk | contribs) (→Keeping dead links: True. I should have clarified better that I am referring to web-only pages, not to the printed ones. However, the current help page gives an erroneous idea that even some permanently dead, low-quality web pages may be kept in an article — thus preventing further tagging for <nowiki>{{citation needed}}</nowiki>, and the eventual removal of unreferenced material — just on the grounds of being tagged by <nowiki>{{dead link}}</nowiki>.)(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

This is the talk page for discussing improvements to the Link rot page.

Put new text under old text. Click here to start a new topic.
New to Misplaced Pages? Welcome! Learn to edit; get help.

Misplaced Pages essays Top‑impact

	This page is within the scope of WikiProject Misplaced Pages essays, a collaborative effort to organize and monitor the impact of Misplaced Pages essays. If you would like to participate, please visit the project page, where you can join the discussion. For a listing of essays see the essay directory.Misplaced Pages essaysWikipedia:WikiProject Misplaced Pages essaysTemplate:WikiProject Misplaced Pages essaysWikiProject Misplaced Pages essays
Top	This page has been rated as Top-impact on the project's impact scale.
	The above rating was automatically assessed using data on pageviews, watchers, and incoming links.

Misplaced Pages Help B‑class Mid‑importance

	This page is within the scope of the Misplaced Pages Help Project, a collaborative effort to improve Misplaced Pages's help documentation for readers and contributors. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks. To browse help related resources see the Help Menu or Help Directory. Or ask for help on your talk page and a volunteer will visit you there.Misplaced Pages HelpWikipedia:Help ProjectTemplate:Misplaced Pages Help ProjectHelp
B	This page does not require a rating on the project's quality scale.
Mid	This page has been rated as Mid-importance on the project's importance scale.

Archives

Archive 1 (2005-2007)
Archive 2 (2008-2009)
Archive 3 (2010-2014)
Archive 4 (2015-)

Shouldn't there be a section on "Signal a dead / rotten link" and put the code in the Advanced Edit Menu?

Imagine a user finds one. Isn't it already great if the user signals it, like leaves a code next to the dead link? I saw some code left by someone, but now I can't find the code anymore. Maybe ? And wouldn't it make sense to put the code in the Advanced Edit Menu? Thy --SvenAERTS (talk) 13:21, 31 January 2016 (UTC)

Archiving youtube videos

How exactly can we go about archiving youtube videos to prevent link rot? I tried putting some urls to youtube videos into the Internet Archive (which I saw being done elsewhere on Misplaced Pages), but it seems that when you access the archive link, the video refuses to play.

Is there any other way to archive youtube videos? 8bitW (talk) 23:47, 8 February 2016 (UTC)

Have you tried archive.is? They tend to be able to archive difficult pages that Wayback has trouble with. For others to try in this list Misplaced Pages:List of archives on Misplaced Pages. -- GreenC 17:01, 16 March 2017 (UTC)

Edit request

This edit request has been answered. Set the |answered= or |ans= parameter to no to reactivate your request.

Near the bottom of the "Web archive services" section, please change the word "javascript" to "JavaScript" (and "flash" to "Flash"). Thanks!211.100.57.47 (talk) 14:12, 19 March 2016 (UTC)

Done - Arjayay (talk) 15:31, 19 March 2016 (UTC)

Should archiving sources be mandatory?

After all, it is sometimes impossible to undo link rot and it is much easier to just archive the damn source to begin with. If a link rots then that nullifies adding it in the first place. Rovingrobert (talk) 07:47, 5 May 2016 (UTC)

Wayback automatically archives every external link added to Misplaced Pages. Adding the archive link to the page is done by IaBot as of 2016. -- GreenC 16:36, 13 December 2016 (UTC)

@GreenC: That's not true. I've added archives through the bot before that has tagged some links as dead with no archive available. Samurai Kung fu Cowboy (talk) 16:34, 7 February 2021 (UTC)

Hopefully, the bot is accurately reporting finding no archive snapshot. No archive could happen even if archives are there at first (e.g. due to later take down due to copyright claims). Somewhat similarly, I have encountered such massive, editor-generated bot runs that do find dead links and, to the best of my recollection, supply appropriate links. That could be done without adding links to articles when the original links are determined to be live. To the extent that such bot runs check the archive for the existence of appropriate snapshots and place links in citations when the original links die, that is good, and could be accomplished without adding links when the original links are still live. Dhtwiki (talk) 21:04, 7 February 2021 (UTC)

archive.is?

It seems ye old Archive.is has been added to the link blacklist. Why? User:jjdavis699 16:40, 18 May 2016 (UTC) — Preceding unsigned comment added by Jjdavis699 (talk • contribs)

No longer blacklisted. WP:Using archive.is -- GreenC 16:38, 13 December 2016 (UTC)

I have thus updated info here. Only today I wasted time using Webcite, thinking that archive.is is still banned here. Zezen (talk) 11:21, 24 August 2017 (UTC)

How very safe is this archive.is? Searching for an URL takes me to a suspicious-looking website, saying: One more step: Please complete the security check to access archive.md, very similar to those that can be seen on the dark side of the Internet... :-o Jayaguru-Shishya (talk) 19:51, 8 December 2021 (UTC)

It's just a captcha, lots of sites use them when dealing with denial of service and other abuse that can happen. -- GreenC 20:03, 8 December 2021 (UTC)

Thanks, @GreenC:. I am just a bit cautious, because there are a lot of fake-CAPTCHA's out there (and this one looks exactly like one). I might try the link once I am stationed at my Linux laptop. :-) Cheers! Jayaguru-Shishya (talk) 20:47, 8 December 2021 (UTC)

archive.today (ie. archive.is, .md, etc) is the second-largest archive provider on Misplaced Pages behind Wayback Machine. Should be OK. -- GreenC 21:12, 8 December 2021 (UTC)

New problem with Archive.org

It appears that archive.org is now implementing a beta version that will be very significant for Misplaced Pages. I'm not sure where to discuss this, so please let me know if I should bring it up somewhere besides this talk page. Or maybe someone has already brought it up.

Apparently, we can only search now at archive.org for main pages, but not sub-pages. At the same time, archive.org seems to be offering permanent links for any sub-page that we want.

It therefore might be wise for a bot to replace every external link at Misplaced Pages with an archive.org link, BEFORE the linked website goes dead. After it goes dead, there seems no way to cure the link rot.Anythingyouwant (talk) 21:12, 9 April 2017 (UTC)

we can only search now at archive.org for main pages .. do you mean on this page https://web-beta.archive.org/ the "Search" box in the upper right corner? That is a new feature. They are only doing base URLs for now. Full URLs can still be found a number of ways such as the API or a URL-based API eg. https://web.archive.org/web/*/http://www.yahoo.com/news or https://web.archive.org/http://www.yahoo.com/news -- GreenC 22:08, 9 April 2017 (UTC)

Here's the dead link I was trying to replace with an archive.org link: http://www.rhapsody.com/#artist/elvis-costello/album/elvis-costello-the-rhapsody-interview/track/on-linda-ronstadts-rendition-of-alison It's now impossible to do that, right?Anythingyouwant (talk) 00:09, 10 April 2017 (UTC)

In this case Wayback removes anything beyond the # because the # is just a page section. It's the same content with or without the # portion. -- GreenC 01:17, 10 April 2017 (UTC)

But the content they give me doesn't mention Elvis Costello or Linda Ronstadt. I finally found the content here: http://us.napster.com/artist/elvis-costello/album/elvis-costello-the-rhapsody-interview/track/on-linda-ronstadts-rendition-of-alison However, I think you're correct that the "#" caused the problem, thanks. Anythingyouwant (talk) 01:26, 10 April 2017 (UTC)

Link Rot

I am affiliated with Symantec an IT security company. I was hoping to address the "broken link" tag on the page: List of mergers and acquisitions by Symantec. The once FA article has about 70+ broken links to Thomson Reuters reports on alacrastore.com. I have searched the website and the internet and found no other way to access those reports. Additionally, the citation templates are already marked with "dead-url=yes," making the primary link in the citation the working archived version. Based on the instructions on this page, does this mean everything is in order and there is nothing additional to do to address the tag? I already corrected all of the other broken links on the page. CorporateM (Talk) 14:08, 14 April 2017 (UTC)

The bot WaybackMedic added 44 archive links on 23 March and that cleared up the problem mostly, don't see why the broken link tag is needed now. -- GreenC 15:11, 14 April 2017 (UTC)

What is the right thing to do regarding 'broken links'?

I read wikipedia often, and occasionally come across broken or 'dead' links But what is the correct thing to do? Ignore it? Mention it on the talk page? "Add" broken link into the article itself? (I am not a proper wikipedia editor) I read through the FAQ, and searched for 'broken links' - no results matching the query. Thanks — Preceding unsigned comment added by 79.76.99.144 (talk • contribs)

Generally: repair, tag, remove -- in that order. Repair if you can -- may be site syntax changed or there is an archived copy somewhere, like Internet Archive. If you cannot repair, tag it with {{dead link}} and someone else might eventually get to it; we have a few bots too. If you've looked everywhere and it cannot ever be restored -- then remove it (and hopefully replace it, and many editors would not remove even a dead link without providing alternative sources). Mostly, you don't really have to do anything -- we have millions of dead links and, unless you actually fix them, individually they are not worth the effort reporting beyond placing a dead link tag. The bot will also get to it eventually and repair or tag it. — HELLKNOWZ ▎TALK 12:00, 3 September 2017 (UTC)

If you reach step three: the link only, not the whole citation. Working URLs are not required. Right Hellknowz--50.201.195.170 (talk) 20:06, 4 May 2021 (UTC)

I meant the whole citation. If a source cannot be ever verified, it's not really a source. The whole point of Misplaced Pages is that verification can happen. The citation without a link doesn't provide a way to verify the information. Removing the citation is really the last resort, which is why I elaborated that it's for cases when it "cannot ever be restored". — HELLKNOWZ ▎TALK 20:17, 4 May 2021 (UTC)

Bull. If I cite something from the Bible, I don't need a URL to make it verifiable. FS. --50.201.195.170 (talk) 21:55, 4 May 2021 (UTC)

This is obviously talking about content that solely existed on the Internet. Why would anything on this page apply to physical books? It seems like your question isn't actually about link rot, but whether a URL is required for a citation. — HELLKNOWZ ▎TALK 22:15, 4 May 2021 (UTC)

Nope. Obvious. Nope. My question was asking you to agree that your advice to remove an entire citation isn't general, bur rather applies only to content that solely existed on the Internet. Which you've more or less done. Thanks. Offline-only sources are 100% acceptable as such.--50.201.195.170 (talk) 22:26, 4 May 2021 (UTC)

Using a tool to archive live links

When archiving references in an article, should ALL the references (live and dead) be archived, or only the dead ones? I raised this question at Misplaced Pages:Bots/Noticeboard#Archiving live links - Redux, and referenced an earlier discussion at Misplaced Pages:Bots/Noticeboard/Archive 11#Archiving links not dead - good idea?. I was advised that this Linkrot talk page might be an appropriate place to discuss it. Apparently the default setting of a tool like IABot v1.5.2 is to archive only the dead links, but some people are choosing the option to archive everything. This practice came to my attention with this edit to the article Barack Obama: someone using the IABot v1.5.2 archived 392 references, adding 74,894 bytes to the article, and increasing its already huge size by 22.6%, from 330,241 to 405,135 bytes. (The user reverted at my request.) Do people think this kind of outcome is a good thing? Should some kind of consensus be developed, as to when and whether to use the "rescue all" option? --MelanieN (talk) 15:07, 4 October 2017 (UTC) On second thought I am going to post this question at Village Pump so as to get a wider readership and more input. --MelanieN (talk) 15:28, 4 October 2017 (UTC)

The discussion is at Misplaced Pages:Village pump (miscellaneous)/Archive 56#Using a tool to archive live links. – Uanfala (talk) 13:39, 15 May 2018 (UTC)

Dead links taken over by domain grabbers & Co.

To whom it may concern: Most of the automatic dead link detection helpers recognize links such as http://www.bigshoegames.com/about-us.html as working properly, even though the original target page has been replaced with advertising by a domain grabber. Therefore, it would be really nice if those tools could detect such dead links not only by their HTTP status codes, but also by looking at the page content. A list of match patterns indicative of domain grabbers could be compiled and maintained for example on-wiki and, after manual review, synchronized to the various tools. It would probably be difficult to reliably automatically determine the last "good" snapshot in the Wayback Machine, but marking up this kind of links as needing maintenance would be a huge step forward. --Tim Landscheidt (talk) 08:35, 5 November 2017 (UTC)

It is a problem. In my experience writing a filter for domain squatters (I've tried it) they are endless in variety and name. I've discovered affected domains and could forward a list to user:cyberpower678 for IABot to mark dead, though I need to write some code first to pull the domains from the logs. This is part of the bigger issue of soft 404 links which is quite challenging. -- GreenC 15:12, 5 November 2017 (UTC)

In the end, all attempts are futile :-). I have seen several companies and organizations who redirect all dead links to their homepage, probably because someone told them that a 404 might upset the reader, or they just do not have the technical skills to set up proper error pages; that's when I turn to "Cool URIs don't change" for some voice of reason. But at Misplaced Pages scale, one pattern can match a lot of pages, so this might be worthwhile. --Tim Landscheidt (talk) 19:50, 5 November 2017 (UTC)

I ran a program against a large dataset and it found about 76 domains that are web squatters (or former squatters now completely dead), and checking the IABot database most of them are already marked dead. I'm fixing them via the IABot interface, but the queues are backed up at the moment (only 5 at once per user). Here's the list:

Extended content
ahuero.com anotherchance.es bigdekalb.com cooke.ws curnonska.com gabonnationalparks.org kids.activedmonton.ca www.activedmonton.ca losespectaculos.tv newsfix.ca newwritinginternational.com newyorknewstoday.com oldwebsite.paralympic.org paralympic.netempire.de payrent.co.uk pfcberkut.ru usautotrails.com verusx.net www.airlineupdate.com www.animacor.com www.apacheness.com www.artsandantique.net www.basketpedya.com www.bndr-mali.org www.buddyhollyonline.com www.chriswhitleydiscography.com www.clannad.org.uk www.claytoday.biz www.comeonboro.com www.detlefmauss.de www.encuentroartesescenicas.com www.fil-amboxers.com www.flfa2010.com www.floridaparks.com www.foundryclimbing.com www.giulianacesariniproart.com www.health7800.com www.hot-iron.co.uk www.hwy56.com www.indyinsiders.com www.iraklis-fc.gr www.kinema2cinema.com www.lbpapalvisit.org www.libertinesecurity.com www.lincolnunitedfc.co.uk www.luckyshow.org www.marioyepes.co www.mercatorgold.com www.mojvikend.info www.multiracialheritageweek.com www.neftchifc.com www.netspinners.co.uk www.nfsbih.net www.nick-kelly.com www.oktoberfest.ca www.olivercromwell.org www.pgxnews.org www.philadelphiabrassdrumcorps.org www.pinrepair.com www.rachelbillington.com www.rlwc08.com www.saintpatrickskilsyth.org.uk www.saints-alive.co.uk www.shipstontennis.org.uk www.simpsonwatch.com www.stroudsfitness.net www.theforgottenimp.co.uk www.thirdfridaywine.com www.timesnews.co.ke www.tonicbooks.com www.usautotrails.com www.versussleep.net www.walbergwatch.com www.webhosting.info www.welsh-canoeing.org.uk www.xblb.com

Extended content

ahuero.com
anotherchance.es
bigdekalb.com
cooke.ws
curnonska.com
gabonnationalparks.org
kids.activedmonton.ca
www.activedmonton.ca
losespectaculos.tv
newsfix.ca
newwritinginternational.com
newyorknewstoday.com
oldwebsite.paralympic.org
paralympic.netempire.de
payrent.co.uk
pfcberkut.ru
usautotrails.com
verusx.net
www.airlineupdate.com
www.animacor.com
www.apacheness.com
www.artsandantique.net
www.basketpedya.com
www.bndr-mali.org
www.buddyhollyonline.com
www.chriswhitleydiscography.com
www.clannad.org.uk
www.claytoday.biz
www.comeonboro.com
www.detlefmauss.de
www.encuentroartesescenicas.com
www.fil-amboxers.com
www.flfa2010.com
www.floridaparks.com
www.foundryclimbing.com
www.giulianacesariniproart.com
www.health7800.com
www.hot-iron.co.uk
www.hwy56.com
www.indyinsiders.com
www.iraklis-fc.gr
www.kinema2cinema.com
www.lbpapalvisit.org
www.libertinesecurity.com
www.lincolnunitedfc.co.uk
www.luckyshow.org
www.marioyepes.co
www.mercatorgold.com
www.mojvikend.info
www.multiracialheritageweek.com
www.neftchifc.com
www.netspinners.co.uk
www.nfsbih.net
www.nick-kelly.com
www.oktoberfest.ca
www.olivercromwell.org
www.pgxnews.org
www.philadelphiabrassdrumcorps.org
www.pinrepair.com
www.rachelbillington.com
www.rlwc08.com
www.saintpatrickskilsyth.org.uk
www.saints-alive.co.uk
www.shipstontennis.org.uk
www.simpsonwatch.com
www.stroudsfitness.net
www.theforgottenimp.co.uk
www.thirdfridaywine.com
www.timesnews.co.ke
www.tonicbooks.com
www.usautotrails.com
www.versussleep.net
www.walbergwatch.com
www.webhosting.info
www.welsh-canoeing.org.uk
www.xblb.com

It's not a complete list but probably represents a fair portion of the total. -- GreenC 20:30, 5 November 2017 (UTC)

Announce: RfC: Nonbinding advisory RfC concerning financial support for The Internet Archive

Misplaced Pages:Village pump (miscellaneous)/Archive 57#RfC: Nonbinding advisory RfC concerning financial support for The Internet Archive --Guy Macon (talk) 12:13, 22 December 2017 (UTC)

Overhaul

This page has gotten somewhat long and verbose and I'm afraid most people don't read it. This has happened over the years due to changing conditions and the nature of Misplaced Pages where everyone edits. I'd like to overhaul the page and trim it down so the important stuff is clearly presented. Right now it's a mix of information points and a tutorial for newbies. It does neither very well. A tutorial can be made in a separate document while keeping this one a source of important information for editors about the various ways archiving is currently being done by automated systems and manually. -- GreenC 16:55, 12 January 2018 (UTC)

Sounds reasonable, but the devil is in the details. Might I suggest writing up a draft of the overhaul on a subpage of your user page and asking for comments/corrections before going live with it? --Guy Macon (talk) 19:11, 12 January 2018 (UTC)

Listing of site exclusions from archive.org

It would be useful, but likely unmaintainable, to have some listing of sites which are not possible to archive via archive.org due to, for instance, robots.txt or generic exclusion. For instance, it appears that eWeek is 'excluded' from the Wayback Machine, but can be archived in archive.is. Thoughts? --User:Ceyockey (talk to me) 12:50, 14 April 2018 (UTC)

It is constantly changing on a per website basis. My bot WP:WAYBACKMEDIC is able to auto detect when a link is excluded and look for alternatives like archive.is but it doesn't operate on a per-domain basis. I could run the bot on all pages that contain an eWeek URL.. -- GreenC 14:28, 14 April 2018 (UTC)

External links section

Ironically, some of the external links to add-ons in the External links section are dead. I hid the dead URLs inside comment tags (); does anyone know of any other add-ons that could replace these? --Hmxhmx 18:34, 5 January 2019 (UTC)

User:Hmxhmx, I just added the official Wayback add-on which I personally use and find useful. The second biggest cache of archives is archive.today but don't know of an add-on have to manually check. -- GreenC 19:05, 5 January 2019 (UTC)

Thanks! I wasn't aware of this add-on. --Hmxhmx 11:42, 6 January 2019 (UTC)

Why is this article Semi-Proteted?

Misplaced Pages is supposed to be The 💕. Why is this article is so-called Semi-Protected? Whoever had this article semi-protected should be ashamed. I don't like Misplaced Pages's so-called protection policies, it bothers me. I urged you to remove the semi-protection on this article right now! I quit Misplaced Pages three years ago due to similar creative differences. The so-called protection policy is a joke. It's time to have Common Sense and put the free back in The Free Encylopedia. No More Protection Nonsense, No More Gold and Silver Padlocks on the right of the screen. SMH! Spencer H. Karter (talk) 22:18, 24 May 2019 (UTC)

a) No one can tell what article you are talking about. b) You may want to read WP:NOTCOMPULSORY. MarnetteD|Talk 23:25, 24 May 2019 (UTC)

c) If you are talking about the page associated with this talk page you should no that it is not an article. If you have an edit that you think should be to you you can file an edit request on this talk page. MarnetteD|Talk 23:35, 24 May 2019 (UTC)

Unhelpful page

I find this an un-helpful page. I am looking at a page with a dead link; I found the archived version at the Wayback machine, I want to make the citation include this archived version. What specific format do I put in the citation to do that? Geoffrey.landis (talk) 15:55, 3 October 2019 (UTC)

This is sort of covered in the § Internet archives section. Basically you need to add |archive-url=archive.org/example|archive-date=3 October 2019 (or possibly some other date format - I always forget) inside the citation template. I agree that this information could be communicated more clearly and prominently, maybe with a simple before-and-after example. Colin M (talk) 16:12, 3 October 2019 (UTC)

I added more detailed instructions. I hope that helps. —AlanBarrett (talk) 16:59, 18 January 2020 (UTC)

Should the archive details of live links be added to pages?

There's a brief discussion on this above from a year-some ago, but with no resolution. Is it acceptable or not to actually add archival details of live links to article pages? This is an issue being raised here , and I had thought there wasn't an issue to at least add these for live links (This is an optional thing but doable through IABot) but the discussion above, and this diff issue, suggest that it is unwanted with live links? (I do understand that the archive version of these links are automatically made, it's not like the archive has to be made, its just a matter of connecting it up.) --Masem (t) 06:34, 18 January 2020 (UTC)

I find archive links useful, even when the original URL still works. For example, it protects against future link rot, and against future changes to the referenced page. However, it might make sense for the {{Citation}} templates to be changed to place less emphasis on the archive when url-status=live. For example, instead of rendering like this

"Finding Dory" breaks record for opening of animated film". Associated Press. June 20, 2016. Archived from the original on June 21, 2016. Retrieved August 11, 2016.

one could de-emphasise the archive like this:

"Finding Dory" breaks record for opening of animated film". Associated Press. June 20, 2016. Retrieved August 11, 2016. Archived on June 21, 2016.

—AlanBarrett (talk) 16:26, 18 January 2020 (UTC)

For the above example at Toy Story 3, I agree it is obnoxious to pre-archive every link on the page. Keep in mind archive URLs are themselves prone to link rot and problems, and need maintenance and checking. It is also a morass of complexity added to the wikitext. The feature of IABot to archive every link is and always has been controversial with many discussions started and no resolution. IMO it should be limited to admins and only done when there is some justification. -- GreenC 17:14, 18 January 2020 (UTC)

I had no idea this could be controversial. The integrity of our references in the content readers see should undoubtedly take precedence over a minor inconvenience of some added text for those using the source editor (myself included, most of the time). Of course we should proactively prevent link rot rather than waiting for a problem to emerge and sometime later trying to fix it, hoping there's an archived version available. I sympathize with the frustration of navigating lots of citation markup in the source editor, but that's a problem that could use addressing regardless of whether we're using two of the parameters. There are any number of technical interventions, from collapsing that markup in the editor by default to what the folks at meta:WikiCite are working on. All of those interventions would be useful to editors, but our priority is readers. — Rhododendrites \\ 18:09, 18 January 2020 (UTC)

IABot does archive every link it encounters (as can be seen in the IABot database) but it doesn't load the archive into the wikitext, until the link dies, by default. There is no consensus for that sort of thing. IABot's manual option to do so has always been highly controversial ie. you will never get consensus for making it the default behavior. The arguments you give above a pretty standard in those discussions but there are also counter arguments and opinions. -- GreenC 18:29, 18 January 2020 (UTC)

If we have established by past discussions that generally, archiving live links should be avoided, I'd recommend including this on this page (and again, reiterating that it is not necessary to use something like IABot to make sure references are properly stored at archive.org or similar since this is all done through automatic tools, and we can recover those via IABot if the live-link ever dies, and that otherwise, archives of live-links are adding excessive wikitext and they themselves may go dead). But I would put alongside it, that should someone decide to include them, reverting those additions should be also be avoided or determined inappropriate in the same manner as how DATERET works. But again, not sure where a major consensus discussion on this exists. --Masem (t) 18:37, 18 January 2020 (UTC)

The IABot optional feature (a check-box on the control page) that mass-adds live archive-links is controversial. Prior to the IABot feature coming into existence, there was never an issue with adding live links. All the discussions have been specific to the IABot feature and its usage of blindly mass-adding links via bot. Toy Story 3 concerns the usage of this IABot feature. For example, did the person who deployed the bot actually check the archive links are accurate and working? Or, are they merely mass-adding links without checks or verification presuming that the Bot always gets it right and knows best? BTW, IABot never had consensus for this feature, it was created by the bot op without discussion or specific approval despite continual threads like this one. -- GreenC 19:31, 18 January 2020 (UTC)

Okay, that makes sense, and that should be something documented on this page. Just to make sure, if someone else manually figured out all the live-link archive links and added them - with good faith assumption they checked the archive link was the proper functioning page and with the info supporting the material it is used to sourced, then that should be fine, but probably of the nature of "busy work". --Masem (t) 20:39, 18 January 2020 (UTC)

Yes, I have never seen anyone dispute manually adding links. -- GreenC 17:51, 19 January 2020 (UTC)

I am someone who disputes massive additions but not singular ones, or even small additions (< 10k characters) if I'm in a good mood. However, the policy of encouraging setting up archive links when adding the original citation seems to be what encourages the massive additions, which need to be positively discouraged. In addition to what others have said, unnecessary archive links add to the visual clutter text editors must sort through in raw editing mode (I use wikEd for parsing heavily referenced articles, which have lots of such clutter, archive links or no, but that tool has its flaws, such as images obscuring text). Then there's the matter of adding to download time (is there any added setup time when archive links are present?, as I'm aware of how references rendering seems to add to page download time, especially on slow machines). Dhtwiki (talk) 20:03, 6 February 2021 (UTC)

Where do I find this misterious r "Fix dead links" option in the page history tab?

I can't see such. Does it's appeareance depend on language or theme settings? — Preceding unsigned comment added by Dqeswn (talk • contribs) 20:52, 21 January 2020 (UTC)

It's right above the line that starts "For any version listed below". -- GreenC 21:00, 21 January 2020 (UTC)

Yeah, it's right near the title of the page when you are looking at its history. Bottom right on this screenshot here. . --Masem (t) 01:56, 22 January 2020 (UTC)

guidance on selecting an archive service

Due to the numerous archive links created by User:InternetArchiveBot, there's a presumed preference to use the Wayback Machine over other archive services, but Wayback has some disadvantages.

As reported at Archive.today, existing Wayback pages can be blocked after the fact through robots.txt. Alternatively (and with the same effect), Using the Wayback Machine documents that a block can be requested by emailing a request to the Wayback operators.

This provides a compelling reason (at least in many cases) to prefer the use of archive.today. Of course, there are specific situations where archive.today is not an option, and there may be other scenarios where archive.org or some other archiving service would be preferable to archive.today.

Considerations in selecting an archive service for Misplaced Pages pages are both complicated and non-obvious. It is not something that should be left to individual editors to figure out. A page to gather the collected knowledge needed to minimize the negative effects of link rot would seem to be desirable. Fabrickator (talk) 03:15, 18 March 2020 (UTC)

If this was decided centrally, the decision would certainly be the Internet Archive. Even if the Wikimedia Foundation wanted to invest millions of dollars to develop a sustainable competitor, would that make sense?

Because the disadvantage of the wayback machine are usually seen mostly on specific URLs (for instance pages with weird JavaScript, AJAX, iframe or cookie requirements), the individual editor is best placed to notice that an archived version is not fit to purpose and needs to be replaced with something else. I do agree that sometimes it would be nice to have extra support for specific URLs to be (re)archived with a more "powerful" (and expensive) crawl, however I'm not sure how that could be offered indiscriminately.

For specific cases, ArchiveBot by the ArchiveTeam is also available. One possible request (maybe even at the next m:Community Wishlist Survey) could be to offer a way for users to propose (a limited number of) URLs for archival via that method in order to bypass a specific set of restrictions in the normal wayback crawls. (You'd need to specify which ones. Ignoring copyright issues may not be the "problems" to focus on, if the archive is to last.) Nemo 07:16, 18 March 2020 (UTC)

My bot WP:WAYBACKMEDIC is testing archives for availability and if not then it switches it to a different provider. It keeps logs how common the problem is. It does exist but is nothing to be overly concerned about, at any given moment probably less than 2% of Wayback links have a problem. Wayback has a lot of advantages. Every new URL added to Misplaced Pages is archived at Wayback within 24hrs. This is important for content-drift reasons ie. when content on the page changes over time. Wayback keeps archiving the page over time so users can choose newer versions if they want. Wayback has more, a lot more, than any other provider. Wayback is pretty good about soft-404s where archive.today it is over 50% making automated archiving with that service next to impossible at large-scale - I have to manually check each page before it is posted (an automated soft-404 checker that I wrote gets it down to 15%, great, but still too high thus manual checks - very laborious). -- GreenC 14:59, 18 March 2020 (UTC)

I don't think it's critical to have an answer that works across all the Wikipedias.

Unless you're trying to say that only the bots should add archive urls, there will be times when an editor may choose to add an archive url, and given that there are times when archive.today has a good copy on Wayback, but Wayback doesn't, then we need some guidance, not necessarily a formal policy. Yes, there is WP:Using archive.today, so that might seem like the right place, though it's missing at least some of the "nuts and bolts" of using archive.today, and I think different points of view may make that a problematic location. Is it permissible to have an "essay" which does not necessarily represent consensus? Fabrickator (talk) 08:35, 21 March 2020 (UTC)

Sure anyone can make an essay. -- GreenC 11:44, 21 March 2020 (UTC)

Fixing dead link on archive URL

I'm working on the Flashdance (soundtrack) article, and reference number 27 has an archive-url with a url-status=dead. I found the new location that's pretty much identical to the archive.org page that 27 points to. The new version is:

https://www.ifpi.fi/tutkimukset-ja-tilastot/kaikkien-aikojen-myydyimmat/ulkomaiset/albumit/

Is it OK to replace all of 27 with this? Should I just replace the archive-url with this and remove the status? I look forward to learning the best way to handle this kind of situation. Thanks so much! Danaphile (talk) 00:16, 23 March 2020 (UTC)

Here's what I suggest: set url to the live working url, leave archive-url alone, and set url-status to "live".

Since IABot will be inclined to mess with this, add ''{{cbignore|bot=InternetArchiveBot}}'' following the closing "ref" tag (I'm crossing my fingers that other bots won't do this.)

My rationale is that I want to leave a "bread crumb" to the existence of two different urls under which this content has been made available (ignoring the archive servers). If the live url stops working, the editor has both urls available from which to choose a suitable archive copy. Also, if there is some significant (but possibly non-obvious) difference in content, we have the old url available for reference. Fabrickator (talk) 03:48, 23 March 2020 (UTC)

Fabrickator Please do not hack the system. We are working on bots to cleanup/fix these sorts of situations. The |url= and |archiveurl= need to match, and if they don't then it needs to be fixed. The problem is that when you put a different URL in the |archiveurl= then exists in the |url=, when the later dies there is no place to put the archive because the slot is already taken with a different URL and now we have stuck link rot. If you want multiple different URLs use two citations.

Danaphile, in this situation delete |archiveurl=, |archivedate= and |url-status= entirely, and replace |url= with the new working URL. -- GreenC 12:11, 23 March 2020 (UTC)

Thanks so much! Danaphile (talk) 13:31, 23 March 2020 (UTC)

add Memento Project to 'Manual archiving'

I think that the Memento Project should be added to the list of handy web archiving sites. Namely, I would exchange sentence "Use a web archiving service such as Internet Archive or Archive.is." for "Use a web archiving service such as Internet Archive, Archive.is or Memento Project, although Memento Project has trouble finding all existing results." In my opinion, the fact that the Memento Project can search so many archives at once (though not always with 100% performance) is too important to miss such an opportunity at the price of additional minimal user effort. --Entropfix (talk) 20:08, 11 April 2020 (UTC)

I wouldn't go out of the way to prominently recommend it. They have two systems, one is a user-facing "show all links at all providers" database. The other is a DIY where you poll each archive provider directly. The later is pretty reliable because it reflects what actually exists at the archive provider at the time of checking, but of course it is slow as you have to poll each provider site directly which can take a long time. The former is a cached database maintained at the memento website, of what existed at one time at each provider, but it is not kept current so you end up with bad data due to changes on the archive provider end that doesn't get reflected in the memento database. The advantage is the results return quickly vs the DIY method, since it is a cache. Also, it is preferable to link directly to the final destination archive. -- GreenC 22:22, 11 April 2020 (UTC)

section rot (anchor rot)

In Comparison of free and open-source software licences#Approvals, at the row for “Unlicense”, the link for OSI approval contains an anchor to the section unlicense, and the link itself does return the article, but the section is removed so browser ignores the anchor. I saw this multiple times (though a low percentage). Is there some way to mark it? I guess {{deadlink}} is inappropriate, but I found no alternatives. -- Franklin Yu (talk) 22:32, 16 May 2020 (UTC) Lesson 1-1 Computer Fundamentals https://totalemitranews.blogspot.com/2020/10/lesson-1-1-computer-fundamentals.html — Preceding unsigned comment added by Vicckymm123 (talk • contribs) 10:20, 3 October 2020 (UTC)

Archive.Today in 2010-2012?

The article claims that Archive.Today archived URLs between 2010 and 2012.

But Archive.Today only exists since 2012. --84.147.40.124 (talk) 10:21, 17 July 2020 (UTC)

Automatic archiving question

The automatic archiving has been empirically untrue for me. I have created and significantly rewrote and improved over 15 Misplaced Pages pages, and every single time only some or half or if I'm lucky, most are archived when I use IABot. But every single time I find that I have to make custom archive URLs myself either through Wayback Machine or Archive.is. So what is the deal? Is this statement in the lead section inaccurate, am I doing something wrong, or both? Does the Misplaced Pages automatic archiving need to be fixed? And if so, how would I go about correcting the archiving, or notifying the person in charge? Factfanatic1 (talk) 13:53, 18 August 2020 (UTC)

If it can be verified I can report it to InternetArchive, who maintain the program. It's simple, add some URLs to a page and see if the archives show up on the WaybackMachine. If they don't show up post the diff here of the URLs that were added. -- GreenC 14:19, 18 August 2020 (UTC)

@GreenC: But in the future when I rewrite and create further pages, what do I do?

You add new URLs. Check if those URLs get archived at WaybackMachine. Give it 24hrs it should happen automatically. If they are not at archive.org after 24hrs, post here which URLs. -- GreenC 18:19, 18 August 2020 (UTC)

@Factfanatic1: You can run the page through the archiving bot. And it will automatically archive those links and add them to the page. In fact, I recommend you do this, as, like you, I have found dead links that can't be archived by doing this before. Then I've had to delete those links altogether as well as any information pertaining to it on the page. Samurai Kung fu Cowboy (talk) 16:44, 7 February 2021 (UTC)

@Samurai Kung fu Cowboy: the bot (no longer) automatically creates new archives at WaybackMachine, that is done by a separate process. IABot adds archive URLs into Misplaced Pages is they preexist at Wayback (or really in the IABot database iabot.org which is a cache of Wayback). -- GreenC 17:09, 7 February 2021 (UTC)

Link rot through redirections

I've seen this before, but never thought to ask about it until now. While editing Crime in Venezuela I found a link to the website http://www.scotsman.com/news/venezuela-police-corruption-blamed-for-kidnapping-epidemic-1-1667444

If you follow the link, it takes you to an HTTP 301, or a redirection to https://www.scotsman.com/news/wait-justice-over-teachers-killer-gets-life-prison-1667444

The internet archive has an archive of the original article, but recent archives have also been redirected to Site #2 and that has been saved instead. What should I do in this situation? Link the old archived site and warn people never to update the link because all further links will be broken? Advice appreciated Ph03n1x77 (talk) 20:29, 18 August 2020 (UTC)

Oh soft404s, they are difficult. I just edited the article to use {{cbignore}} which tells the bots not to touch the cite. -- GreenC 00:01, 19 August 2020 (UTC)

Internet Archive backups for Template:External media

I know that, when a URL is used as a reference on Misplaced Pages, the Internet Archive makes sure to back it up soon after if it has not already done so. Is there a similar practice in place for URLs used in Template:External media? I'd think we'd want to do our best to fight link rot. {{u|Sdkb}}  05:38, 28 August 2020 (UTC)

As long as a link is registered in the mw:externallinks table, it's picked up. Nemo 07:03, 28 August 2020 (UTC)

Nemo bis, oh awesome, I guess we're already all set, then. (I'm not too familiar with MediaWiki, but I'm assuming it's registered somehow. ?) {{u|Sdkb}}  07:56, 28 August 2020 (UTC)

Archiving YouTube links

Hello. I'm pretty sure I saw somewhere on Misplaced Pages a YouTube video archiver that actually worked. I've been looking for this site for a while but I can't find it again. Does someone know where I can find it? Yes, I tried archive.is as pointed out in #Archiving youtube videos, but it didn't work. Wayback Machine and all mainstream archivers I can think of doesn't correctly archive YouTube videos, so a site that actually archives the videos would be extremely helpful to prevent YouTube link rotting. I know it exists, and AFAIK, it was linked in a Misplaced Pages policy about YouTube links/link rotting/etc., but I can't find it again. GhostP. 21:20, 9 November 2020 (UTC)

User:GhostP., the only method I know of is Conifer (formerly Webrecorder.io). Would need to place a {{cbignore}} at the end of the cite template (after the closing }}) otherwise the bots could fail to recognize it and remove the link. -- GreenC 21:59, 9 November 2020 (UTC)

The main problem with Conifer is your account has a cap on disk space usage so you can only archive so many links before running out. Possibly could sign up for multiple accounts but eventually they will catch on. They also tend to frown on copyright content so anything not totally free could have an uncertain future. It's really designed for academics and arts not mass archiving. It can work when done with discrimination. -- GreenC 22:04, 9 November 2020 (UTC)

That's the site I was looking for, thanks :) GhostP. 22:18, 9 November 2020 (UTC)

You can also use tubeup.py, a script that will download the video locally and then generate the metadata required to fit Web Archive's template and then upload there automatically. --94.105.103.120 (talk) 04:51, 2 February 2021 (UTC)

Archive links in citations with live URL's?

Hello. Couple of questions.

Are you supposed to put archive links in citations with live URL's? Or are you supposed to omit them?
At this revision of Brooks Brothers riot, for citation #2, even though the citation is set to url-status=live, it is still displaying the archive link first. Is this normal? If the URL is live, shouldn't it only display the live URL, or display both but place the live URL first?

Thanks a lot. Looking forward to your feedback. –Novem Linguae (talk) 00:08, 12 February 2021 (UTC)

Novem Linguae if you don't get an answer here in a day or so you might try asking at the WP:VPT. MarnetteD|Talk 00:55, 12 February 2021 (UTC)

@Novem Linguae: I assume that this wasn't resolved, as the page now excludes the archive link. To answer your question, though, the CS1 template will automatically assume that the original url is dead if the "archive-url" parameter is populated, effectively setting "url-status=dead" by default. This has to be overridden by manually inserting "url-status=live" into the CS1 template. I don't think this can be automatically detected because in cases where the site dies and gets replaced by another (e.g. an ad saying the domain is for sale), it would be detected as live when the content is no longer present.
From the revision you posted,
<ref name="MillerWapo">{{Cite news|url=https://www.washingtonpost.com/history/2018/11/15/its-insanity-how-brooks-brothers-riot-killed-recount-miami/|title=‘It’s insanity!’: How the ‘Brooks Brothers Riot’ killed the 2000 recount in Miami|first=Michael E.|last=Miller|work=Washington Post|date=2018-11-15|archive-url=https://web.archive.org/web/20181212221657/https://www.washingtonpost.com/history/2018/11/15/its-insanity-how-brooks-brothers-riot-killed-recount-miami/|archive-date=2018-12-12}}</ref>
the CS1 template does not include "url-status=live". Therefore, it assumes that it was dead. Inserting the additional parameter "|url-status=live" would have listed the live site first. -2pou (talk) 17:54, 28 April 2021 (UTC)
2pou, thanks for the detailed answer. Much appreciated. Another quick question. Is it good practice to run IABot / add archive URL's for live links? I've seen this happen in articles a couple of times now, and I've also seen people remove them if the links are still live, so it is unclear to me which practice is best. Is our goal to pre-emptively archive all citations? Is that good practice? Or should we wait until the links die, then archive? Thanks. –Novem Linguae (talk) 23:34, 28 April 2021 (UTC)
Novem Linguae You can check out a whole discussion at the village pump for additional thoughts, but I think that archiving links is a good thing, personally. Sometimes an entire site will get migrated somewhere, and a lot of old content gets lost. I tend not to do it, knowing that it can always be retrieved later, but I think it depends on the editor. I'm surprised people would remove them, though. I believe, per that linked thread, a lot of people use the archive all links function to prepare for a GA/FA review. -2pou (talk) 17:49, 29 April 2021 (UTC)

Another automatic archive question

The description states that links are automatically archived when URLs are added to main space articles. Does anyone know if this behavior carries over to all links in an article that are moved from User space or Draft space into the Main space? Will all those links be archived when reaching Main space, or is the mechanism only triggered when a brand new URL is added to the existing Main space? Does it matter if the page has been indexed yet via a reviewed status on the new pages feed? (Courtesy ping @GreenC: who was responsive to the previous, similar question. Also not really sure the best forum to ask this...) -2pou (talk) 17:53, 28 April 2021 (UTC)

This is a good forum. Those are good questions. I do not know for certain. However, I believe it monitors mainspace only (easy to test). And only triggers if the URL was never added to the Wayback Machine before (also easy to test). The indexing via NPP shouldn't matter as it is using the EventStream API which is not aware of things on that level.-- GreenC 21:07, 28 April 2021 (UTC)

WARNING! Potential CENSORSHIP concerns. When is it appropriate to MASS delete 'working links' and working archiveURLs from a citation? Need more eyes, and policy help on this.

I'm having a discussion with a taciturn user who is deleting very large numbers of URLs from references, across wikipedia, refuses to use edit summaries and insists that it's OK to delete URLs AND archiveURLs. (e.g. all of "url=http://www.biology.leeds.ac.uk/staff/tbt/Papers/JLTetal_TREE04.pdf |archive-url=https://web.archive.org/web/20060321002708/http://www.biology.leeds.ac.uk/staff/tbt/Papers/JLTetal_TREE04.pdf").

The user at first justified it as based on copyright violation, and when that didn't fly, eventually said it was because there were doi= parameters. DOIs should be persistent, but in practice they are not. They're sometimes deleted for political reasons - e.g. papers that are politicially embarrassing to the governments of host countries, expose corruption. I gave examples, but the user isn't having it. It's important that we not allow these to be MASS deleted. If PAG already forbid this, I'm seeking help with clarification and enforcement, and if not, I'm seeking help making it so. --50.201.195.170 (talk) 21:52, 4 May 2021 (UTC)

Is KDL example URL really representative?

The kind of dead links I'd like to scrap are much less recoverable than anything like the example URL in the KDL section. For example in college football, there is a University of Florida Health Science Center Libraries proxy link to something on EBSCO Information Services, but looks like the URL is only some expired session entry that no one will ever be able to access no matter who they contact in Florida

http://web.ebscohost.com.lp.hscl.ufl.edu/ehost/pdfviewer/pdfviewer?sid=faf8af2a-6b90-49ff-bea2-ac199cafedc5%40sessionmgr111&vid=2&hid=125

— Chris Capoccia 💬 22:18, 3 June 2021 (UTC)

YouTube unlisted video changes

YouTube is breaking links to pre-2017 unlisted videos in 30 days for security reasons unless the channel requests otherwise.

I know of one article with a citation to an unlisted video: Programadora (ref 41). Want to make you aware in case more links are broken and if anything can be done to archive videos in this situation. Sammi Brie (she/her • t • c) 05:36, 24 June 2021 (UTC)

Archive.org should be able to archive youtube videos. If not, the metadata and the archived page (just without video) should be there, so use that at the very least if you encounter an unlisted video maybe?

In your case, they have the metadata and the video: https://web.archive.org/web/20160309105430/https://www.youtube.com/watch?v=hLQkvxYPHUw. I added the link to the article you were talking about.

However, I think archive.org they only archive certain videos. Hopefully a site that can archive videos on demand comes about soon. Rlink2 (talk) 15:19, 5 September 2021 (UTC)

Another question about citation url-status: what counts as dead vs. unfit?

if a citation's link points to a the same website, but the article itself was removed, what would be appropriate url-status be? I'm unsure whether to mark it as unfit or just plain dead. Or, for that matter, what counts as "unfit" in the first place I checked through the talk page and talk archives here but I didn't see anyone else ask this, and I'm not entirely sure I understand the mechanics of how this affects the display and bot behavior. Could someone help? — Preceding unsigned comment added by Macks2008 (talk • contribs) 21:23, 3 July 2021 (UTC)

Setting to unfit has only one consequence: the url is no longer hyperlinked (blue). This would normally only be done if you don't want people clicking through to the link. This is usually the case when the domain has been hijacked by spammers or porn sites. Otherwise, if it's just a 404 or soft-404 (eg. to a the same website, but the article itself was removed) it would be set dead. -- GreenC 00:47, 4 July 2021 (UTC)

Webcite down?

Webcitation.org has been down for about a week or two now? Anyone have any contacts to ping them and see if anythings going on? Rlink2 (talk) 20:52, 2 September 2021 (UTC)

See talk pages of WP:WEBCITE and WebCite. -- GreenC 01:37, 3 September 2021 (UTC)

I noticed the same. Shouldn't this be updated to the Internet archives section of our very help page? Cheers! Jayaguru-Shishya (talk) 19:42, 8 December 2021 (UTC)

Half-archived youtube links

Sometimes, when there is a Youtube video that is already dead, archives will have a "half-archived" version of the page where the comments, description, uploader, etc.. are all saved but not the actual video. In this case, should an archived link be placed, or should it just be marked as permdead with no archive? Rlink2 (talk) 14:36, 9 October 2021 (UTC)

Alternate vs alternative

Dhtwiki I know that the use of "alternate" meaning "alternative" is a common US usage, but even Merriam Webster explains why it's not good usage. See (UK) Cambridge here and here; Collins (alternate); US sources here and here; and Australian sources here and here. I could go on! "Alternative" works for everyone. Laterthanyouthink (talk) 00:00, 14 October 2021 (UTC)

Keeping dead links

Greetings!

To cut a long story short, at the Keeping dead links section of our very help page, it is said that:

Do not delete a citation just because it has been tagged with {{dead link}} for a long time.

I understand that on some occasions this is advisable (like in the example given w.r.t. the Yale PhD thesis), but isn't that exactly what we want to do in case of 1) permanently dead and 2) questionable (low-quality) sources that cannot be verified? Any material that fails to be verified may be removed anyway, so I think such an addition just creates more confusion than it clarifies anything.

Therefore, I'd like to suggest to remove the very last sentence of the paragraph. Removing unrecoverable links would make things clearer, and would allow other editors to look for alternative sources after replacing the {{dead link}} tag with a more appropriate {{citation needed}} one.

After all, this is a help page, and I think we shouldn't give such strong imperatives here. Just might confuse some editors already looking for answers to their questions and doubts even more.

Cheers! Jayaguru-Shishya (talk) 20:35, 8 December 2021 (UTC)

Just because a citation no longer has a valid URL, doesn't mean it isn't a valid source: a published book, for example, where a link to an online preview has been lost, but the book itself supports article text, something that could be confirmed by consulting a print version at the library. The article might benefit from even apparently dead-linked, purely web-based information continuing to be referenced, as someone might be prompted to find an archive snapshot or link to a page at a reorganized website with the citation left in place. Dhtwiki (talk) 22:42, 8 December 2021 (UTC)

@Dhtwiki: True. I should have clarified better that I am referring to web-only pages, not to the printed ones. Of course, if we are talking about printed sources, the removal wouldn't consider them (since they're ... well, printed ones).

However, the current help page gives an erroneous idea that even some permanently dead, low-quality web pages may be kept in an article — thus preventing further tagging for {{citation needed}}, and the eventual removal of unreferenced material — just on the grounds of being tagged by {{dead link}}. Jayaguru-Shishya (talk) 23:14, 8 December 2021 (UTC)

Categories: