Misplaced Pages:Requests for comment/Archive.is RFC

< Misplaced Pages:Requests for comment

This is an old revision of this page, as edited by Hasteur (talk | contribs) at 20:56, 27 September 2013 (Undid revision 574789004 by 79.47.98.149 (talk) It's exceedingly poor form to edit others comments... Please feel free to quote, but DO NOT edit other people's comments.). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Revision as of 20:56, 27 September 2013 by Hasteur (talk | contribs) (Undid revision 574789004 by 79.47.98.149 (talk) It's exceedingly poor form to edit others comments... Please feel free to quote, but DO NOT edit other people's comments.)(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

Please consider joining the feedback request service.

An editor has requested comments from other editors for this discussion. This page has been added to the following lists:

When discussion has ended, remove this tag and it will be removed from the lists. If this page is on additional lists, they will be noted below.

Recent events related to archive.is have left Misplaced Pages's links to that service in a state that requires a community decision.

Background

"Archive.is" is a website that functions similarly to the more established Wayback Machine: Both provide an archiving service whereby snapshots of web pages across the internet are saved in a vast repository. In case archived pages become unavailable at their original locations, or their content is removed or changed, these archive services provide a static backup of each page, each of which can be linked to with presumably more assurance that their content will remain online and intact. Compared to Wayback Machine, which is much older and established, Archive.is is a newer competing service. Misplaced Pages articles have commonly used links to Wayback Machine's version of web pages for use in their references in order to combat link rot.

A bot called RotlinkBot, created by User:Rotlink, has recently begun linking Misplaced Pages articles to the new Archive.is service. This bot was not approved, and was therefore subsequently blocked.

Following this block, the bot was used in an anonymous operation using IPs from three different Indian states, Italy, Hong Kong, Vietnam, Bulgaria, Qatar, Latvia, Hungary, Slovakia, Romania, Brazil, Argentina, Portugal, Spain, France, Mexico, Austria, and South Africa, raising strong suspicions that the IPs were not being used legally. These IPs, and User:Rotlink, self-identified as the owner of archive.is, were subsequently blocked. Rotlink has not commented on any of the blocks.

Over 10,000 links to archive.is remain on Misplaced Pages.

Points to consider

Archive.is is a relatively young archiving service.
No one has found any problems with the quality of archived links. So far as anyone can determine, archive.is is presenting an accurate record of all material it claims to archive.
In this discussion, User:Rotlink identifies himself as the owner of archive.is.
Rotlink wrote User:RotlinkBot, a bot which created links. It was unapproved, and blocked because of unapproved operation. Again, the bot seemed to operate reasonably well: minor defects were noted, but nothing serious. The motivation for the block was the unapproved operation.
RotlinkBot did not exclusively add links to archive.is: it added links to other archiving sites as well, and apparently in preference to archive.is in some cases.
On September 3, 2013, 94.155.181.118 (talk · contribs · deleted contribs · logs · filter log · block user · block log) began inserting links to archive.is, as well as links to other archive sites. This appears to be RotlinkBot running anonymously.
By September 17, 2013, the list of IPs that were inserting had grown. It included at least the following:
- 188.217.203.245 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 61.15.46.216 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 27.3.85.26 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 84.43.147.53 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 176.202.105.40 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 117.239.64.166 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 117.223.161.182 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 87.110.16.100 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 89.132.64.81 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 78.98.25.91 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 89.34.75.123 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 189.34.9.60 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 186.19.57.19 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 188.251.236.114 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 87.223.115.147 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 83.157.124.218 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 90.163.51.63 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 85.66.241.59 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 89.36.214.186 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 95.168.56.11 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 117.215.1.168 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 187.208.150.144 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 78.142.126.177 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 105.236.16.88 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 41.228.51.25 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 89.228.46.37 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 60.50.51.210 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 190.57.181.70 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 62.63.132.36 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 122.178.159.163 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 178.79.34.86 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 109.175.88.133 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 190.244.69.154 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
This list of IPs included three different Indian states, Italy, Hong Kong, Vietnam, Bulgaria, Qatar, Latvia, Hungary, Slovakia, Romania, Brazil, Argentina, Portugal, Spain, France, Mexico, Austria, and South Africa.
Based on that pattern of IPs, User:Kww concluded that not only was RotlinkBot being used anonymously in violation of its block, but that the IPs being used were likely to be anonymous proxies or a similar form of botnet. He blocked Rotlink, all of the IPs, and a few more IPs that were discovered later.
He called for edits by the IPs to be rolled back at WP:ANI: https://en.wikipedia.org/search/?title=Misplaced Pages:Administrators%27_noticeboard/Incidents&oldid=573791554#Mass_rollbacks_required
Many editors and admins reverted.
At this point, over 10000 links to archive.is remain in Misplaced Pages.
At this point, User:Kww has no firm proof of illegal activity, although he remains of the opinion that this is likely.
User:Rotlink has made no comment in respect to his block.

The current situation is awkward. It's impractical to place the link on the spam blacklist, because the spam blacklist will interfere with editing any of the articles that contain a link to archive.is. It seems strange to have so many links, but to claim that no more links can be added. Several editors view the rollbacks themselves as destructive. We need to figure out how to go forward.

There would appear to be several options.

Options

1. Remain where we are

Misplaced Pages is notoriously inconsistent, and this is just one more case. There's no need for blacklisting, no need to remove the existing links, and no need to restore links that were removed due to the improper bot use.

Support Per my comment in the discussion section below. I put a lot of work into changing over broken links to archive.is links when pinkpaper.com went offline. I don't see why my hard work should be undone because of someone else's misbehaviour. If someone spammed links to BBC News all over Misplaced Pages, and we found it was a BBC News employee, that wouldn't change the fact that BBC News is a useful and reliable source. Behaviour issues on the part of this unapproved bot operator doesn't change the fact that archive.is remains a useful service that ensures a fair number of references are actually verifiable. —Tom Morris (talk) 12:52, 21 September 2013 (UTC)
Support I added many archive.is links to snooker-related articles, because they can't be found on any other archiving service due to nobot restrictions. Armbrust 16:45, 21 September 2013 (UTC)
Support. It's a sticky situation, but I see no need for knee-jerk reactions such as using a bot to remove all links. Don't fight fire with fire. (sorry for the cliché...) — This, that and the other (talk) 01:55, 27 September 2013 (UTC)

2. Revert the reversions

Since no one has found a problem with the existing links that were reverted, the reverted links should be restored.

I'm going to park myself here, although I do think those added after Rotlink's indef by IPs shouldn't be reinstated. Kww seems to be going around in circles with whether they want the site blacklisted or not, and are jumping to conclusions that simply have no foundation whatsoever. Lukeno94 (tell Luke off here) 12:51, 21 September 2013 (UTC)

3. Complete removal of archive.is

We should write a bot which searches for all links to archive.is, replacing them when possible, and removing them when not. When this bot is complete, archive.is should be placed on the blacklist.

I prefer this option. It is based primarily on my belief that the IPs were not being used legally. This makes me distrust the motives of archive.is, and suspicious that we are being set up as the victim of a Trojan Horse: once the links to archive.is are established, those links can be rerouted to anywhere. If illegal means were used to create the links, why should we trust the links to remain safe?—Kww(talk) 15:57, 20 September 2013 (UTC)
Support this option as second choice.--v/r - T P 21:01, 20 September 2013 (UTC)
Support with no prejudice to human readdition - do not blacklist the link. A bot cannot determine whether the link is appropriate, but if a human editor does, he should be free to add it. ~Charmlet 21:45, 20 September 2013 (UTC)
Support. This is a new and uncertain operation, and there are serious questions about its ethics and stability given what has happened. When the operation has been around long enough to show it is trustworthy, we can reconsider using it then. But as it stands, we should remove it completely and flag it as questionable so as to save editors from working on creating links to it, which later either break down if the operation closes, or lead to adverts as the owner indicates might happen. The Misplaced Pages article on the operation itself is currently at AfD, with six delete comments and one keep: Misplaced Pages:Articles for deletion/Archive.is. The company may have been using Misplaced Pages to make themselves known, and to pave the way for the site owner to make a profit. It is not our purpose to promote or advertise any company. Alexa shows that Misplaced Pages is the website's fourth largest direct supplier, and an indirect supplier via mirrors and Google searches. The website is a start up that is relying on Misplaced Pages to build traffic. The owner has indicated that ads may appear after 2014. We should wait until the operation has proved itself before setting up thousands of links to what may become an advert site. SilkTork 09:30, 22 September 2013 (UTC)
Let me dispute your assertions point by point. You used the words "uncertain operation", when in fact it has high quality archives, in HTML/CSS (mal-Javascript free) and image form for each page archived. Its acquisition of a page is quite certain and reliable because it is not a crawler (it only archives a single page for any given citation), and it, like WebCite, is not subject to the vagaries of robots.txt (web.archive.org's required Achilles' heel). It offers DMCA-based content removal for those website owners who do not wish their content archived, and since it only archives one page of a site upon request, site owners do not have an onerous task requesting thousands of removals; just (likely) one. Its uptime has been 100% as far as I have been able to determine. No, archive.is has been quite certain. You seem to have forgotten the long archive.org outage of a couple of years ago, when it apparently stopped archiving pages for no publicly disclosed reason. So archive.is is more "certain" than web.archive.org, so far, certainly in terms of memorable outages. And there have been Webcitation.org outages, and a threatened cessation of new archiving this year due to funding problems. So archive.is is more "certain" than that. 2. Linkrot stops for no one. At that time of the Wayback Machine's long dark "pause", let's call it, as I saw links in citations of RS, including web-only RS disappear, I was alarmed. There were a few alternate archiving services, several have been listed at WP:LINKROT at various times; I tried to use all of them, but six months, or a year or two later they were gone, or went to a subscription model. Archive.is has now lasted IMHO long enough (9 full months) to merit a measure of respect and forebearance of minor transgressions (which only occurred, I think, this month). And yes, the recent events have been minor, and, I contend, in service of the Five Pillars, with no obvious commercial intent. 3. Commercial intent including advertising: Recall how the Wayback Machine works. Alexa Internet archives interesting pages, as measured by its browser plugin and other means, as well as crawling the web, for its commercial, paying clients. Alexa, several months later, releases that archived content to web.archive.org. In other words, the archive is the handmaiden to the commercial service. If you object strongly to any hint of an organization being funded, then you would logically stop supporting the use of web.archive.org links. 4. Notability: You assert that Archive.is went to AfD. Ok, so it was nominated without notifying any other interested editors (like recent editors of the article) per WP:AFD#, and was deleted due to alleged non-notability. That's not a reason not to use the service, and so is moot. WebCite itself was non-notable, and we used it. Now it's notable. So what? Observe the article WebCite - is that really notable enough for an article? Observe that aside from one NYT mention, the cited sources are all primary, or cowritten by its founder, Eysenbach. I'm not advocating AfD, but hey, it's vulnerable. So are you then going to campaign for webcitation.org link removals? I wouldn't think so. 5. "The company may have been using Misplaced Pages to make themselves known" - this is bald conjecture. "May" cannot be a valid reason for removal of all links to it, because the benefit to Misplaced Pages completely outweighs the possible benefit to archive.is. We also have no evidence that archive.is is a "company" in the sense of a commercial venture at this time. Is it a company? We do not know that. Also, only deadlinks would result in an interested reader clicking on the blue "archive" link in the citation and seeing the archive.is page. Archive.is links in {{cite web}} appropriately only assert "deadurl=yes" for dead links. If it was a pure linkspam play as you fear and infer, the bot would have asserted |deadurl=yes for all filled-in cite web templates. Your argument to delete all archive.is links does not logically follow from the occasional link traffic entrained from readers clicking on citation links to verify them. Further, we have no evidence of the size of traffic outbound to archive.is. Basing such a scorched-earth action on so little evidence is not appropriate. 6. You write "It is not our purpose to promote or advertise any company. Alexa shows that Misplaced Pages is the website's fourth largest direct supplier, and an indirect supplier via mirrors and Google searches. " I suppose it's just ironic, but Hey, Look, Misplaced Pages directly advertises Alexa statistics on every website article we host. In fact Alexa shows that Misplaced Pages is the 7th largest source of referer traffic to Alexa.com. Alexa is a much larger commercial organization than archive.is. We don't have actual traffic numbers (thanks, Alexa!) so direct comparisons are floppy for now, but please don't play the holy card about directly linking to commercial sites. As for advertiser-supported sites, don't forget the genuflecting at film articles, in nearly every single review section, first citing and linking to Rottentomatoes.com and Metacritic.com, those oh-so-reliable sources (Fox News quotes them, so they must be reliable!). My point? 'Commercial' and 'ads' aren't problems; it's the value provided to Misplaced Pages that matters. The community thinks that those commercial sites add enough value to overlook the ridiculous blatant conflict-of-interest advertising on them. But there aren't ads at archive.is now , so ads, and the fear of ads, just can't be considered relevant now. 7. It's easier to ask forgiveness than permission. So, we should, lacking any proof of illegality, in fact, just go ahead and forgive the recent alleged bot transgressions without even being asked, because on balance, Misplaced Pages wins with archive.is, and loses without it as can be seen in dozens (hundreds) of dead links not held at archive.org or webcitation.org. My point? We should simply welcome any service which can stay reliable and freely accessible (preferably ad-free), and do everything possible to help that service comply with content and behavioral community standards, and keep trying indefinitely, because Misplaced Pages needs verifiability of web content, just as much as print, TV, and radio content. Publishing is publishing, and archiving of ephemeral, but reliable sources, is important. More important to me, than to you, apparently, but I do hope to convince you. --Lexein (talk) 11:20, 25 September 2013 (UTC)
The problem is the people cannot fit the behaviour into any customary category. It does not look like usual editor's behaviour, does not look as usual spammer's behaviour, etc. They do not know what expect from it. They still cannot stop the bot fixing the dead links. They have angst.

And as it is something unusual, the usual verbs cannot be applied to it. Can the ants from the outer space be "forgiven"? No. They can be only wiped out.

It is an existential issue, not a technical one.

That's why your brilliant arguments won't work. 95.225.130.13 (talk) 18:36, 25 September 2013 (UTC)
Support per SilkTork (moved from option 4). And after reading the FAQ, it seems apparent that this is a one-man operation. Combined with the possibility of ads in the future, and the evidence we have on his ethics (which make me doubt this will even be a viable ad-free service for as long as promised), we should clean this up while it's still somewhat manageable and wait to see what archive.is becomes before allowing our articles to become reliant on it (as otherwise we could end up with an even tougher problem to deal with). equazcion 09:48, 22 Sep 2013 (UTC)
According to a response on the website there are two people running the operation. So it was either the owner who has been inappropriately using Misplaced Pages or the owner's partner. Either way, not a good show. SilkTork 09:58, 22 September 2013 (UTC)
Support for the time being. Although I've encouraged Rotlink to follow procedure at every step and promptly addressed his BRFA, it is clear that the likelihood of an ulterior motive is very high, as he has circumvented our processes at every step once he realized they will take time. And there is absolutely no reason to be in such a hurry to add massive numbers of links to one's website unless one really wants to drive traffic to their website. I know this is speculation, and I'd like to be proven otherwise. But until this is a 1-man operation, has no financial safety proof, doesn't follow robots.txt, the owner is this impatient to add links and doesn't respond, uses anonymous proxies to further add links, and there are no guarantees that the website doesn't suddenly start serving ads, I cannot endorse this archival service. Per SilkTork, the ethics and stability are too uncertain. This service first has to prove it is well-meant, reliable, and open -- two of which are already under significant doubt. For example, we have Webcite as a perfect alternative and every link rightfully archived at archive.is could have been archived at Webcite. — HELLKNOWZ ▎TALK 13:58, 22 September 2013 (UTC)
If you went to the comparision with WebCite, there are more points.
- Supporting robots.txt which was designed for crawlers is not relevant to on-demand archives. It prevents them to archive pages from many sites.
- It is WebCite that is 1-man enterprise experiencing financial problems which could only escalate after moving to expensive Amazon EC2 cloud hosting. 77.110.134.11 (talk) 14:53, 22 September 2013 (UTC)
Oppose. I think, the argument of supporters are very emotional. They appeal to ethics and try to predict the future. My vision of the future is:
- The revertion will be mass scale vandalism.
- The editors will scream like User:Lexein. Most of them do not read ANI and RFC and do not take part in this discussion. But they get notified about the changes in their articles.
- Many sources available only on archive.is (see ANI discussion for examples). The editors will have to circumvent the ban of the domain. Do you know how do they do it for currently banned domains?
  - By using Google Cache. This will result in "nice" URLs http://webcache.google.com/___&q=http://archive.is/http://webcache.google.com/___. This URL is correct against the new rules but it is fragile.
  - By using WebCite. This will result in pages with HTML hosted on WebCite and images hosted still on Archive.is. Fragile to downtime of any one of the services.
- Assuming that the bot was seeking for traffic, it can also circumvent the domain ban using Google Cache or WebCite. Both keep JavaScript on archived pages and the script can redirect trafic anywhere. I would say, it is even easier to steal traffic this way. 193.86.243.17 (talk) 07:49, 23 September 2013 (UTC)
I'm confused: are you saying that users would be right (or wrong) to oppose (like me) the punitive mass reversion of archive.is links? And how is anything I wrote "screaming"? Uncool. And I do read both ANI and RFC, so, wth? --Lexein (talk) 13:07, 24 September 2013 (UTC)
It is not right nor wrong. I think a lot of editors who think like you are not interested in reading ANI and RFC. A lot of editors (actually, admins) who think like Kww do. If #3 would win the consensus, it would be only as the result of the bias. The editors who think like you will get know about the desicion when their arcticles will get touched by the reversion edits. They will argue and ask wtf. The admins will answer "There was RFC and if you did not read it, it is your problem. Now we have a solution by consensus and only have to reify it. It is too late to discuss". 88.15.83.61 (talk) 16:02, 24 September 2013 (UTC)

Sorry, if word "screaming" offended you. I peeked it from ANI topic with no thinking how rude it sounds. Sorry again. 88.15.83.61 (talk) 16:02, 24 September 2013 (UTC)

"Vandalism" is a deliberate attempt to compromise Misplaced Pages. A mass revert in good faith is not vandalism, since the aim is to improve Misplaced Pages, regardless of whether the result does so.

Circumventing Misplaced Pages policy is pointy as well as being against policy. If some editors do that, we can deal with it as it becomes a problem, but I don't think we should make up hypothetical problems to stop us doing things that are a good idea.

—me_and 16:51, 23 September 2013 (UTC)
I meant the following case (193.86.243.17 and me is the same person, first IP is airport wifi): I would like to edit an article, to fix typo or something minor. But when I try to save the page it is not possible, becase the page has a link to a banned domain (my edit has nothing to do with the link). Maybe, you have admin rights and never hit this case, but it is very common. The editor has the choice: not to save his edit, to remove the link, to link it via bit.ly... oops, it is banned as well... then to link it via WebCite. A lot of links to WebCite, archive.is and Google Cache are there not because the original links are dead, but because they are banned. 88.15.83.61 (talk) 20:05, 23 September 2013 (UTC)
Oppose I support verifiability and protection against link rot. I oppose the assumption of bad faith against the operator(s) of archive.is by a large number of editors and administrators here. The operator(s) of archive.is have stated that advertising is very unlikely, because its operation is cheap, and funded by income from other projects. The quality of archive.is content is high, in general better than both archive.org and webcitation.org. Up until the alleged bot operations, archive.is was only an asset to Misplaced Pages. Its archive is still an asset. Misplaced Pages's Five Pillars call for building an encyclopedia with verifiable content, based on reliable sources. IMHO archive.is contributes to that, and every link to archive.is should be maintained, as long as the archive link doesn't remove the original broken link. --Lexein (talk) 13:07, 24 September 2013 (UTC)
Support Rotlink (and subsequently Archive.is and RotlinkBot) have burned a great amount of good faith from within the community. Rotlink was caught running an unauthorized bot and did some effort in trying to get it approved. Rotlink withdrew the request for approval on the bot task. When it was discovered that a great many IP addresses were adding archive.is links in the same way that RotlinkBot was, there was cause for blocking on the grounds for suspicion that the bot had been distributed to a wide collection of sites (possibly mirrors for Archive.is?) and started up. That no explanation has been forthcoming is indicative (in my mind) that Rotlink knows they were caught in the cookie jar, and are trying to weasel their way out of accepting responsibility. Rotlink has shown a interest in furthering their nascent archiving service over the expressed viewpoint of wikipedia. Therefor it is incumbent on wikipedia to divest itself of this Archiving service untill it becomes a standard accepted elsewhere and we recieve an accounting of Rotlink's actions and how they will resolve disputes such as this in the future. Hasteur (talk) 13:29, 24 September 2013 (UTC)
Strong Oppose as a stupid overreaction that damages the hard work numerous editors, including me, will have done. Anyone who has ever bothered to look for archives should know just how hard it can be to find archives of some links. And the questions over the future of archive.is are irrelevant; we're not about to advocate the removal of WebCite links, yet that has a MUCH less clear future. Lukeno94 (tell Luke off here) 15:34, 24 September 2013 (UTC)
Support Remove all archive.is additions and links, irrespective of who or what added them. Remove all references to archive.is. Apply a scorched earth policy to make it absolutely clear to everyone, that setting up an archiving service, archiving hundreds of thousands of URLs mentioned in Misplaced Pages references and then adding those archive details to the references will not be tolerated. Tens of millions of Misplaced Pages references do not link to an archive copy. Increasing that figure by a few hundred thousand makes no material difference to the overall number. Additionally, WebCite seems to be in financial trouble. They could easily add adverts to fund their site at any time. Misplaced Pages should remove all WebCite archive links long before this happens. Apply the same scorched earth policy and teach these people a lesson. Make it clear that setting up an archive service, archiving hundreds of thousands of pages and then expecting hundreds of thousands of free links from Misplaced Pages is always going to be doomed to failure becaue the project rejects all such offers of 'help'. Fundamentally, there's more to this. There's no need to ever link to an archive copy of anything. Most material currently being archived will be of no interest to anyone in a hundred years time. Truly interesting things stick around. By archiving hundreds of thousands of Misplaced Pages references, the "natural selection process" is being usurped with huge quantities of trivia being preserved that should not be. - 91.84.105.112 (talk) 16:08, 24 September 2013 (UTC)
91IP Please assume good faith on the actions of others (as I assume you'd want good faith assumed on yours). We are not supposed to enable/reward blocked editors ever. Hasteur (talk) 16:43, 24 September 2013 (UTC)
(NB:91IP is not me, my vote is above). It is questionable which one of the solutions can be called "reward blocked editor". I would say it is #3, as its consequences will draw big attention to the archiving problem in general and to archive.is in particular. Ill fame is also promoutional. 88.15.83.61 (talk) 17:40, 24 September 2013 (UTC)
I'm strongly concerned that the 91IP is simply here just to make a point. Lukeno94 (tell Luke off here) 18:19, 24 September 2013 (UTC)

3a. Allow dead links to remain permanently in Misplaced Pages. Change archive.is links back to dead links to the original content. Let users find archived copies by themselves if they can.

It is not traditionally the business of an encyclopaedia to help readers to obtain out-of-print references, or their modern equivalents, as far as I know. Hypothetically, various third-party apps and third-party websites could choose to shoulder the legal risks, if any, of presenting modified versions of Misplaced Pages articles with archive links added. Misplaced Pages's content licensing allows this. Editors would be free from legal risks and would have more free time to add actual content to the encyclopaedia.--greenrd (talk) 19:39, 23 September 2013 (UTC)

Support as proposer primarily on the grounds of freeing up editor time. Automation is good - even if someone else is doing it.--greenrd (talk) 19:39, 23 September 2013 (UTC)
Comment. A lot of links to archive.is, archive.org, WebCite and Google Cache are not dead. They are from domains banned in Misplaced Pages. That's why the editors had to use archiving service. Reverting such links means using banned domains.This will prevent the articles from further edits. 88.15.83.61 (talk) 19:48, 23 September 2013 (UTC)
When hosts which formerly hosted RS content die, frequently their domain-name-squatted replacements host malware; this is a good reason for them to become blacklisted. Also correctly blacklisted are notoriously unreliable sources which host only user-generated or copyright-violation content. But for the first, I will link to an archive of a URL, from a time when the content was valid. I'm stating this only to point out that archives are not typically used to maliciously bypass the blacklist, it's to link to an archive of an actual reliable source. When I find spot archive links to bad sources (blacklisted), I mark them as {{dubious}} or remove them entirely and tag the claim {{citation needed}}. --Lexein (talk) 13:07, 24 September 2013 (UTC)
Oppose. Dead links are anathema. Verifiability is important. Misplaced Pages exists in an internet/web world: citing a source which is reachable by a URL, then letting that URL go dead with no archive of it, is wrong at several levels. Reading WP:LINKROT will help here. --Lexein (talk) 13:07, 24 September 2013 (UTC)
So is circumventing the community consensus. Want to guess which one is more of an anathema? Hasteur (talk) 13:36, 24 September 2013 (UTC)
Strong Oppose - What on earth are you on about Greenrd? This doesn't free up editor's time at all; in fact, it wastes it by ruining hard work, and will cause multiple GAs and FAs to fail various bits of their criteria. Utterly stupid idea. Lukeno94 (tell Luke off here) 15:32, 24 September 2013 (UTC)
UK law provides for the unauthorised copying of orphaned works, but (a) an onerous search for the rightsholder is (rightly in my view) required, (b) by definition web pages are not orphaned works when they are archived by archive.is because they still exist and (c) in any case this does not help English Misplaced Pages, which must follow the laws of the United States. Therefore I believe to do it (which is what archive.is lets people do, unlike archive.org which just does it itself) is onerous one way or another, and it should be left to someone else. At least archive.org makes its own decisions about what to copy, so an editor cannot be accused of initiating the unauthorised reproduction of a work when linking to it.--greenrd (talk) 18:51, 27 September 2013 (UTC)
I think, you can compare it with any photo hosting or code hosting like BitBucket. The user can enter URL of repository anywhere in the Internet and Bitbucket will download and publish it without guessing how the content in the repo is licensed. Until the copyright holder claims his rights.

Or should we also ban bitbucket.org, github.com, sourceforge.net, flickr.com, facebook.com, ... because it is possible that some content hosted there is unauthorisely reproducted from another website ? 79.47.98.149 (talk) 19:20, 27 September 2013 (UTC)

4. Replace bot-added archive.is links where possible, leave human-added links intact

We should replace links to archive.is that were added by the bot, where possible. Where no replacement is available, the links should be left in place. Links added by human editors should be left in place as well.

The circumstances surrounding these links leave me uneasy about leaving them alone (a startup trying to establish itself by automatically spreading its links across Misplaced Pages, use of proxies, unapproved bot by unresponsive entrepreneur). However I'm wary of cutting off our nose to spite our face -- if they have the only viable links to the content we need for a substantial number of references, leave the links alone in those cases. But in situations where there is a replacement available at a different, reputable service, those links should be switched over. Links added by people should also be left alone, however, as editors should be allowed to link to whichever service they want. The pervasiveness of the bot-added links establish a possible artificial trust among editors who see them that I think warrants undoing. equazcion 16:03, 21 Sep 2013 (UTC) (moved to complete removal)

This is a sensible option; the easiest way (albeit not a foolproof way) is to simply nuke those added by IPs. Lukeno94 (tell Luke off here) 18:47, 21 September 2013 (UTC)
Support, as long as archive.is remains ad-free and there remains no evidence the archive are not faithful renditions of the original sites. NE Ent
Support: I don't see enough evidence to support reverting legitimate editors' work, but I also believe that we should stop unauthorized bots from being able to edit Misplaced Pages even when their edits are ostensibly positive. —me_and 09:03, 23 September 2013 (UTC)
Support, with the caveat that it should be limited to articles that have been edited by the unapproved bot and its IPs, and that it not be blacklisted, due to it being reasonable for editors to add archive links, including this one. The only problem is the unapproved bot and its sockpuppets. Van Isaac_{WS Vex} 00:19, 27 September 2013 (UTC)

5. Contact Rotlink off-wiki and get them to seek approval of their bot

Contact Rotlink off-wiki (using email perhaps) and encourage them to follow the community's process for bot approval so the bot can operate within policy.

Support as proposer and first choice.--v/r - T P 21:03, 20 September 2013 (UTC)
Support in addition to number 3. I don't support a bot for what I believe should be human judgement, but regardless, I think he should be allowed to use the community processes for approval if the community so wishes. ~Charmlet 21:45, 20 September 2013 (UTC)
Support a bot which would focus on, within Misplaced Pages citations, archiving and/or finding alternate URLs for notoriously ephemeral, doomed sources such as Google cache, AP (or better, news sites hosting AP articles), and publications which regularly purge old articles for no obvious reason, such as the various Murdoch media outlets. Such a primary focus would fill a gap not served by any existing bot that I know of. --Lexein (talk) 13:19, 24 September 2013 (UTC)

6. Copy Archive.is and WebCite content to Wikimedia-controlled server until it is too late

It is only 10Tb (Archive.is) and 2Tb (WebCite). $500 question (3 * $165 (4 Tb HDD)). 193.86.243.17 (talk) 07:49, 23 September 2013 (UTC)

x3 for redundancy. Not that that makes it exorbitant or anything.--v/r - T P 13:29, 23 September 2013 (UTC)

It's never that cheap or that easy. Setting up and maintaining such a service requires more than simply the disk space. In any case, there discussions about doing this for WebCite at meta:WebCite. —me_and 16:59, 23 September 2013 (UTC)

Support. Do not host the archive files. Only copy and keep. If WebCite would go down, give the files to archive.is or archive.org and ask them to host them. If archive.is would go down give the files to WebCite or archive.org. 88.15.83.61 (talk) 19:42, 23 September 2013 (UTC)
Oppose Wikimedia is not in the business of providing hosting for "worthy" projects. Let Meta handle the discussion. If Wikimedia picks up the option we'll get a great fanfare of trumpets to announce this new option for all wikis. Hasteur (talk) 13:39, 24 September 2013 (UTC)
This is effectively equivalent to this proposal on meta. As intriguing an idea as it may be, it presents legal challenges and could have a substantial price tag. Even if the community was keen, this would be more in the domain of a WMF in-office decision. --LukeSurl 15:07, 26 September 2013 (UTC)
No! In meta it was proposed to copy and publish under wiki{p|m}edia.org domain. Here proposed only to copy and keep, just in case if the pessimistic scenarios described here by a lot of speakers will come true. It is much cheaper, easier in support and cannot have legal consequences. 93.148.194.81 (talk) 17:17, 26 September 2013 (UTC)

Discussion

I'm very concerned about the idea of completely removing all archive.is links, even those added by actual editors. I know of several editors who switched from WebCite to Archive.is when WebCite's future ability to archive came into question, myself being one of them. WebCite does say existing archives won't go away, but I find it hard to trust this in the long term. I'm unaware of any other on-demand services other than WebCite and Archive.is at this point, so if in a year WebCite goes away and Archive.is no longer trusted, where does this leave us? To be blunt, we're probably back to the idea of either trying to take over WebCite ourselves, or providing some funding, or...something along those lines. I know there's the issue of copyright/non-free being an issue for the Foundation, but a solution needs to be found for the long run, not simply what's convenient right now. (Sorry for rambling...) — Huntster (t @ c) 07:58, 21 September 2013 (UTC)

In case Kww's proposal does win out, there is also web.archive.org - a seperate website. It's my preferred archiving site :) Lukeno94 (tell Luke off here) 20:05, 21 September 2013 (UTC)

I know about Archive.org, but it is not "on demand". My whole point was that without the two above-mentioned sites, we won't have access to on demand services, which are needed for archiving a specific instance of a site. — Huntster (t @ c) 20:54, 21 September 2013 (UTC)

Web.archive.org has an "archive now" function, but the archived page is only made available much much later: officially 3-6 months, anecdotally 2 weeks to a year. --Lexein (talk) 19:11, 24 September 2013 (UTC)

I think the "remove all archive.is links" option is incredibly stupid. By all means, remove all links added by IPs if you really have to do this, but really... Fuck knows why you proposed this; you may as well blacklist it if you want this solution! Also, the option I want isn't there: where we reinstate all of the archive.is links added before Rotlink's indefinite block (or those inserted before their bot was indeffed), and only remove those added by IPs after their indef. Lukeno94 (tell Luke off here) 10:29, 21 September 2013 (UTC)

Please reread my comment: I explained myself. I believe the owner of the site to have engaged in illegal activity, and therefore do not trust him, his site, or his future intentions. The bot was indefed on August 18, so all links created by the IPs above were placed in defiance of a block.—Kww(talk) 16:47, 21 September 2013 (UTC)

Again, you have made that claim, but there's no real evidence to prove it; whatever happened to "innocent until proven guilty" anyway? You may not trust the site, but it is clear that several longstanding editors - including myself - do, and still trust it. Your proposal to nuke everything flies in the face of a LOT of work by legitimate editors, particularly as archive.is has often been the only accessible archive for a given page. Lukeno94 (tell Luke off here) 18:45, 21 September 2013 (UTC)

How do you think he had legal access to IPs in such a wide range of countries, Lukeno94?—Kww(talk) 19:09, 21 September 2013 (UTC)

Considering I know absolutely nothing about how VPNs and proxies work, I don't know what is legitimate and what isn't. But the actions of one person, regardless of who they are, shouldn't result in lots of other people having their hard work undone (as it can be VERY hard to find a working archive for a link sometimes...) Lukeno94 (tell Luke off here) 20:04, 21 September 2013 (UTC)
Kww, I see you presume it should be obvious to anyone that it was done illegally. I'm reasonably familiar with how proxies work and I'm not sure I understand your reasoning. If I wanted to set up proxies in several different countries I'm fairly certain I could do it legally. Could you explain what you believe to have occurred here that was illegal, and what leads you to think that? I'm asking honestly, not necessarily out of doubt. You may be more knowledgeable in these things than I am. equazcion 20:17, 21 Sep 2013 (UTC)

First is Occam's razor: what would prompt anyone to actually go to the expense of negotiating individual proxy hosts in places ranging from Qatar to Brazil to Vietnam? Second is the nature of the IPs: they aren't webhosts and servers. Instead, they are individual IPs on adsl networks, FTTH networks, cable modems, etc. Everything about the setup screams "botnet". If it was a legitimate proxy arrangement, I would expect to see webhosts and servers hosted in a small number of countries with good internet access.—Kww(talk) 00:27, 22 September 2013 (UTC)

So you think this is a network of compromised computers (just for those who might not know what botnet refers to). That does make sense, and thanks for explaining. equazcion 00:41, 22 Sep 2013 (UTC)
What would prompt anyone to actually go to the expense to do a legal or illegal setup instead of simple googling proxylist ? 77.111.172.172 (talk) 09:10, 22 September 2013 (UTC)

I think known open and advertised proxies tend to be preemptively blocked from editing. equazcion 09:18, 22 Sep 2013 (UTC)

This can explain why there are so few webhosts and contiguous blocks of IP. They were already blocked from editing. I can imagine another simple way to get proxies. We are talked about a site owner, right? Then he/she can see access logs of the site. There are usually a lot of hits from malicious security scanners (looking for SQL-injections, etc). Those IPs are proxies and can be connected back and reused. Setting up own proxy infrastructure looks too expensive. 77.111.172.172 (talk) 09:53, 22 September 2013 (UTC)

Those proxylists frequently (not always, but frequently) contain compromised computers as well. It's a common vector for virus and malware distribution.—Kww(talk) 14:42, 22 September 2013 (UTC)

Citation needed. Although I see you took my suggestion to pose an RFC, I'm seeing a raft of pointy supposition, gossip, handwaving, and assumption of bad faith, exaggerated with purple prose like "spambot" and "botnet", deliberately spreading fear, uncertainty and doubt about a resource you've personally decided to dislike and campaign against, despite showing little knowledge of the suspect service. You still don't know if it was Rotlink who actually did any archive.is additions after the block, or if it was anyone at archive.is at all, or someone else who helped out. Same IP? Ask a Checkuser. Otherwise, this is all rather weak koolaid, which I won't be drinking. --Lexein (talk) 19:11, 24 September 2013 (UTC)

I'm very concerned about the removal of archive.is links too. A while back, I tried to fix all the references that use the now offline site pinkpaper.com, the website of the Pink Paper, one of the UK's main LGBT news sources. Between archive.org and archive.is, I managed to find replacements for some but not all of the references used. The LGBT topic area tends to be filled with a lot of poorly sourced material especially around BLP subjects. Removing archive.is links is likely to leave a lot of those links broken. I don't really know what's going on with the IP and the non-approved bot account, but I'd rather if all the hard work I put into fixing PinkPaper links were removed just because of somebody else's behaviour. And I'm not keen on having BLP articles on sexuality-related topics potentially left without sources. This seems self-defeating. Whatever the problem is, please can you seek more of a calmer, less dramatic solution than removing all the links to a useful archival service. —Tom Morris (talk) 12:48, 21 September 2013 (UTC)

Thought/idea: would it be possible to wrap the archive.is links up in an external links template? (similar to {{IMDb title}} etc) That way, if the site goes hinky in the future, all the links could quickly be disabled, minimizing negative fallout. Siawase (talk) 12:03, 22 September 2013 (UTC)
- This is a good idea. Wouldn't it be better to backup its content to a Misplaced Pages server? If the site goes hinky in the future, all the links could be changed to something like archiveis.wikimedia.org instead of disabling the links and hitting the verificability issue. 77.110.134.11 (talk) 12:15, 22 September 2013 (UTC)
- My concern with this is that a template would be seen as tacit approval of these links, which I think we're a long way from having. I know I would see the use of such a template as implying the community considers these links to be A Good Thing, particularly if Archive.is had such a template while other archiving services didn't. —me_and 09:06, 23 September 2013 (UTC)

The introduction of this RfC misses the fact that User:Rotlink (user, not bot) himself added a lot of links between having his bot blocked, withdrawing his BRFA and until this was pointed out to him . — HELLKNOWZ ▎TALK 13:35, 22 September 2013 (UTC)
Do we even have any proof that Rotlink is the owner of the website, and isn't just claiming to be? I'm still disgusted that the actions of one person could lead to the reversion of a shedload of good edits by legitimate editors; regardless of whatever position they hold. Frankly, the age of an archiving site is utterly irrelevant; if it does go under, or if it does end up with adverts, then THAT is the time to propose its removal. Seems like several people have forgotten about WP:CRYSTAL... Lukeno94 (tell Luke off here) 14:37, 22 September 2013 (UTC)
- Lexein says "I have communicated with the owner" and that Rotlink is the owner. NebY (talk) 18:21, 22 September 2013 (UTC)

At the risk of looking like a total pratt, that isn't convincing. Lexein has communicated with Rotlink, who claims that they are the owner. Lexein's usage of words doesn't confirm or disprove the claim. I'd like to see something rather more solid before we jump to conclusions about whether to include archive.is links or not. Also, the presence of a Misplaced Pages article, and the reliability and/or notability of it, has precisely nothing to do with whether we use an archiving site or not; bringing that up is unnecessary and deliberately inflammatory. Lukeno94 (tell Luke off here) 19:27, 22 September 2013 (UTC)

better diff. — HELLKNOWZ ▎TALK 19:37, 22 September 2013 (UTC)
CRYSTAL applies to determining article existence and content. A modicum of speculation isn't unreasonable when it comes to technical concerns, which this can easily become if an archive site becomes widely relied upon by articles and then becomes unviable. equazcion 19:43, 22 Sep 2013 (UTC)

The possibility of WebCite going under was, and perhaps still is, very real. Did that mean we blanket removed every single link? No, it didn't. The diff Hellknowz shows a lot of technical knowledge; but it's fairly generic stuff that anyone who goes and looks things up for could come out with. Given that it is near a year old though, it appears probable, if not certain in my mind, that Rotlink is an owner, or employee. It does not, however, confirm he is the only owner; nor should it matter one iota if a website is owned by one guy, two guys, or a consortium. Lukeno94 (tell Luke off here) 19:55, 22 September 2013 (UTC)

Luke, Lexein has emailed the owner and has later confirmed that the owner is Rotlink. I was not trying to "bring up" that article (you surely know about it already) and am disturbed that you think it "deliberately inflammatory" to try to help answer the question you asked, "Do we have any proof...", by referencing another editor's research. NebY (talk) 22:20, 22 September 2013 (UTC)

You've misread my comment - I was referring to the earlier mention of the AfD on an article about archive.is; that was irrelevant and deliberately inflammatory. Not any response to my comment. Lukeno94 (tell Luke off here) 22:28, 22 September 2013 (UTC)

WebCite is no longer accepting submissions like they used to. I tried it today. They rejected my target page with the false summary claiming that my email address was incorect. Archive.is accepted my submission no problem. Poeticbent talk 21:01, 22 September 2013 (UTC)

I just archived a page fine, perhaps your e-mail address was invalid, like a stray character. — HELLKNOWZ ▎TALK 23:02, 22 September 2013 (UTC)

Must've been a temporary outage. I tried it yesterday and got the same error message. Good to know though that it's back up again for the time being. De728631 (talk) 15:23, 23 September 2013 (UTC)

Other Archive.is users. I was 2 minutes too late to add this info to Misplaced Pages:Articles_for_deletion/Archive.is into the list of notable sources which consisted of a single item. I just did "archive.is" search on majesticseo.com (pro account required) and find out that besides Misplaced Pages archive.is is used by

88IP making the case that if others use it, we should to. Editor is encouraged to make the case for why enWP should use/trust it instead of flooding us with lists of other sites that use it
International Tropical Timber Organization, WikiLeaks, Verso Books, Lenta.ru, The Guardian, Channel Register, Reuters, RTVE, The Atlantic Wire, Time (magazine), Badische Zeitung, Blueseed, MTV Sweden, Chicago Reader, The Huffington Post, Público (Portugal) and many others less notable web sites and blogs.

88.15.83.61 (talk) 19:19, 24 September 2013 (UTC)

I did not say if others use it, we should to. I only said the site has many notable users (no less notable than WebCite's). I want to see the ideas of how it can be compatible with the claims above about "botnets", "seeking for traffic", "depending on Misplaced Pages" from the authors of those claims. 88.15.83.61 (talk) 20:34, 24 September 2013 (UTC)
And I see no point to add collapsing button except of lying about what I said. It does not make the text smaller. Your lie is larger than the list you have collapsed. 88.15.83.61 (talk) 20:38, 24 September 2013 (UTC)

88IP. you keep using that word lie. Please back up the assertion with diffs, retract, or face summary striking/removal of your attempts to lobby for Archive.is. Per WP:WIAPA Accusations about personal behavior that lack evidence. Serious accusations require serious evidence (are a personal attack). Evidence often takes the form of diffs and links presented on wiki. You are now being directly challenged to back up your accusations of lies. Hasteur (talk) 12:29, 25 September 2013 (UTC)

Meanwhile we are talking here the bot is going on with fixing dead links with links to archive.org and .is
- 91.235.72.49 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 42.115.42.162 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 190.129.86.203 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 190.78.158.176 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 113.163.12.129 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 179.233.95.217 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 125.132.182.157 (talk · contribs · deleted contribs · logs · filter log · block user · block log)
- 83.145.183.122 (talk · contribs · deleted contribs · logs · filter log · block user · block log) etc.
1. What has the removal of the archive links has to do with stopping the unapproved bot ?
2. Should the reverting of bot's edits be one-time action or a continuous process (for example, another approved bot's job) ? 88.15.83.61 (talk) 06:19, 25 September 2013 (UTC)

So they are now using even less descriptive edit summary as not to be caught. Personally, this just further lowers my trust in the user and their true motives. There is no reason to be this persistent to circumvent our guidelines and policies. — HELLKNOWZ ▎TALK 10:20, 25 September 2013 (UTC)

By the way, why do you think that there was a bot?
It is not clear from the #Points to consider.
Yes, using interveaving IPs from different continents looks suspicious and displays proxy usage.
But, as IP users, they are:
1. limited in numbers of edits per unit of time. I do not know the exact limits but they exist. Approximately 4th-5th save in a row will not success because of this limit. It is much simpler reason to use proxy rotation compared to the conspiracy theories.
2. have to solve captcha; at least some part of the job (if not all) was done by a human, not by a bot. 95.225.130.13 (talk) 17:11, 25 September 2013 (UTC)

References

Category:

Misplaced Pages requests for comment