Misplaced Pages

talk:Bot policy: Difference between revisions - Misplaced Pages

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
Browse history interactively← Previous editNext edit →Content deleted Content addedVisualWikitext
Revision as of 00:27, 25 March 2005 editAllyUnion (talk | contribs)Extended confirmed users11,952 edits [] extension← Previous edit Revision as of 21:45, 27 March 2005 edit undoEugene van der Pijll (talk | contribs)37,383 edits bot to update Dutch municipalitiesNext edit →
Line 226: Line 226:
:Ok, I did a lot of work on this. The problem is, the method above fails for files above 4.7 megs (5000000 bytes) because mediawiki gives you an "Are you sure you want to upload this big file?" prompt. I tried a workaround but it doesn't work yet. You can see my script . Run it by by doing: ./wikiupload username pass file ] 08:39, Feb 10, 2005 (UTC) :Ok, I did a lot of work on this. The problem is, the method above fails for files above 4.7 megs (5000000 bytes) because mediawiki gives you an "Are you sure you want to upload this big file?" prompt. I tried a workaround but it doesn't work yet. You can see my script . Run it by by doing: ./wikiupload username pass file ] 08:39, Feb 10, 2005 (UTC)


For the record i fixed Rauls bot so that it no longer has this limit. . {{User:Ævar Arnfjörð Bjarmason/Sig}} 11:37, 2005 Feb 10 (UTC) For the record i fixed Rauls bot so that it no longer has this limit. . —{{User:Ævar Arnfjörð Bjarmason/Sig}} 11:37, 2005 Feb 10 (UTC)


== VFD Old Bot work == == VFD Old Bot work ==
Line 579: Line 579:


I am modifying Sandbot to reinforce the existence of the header every 30 minutes (all sandboxes). The main sandbox will be cleaned every '''12 hours''' instead of every '''6 hours'''. --] ] 22:36, 24 Mar 2005 (UTC) I am modifying Sandbot to reinforce the existence of the header every 30 minutes (all sandboxes). The main sandbox will be cleaned every '''12 hours''' instead of every '''6 hours'''. --] ] 22:36, 24 Mar 2005 (UTC)

== Bot to update Dutch municipalities info ==

I'd like permission to use a bot to update the pages on Dutch municipalities. Things the bot wants to do: update population etc. to 2005; add coordinates to infobox; add articles to the proper category. After that, I may use it as well for adding infoboxes to the articles on Belgian municipalities, and perhaps those of other countries. ] 21:45, 27 Mar 2005 (UTC)

Revision as of 21:45, 27 March 2005

If you want to run a bot on the English Misplaced Pages, please follow the policy at Misplaced Pages:Bots, explain the use of your bot on this talk page, wait a week then ask for a bot flag at m:requests for permissions if no objections were made.
This page is for people who are seeking community permission to run a bot. This page is not for requesting a bot. If you would like to request a bot you can do that at Misplaced Pages:Bot requests.
Authorized bots do not show up in recent changes. If you feel a bot is controversial and that its edits need to be seen in recent changes, please ask for the bot flag to be removed at m:Requests for permissions.
Older talk and subpages are at Misplaced Pages talk:Bots/Archive 1, Misplaced Pages talk:Bots/Archive 2, Misplaced Pages talk:Bots/Archive 3, Misplaced Pages talk:Bots/Archive 4, Misplaced Pages talk:Bots/Archive 5, Misplaced Pages talk:Bots/Control proposals.

Control proposals

Moved to Misplaced Pages talk:Bots/Control proposals


OpenProxyBlockerBot

With the recent surge of anonymous proxy vandalism, I think the time has come to attempt to plug the hole. I was thinking of periodically grabbing the lists of open proxies from the various websites that publish them, verifying they're open proxies and blocking them. I've already done this for the current tor outproxies, but doing this manually for the (much larger) list of normal anonymous proxies would cost too much time. --fvw* 09:49, 2005 Jan 23 (UTC)

Bots with admin privileges make me nervous. Bots with admin privileges imposing permanent blocks make me very nervous. What happens when someone is clueless enough to not plug up a trojan for long enough to be listed, and then found by your bot? I agree that something needs to be done, though. And didn't there used to be a bot that did this? —Korath (Talk) 10:45, Jan 23, 2005 (UTC)
Then the bot would block that user, just as a flesh and blood admin would. This is a good thing™. If they want help getting rid of their proxy they're free to contact the admin or the mailinglist, but until then blocked is the correct state for that host.
(For the last, I was thinking of User:Proxy blocker, which worked differently. —Korath (Talk) 10:57, Jan 23, 2005 (UTC))
Yes, it scanned all anonymous users, which gave complaints. This shouldn't even scan innocent users, so would be much less problematic. Incidentally, Proxy blocker would have blocked your "poor innocent" trojaned user too. --fvw* 11:09, 2005 Jan 23 (UTC)
Actually, I was making an assumption above that probably isn't justified (it's very very late here, so forgive my incoherence) - will the blocks be permanent, or (like Proxy blocker's) a week or whatever? If the latter, how are you planning to deal with addresses that are still on the list, or on multiple lists? Unblock it and immediately reblock? And how often will it be run? —Korath (Talk) 11:22, Jan 23, 2005 (UTC)
Dunno. Currently proxy blocks are permanent, which makes sense, but once scripted there'd be no harm in making them shorter. Unblock and reblock isn't hard once you're not doing it manually. I'd guesstimate once every two weeks or every month should be sufficient to get most of them, but it's one of those things that'll have to be tweaked based on performance. --fvw* 11:28, 2005 Jan 23 (UTC)
My last concerns (honest!): if not permanently blocking, I assume the bot'll be written so that it doesn't ever actually shorten a pre-existing block. Also, everyone makes mistakes; if other bots go berserk, though, any admin can immediately neutralize them with a block. This won't (as I understand it) stop a bot that only places or removes blocks itself. How do you plan to safeguard it? —Korath (Talk) 08:43, Jan 24, 2005 (UTC)
It should only run 2 or 3 hours tops per run, so I'd be watching it run (and responding to any talk or emails) the whole time. --fvw* 08:57, 2005 Jan 24 (UTC)
Then I support, FWIW. —Korath (Talk) 09:11, Jan 24, 2005 (UTC)
This sounds good to me. Let's try it. Please make sure the bot uses a strong password. -- Karada 11:38, 24 Jan 2005 (UTC)
You mean the account it runs under? Just as any admin account I should hope. For the testing phase of this bot I think it'd be best to just run as User:Fvw, unless there's a bureaucrat around willing to bless a bot account into adminhood. --fvw* 13:41, 2005 Jan 24 (UTC)
Ceterum censeo that someone should do something about bug #621. But that's neither here nor there. JRM 16:16, 2005 Jan 24 (UTC)
I trust fvw to run it properly and respond reasonably to any issues that develop. Support. —Ben Brockert (42) 01:45, Jan 25, 2005 (UTC)
I disagree that a bot running as an admin is the right way to go. Dumping hundreds of IP addresses into the Block log will clog it up and make it less useful when reviewing normal blocks. It sounds like the larger question of "should we block all open proxies" should be tossed about first, and then a back-end solution would be preferable. I am not opposed to blocking open proxies myself, just not convinced this is the right solution. -- Netoholic @ 14:23, 2005 Jan 24 (UTC)
This isn't really the place for that discussion. But if you do open it up elsewhere, please put up a link to it here. —Ben Brockert (42) 01:45, Jan 25, 2005 (UTC)
Great idea. Support. Neutrality 17:04, Jan 25, 2005 (UTC)
I second Brockert and everyone else in support. Ambi 03:02, 26 Jan 2005 (UTC)
Support. This would be very useful. Proteus (Talk) 16:30, 26 Jan 2005 (UTC)
Support. Vandalism by Open proxies needs to be slowed - I also like the alternative of using cookies and a 10 char ID for tracking edits by a user. Trödel (talk · contribs) 17:01, 26 Jan 2005 (UTC)
Definitely support as a trial run. If it causes problems then it needs to be reworked, but worth a try. - Taxman 08:44, Jan 28, 2005 (UTC)
Support OneGuy 05:27, 2 Feb 2005 (UTC)

Thanks for your support everyone, I'm currently running the first batch of hosts at User:Fvw/proxytest and User:Fvw/proxytest2. A rought estimate gives that this run will find around 1000 open proxies. Blocking and unblocking those regularly would be too spammy in the block log, so I'm considering indefinite-blocking them and checking them regularly and unblocking when necessary. I'll put the list of blocked proxies up in my user space so should I disappear someone else can do the checking or unblock them. Sound ok? --fvw* 08:33, 2005 Feb 2 (UTC)

Opposed. Something like this recently ran at the Japanese Misplaced Pages. One of the victims was a friend of mine from the Dutch Misplaced Pages who lives in Thailand. He automatically gets a proxy from his IP, which according to him is the largest in Thailand, which was recognised as an open proxy. To make a long story short, we could easily victimize innocent and useful users this way. - Andre Engels 20:29, 3 Feb 2005 (UTC)

That's a shame, and individual cases can be worked around, I've already agreed with Waerth that we're going to figure something out before we block the open proxy. It's not a reason not to block in the general case though. Vandals like Wik and Willy have already caused more than enough trouble. --fvw* 22:50, 2005 Feb 3 (UTC)

The blocking code currently in use on the site does a full table scan of the block table for every attempt to edit a page. That is, it does work proportional to the number of blocks in place and delays the edit page load and save until that has been done. Emergency changes to improve this behavior were requested by me and I've taken some steps myself to reduce the impact of this (I'm optimising the table almost every night). At present making the block list larger than it has to be to block those actually vandalising the site is piling on more trouble to an area which is already very troubled. Using a bot to dramatically expand the number of entries and slow down all edits while emergency steps are already being taken to try to reduce the pain in this area is contrary to the do no harm principle for bot operations. If this bot is seen adding entries, please block it immediately so that it doesn't slow down edits for all human and bot editors. If you're aware of any blocks not in place to deal with actual vandalism, please remove them until the programming changes are known to be in place on the live site. This is a typical query, taken from the slow query log of one of the database servers today:

/* Block::load */ SELECT * FROM `ipblocks` WHERE (ipb_address='81.57.248.96' OR ipb_user=3763);

Every edit page load does that. Improved code is on the way but for now the use of two terms blocks the use of any index. Updated code, when in use, will use a union instead, so each part can use an index. Work to use memcached as a preliminary check is also ongoing, because crawlers loading edit pages and causing database checks have caused the site to be unusable (and the size of the block list affects how much pain they cause as well...) Likely to be in place within a few months. Jamesday 21:51, 3 Feb 2005 (UTC)

Note that this slow query only gets hit for logged in users though, so it isn't relevant for web crawlers. Anyway, I'm backing out the proxies I've blocked until the patch to fix this I've sent to wiki-tech is applied. --fvw* 22:50, 2005 Feb 3 (UTC)

VFD Hourly Discussions

I know that Anthony_DiPierro was running a bot to update his own personal subpage to keep a track of the hourly discussions... but since the format on VFD has changed, he has not fixed his hourly discussions. I would like to use a bot for the purpose of updating a subpage on my user page, keeping a track of all the new discussions being added by the hour. -- AllyUnion (talk) 04:26, 2 Feb 2005 (UTC)

You mean just links to the newly created vote-pages or also links to changes in vote pages? What exactly do you hope to achieve, I think if we're going to run a bot on VfD we might as well make it one that organises VfD in such a way you don't need to run a personal bot to check for new VfDs. --fvw* 04:32, 2005 Feb 2 (UTC)
Each hour, it would post links to any new VFD discussions, keeping 7 days of VFD on the list. Here's a sample of what I mean: User:AllyUnion/VFD_List -- AllyUnion (talk) 07:53, 2 Feb 2005 (UTC)
Ah, right. I don't find it very elegant (and I think doing it hourly instead of on-demand maybe wasting server resources unless a lot of people use it), but as long as VfD isn't reorganised into something that can provide something similar, support. --fvw* 08:20, 2005 Feb 2 (UTC)
Well, there is no elegance in utility, unless you make it that way. I'm going for functionality over pretty. -- AllyUnion (talk) 11:23, 2 Feb 2005 (UTC)
Support. The server resources used by running a script once an hour are negligible. anthony 警告 20:33, 4 Feb 2005 (UTC)
The bot is using pywikipediabot framework, and is currently running to do this task. -- AllyUnion (talk) 02:24, 7 Feb 2005 (UTC)
The bot works! And if there is no objection soon, I would like to request a bot flag for the bot. -- AllyUnion (talk) 10:47, 8 Feb 2005 (UTC)
I don't think you should set a bot flag on an account that is not used exclusively as a bot. Allyunion used to be your real account. I would rather this were renamed to a non-used account, and one that isn't so similar to your current user name, in order to avoid confusion. Angela. 10:46, Feb 19, 2005 (UTC)
Okay. I'll create a new account for the bot. See User:VFD Bot. -- AllyUnion (talk) 04:50, 24 Feb 2005 (UTC)


WML Gateway

Hi folks! I've been working on a WML gateway so that I can read wikipedia on my moblie phone while waiting for public transport. The gateway is read only, and I've got no plans to change that. I think the bot policy is geared towards robots that edit, but I'm asking for permission here to be on the safe side. The gateway is apache/mod_perl/libwww-perl based. The gateway uses Special:Export as the backend, and then parses the wikimarkup to make a fairly basic WML page. Long term, I would like to see the functions of this gateway moved closer to the database, but I'm happy to keep it running as long as it is is needed (and I'm happy to switch it off as soon as it is not needed). There is a testing version avaiible on my DSL line (so it runs very slow when my flatmates are filesharing), however I will move it to a better hosted site once I've got permission. (I don't want to put the link anywhere too popular untill testing is finished and I've got permission to run a bot, so you will need to look at the Talk page for Misplaced Pages:WAP_access if you have a WML browser and want to try it). Thanks. Osric 10:12, 3 Feb 2005 (UTC)

  • Sounds good, definitely support. --fvw* 17:06, 2005 Feb 3 (UTC)
  • One vital precondition: the Special:Export page has a check box to include only the current version, not the full history. It is vital that your operations always use that check box. Some pages have many tens of thousands of revisions and that special page is already in the query killer because it has caused serious load problems. At present, every revision displayed can be assumed to cause at least one disk seek. Improvements are planned for MediaWiki 1.5, after not making the cut for 1.4. In addition, if your pages are going on a site which can be crawled, it's vital that you include a robots.txt file which gives any crawlers of your or other sites the same robots.txt instructions as given by the wikimedia servers. Bots crawling third party sites not doing this have led us to block the crawler of Ask Jeeves (ask.com) on a couple of occasions and proxying sites are likely to be very quickly firewalled if we notice that happening - which may mean firewalling you. Please exercise great care in this area - it's a high risk activity at present. I also strongly recommend including contact details in your browser ID string. Jamesday 22:12, 3 Feb 2005 (UTC)
    • I think you shouldn't allow crawling at all, i.e. Disallow: * in robots.txt. There's no need for robots to crawl wikipedia both through the direct interface and through the WML gateway. --fvw* 22:53, 2005 Feb 3 (UTC)
      • Try to persuade the people here that we should modify robots.txt to block all crawlers, including bots.:) The most common crawlers are search engines, which we do not want to block. Jamesday 20:14, 4 Feb 2005 (UTC)
  • If needed, I'll write a cron job to grab the robots.txt once a week or however infrequenly it is updated. However, I'd be happier with a simple Disallow: * robots.txt. How paranoid should I be about robots that don't folow robots.txt?
    • Disallow: * should do the job. If not, well, we'll find out and are happy to remove a firewall rule once the problem is gone, if we do have to put one in because something unruly ignored it. If it's practical for you you might consider rate limiting based on IP as a second line of defence. We don't currently do that but it's on our to do list. Jamesday 20:14, 4 Feb 2005 (UTC)
  • The text on Special:Export says that if I ask for a page as '.../Special:Export/page', I only get the most recent version (and thats how I'm doing it).
  • My email address is in the User-Agent header, along with a browser tag of 'WikiToWML/0.1'. Osric 00:11, 4 Feb 2005 (UTC)
    • Sounds good. Thanks. Jamesday 20:14, 4 Feb 2005 (UTC)

Security update may block some bots/tools

An emergency security release may currently be blocking access to existing bots and tools like editors. "Additional protections have been added against off-site form submissions hijacking user credentials. Authors of bot tools may need to update their code to include additional fields". To get started, you'll have to fetch the wpEditToken value from an edit form after you login, and provide that with your save form submissions. The new release is live on all Wikimedia hosted sites and is a recommended security update for all MediaWiki sites. See the release notes for more details of what you need to do to modify a bot or tool to deal with this. Please also take the opportunity to add rate limiting based on how long the last response took, if you haven't already done that based on earlier discussions of how to avoid hurting the servers. Jamesday 20:14, 4 Feb 2005 (UTC)

  • Okay, so where exactly are these alleged release notes? RedWolf 04:46, Feb 5, 2005 (UTC)
  • I opened bug 1116690 for the Python Misplaced Pages Robot Framework. RedWolf 21:35, Feb 5, 2005 (UTC)
Bug has been fixed. Kudos to A. Engels for his quick turnaround fixes on the problem. There's a new snapshot if one doesn't want to pull it from CVS themselves. RedWolf 06:06, Feb 8, 2005 (UTC)

Additional uses to bot (User:Allyunion)

Well, due to the security update, I will not be able to implement this bot yet, but I would like the approval to have my bot additionally do the following tasks at roughly exactly around every 00:00 UTC:

  1. Add each VFD subpage day. Example: {{Misplaced Pages:Votes for deletion/Log/2005 February 5}}
    Reason for this is because the VFD system has changed such that each day has a subpage for VFD based on the UTC clock. The transincludes must be included every day in order to properly present the VFD. -- AllyUnion (talk) 17:07, 5 Feb 2005 (UTC)
  2. Edit each new subpage day to include: == MONTHNAME DATE == (Like == February 5 ==)
    Again, VFD system changed, and each day will require to display its day section, allowing a person direct access to the VFD day. -- AllyUnion (talk) 17:07, 5 Feb 2005 (UTC)
  3. Take the last recently moved subpage day (transinclude) to Misplaced Pages:Votes for deletion/Old, edit all VFD subpages to include the content from Template:Vfd top and Template:Vfd bottom. (This bot will not count any of the votes, just merely include content from the templates, making it easier on the maintainers)
    Each VFD subpage includes the content from both of these templates. I believe they are not applied due to the technical limitations of the number of templates that can be used on each page. -- AllyUnion (talk) 17:07, 5 Feb 2005 (UTC)
    I have decided not to do this feature, until the bot really can do something useful. -- AllyUnion (talk) 00:59, 6 Feb 2005 (UTC)
    Actually, rethinking to add the content from Template:Vfd top and Template:Vfd bottom but as HTML comments, on all VFD subpages after moved to WP:VFD/Old -- AllyUnion (talk) 07:13, 6 Feb 2005 (UTC)

-- AllyUnion (talk) 17:07, 5 Feb 2005 (UTC)

Having a bot to update VfD would be wonderful. One suggestion: as step 3), have it create the subpage for tomorrow. There's no reason not to create it ahead of time, so it's ready when it's needed.
About adding {{subst:vfd top}} and {{subst:vfd bottom}}: I don't know if this is such a good idea. When I'm scanning /Old, I ignore the sections with colored backgrounds, because they've already been closed. Adding the templates ahead of time would make it harder to see which pages have been processed.
Also, if it's feasible, change "roughly around" to "exactly". This thing'll be run as a cron job anyway, so there's no reason not to run it at the right time. dbenbenn | talk 21:00, 5 Feb 2005 (UTC)
You're forgetting about the time it takes for it to get a Misplaced Pages page. That is why I say roughly. Because the load time to get the page is not always exact, the time when the bot posts something to the Misplaced Pages will not always fall exactly 00:00 UTC. -- AllyUnion (talk) 00:54, 6 Feb 2005 (UTC)
How do you think I manage to do it by hand every day at midnight (except when I flake out)? You load the page ten minutes early, get the changes ready, then hit "save page" at the right time. Edit conflicts aren't an issue, because the page is only edited once or twice a day. Of course, doing the same thing with a bot would complicate the code slightly. It's no big deal either way. dbenbenn | talk 05:10, 6 Feb 2005 (UTC)
The code won't be complicated any further than it already is. If the bot runs exactly 10 minutes before the UTC date rolls over to the next day, all it has to do is search for the current date, then replace {{Misplaced Pages:Votes for deletion/Log/2005 February 5}} for {{Misplaced Pages:Votes for deletion/Log/2005 February 5}}\n{{Misplaced Pages:Votes for deletion/Log/TODAY}}. Or in computing terms: {{Misplaced Pages:Votes for deletion/Log/TODAY}} for {{Misplaced Pages:Votes for deletion/Log/TODAY}}\n{{Misplaced Pages:Votes for deletion/Log/TOMORROW}}. -- AllyUnion (talk) 07:08, 6 Feb 2005 (UTC)
Additionally, if it is running 10 minutes before the next UTC day, then yes, it would create the VFD page for tomorrow. I would actual want the bot to do that one hour before hand. just to be on the safe side. Of course, all changes presume that the Misplaced Pages is working, and is not down at the time of the updates. -- AllyUnion (talk) 07:31, 6 Feb 2005 (UTC)

Well, the additional complication would be having it wait until midnight before submitting the change. Also, there are actually 5 lines to edit on VfD. See this diff for a representative example. dbenbenn | talk 08:31, 6 Feb 2005 (UTC)

As I said, it would only be adding, not removing the discussions. Removing the discussion would still be a human job, since the bot will not know when a discussion is closed. It would be changing at line 13 and line 50, but it will not be making any changes at line 19 and line 45. -- AllyUnion (talk) 09:57, 6 Feb 2005 (UTC)
I think there's a slight misconception here about how the edits to VfD work. Every day at midnight, the new day is added, and the oldest day (now 6 days old) is removed, and added to /Old. The discussions on that old day aren't closed yet—anyone can still vote. It's just that the day doesn't appear on the main VfD page anymore. The votes get closed one by one as someone gets around to closing and tallying them. There's no reason for the bot not to do the complete extension (all 5 lines of the diff above), and it would be tremendously helpful. dbenbenn | talk 21:17, 6 Feb 2005 (UTC)
Oh, if that is the case, then it can be done. -- AllyUnion (talk) 22:52, 6 Feb 2005 (UTC)
And the submission should be right at midnight or slightly thereafter. -- AllyUnion (talk) 22:54, 6 Feb 2005 (UTC)
And of course, it would move the page appropriately to the Old page as well. -- AllyUnion (talk) 22:55, 6 Feb 2005 (UTC)
Technical detail: Search for "'''Current votes''':<br>" and find and replace that. Search for "<br>'''''Old votes''':'' <br> <!-- Old votes need both their day's link moved here from Current ones, just above, and the day's link moved to the /Old file -->" and find and replace that. -- AllyUnion (talk) 22:57, 6 Feb 2005 (UTC)

Changed to exactly. I still question whether it can post two pages at once at exactly 00:00 UTC. -- AllyUnion (talk) 23:56, 6 Feb 2005 (UTC)

Please let me know when you're going to start running your bot. Also, I'm curious to see the source code, if you're willing to share it. Thanks, dbenbenn | talk 02:26, 7 Feb 2005 (UTC)
I don't see any problem with two edits at once a day, the rate limiting is just there to avoid overloading the server and allow for verification of bot actions. --fvw* 02:30, 2005 Feb 7 (UTC)
Another minor technicality: have it submit the changes for VfD first, and the change to /Old second (like, a couple seconds later). That way, you never have votes on /Old and VfD at the same time, which would contradict the Deletion process. (Of course, it's only a matter of a few seconds anyway, but I'm a mathematician. :) dbenbenn | talk 02:55, 7 Feb 2005 (UTC)
If that's how you want it, then it will submit the new VFD page first, then the Old page after that in the same script. -- AllyUnion (talk) 06:18, 7 Feb 2005 (UTC)
Oh, and yes, I will let you look at the code. -- AllyUnion (talk) 07:08, 7 Feb 2005 (UTC)

Okay. I'm all done with tasks 1 and 2. Task 3 will be another script. If anyone cares to view the scripts made based on the pywikipediabot framework, please leave a comment at my talk page. -- AllyUnion (talk) 10:54, 7 Feb 2005 (UTC)

Further additional tasks

(clarification of task 1)

  1. Post a new edited WP:VFD page
    1. Add the new UTC day section link on WP:VFD (in the Jump to specific days box)
    2. Remove the six days ago section link on WP:VFD (in the Jump to specific days box)
    3. Add a new six days ago /Old section link on WP:VFD (in the Jump to specific days box)
    4. Add the new UTC day transinclude on WP:VFD
    5. Remove the six days ago transinclude on WP:VFD
  2. Add the six days ago transinclude on WP:VFD/Old

Summary of tasks

A summary of its tasks can be found on its user page and whether or not it is working at this time: Allyunion -- AllyUnion (talk) 05:36, 8 Feb 2005 (UTC)

Interlanguage specialities at eo:

Additions added there: Links to eo: will "lead to the target" throught a redirect. Gangleri | Th | T 08:36, 2005 Feb 9 (UTC)

Upload script request

I'm in need of a script. I recently found a large dump of classical music, all CC-by-SA. I used wget recursively to fetch all of it (650 files; 5.29 gigs). I need to upload them now. I'd like to do it with a unix command line program. I figure the syntax should be something like:

>wikiupload Raul654bot:mypassword ./song.ogg --rate=100 --location=commons "This is a public domain song I uploaded {{PD}}"
  • The 1st arguement is the username and password (necessary for uploading files)
  • The 2nd arguement is the file to upload. So in the case of uploading a large number of files, I can just use the *
  • The 3rd arguement specifices the upload rate. I believe this is necessary because bots are supposed to be able to run at low speeds initially.
  • The 4th arguement specifies where it should go: en, de, commons, wikibooks, etc.
  • The 5th arguement is the upload text - IE, the text to be put on the image page.

→Raul654 07:36, Feb 9, 2005 (UTC)

cURL might already do what you're after. It's what I use for putting automated content into the Misplaced Pages - (I'm only adding article text, and I'm doing it using PHP's cURL library, so I don't think my code is likely to be of much use to you, otherwise I'd just give you the code). Nevertheless I think it should be able to do what you're after from the command line.

The following args look most applicable:

  • --cookie (you'll need a cookie for your uploading data under your username / password)
  • --limit-rate (so as to not upload too fast)
  • --form (With various args as per the Upload forms, which will include the description, and the path to the file)
  • The destination URL (which will be different depending on where you want it to go, but will presumably be the commons most of the time).

If you're lucky then maybe someone has already done this using cURL on the command line, and will let us know the command they used.

Docs and source code available at: http://curl.haxx.se/

Note that you'll probably have to call it multiple times (e.g. using "xargs"), if you want wildcard / multi-file-upload functionality.

All the best, Nickj (t) 08:08, 9 Feb 2005 (UTC)


I talked it over with Kate ,and here's what we got:

  • curl -F wpName=Bob -F wpPassword=asdf --cookie-jar ./mycookies "http://commons.wikimedia.org/w/wiki.phtml?Special:Userlogin&action=submit" > output1.html
  • curl -F wpUpload=@./Bob_test_file.txt -F wpUploadDescription=This_is_a_test -F wpUploadAffirm=1 --limit-rate 100k --cookie-jar ./mycookies "http://commons.wikimedia.org/w/wiki.phtml?Special:Upload&action=submit" > output2.html

The first one logs in Bob (whose password is asdf) and creates a cookie jar containing the log in cookie, and the second one actually does the upload (of file Bob_test_file.txt with description This_is_a_test). I tested, and the first one works *I think* but the 2nd one does not. I could appreciate someone helping debug it. →Raul654 09:38, Feb 9, 2005 (UTC)

You can try the pywikipediabot framework. They have a script called upload.py that you could use, if you made the script runable. Then you can create a perl script or a bash script based on upload.py to loop through the contents of the directory. I am uncertain if they have the pywikipedia framework ready for the commons. -- AllyUnion (talk) 10:13, 9 Feb 2005 (UTC)
Not only is pywikipediabot extremely difficult to get working, but it's nonfunctional at the moment. →Raul654 15:28, Feb 9, 2005 (UTC)
The newest version is working. -- AllyUnion (talk) 11:03, 10 Feb 2005 (UTC)

Debugging - Raul, you did good, and you were 95% of the way there:

With the first command:

  • I added wpLoginattempt="Log in" (which is the same as clicking the "log in" button) - (may or may not be needed, but it won't hurt).
  • Added wpRemember=1 (may or may not be needed, but it won't hurt).

With the second command:

  • The URL could be wrong - I used http://commons.wikimedia.org/Special:Upload instead.
  • Need the file's path in "wpUploadFile", rather than "wpUpload".
  • Add 'wpUpload="Upload file"', which is the same as clicking the button (may or may not be needed, but it won't hurt).
  • With uploading, I think you want "--cookie", rather than "--cookie-jar", since "--cookie" is read-only, whereas --cookie-jar is for storing stuff (i.e. use store to log in, then read to upload).
  • Note that you'll also want to put in a license tag in the description, otherwise the tagging folks will hunt down and nail you to a tree ;-)

Putting all that together into two commands, we get:

  • First command :

curl -F wpName=Nickj -F wpPassword=NotMyRealPassword -F wpLoginattempt="Log in" -F wpRemember=1 --cookie-jar ./mycookies "http://commons.wikimedia.org/search/?title=Special:Userlogin&action=submit" > output1.html

  • Second command (note that I have omitted the rate limiting bit, as my installed curl is so ancient that it doesn't have that option, but you probably want to add it back):

curl -F wpUpload="Upload file" -F wpUploadFile=@./Part_of_Great_Barrier_Reef_from_Helecopter.JPG -F wpUploadDescription="Photo I took in Jan 2005 over part of the Great Barrier Reef in a helicopter {{GFDL}}" -F wpUploadAffirm=1 --cookie ./mycookies "http://commons.wikimedia.org/Special:Upload" > output2.html

And to the right is the result, as a thumbnail:

View of part of the Great Barrier Reef from helicopter

All the best, Nickj (t) 23:23, 9 Feb 2005 (UTC)

Ok, I did a lot of work on this. The problem is, the method above fails for files above 4.7 megs (5000000 bytes) because mediawiki gives you an "Are you sure you want to upload this big file?" prompt. I tried a workaround but it doesn't work yet. You can see my script here. Run it by by doing: ./wikiupload username pass file →Raul654 08:39, Feb 10, 2005 (UTC)

For the record i fixed Rauls bot so that it no longer has this limit. . —Ævar Arnfjörð Bjarmason 11:37, 2005 Feb 10 (UTC)

VFD Old Bot work

On all pages moved to VFD/Old: On their talk pages, include a link to the VFD discussion, with a signed name. On all VFD subpages, include <!-- {{subst:Vfd top}} --> on the top and <!-- {{subst:Vfd bottom}} -->, with no finalization of the count. -- AllyUnion (talk) 10:25, 9 Feb 2005 (UTC)

No, this is bad. The usual way to finalise the votes is to go to /Old, pick an article that hasn't been boked, and resolve it. If they're all pre-boxed, it will make the process more difficult and less likely to be completed. —Ben Brockert (42) 06:13, Feb 10, 2005 (UTC)
Just to clarify: <!-- HTML Comment --> Anything between those two brackets will not show up on the page. -- AllyUnion (talk) 21:59, 10 Feb 2005 (UTC)
Ah, I completely ignored that they were in comment tags. In that case, please do that, it would help a lot. —Ben Brockert (42) 04:02, Feb 11, 2005 (UTC)
I've decided against creating this feature as there are two shortcuts now. -- AllyUnion (talk) 23:39, 3 Mar 2005 (UTC)

User:Allyunion - Changing template to subst

A few users forget to use {{vfd top}} as {{subst:vfd top}}. This bot is to correc that as well as: {{vfd bottom}} to {{subst:vfd bottom}}. -- AllyUnion (talk) 02:45, 10 Feb 2005 (UTC)

Can't really hurt. It would be good to check that the action was completed at the same time. —Ben Brockert (42) 06:14, Feb 10, 2005 (UTC)

Bot for maintaining Municipalities and Counties of Norway

Will be using pywikipediabot, especially replace.py and of course under supervision adhering to bot best practices and speed limits. The purpose is to maintain the various Municipalities_of_Norway: Many of the articles need a cleanup. A uniform use of the new infoboxes will help. Based on the standard numbering of municipalities, some statistical information can be added automagically. -- Egil 11:32, 11 Feb 2005 (UTC)

Request for flag made on meta:Requests for permissions#Requests_for_Bot_status -- Egil 19:59, 21 Feb 2005 (UTC)

Bugzilla:1512

  • Bugzilla:1512 describes specialities about interlanguage links to and from eo:. All the titles I could identify have been fixed manualy both at eo: and at Wikipedias in other languages. The list of "affected" titles can be found here. At some languages changes to valid interlanguage links / to the valid titles have been reverted now manualy or by bots, probably because users where not aware of this problem. Please do not hesitate to contact me if you have any questions. Regards Gangleri | Th | T 12:53, 2005 Feb 17 (UTC)
  • See also: post at sourceforge:projects/pywikipediabot 04:16, 2005 Feb 18 (UTC)

bot request

Can someone make a bot to go through the 366 days of the year and put links into the BBC's on this day page? They come in the format http://news.bbc.co.uk/onthisday/hi/dates/stories/month/date/ replacing month with the month, and date with a numerical date, not including a leading zero, 9 not 09. Dunc| 22:39, 19 Feb 2005 (UTC)

I don't think you need a bot for that... really... See my comment on your talk page. -- AllyUnion (talk) 11:10, 20 Feb 2005 (UTC)

Sorry if I didn't make myself clear. The link ought to go into an external links section in the pages in the style of "February 20" rather than the little ones that appear on the front page. It should be possible to go to March 18 (say) on any date of the year and the link should link to http://news.bbc.co.uk/onthisday/hi/dates/stories/march/18. If it is coded into the article text with {{}} then that will always take the user to today's article, rather than the date in the article title. Dunc| 12:45, 20 Feb 2005 (UTC)

This has or is being handled under User:Kakashi Bot maintained and run by me. Since this is a one time 'run', I don't think a bot flag is necessary. -- AllyUnion (talk) 19:09, 25 Feb 2005 (UTC)

Grammarbot

Old discussion moved to User talk:Grammarbot. r3m0t 11:05, Mar 5, 2005 (UTC)

Which of these should I use next? What about after that? Which of these should I definately not do?

  1. "Space before colon" (8066 articles, main namespace only)
  2. "Space before exclamark" (1710) (many false positives involving table syntax) (can exclude tables, however need to put a bit more work in for recognition of nested tables)
  3. "Space before fullstop" (12372) (many false positives involving ellipsis and TLDs)
  4. "Space before fullstop which is before space" (3160) (less false positives)
  5. "Space before qmark" (3424)
  6. "Space before semicolon" (912) (many false positives involving assembly language code and definition lists)
  7. "&#8211; instead of &ndash;" (1617)
  8. "&#8212; instead of &mdash;" (1177)
  9. "&ldquo; instead of a straight double quotation mark" (1450) (total of named entity, number and hex)
  10. "&rdquo; instead of a straight double quotation mark" (1464)
  11. "&lsquo; instead of a straight single quotation mark" (799)
  12. "&rsquo; instead of a straight single quotation mark" (3562)
  13. "euro character instead of HTML entity" (53298) (assuming HTML entity is preferred as that is what the insert box puts in - compatibility problems?)
  14. "double space which is not after a full stop" (100617) (old data)
  15. "ampersand before space" (i.e. invalid XHTML) (29329) (old data)
  16. "0x91 (145) (MS left single quotation mark) instead of a straight single quotation mark" (53298) (see for info on these)
  17. "0x92 (146) (MS right single quotation mark) instead of a straight single quotation mark" (53298)
  18. "0x93 (147) (MS left double quotation mark) instead of a straight double quotation mark" (53298)
  19. "0x94 (148) (MS right double quotation mark) instead of a straight double quotation mark" (53298)
    Clearly something went wrong here. Maybe I'll try to get it right some other time. r3m0t 22:54, Mar 15, 2005 (UTC)
  20. "Space before comma" (1062)

The comma thing is due to finish in a few days. Note that if you think a human is needed to do any of these, I can create some very nice reports easily. r3m0t 17:12, Mar 4, 2005 (UTC)

Well, HTML Tidy converts naked ampersands to &amp;, so there's no need to fix those (15). I would avoid 2, 3, and 6 until you can figure out a way to weed out some of the false positives. 14 seems superfluous. People put double spaces in for clarity, and it's not something that shows up in the output anyway. 1, 4, and 5 will get some programming false positives, but I don't think very many. My suggestion is to go with the ones with very few false positives first: 7–13, then move to the ones with less false positives. Maybe you can develop some methods of weeding out the false positives in these other items.
My other suggestion is to find more numbered entities along the lines of ndash and mdash and convert those to their named equivalents.
– flamurai (t) 01:33, Mar 5, 2005 (UTC)
Weeding out the table syntax for 2 is very easy, but I have to do it when I'm running the bot, not in the original SQL query that gives us that count. I think 7 and 8 will have less problems than 9-12 (as proper quotation marks may be used in the articles about quotation marks and about typesetting) so those will be first. I will do 13 after that.
What bot is running HTML Tidy? r3m0t 09:24, Mar 5, 2005 (UTC)
MediaWiki runs HTML Tidy. View source and edit the page on this ampersand: & – flamurai (t) 09:50, Mar 5, 2005 (UTC)
If that is so, then are all those & characters I found inside math tags? Oh, you mean that it keeps the source the same, but sends it properly! Good. I will look around about converting other entities. r3m0t 10:25, Mar 5, 2005 (UTC)
Numbers 9–12 have little or no obvious benefit to outweigh the risk of false positives; indeed past bot-based changes to Windows-1252 quotes (Guanabot) have stuck to entities as a bot safety issue. Susvolans (pigs can fly) 17:26, 7 Mar 2005 (UTC)
Well, they make the source text more readable. I don't understand what this "bot safety issue" is, but I will of course avoid these until your explanation. r3m0t 18:34, Mar 7, 2005 (UTC)

The full list of entities is here. We are using XHTML 1.0. r3m0t 10:33, Mar 5, 2005 (UTC)

Simple Matt Crypto bot?

User:Matt Crypto (bot) is doing a simple, valid change every midnight. I don't think the bot flag should be set (since the cryptography wikireader always needs more attention ;)) but it ought to be put on the list of bots. Not putting it in myself because I'm not sure the bot is allowed. :) r3m0t 11:59, Feb 21, 2005 (UTC)

The bot flag in my opinion is meant to avoid a situation where the bot clutters up the recent changes. If a bot makes just a single change or a few changes every day, a bot flag seems unnecessary. - Andre Engels 08:52, 24 Feb 2005 (UTC)
Yes, I think a bot flag is unnecessary (unless there's some other reason apart from filtering RC). I requested permission to run this script last December at Wikipedia_talk:Bots/Archive_4#Trivial_daily-update_bot. — Matt Crypto 11:31, 4 Mar 2005 (UTC)

Responding to Sandbot request

I am responding to a request given at Misplaced Pages:Bot requests for a bot that cleans out the sandbox. The request is that the bot should clean the sandbox every 6 hours.

The following sandboxes will be cleaned every 6 hours:

This bot will be running under User:Sandbot, and using either Template:Please leave this line alone (sandbox heading) or a similar template. -- AllyUnion (talk) 10:27, 25 Feb 2005 (UTC)

This also includes all talk pages, except for Misplaced Pages talk:Tutorial (Talk pages). Each sandbox will be cleaned separately 15 minutes apart from each other. -- AllyUnion (talk) 10:56, 25 Feb 2005 (UTC)

There has been a change, due to a request. All tutorial related sandboxes will be cleaned out every week. -- AllyUnion (talk) 18:39, 25 Feb 2005 (UTC)


VfD newpage bot

Recently there have been some 'general' articles on VfD pages to establish consensus on inclusion (or exclusion) of certain groups of articles. It would be useful to have the link (or {include}) to those articles placed on the top of each day's VfD page. I've been doing this manually but would prefer the VfD bot to handle this; would that be ok? Radiant! 11:40, Mar 4, 2005 (UTC)

Should there be no objection in 7 days from Radiant's request, I will add the feature to the bot. -- AllyUnion (talk) 06:17, 5 Mar 2005 (UTC)

editing tools

whilst most users just use a normal browser others may decide to supplement thier editing operation in various ways.

lets look at the spectrum from definately not a bot to definately a bot

  • using an unmodified browser (definately not a bot)
  • using a browser plugin (like say spelcheck)
  • using a tool to load up articles to edit and find the right place (say load backlinks from a disabig and find where in the page those backlinks are and let you make the edits)
  • using a tool to load up articles to edit and allow you to select from a range of options (say with the disambig options above letting you change the link to somehting listed on the disambig page with one keypress).
  • using a tool to automatically propse edits but checking them manually before comitting them to wikipedia.
  • using a tool to automatically edit without human supervision (definately a bot)

my question is where in this spectrum do we draw the line and say that bot permission is needed before using the tool. Plugwash 17:07, 8 Mar 2005 (UTC)

I would say anything that makes more than about a dozen edits with a single user command should be subject to community scrutiny. I think the largest concern about bots is the amplification effect, which works as well for bugs and unpopular decisions as it does for helpfulness. If you're using a tool that makes your one-for-one or three-for-one edits more efficient, I say no need to go through all the trouble of pre-approval. Anyone can make lousy edits one at a time, with or without the help of a program. -- Beland 03:49, 9 Mar 2005 (UTC)


Plant bot

I would like to request to be able to add all known species of plant life by scientific and common name to the Misplaced Pages automatically with a taxobox on each page. I don't have an exact source yet and I don't have any code yet. -- AllyUnion (talk) 08:41, 11 Mar 2005 (UTC)

Just out of curiosity, how many articles is this going to add to Misplaced Pages? Am I going to need to adjust my guess for the Misplaced Pages:Million pool? --Carnildo 09:08, 11 Mar 2005 (UTC)

I think the only list of plants I can add are established and very well known plants whos sciencific name classification hasn't changed since the last past 25-50 years. -- AllyUnion (talk) 19:25, 12 Mar 2005 (UTC)

On somewhat of a sidenote, I've considered doing a similar thing for the fish in fishbase, but I have not had the time to put into that yet. I suppose the request would be quite similar, so I'll throw it out as an idea. -- RM 14:04, Mar 23, 2005 (UTC)

Taxobox modification for plant articles

In addition to this request, I wish to use a bot to automatically correct and add the taxobox to all plant articles. -- AllyUnion (talk) 09:46, 13 Mar 2005 (UTC)

VFD into a list of day links

As per discussion on Misplaced Pages talk:Votes for deletion, I'll be making WP:VFD into a list of day links probably during the day (UTC) on Sunday 13 March 2005. Misplaced Pages:Votes for deletion (full list) (or WP:VFDF) will be the full list with all day pages transcluded for a multi-megabyte monster page. I've left notes on Misplaced Pages talk:Bots and User talk:AllyUnion about the change - David Gerard 09:41, 11 Mar 2005 (UTC)

No consensus has been developed over this decision. I object until I understand what is being trying to solve / accomplished. -- AllyUnion (talk) 19:32, 12 Mar 2005 (UTC)

Royal Society request

The Royal Society have a list of all their fellows, foreign members and presidents at http://www.royalsoc.ac.uk/page.asp?id=1727 and I think this would serve as a useful indication of articles that are needed. There are 26 pdf files at http://www.royalsoc.ac.uk/page.asp?id=1727 thru http://www.royalsoc.ac.uk/downloaddoc.asp?id=796 but a script is needed to go through and somehow arrange them from "surname, firstname" into "firstname, surname", and ignore the titles. Formatting doesn't appear to be preserved by copying out the text, which would be very useful in knowing which data are where so that they can then be jiggled. Any thoughts? Dunc| 23:12, 11 Mar 2005 (UTC)

Use something like ScITE (s/(*), (*)/\2 \1/g) to reform the names - titles can be removed with a simple "Find and Replace" function in Notepad ;). Links can be made using either of these applications. --Oldak Quill 00:30, 16 Mar 2005 (UTC)

VFD Bot extension

(copied from my and AllyUnion's talk pages —Korath (Talk) 01:36, Mar 14, 2005 (UTC))

Also, as long as I'm here, could I trouble you to tweak Vfdbot to add a <!-- Please do not add new vfds here. Put it on the appropriate day's page instead. --> or something similar to the previous day's page when updating the list of days currently transcluded into WP:VFD? I catch two or three vfds a week being added to the first day listed on vfd instead of the current day, and sometimes two or three a day where they're added to yesterday's page just after midnight UTC. The latter aren't a problem if they're missed (and I don't bother to correct them even when I see them), but the former certainly are. —Korath (Talk) 00:28, Mar 14, 2005 (UTC)

I'll tweak VFD Bot later. I'm kind of suppose to be studying for my finals. ^^;;; Do me a favor and post in the talk page for WP:BOTS to make sure it gets approval. --AllyUnion (talk) 00:55, 14 Mar 2005 (UTC)

(end copy —Korath (Talk) 01:36, Mar 14, 2005 (UTC))

Request done. -- AllyUnion (talk) 00:27, 25 Mar 2005 (UTC)

Footnote bot

This bot will be used in the assistance and correction of pages using footnotes. No proposed name has yet been suggested for the username and the code for this is still unwritten.

I am proposing that the bot performs like this: (all pages refer to subpages of the bot's user space or a Misplaced Pages Project page)

  1. Every hour, check for articles listed by users at a /To Do/2024 December 27 subpage... either at its user page or a Misplaced Pages project page
  2. Fix template usage of the footnotes on that page and re-arrange them in order on the given page
  3. Remove added article on the /To Do/ page to /Completed/2024 December 27 subpage.

The initial suggestion was to actually browse all pages using the template. I think that's a bad idea, as we're at half a million articles and the amount of pages the bot needs to work at is really limited. Personally, I like the idea of having a dog like bot where you tell it, "Fix bot, fix." is a better implementation. That way, 1) it doesn't need to bog the Misplaced Pages down searching for articles it needs to fix, 2) articles can be fixed when they need to. Users would just simply leave the footnotes out of order and the bot would come around to correct the ordering. -- AllyUnion (talk) 06:53, 15 Mar 2005 (UTC)

Underscore correction bot

This could optionally be added to Grammar bot as an extension... anyway...

Basic idea: Change text in {{text}} and ] to remove all underscores. The only exception that it will not change is anything within <nowiki></nowiki> tags. Are there any other considerations that this bot would need to make? -- AllyUnion (talk) 08:56, 15 Mar 2005 (UTC)

I can program that. I will run the basic query I will run off, but it will have many many false positives. (No worries, it won't make edits that are false positives.) I have the 09-03-2005 dump. r3m0t 17:20, Mar 15, 2005 (UTC)
There will be a few articles that have "_" as part of their proper name; the only one I can think of off-hand is _NSAKEY. Links to such articles shouldn't have underscores removed. — Matt Crypto 18:06, 15 Mar 2005 (UTC)
I found the following articles with _ (space) at the beginning: ] (yes, that's right, it does exist) _Hugh_Sykes_Davies _Swimming_at_the_2004_Summer_Olympics_-_Men's_400_metre_Freestyle and about 40 not in the main namespace. I found stuff with underscores inside the The correct title of this page is not specified. It appears incorrectly here due to technical restrictions. tag: FILE_ID.DIZ linear_b mod_parrot mod_perl mod_python _NSAKEY Shift_JIS Shift_JIS art strongbad_email.exe (here showing the ideal names, not the actual ones). Any which do not have a wrongtitle tag deserve the changes they get. Hmmph. r3m0t 20:55, Mar 15, 2005 (UTC)

pending deletions

Can someone write a bot to move all articles in category:pending deletions out of the main namespace, e.g. to talk:foo/pending or Misplaced Pages:pending deletions/foo and then delete the resultant redirect? Is that a good idea? 131.251.0.7 12:30, 16 Mar 2005 (UTC)

It's possible. I think it's a good idea. Maybe I'll program that, and it could be made by the weekend (I think). You could ask at Misplaced Pages:Bot requests for somebody to program it. r3m0t 13:57, Mar 16, 2005 (UTC) PS you ought to register.

Categories for deletion / category population / category depopulation bot assistant

This bot would be solely responsible for depopulating, and repopulating categories. Primarily used in the assistance of categories for deletion, but I realized how it may be used for other purposes. -- AllyUnion (talk) 15:01, 17 Mar 2005 (UTC)

I thought we already had one?? --ssd 13:04, 18 Mar 2005 (UTC)
We did. But I think it was banned. -- AllyUnion (talk) 08:15, 19 Mar 2005 (UTC)
Pearle was mistakenly banned for 24 hours by someone who thought she wasn't authorized to tag articles {{cfd}}. Then there was a period of time when she was out of service because people had complained about style details and I was too busy to implement the significant reprogramming that resolving the complaint required. Then there was the recent problem with wholesale deletion of interwiki links. But everything is back to normal, and Pearle has been clearning the backlog on WP:CFD. -- Beland 04:27, 20 Mar 2005 (UTC)

Process of how the bot does it:

Depopulation process

  1. Check user subpage for category de-population request. - May run at specific times or intervals
  2. Check if the request is signed and from an administrator from the list of Special:List of administrators
  3. Check if the category has gone through the proper channels, specifically if it has been listed on CFD.
  4. Assert no unprocessed rename request has been requested, and assert no "undo" request has been made either
  5. Generates a list of pages that need to be edited, and post it into a wiki list format on to user subpage.
    The purpose of this is to allow the bot to keep a track of articles so that it can undo the damage if necessary.
  6. Queries the administrator by posting a message on their talk page, confirming the requested action. Then waits for an "okay" from the administrator (This particular step may be difficult to program, so I'm uncertain if I can program it in...)
  7. Goes ahead and depopulate the category.
  8. Once complete, notifies the administrator on their talk page that the requested task was complete.
  9. Strikes out the request using <s>, posts underneath that the request has been filed

Undoing the damage of a removal request

  1. Check user subpage (separate from the depopulation page) for re-population / undo request
  2. Check if request has been completed and listed at the de-population request user subpage.
  3. Check if the request is signed from an administrator
    • Check if the request is from the de-population requesting administrator.
    1. If no: Wait until a second and third administrator confirms the request. (preventing abuse)
    2. If yes: Don't wait for a second and third administrator approval request. (if the nominating administrator screwed up, the bot can fix this with no problems to the administrator.)
  4. Queries the administrator(s) by posting a message on their talk page(s), confirming the requested action.
  5. Waits for approval from all administrator(s) who have confirmed or requested the undo.
  6. Goes ahead and repopulate the category
  7. Strikes out the request using <s>, posts underneath that the request has been filed

Rename process request

  1. Check user subpage for category renaming request. - May run at specific times or intervals
  2. Check if the request is signed and from an administrator from the list of Special:List of administrators
  3. Check if the category has gone through the proper channels, specifically if it has been listed on CFD.
  4. Assert no depopulation request has already been made (it follows that if a depopulation request has already been made, then renaming becomes redundant)
  5. Generates a list of pages that need to be edited, and post it into a wiki list format on to user subpage.
    The purpose of this is to allow the bot to keep a track of articles so that it can undo the damage if necessary.
  6. Queries the administrator by posting a message on their talk page, confirming the requested action. Then waits for an "okay" from the administrator (This particular step may be difficult to program, so I'm uncertain if I can program it in...)
  7. Goes ahead and depopulate the category.
  8. Once complete, notifies the administrator on their talk page that the requested task was complete.
  9. Strikes out the request using <s>, posts underneath that the request has been filed

Undoing the damage of a rename request

  1. Check user subpage for rename undo request
  2. Check if request has been completed and listed at the de-population request user subpage.
  3. Check if the request is signed from an administrator
    • Check if the request is from the renaming requesting administrator.
    1. If no: Wait until a second and third administrator confirms the request. (preventing abuse)
    2. If yes: Don't wait for a second and third administrator approval request. (if the nominating administrator screwed up, the bot can fix this with no problems to the administrator.)
  4. Queries the administrator(s) by posting a message on their talk page(s), confirming the requested action.
  5. Waits for approval from all administrator(s) who have confirmed or requested the undo.
  6. Goes ahead and renames the category back to the original state
  7. Strikes out the request using <s>, posts underneath that the request has been filed

Population process

User process:

  1. A user creates a list of pages on the bot's user subpage
  2. A user then makes a request

The bot follows up with:

  1. Check user subpage for category population request
  2. Check if the request is signed and from a logged in user
  3. Verify that the list has been approved and reviewed by an administrator, and that the last edit was made by an administrator
  4. Verify that another administrator (even if the logged in user IS an administrator) has approved the population request
  5. Assert no unprocessed requests (above) have been made for the category requesting to be populated
  6. Edits the subpage list of pages, marking that it is or has been processed. The purpose behind this is for the bot to "tag" the state of the article, making certain that if it ever needed to undo the damage, it has a point of reference to do so.
  7. Queries the administrator by posting a message on their talk page, confirming the requested action. Then waits for an "okay" from the administrator (This particular step may be difficult to program, so I'm uncertain if I can program it in...)
  8. Goes ahead and populates the list of articles
  9. Once complete, notifies the approving administrator and the requesting user on their talk pages that the requested task was complete.
  10. Strikes out the request using <s>, posts underneath that the request has been filed

Population undo request

  1. Check user subpage for population undo request
  2. Check if no request has previously been made, and assert that there are no open requests of the same article in the population request
  3. Assert that this undo request is valid, and does not conflict with the CFD process. If the undo request is made, and the category is listed on CFD, the bot will refuse to undo the population request.
  4. Check if the request is signed from an administrator
  5. Queries the administrator(s) by posting a message on their talk page(s), confirming the requested action.
  6. Waits for approval from the administrator who have confirmed the undo
  7. Goes ahead and reverse all its edits made under the population list
  8. Strikes out the request using <s>, posts underneath that the request has been filed

Automatic page archival

Would automatically archive subpages after 50 requests.

General considerations for security and abuse

These are a few considerations that need to be made to prevent abuse and vandalism.

  1. Assert the signed name did come from that user
  2. Assert that no alternations had occurred to the list of pages that it needs to undo.

-- AllyUnion (talk) 15:01, 17 Mar 2005 (UTC)

Discussion

Why not simply protect the user subpage so that only administrators can edit it, and make the Talk: page or a subpage available for requests? r3m0t 18:10, Mar 17, 2005 (UTC)
m:Protected pages considered harmful. Also it isn't completely necessary. Although, technically, the bot could simply post a static page somewhere on the WWW. -- AllyUnion (talk) 22:08, 17 Mar 2005 (UTC)
You are going to require administrator signatures for all changes anyway. Protecting the page simplifies the job of the bot and makes more obvious the procedure for the user. r3m0t 22:15, Mar 17, 2005 (UTC)
So what do you do for requested lists for population? Protect them when they have been reviewed by an administrator? -- AllyUnion (talk) 06:29, 18 Mar 2005 (UTC)

Pearle perspectives

Well, this definitely overlaps a lot with what Pearle does now. I having working code to:

  • Add an article to a category.
  • Remove an article from a category.
  • Change an article from one category to another in a single edit.
  • Depopulate an entire category.
  • Move an entire category, including intro text.
  • Add the {{cfd}} tag to a page.
  • Remove the {{cfd}} tag to a page.
  • Maintain proper ordering for category, interwiki, stub, and other tags. (This was a non-trivial parsing problem, and this code is used by all of the other editing commands.)

I have found that there are certain complications that do crop up.

  • Many times there are mistakes in specifying the desired command, or weirdnesses in the names or contents of pages (including special characters) that cause execution to fail with an error message. Many errors are intentionally fatal, because bad input on the first line is often a good indication that subsequent input lines are also bad, and the early warning prevents undesirable edits.
  • Some edits require human followup. Instead of capturing terminal output and examining it, I have taken to tagging articles and then checking Category:Pearle edits needing manual cleanup and the like.
  • Category-frobbing operations and batch operations in general generate a considerable amount of server load. Some batches take a long time (on the order of hours) to run, because there is a lot of artificial delay, to allow the servers plenty of time to service human editors.

My general advice:

  • A big, red "emergency stop" button would be a good idea.
  • I like the idea of posting errors and completion notices to user talk pages.
  • The "undo" feature is an excellent idea. It may not be necessary to maintain verbose logs on a wiki page. As long as there's a way to select which command should be undone, most of the rest of the information that humans need to see is accessible from the "User contributions" page. The bot could simply store "undo" information on local disk instead of a "really hope no one changed it" wiki page.
  • Design the input interface so it's hard to make massive errors, and avoid command-line-like syntax. An HTML form with a "from" field and a "to" field might be better than a special Wiki page for this reason. That way, you don't have to worry about whitespace, and you can provide immediate validation. On the other hand, it makes authentication against Wiki user harder. (And you could squawk about input errors to their talk page, anyway.) A CGI-powered HTML interface might also reduce complexity in dealing with race conditions, edit conflicts, unexpected changes, etc. Certainly some sort of status reporting mechanism is necessary.
  • The ability to queue multiple batches would be nice. That way, if there is an error with one batch, it can move on to the next. This would also enable multiple users to queue sequential requests. You're pretty much going to have to have a queue of some kind, because a lot of "umbrella" requests that are basically long lists of changes are punted to the CFD bot.
  • You will have to be careful not to allow cleverly crafted user input to compromise the machine that the bot is running on, circumvent the authentication mechanism, or cause undesirable behavior that could affect Misplaced Pages operations.

With regard to user authentication and security restrictions...

Current policy requires that all categories being moved or deleted be listed on WP:CFD. However, because of the complexities of how nominations are made, especially the large batch moves that require bot assistance, it is not possible to automatically verify that a category has been listed there. It is possible to automatically require that a page have been previously tagged {{cfd}} or {{cfr}} or whatever, and to print a fatal error message explaining that this is a prerequisite.

The upshot of this is that at some point a human is going to have to look at WP:CFD and decide what input the bot should get to implement any given decision. On the one hand, it's good to keep this as open as possible, to prevent a backlog of requests from piling up. On the other hand, it's important to keep the bot out of the hands of vandals and people who don't cooperate with the CFD process. For similar reasons, people who want to operate bots have to get approval from this page. So I would agree that unrestricted access would be a bad idea.

My advice would be to start with a relatively simple authentication model, and add complexity only if problems occur. I can think of three different mechanisms:

  • Allow access by authenticated Misplaced Pages administrators.
  • Allow access by authenticated Misplaced Pages users on a special list.
  • Allow access by anyone, but run commands on a 24-hour delay, to allow others the chance to veto execution.

Perhaps some combination of these would be optimal. Personally, I'm not an administrator, and I do clear out a great deal of old and complicated WP:CFD requests, so I'm hoping some kind of mediated access for non-admins will be allowed. I do actually like the idea of a 24-hour delay, because I have certainly made spelling mistakes and misinterpreted people's suggestions before, so a little pre-publication peer review might be a good thing. But on the other hand, it's not like there isn't plenty of peer review after the fact, and an "undo" feature would make the difference somewhat smaller.

I would be happy to share my existing code base to speed up this project, or to give advice about a re-implementation, including how to avoid complaints and behavior deemed undesirable by the community. In fact, it would be nice to pass this rather mundane janitorial position on completely to another party or a community project, and move on to more interesting things. -- Beland 07:31, 21 Mar 2005 (UTC)

Authentication of requests

How about we use a combo? An offsite request is made using a CGI form somewhere where the bot can access and the request is static. Then a Misplaced Pages subpage somewhere is confirmed by the requesting user and signing it. The file name of the request should match the subpage name. Based on checking the history, a confirmation can be given on who request it and the like. Also, it can match the authorization methods that I suggested above. -- AllyUnion (talk) 07:07, 23 Mar 2005 (UTC)

Hmm...I wonder if embedding an HTML form on a wiki page would make things easier. In any case, I'm merely trying to make suggestions to simplify your job and making using the thing as easy as possible. Whatever sort of interface you see fit to code is fine with me. It's certainly better to have a simple thing that works than a complex thing that's halfway finished. -- Beland 02:42, 24 Mar 2005 (UTC)

Numbers and commas - possible additional use for Grammar bot?

Just an idea... maybe convert all those large numbers like 100000000 to something with commas like 100,000,000. Some false positives to consider are article links, years, stuff in mathematical articles, and stuff in formatting. -- AllyUnion (talk) 08:18, 19 Mar 2005 (UTC)

Years are easy: ignore everything under 10000. I will consider it. r3m0t 10:10, Mar 19, 2005 (UTC)

Minor Sandbot change

I am modifying Sandbot to reinforce the existence of the header every 30 minutes (all sandboxes). The main sandbox will be cleaned every 12 hours instead of every 6 hours. -- AllyUnion (talk) 22:36, 24 Mar 2005 (UTC)

Bot to update Dutch municipalities info

I'd like permission to use a bot to update the pages on Dutch municipalities. Things the bot wants to do: update population etc. to 2005; add coordinates to infobox; add articles to the proper category. After that, I may use it as well for adding infoboxes to the articles on Belgian municipalities, and perhaps those of other countries. Eugene van der Pijll 21:45, 27 Mar 2005 (UTC)