This is an old revision of this page, as edited by Jmax- (talk | contribs) at 07:33, 7 January 2007 (→Abandoned Article bot). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
Revision as of 07:33, 7 January 2007 by Jmax- (talk | contribs) (→Abandoned Article bot)(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff) Shortcut- ]
This is a page for requesting work to be done by a bot. This is an appropriate place to simply put ideas for bots. If you need a piece of software written for a specific article you may get a faster response time at the computer help desk. You might also check Misplaced Pages:Bots to see if the bot you are looking for already exists. Please add your bot requests to the bottom of this page.
If you are a bot operator and you complete a request, note what you did, and archive it. Requests that are no longer relevant should also be archived in a timely fashion.
This talk page is automatically archived by Werdnabot. Any sections older than 14 days are automatically archived to Misplaced Pages:Bot requests/Archive 8. Sections without timestamps are not archived. |
Archives |
---|
deadlink removal
I'm using the Weblinkchecker.py bot and have a whole load of bad links. Is there a way to have a bot remove them from the articles? (I reaize that this could be hard, since we have refs and links) One output looks like:
- http://www.zbi.ee/fungal-genomesize/index.php
- In Animal Genome Size Database on Tue Sep 12 01:04:24 2006, Socket Error: (10054, 'Connection reset by peer')
- In C-value on Wed Nov 29 14:41:33 2006, Socket Error: (10060, 'Operation timed out')
ST47Talk 22:13, 5 December 2006 (UTC)
- I was looking at replace.py for this, and, well, it's ugly. Every link would need me to run the bot another time, with different parameters. I'm thinking a .BAT file with each replacement.
- RegExes I will place here for development purposes
- ''opening tag''*''additional text, like {{cite''%link]*''More text before the end''|</ref>]''end tag''
- To
- Tested in AWB, didn't work. Any other ideas? ST47Talk 19:22, 6 December 2006 (UTC)
- Eagle_101, king of regular expressions, says:
- (<ref>.*?url=\s*|\)
- Eagle_101, king of regular expressions, says:
- I don't think simply removing dead links is a good idea at all. The linkw as presumably added for a reason - because it held good content. Because of web caching services such as google and alexa and WebCite etc this information may still be available eve thought he original link is dead. I would not want a bot siply removing the dead link without giving people a chance to manually update or find a caches copy of the linked page - PocklingtonDan 14:53, 18 December 2006 (UTC)
- Definitely NOT a good idea, though an understandable desire: Misplaced Pages:Dead external links specifically says that dead links are NOT to be removed.
- On the other hand, TAGGING such dead links or otherwise marking them could be a GREAT idea - then other editors would know that a problem existed when they read the article with the bad link in it. A similar concept is being discussed at Misplaced Pages talk:Disambiguation pages with links; it has been suggested that a template be put immediately after the bad link; the template would display the problem (as does, for example, {{fact}}), and could contain a link ("more"; "help", whatever) that a user could click on to get to an instruction page that would discuss possible ways to fix the bad external link. John Broughton | Talk 00:54, 28 December 2006 (UTC)
Dead link bot?
Is it possible to make a bot that checks to see if links are dead? The Placebo Effect 01:38, 14 December 2006 (UTC)
- Yes, but it would be quite complicated. The bot would have to ensure that it wasn't a problem with your connection, and that it wasn't just a temporary server outage. —Mets501 (talk) 01:46, 14 December 2006 (UTC)
- It's possible, if distributed or tested on multiple hosts. --Jmax- 06:35, 14 December 2006 (UTC)
- This would actually be quite a good idea i think. As above, probably best to have clone code running on two servers and also checking link twice, at a 48-72 hour remove to ignore temporary network problems. Such a bot would be made redundant if the proposed webcite was ever implemented, but until then I think it owuld be useful, if after finding a dead link, it posted a warning notice to the article's talk page perhaps? Sounds like a good idea. - PocklingtonDan 14:49, 18 December 2006 (UTC)
- I could code something like this in Perl, using POE, if needed, but I wouldn't be able to run it long-term. --Jmax- 14:52, 18 December 2006 (UTC)
- The best way to do this is probably check the links and then post what the bots summary of the links are on the talkpage. I can program in java but i have no clue how to make a bot that checks websites and evaluates them. The Placebo Effect 02:36, 21 December 2006 (UTC)
- The pywikipedia framework contains a script for doing just this, which I'll happily tweak to this purpose up and leave running on my server. The only "problem" I can see is how it would grab the links to be checked, as grabbing every external link, even from an API or database dump, would take a fair while (I don't even want to imagine how many external links there are in total on Misplaced Pages). I'd still be willing to do this, but it's going to be a long project, not just an overnight AWB run! ShakingSpirit 03:19, 21 December 2006 (UTC)
- I was assuming it would check all the external links in an article, then post on the article's talkpage. The Placebo Effect 03:25, 21 December 2006 (UTC)
- Yup, should be easy to do that, my point was that going through every single article from A-Z checking links will take a fair amount of time, and isn't too 'friendly' to the server ^_^ ShakingSpirit 03:31, 21 December 2006 (UTC)
- I realize my request is probably unreasonable, but I just had the thought that perhaps after finding a deadlink the bot could find a link to a cached version (on google or the wayback machine or somewhere) and link to that instead. Vicarious 15:25, 26 December 2006 (UTC)
- Finding a cached dead link on an Internet archive such as WebCite is easy - the syntax is http://www.webcitation.org/query.php?url=deadlink (or http://www.webcitation.org/query.php?url=deadlink&date=date for a certain cached date). However, the bot would never know which version the author meant to cite - in case of dynamically changing websites that's a problem. That's why I made a proposal some time ago to prospectively archive (cache) all cited URLs on Misplaced Pages, which is ridiculously easy using WebCite . Writing a bot which prospectively adds a "cached version" links to all cited links in new articles (thereby eliminating the problem of broken links in the first place) would make much more sense than just detecting broken links. I also proposed a policy change on citing sources suggesting that authors should add links to cached versions to their links as much as possible - but a bot would help to make this a quasi-standard. --Eysen 18:08, 26 December 2006 (UTC)
- Couldn't the bot check the page history for when the link was added and assume that is the version to use? Vicarious 23:02, 26 December 2006 (UTC)
- To the best of my knowledge, there's no easy way to check when the link was added short of going though each edit in the page history and scraping it; a solution which is ugly, and wastes both the bot user's bandwidth and the server. I have, however, come up with another idea ^_^ ShakingSpirit 00:38, 27 December 2006 (UTC)
- EDIT: I was wrong; you can grab the page history in a bandwidth and phrasing friendly manner. Still, personally I don't think every dead link should be automatically replaced with an archived version, as sometimes the information the link contained is out of date - sometimes links go dead for a reason! I'd like to hear others' opinions ^_^ ShakingSpirit 00:44, 27 December 2006 (UTC)
- Couldn't the bot check the page history for when the link was added and assume that is the version to use? Vicarious 23:02, 26 December 2006 (UTC)
- Finding a cached dead link on an Internet archive such as WebCite is easy - the syntax is http://www.webcitation.org/query.php?url=deadlink (or http://www.webcitation.org/query.php?url=deadlink&date=date for a certain cached date). However, the bot would never know which version the author meant to cite - in case of dynamically changing websites that's a problem. That's why I made a proposal some time ago to prospectively archive (cache) all cited URLs on Misplaced Pages, which is ridiculously easy using WebCite . Writing a bot which prospectively adds a "cached version" links to all cited links in new articles (thereby eliminating the problem of broken links in the first place) would make much more sense than just detecting broken links. I also proposed a policy change on citing sources suggesting that authors should add links to cached versions to their links as much as possible - but a bot would help to make this a quasi-standard. --Eysen 18:08, 26 December 2006 (UTC)
- I realize my request is probably unreasonable, but I just had the thought that perhaps after finding a deadlink the bot could find a link to a cached version (on google or the wayback machine or somewhere) and link to that instead. Vicarious 15:25, 26 December 2006 (UTC)
- The pywikipedia framework contains a script for doing just this, which I'll happily tweak to this purpose up and leave running on my server. The only "problem" I can see is how it would grab the links to be checked, as grabbing every external link, even from an API or database dump, would take a fair while (I don't even want to imagine how many external links there are in total on Misplaced Pages). I'd still be willing to do this, but it's going to be a long project, not just an overnight AWB run! ShakingSpirit 03:19, 21 December 2006 (UTC)
- This would actually be quite a good idea i think. As above, probably best to have clone code running on two servers and also checking link twice, at a 48-72 hour remove to ignore temporary network problems. Such a bot would be made redundant if the proposed webcite was ever implemented, but until then I think it owuld be useful, if after finding a dead link, it posted a warning notice to the article's talk page perhaps? Sounds like a good idea. - PocklingtonDan 14:49, 18 December 2006 (UTC)
- It's possible, if distributed or tested on multiple hosts. --Jmax- 06:35, 14 December 2006 (UTC)
I would happily code something for this, However, I have concerns regarding WMF policy on using webcite and other proprietary methods of caching web sites. -- Jmax- 09:22, 27 December 2006 (UTC)
Please look at Misplaced Pages:Dead external links — Iamunknown 01:50, 29 December 2006 (UTC)
Punctuation Bot
Hey how about a bot that will put all the commas, periods (all punctuation except semi-colons, in fact) inside quotation marks; it looks quite unprofessional to see articles written with punctuation outside quotations. - Unisgned comment added by User:165.82.156.110
- Not quite sure what you are proposing. Perhaps you could provide a sample of correct and incorrect punctuation within the context of quotations? - PocklingtonDan 21:17, 20 December 2006 (UTC)
- I think he's referring to punctuation within quotations, which is not really an english "rule", more of a matter of style. See Misplaced Pages:Manual of Style#Quotation marks. There's no real way to automate this, and there's no real reason to, in my opinion -- Jmax- 21:22, 20 December 2006 (UTC)
- Does he mean just the trailing full stop/period? If so, then it should always g outside the closing quotation mark, but he seems to suggest you should never use punctuation within quotation marks, which I don't understand - quotation marks are used to quote somebody. If the words you are quoted would reasonable be punctuated when written in prose, then that punctuation is included, regardless of whether it is in punctuation marks. The following is perfectly valid punctuation, regardless of style:
- My friend said to me "The cat sat on the mat, then it bit me. I don't think it likes me".
- Does he mean just the trailing full stop/period? If so, then it should always g outside the closing quotation mark, but he seems to suggest you should never use punctuation within quotation marks, which I don't understand - quotation marks are used to quote somebody. If the words you are quoted would reasonable be punctuated when written in prose, then that punctuation is included, regardless of whether it is in punctuation marks. The following is perfectly valid punctuation, regardless of style:
- And what about a single word in quotation marks, followed by a comma - the comma obviously shouldn't go inside the quotation marks. John Broughton | Talk 01:17, 28 December 2006 (UTC)
- Wow. I was taught in school that the trailing punctuation should go inside the quotations. For example, 'That bastard at the movie said "shhh!" Can you belive it?' (even with the assumption that "shhh" wasn't being exclaimed.) Then again, maybe I wasn't paying good enough attention in class. ---J.S 00:12, 29 December 2006 (UTC)
- Having the punctuation always inside quotation marks is the American style; British English uses punctuation inside if it belongs to the quotation, and otherwise outside. (http://en.wikipedia.org/British_and_american_english_differences#Punctuation) Therefore I don't think this bot would be a good idea, since it only represents the American usuage. —The preceding unsigned comment was added by CJHung (talk • contribs) 03:16, 1 January 2007 (UTC).
WikiProject Strategy Games bot
Like MMORPG project, we need a bot to add some tags for our banner. The banner is Template:SGames or {{SGames}} and is located here. We have a list of categories to put them in, and they are
- Category:Real-time strategy computer games
- Category:Turn-based strategy computer games
- Category:Age of Discovery computer and video games
- Category:Free, open source strategy games
- Category:Panhistorical computer and video games
- Category:Abstract strategy games
- Category:Chess variants
- Category:Tic-tac-toe
- Category:Strategy computer games
- Category:Real-time tactical computer games
- Category:Economic simulation games
- Category:Strategy game stubs
- Category:City building games
- Category:God games
If someone could make a bot or teach us how, that would be great. Thanks, Clyde (talk) 02:00, 21 December 2006 (UTC)
- I can do that for you Saturday - to confirm, you'd like {{SGames}} added to all articles in those categories? If there are subcategories, should I include them? ST47Talk 02:07, 22 December 2006 (UTC)
- Um yes the template is correct, and I found some uneeded subcategories, though I'm still going through them. I found that Category:Virtual toys doesn't fit. I also removed Category:Chess and Category:Strategy (they have too much nonrelevant info) so if you find those as subcategories, don't add them. Actually, are you more experienced with this? Would it be better just to not add to any subcategories, and I'll personally add them later? What's your call?--Clyde (talk) 05:30, 23 December 2006 (UTC)
- Sorry for the delay, comcast decided to stab my internet. I'll start that now, without subcategories just in case. ST47Talk 15:35, 27 December 2006 (UTC)
- Okay thanks.--Clyde (talk) 15:53, 28 December 2006 (UTC)
Category counts
I'm trying to find out how many articles and categories ultimately descend from Category:Dungeons & Dragons and how many of these are stubs (both by categorization and by byte/word count). Lists would be good if possible. For comparison sake, I'm also seeking similar numbers for Category:Chess. This is for the following purposes:
- See how much of Misplaced Pages as a whole is given over to D&D. (It seems to me D&D exposure is much higher among Wikipedians than the general population.)
- Determine whether sweeping mergers are called for, into titles one level less specific (e.g. Dwarven deities in the World of Greyhawk rather than the individual deities).
NeonMerlin 23:56, 24 December 2006 (UTC)
- There are 1041 articles in categories branching from Category:Chess. In order to collect the statistics you desire, I would have to request each of those pages and perform a character count. I'll speak to someone from the Bot Approvals Group and see if they'll let me perform this, or what I must do in order to. -- Jmax- 21:30, 25 December 2006 (UTC)
- PockBot would give you a list of all articles in category, as well as their article class (eg stub) - PocklingtonDan 16:28, 27 December 2006 (UTC)
Image bot idea
Hello. I have had a bot idea. Would it be possible for a bot to scan through every image that comes under Category:Non-free image copyright tags - of which there are tens of thousands - and flag all those that are over a certain file size / set of dimensions / resolution? This is as fair use only allows for a low resolution image, and I have apotted a veritable crapload that are nowhere near being low resolution.
A very clever bot could then automatically downscale and overwrite the image with a lower res version, and automatically leave a note on the talk page of the original uploader.
Is this even technically feasible? Proto::► 13:19, 27 December 2006 (UTC)
- It's possible, but I don't think it's a feasible. Low resolution is quite subjective. (subjective to the original image that is). Then again, I guess a 2 meg image is likely not "low resolution." ---J.S 04:49, 29 December 2006 (UTC)
Anyone willing to analyse some bot pseudocode?
I'm building a research (ie, no edit) bot in C++... since I'm not really that experienced in programing I was wondering if someone would be willing to check my pseudocode?
The basic concept behind the bot is to identify when a particular string of text was added to an article using a binary search method. In theory it could search though the history of a page with 10,000 edits with less then 15 page-requests.
A research program like this will be a helpful tool in tracking down subtle vandals and spammers. So.. I've kinda drifted. Anyone more experienced with OOP languages want to audit my pseudocode? ---J.S 23:59, 28 December 2006 (UTC)
- Here's the link... User:J.smith/pseudocode. ---J.S 00:06, 29 December 2006 (UTC)
- I've written a perl interpretation of your pseudocode, but am having trouble understanding precisely the context of that block. How will it be used? Is that the 'main' function? Where is the return value used? -- Jmax- 08:41, 29 December 2006 (UTC)
- I'm not certain, but I believe the idea is the user provides the wikipedia page and a string that's in the current version of the article. The function returns the diff of when that string was added. So, I would say that no this wouldn't be main, this would probably be 'search'. Vicarious 09:26, 29 December 2006 (UTC)
- Does it recurse? Where should it recurse, if it does? -- Jmax- 09:31, 29 December 2006 (UTC)
- No I don't think so, Main would take the user's input, run the search function then either link to or redirect the user to diff page. Vicarious 09:34, 29 December 2006 (UTC)
- Does it recurse? Where should it recurse, if it does? -- Jmax- 09:31, 29 December 2006 (UTC)
- I'm not certain, but I believe the idea is the user provides the wikipedia page and a string that's in the current version of the article. The function returns the diff of when that string was added. So, I would say that no this wouldn't be main, this would probably be 'search'. Vicarious 09:26, 29 December 2006 (UTC)
- I've written a perl interpretation of your pseudocode, but am having trouble understanding precisely the context of that block. How will it be used? Is that the 'main' function? Where is the return value used? -- Jmax- 08:41, 29 December 2006 (UTC)
Here is a perl implementation, less the essential bits (which could easily be added). I'm not entirely sure if the algorithm will even work properly, actually. Something seems off about it. -- Jmax- 10:09, 29 December 2006 (UTC)
- Well, it does have limitations. If the text was added in and taken out multiple times it won't necessarily find the -first- time the string was added, but it will find one of the times the string was added. There are a number of elements I haven't designed yet so the code is incomplete. ---J.S 17:38, 29 December 2006 (UTC)
- The basic idea here is that the user would input the name of the article to search and the string of text they were looking for and then the program would output a link to the first version of the page with that paticular string. ---J.S 17:43, 29 December 2006 (UTC)
- As was hinted at above, a binary search skips over many alterations. A binary search will find one alteration where the string appeared, but it might not be the first time the string appeared. The bot might look at versions 128, 64, 32, 48, 56, 60, 58, 59 and identify version 59 as having the string while 58 does not. But the string might have been inserted in version 34 and deleted in version 35, as well as several other times. (SEWilco 05:45, 30 December 2006 (UTC))
- Yes, but even that can be usefull information when tracking stuff down...
- It occurs to me, this might be useful for tracking down an unsigned post on a talk-page when the date is completely unknown. Hmmm... ---J.S 05:48, 30 December 2006 (UTC)
- It at least can help in many situations. You wanted comments on the method, and now you know some of the limitations. If you really want to find the first insertion of a string you could examine the article-with-history format which is used in data dumps. (SEWilco 16:00, 30 December 2006 (UTC))
- That could be done, but a db dumb is quite huge:( Maybe I should chat with the toolserver people on that when they get replication up and running? ---J.S 09:35, 31 December 2006 (UTC)
- Is the full-with-history available through Export? (SEWilco 15:03, 31 December 2006 (UTC))
- Help:Export says the full history for a page is available, but at bottom of page is a note that it has been disabled for performance reasons. If the history was available you'd have a single file where you'd just have to recognize the version header (and a few others such as Talk page) and by remembering the earliest version with the desired text be able to find the version in a single read of one file. At present that's only relevant if you search a mirror with export history enabled. (SEWilco 06:39, 3 January 2007 (UTC))
- That could be done, but a db dumb is quite huge:( Maybe I should chat with the toolserver people on that when they get replication up and running? ---J.S 09:35, 31 December 2006 (UTC)
- It at least can help in many situations. You wanted comments on the method, and now you know some of the limitations. If you really want to find the first insertion of a string you could examine the article-with-history format which is used in data dumps. (SEWilco 16:00, 30 December 2006 (UTC))
- Although I don't think this is as big of issue as you guys do, I have a relatively elagent solution to the finding the first insertion problem. Run the exact same search again on only the preceding versions. Have it include a case where if it never finds the string it'll let the first search know it found the right one. This method won't work if the string has been absent from most of the versions, but by far the most common reason the original search won't work is it'll find pageblankings and attribute the sentence to the person that reverts it, this solution solves that problem. Vicarious 01:08, 1 January 2007 (UTC)
- That's a brilliant solution! I'll certainly include a function for this. ---J.S 19:06, 2 January 2007 (UTC)
Ad-stopping bot
In theory, the bot will look through new articles to try and find key phrases like "our products" and "we are a". It then places a template on the page like this:
AdBot suspects this page of being blatant advertising, otherwise known as spam.
Please check this page conforms to the neutral point-of-view policy before nominating for speedy deletion, deleting or removing this template. |
And places it in a relevant category. A human (or other intelligent individual) would then look through the list and nominate any articles that are blatant ads for WP:SPEEDY.
What do you think? --///Jrothwell /// 13:15, 29 December 2006 (UTC)
- Sure, why not? But "suspects that this page contains" rather than "suspects this page of being" might be a little more neutral, as well as more grammatically correct. And it would be an ad-flagging bot, not an ad-stopping bot. (I'm quibbling, I know.) John Broughton | Talk 15:13, 29 December 2006 (UTC)
- Sounds good. There might be some changes in implementation (EG. That flag might cause concern), but I've found phrases like those to be dead giveaways to both commercial intent and notorious copyright infringements.
- It's a good idea. Other phrases you could search for include "our company", "visit our website/site/home page", and "we provide". You might also want to add "fixing the article" to the list of suggested options. Proto::► 01:51, 31 December 2006 (UTC)
- I've altered the template slightly to fit in with everyone's suggestions. Here's the revised template:
AdBot suspects this article contains blatant advertising, otherwise known as spam.
If the subject of the article complies with the Misplaced Pages notability guidelines, please fix the article if it doesn't conform to the neutral point-of-view policy. If the subject is not notable, please nominate the article for speedy deletion. |
- I'm also making a template for user pages of people whose pages have been flagged. Any other thoughts? --///Jrothwell /// 16:06, 31 December 2006 (UTC)
- The user-talk template is at User:Jrothwell/Templates/Adbot-note. Is there anyone who'd be willing to code the bot? --///Jrothwell /// 17:13, 31 December 2006 (UTC)
- Sounds like a great idea for a bot. If you haven't found anyone yet, I'd be willing to code it. Best, Hagerman 19:10, 31 December 2006 (UTC)
- Shouldn't it be "If the article does not assert the notability of the subject, please nominate the article for speeedy deletion. OR "If the subject is not notable, please nominate the article for deletion." Please see WP:CSD#A7. --WikiSlasher 04:23, 1 January 2007 (UTC)
- I don't know if this is a good idea. Googling "we are a" gives mostly legitimate pages where the phrase appears in a quotation. I think at the least there should be a human signing off on each flagging. --24.193.107.205 06:10, 2 January 2007 (UTC)
(undent) The issue of false positives is important. Certainly if a large majority of flaggings related to a particular phrase are in fact in error, that phrase shouldn't be used by the bot. But keep in mind that this flagging will only be used for new articles, which are much more likely to be spam then existing ones, so drawing conclusions from your search of existing articles isn't necessarily a good idea.
In any case, the bot should be tested by seeing what happens using a given phrase for (say) the first ten articles it finds. For example, our products looks like a good phrase to use. A google search on that found Enterprise Engineering Center (user who created article has done nothing else), plus several others (in the top 10 results) that were tagged as appearing to be advertisements.
Finally, the bot is only doing flagging. A human has to actually nominate an article for deletion (and it's easy to remove a template). But your comment does raise a point about there being a link to click on to complain about the bot. John Broughton | Talk 15:40, 2 January 2007 (UTC)
- It strikes me that the Bayesian approach commonly used to detect e-mail spam could work here as well. All we'd need (besides a simple matter of programming) is a way to train the bot. I suppose, if the bot is watching the RC feed anyway, that deletion of a tagged article could be seen as a confirmation that it was spam, while removal of the tag could be taken as a sign that it was not (until and unless the article is deleted after all). But there would still need to be a manual training interface, if only for the initial training before the bot is started. —Ilmari Karonen (talk) 03:23, 3 January 2007 (UTC)
- I like the idea of a Bayesian approach because of the simplicity. However, the bot training would always have to be manual in my opinion. Having the bot treat the deletion of a tagged article as spam will likely result in it learning some behaviors outside of its design scope. For instance, if it tags an article with patent nonsense that happens to trip the filter and that article is removed while the template is still intact, it will start gobbling up patent nonsense like there is no tomorrow. While that's not a bad thing, the template we'd be leaving on the page wouldn't accurately describe what's wrong with the page.
- So... either a manual interface would be necessary to make sure that the bot stays on target or we'd need to change the scope of the bot to encompass every kind of problem there is with a new page (spam, attack pages, patent nonsense, etc.) I think either approach would be good, but would anyone care to offer their feedback? Best, Hagerman 03:31, 7 January 2007 (UTC)
I suggest this template:
AdBot suspects this article (or parts of this article) are blatant advertising, otherwise known as spam.
If the subject of the article complies with the Misplaced Pages notability guidelines, please fix the article if it doesn't conform to the neutral point-of-view policy. If the subject is not notable, please nominate the article for speedy deletion. |
Cocoaguycontribs 03:42, 3 January 2007 (UTC)
forced autoarchive
I know there's already autoarchive bots running such as the one archiving this page, but I think a bot that operates a little differently could be effectively used to archive all article talkpages. First off, it would only archive talk pages that are very long, so 3 year old comments on a tiny talk page would be left untouched. When the bot runs across a very long talk page it will archive similarly to current bots, but with a high threshold, for example all sections older than 28 days (rather than the typical 7 days). Also, unlike current bots I'd suggest we make this opt out rather than opt in, although very busy talk pages or talk pages that are manually archived wouldn't be touched anyway because they'd either be short enough or would have no inactive sections. Vicarious 03:56, 1 January 2007 (UTC)
- If there is interest in this and someone can code it up and get it approved, I'll volunteer to host it and run it under the EssjayBot banner. Essjay (Talk) 03:58, 1 January 2007 (UTC)
- Werdnabot is customizable as to how many days of no replies in a section before it archives - If the only feature you want is to only archive after a certain page length is reached, wouldn't it just be easier to put in a feature request to Werdna, rather than re-inventing the wheel? ShakingSpirit 04:04, 1 January 2007 (UTC)
- Unless I've missed something, Werdnabot is like the EssjayBots, it's an opt-in. Archiving all article talk pages on Misplaced Pages would require a bit more than just a new feature; it's going to have several hundred thousand talk pages to parse, it's going to need a lot more efficient code than the current opt-in code. Essjay (Talk) 04:08, 1 January 2007 (UTC)
- My apologies, I missed that part. Though that does bring up a new point - wouldn't this cause very unnecessary stress on the servers? Crawling through every single article talk page on wikipedia must be very bandwidth-intensive, even using Special:Export or an API, or some such. It would also create a huge number of new pages - again, putting strain on the database server. Does the small convenience of having a shorter talk page to look though justify this? Maybe I'm playing devil's advocate, but I'm sure this has been debated before and wasn't found to be such a good idea ^_^ ShakingSpirit
- Unless I've missed something, Werdnabot is like the EssjayBots, it's an opt-in. Archiving all article talk pages on Misplaced Pages would require a bit more than just a new feature; it's going to have several hundred thousand talk pages to parse, it's going to need a lot more efficient code than the current opt-in code. Essjay (Talk) 04:08, 1 January 2007 (UTC)
- Well, it'll be a strain on the server that hosts it, parsing all those pages. However, if it's done right, it will only archive pages of a certain length, which should avoid most of the one or two line talk pages that are out there, thus reducing any server load. At this point, with 6,930,642 articles, 62,143,382 pages, and tens of thousands of edits a minute, one little bot archiving pages (and set on a delay, to avoid any problems) is hardly likely to bring the site down. As long as it is given a reasonable delay time on it's editing, it should be fine. The real problem will be getting the community signed on to the idea. Essjay (Talk) 04:28, 1 January 2007 (UTC)
- As for bandwidth, I don't think it would be an issue. First off it could run once on a database dump to get the ball rolling, then it could patrol recent changes looking only at "talk:" changes. If it still seems like it could hog bandwidth I can think of many more ways to cut down the number of pages it checks. First off ignore any pages that just had characters removed instead of added. Secondly only check every third page (or so), this operates under the premise that big talk pages get big because they're edited often, so it'll pop up again soon if it's going to need archiving. Thirdly, the bot could store a local hash table of page lengths so rather than loading the page each time it would add (or subtract) the number of characters listed on Special:Recentchanges and could only load the page if it needs archived. This wouldn't be as hard on a bot as it sounds, the storage space would only be a few megs because all it needs is the page's hash and size. Also the computation would be easy, because it would hash, not search for the page so the lookup time is O(1) and the calculations are all real simple. Vicarious 04:34, 1 January 2007 (UTC)
- Well, it'll be a strain on the server that hosts it, parsing all those pages. However, if it's done right, it will only archive pages of a certain length, which should avoid most of the one or two line talk pages that are out there, thus reducing any server load. At this point, with 6,930,642 articles, 62,143,382 pages, and tens of thousands of edits a minute, one little bot archiving pages (and set on a delay, to avoid any problems) is hardly likely to bring the site down. As long as it is given a reasonable delay time on it's editing, it should be fine. The real problem will be getting the community signed on to the idea. Essjay (Talk) 04:28, 1 January 2007 (UTC)
- Ok, the archive bots are great and seem to work really well... but forced archiving of one particular style on a project-wide scale? I'd so rather we keep the opt-in system and some active "recruiting" of large talk pages. ---J.S 19:04, 2 January 2007 (UTC)
congratulations bot
I suspect this idea isn't even remotely feasible, but I thought I'd suggest it in case I was wrong. A bot that posts a note on a user's talk page when they reach a milestone edit count (1k, 5k, whatever). It'd say congrats and maybe have a time and link for the thousandth edit. Vicarious 05:26, 1 January 2007 (UTC)
- Probably not realistically possible. It would be too much of a strain looking through all users' contributions and counting them. We currently have 3,140,639 registered users, and loading Special:Contributions 3,140,639 times would just be a killer, and continuing to do that over time would just be even worse. —Mets501 (talk) 06:37, 1 January 2007 (UTC)
- Users could opt-in by posting their count somewhere, or on irc, and the bot could watch the irc RC channel and just count edits. While it would work, is there enough of a point? ST47Talk 19:48, 2 January 2007 (UTC)
- It sounds a bit counter productive actually. Except for making the distinction between a new editor and a regular editor there isn't much value in an edit count... and focusing on edit count has negative impact. ---J.S 00:04, 3 January 2007 (UTC)
- I think you've missed the point a little. This isn't about telling editors their worth, it's a tiny pat on the back. I enjoy seeing the odometer on my car roll over to an even 10,000 even though it has no significance; this was supposed to be similarly cute and lighthearted. Accordingly because it would be difficult it's not worth it. Vicarious 05:33, 3 January 2007 (UTC)
- I think that by having a bot to do this, we'd be giving legitimacy to making edit count matter, which many people feel it does not. ^demon 01:24, 5 January 2007 (UTC)
- I think you've missed the point a little. This isn't about telling editors their worth, it's a tiny pat on the back. I enjoy seeing the odometer on my car roll over to an even 10,000 even though it has no significance; this was supposed to be similarly cute and lighthearted. Accordingly because it would be difficult it's not worth it. Vicarious 05:33, 3 January 2007 (UTC)
- It sounds a bit counter productive actually. Except for making the distinction between a new editor and a regular editor there isn't much value in an edit count... and focusing on edit count has negative impact. ---J.S 00:04, 3 January 2007 (UTC)
- Users could opt-in by posting their count somewhere, or on irc, and the bot could watch the irc RC channel and just count edits. While it would work, is there enough of a point? ST47Talk 19:48, 2 January 2007 (UTC)
MessageBot
I suspect this idea has been thought of before but i don't see its fruit so here goes. When i first discovered the talk page here I couldn't for the life of me understand why wikipedia couldn't have a normal message box interface, even if it need be public. This simply means showing the thread of an exchange on different talkpages. It would save us having to keep an eye for a reply on the page we left a message the day before etc. A bot can simply thread a talk exchange, of course this would require tagging our talk as we reply to a message. This is more a navigational issue but since its not been integrated into the main wiki OS it seems to be left for a bot. I don't know how it would run though. Suggestions? frummer 17:57, 1 January 2007 (UTC)
- Why not a TalkBot? User A posts a message on User B's talk page. User B responds on his/her own talk page, and this posting invokes (somehow) the TalkBot (maybe a template in the section, like {{TalkBot}}?). The TalkBot determines that the orginal posting from A isn't on A's talk page, and so (a) copys the section heading on B's talk page to A's talk page as a new section; (b) adds You wrote: and the text of A's posting, to that page, and (c) copies B's response on B's page to A's talk page (all with proper indentation). John Broughton
- It gets tricker if A responds on A's talk page and isn't a subscriber to the TalkBot service, but perhaps the bot could insert a hidden comment in the heading of the new section on A's talk page, such as <--- Section serviced by TalkBot --->, and then watch for that textstring in the data stream?
- Thats a good clarification. frummer 14:15, 2 January 2007 (UTC)
- Here is an idea... Why not have a bot that can automaticly move the conversation to a (new)sub-page and then include the conversation into the talk page. Anyone else who wants the conversation on their talk page can include the conversation as well. A new template can be made to "trigger" the bot. Hmmm ---J.S 19:21, 2 January 2007 (UTC)
The Oregon Trail (computer game) anti-vandal bot
I was wondering if it would be possible to create a bot that would serve solely to revert the addition of a link to the oregon trail article. once every other week or so a user adds a link for a free game download that we delete off the article. the bot would just have to monitor the External links category, removing the link: http://www.spesw.com/pc/educational_games/games_n_r/oregon_trail_deluxe.html Oregon Trail Deluxe download whenever it appears. Thanks, please let me know on my talk page if this is a possibility that anyone could take up. Thanks again, b_cubed 17:00, 2 January 2007 (UTC)
- You might want to see WP:SPAM. Blacklisting the link might be an option...
- If it's not, you might want to contact the user who runs User:AntiVandalBot to have that added to the list of things it watches for. ---J.S 19:09, 2 January 2007 (UTC)
- I've added the link to Shadowbot's spam blacklist. Thanks for the link! Shadow1 (talk) 21:48, 2 January 2007 (UTC)
- No, thank you :) b_cubed 21:58, 2 January 2007 (UTC)
DarknessBot
May someone please operate this for me? It's already been userpaged, accounted, and flagged. D•a•r•k•n•e•s•s•L•o•r•d•i•a•n•••CCD••• 22:12, 2 January 2007 (UTC)
- Operate it for you? As in execute? -- Jmax- 14:14, 3 January 2007 (UTC)
- Yes, you can change the name even, but give me a little credit for creating it before my bot malfunctioned. :( D•a•r•k•n•e•s•s•L•o•r•d•i•a•n•••CCD••• 00:44, 4 January 2007 (UTC)
- Why can't you operate it? -- Jmax- 02:46, 4 January 2007 (UTC)
Children Page Protection Bot
I would like to suggest the creation of a BOT to defend articals for children's show. For some reason these pages appile to vandals and I think somthing needs to help protect them. I'll use an exsample before the Dora the Explorer page was put back under protection it was vandalized alot one time sticks in my mind the most was by a user named Oddanimals who, stated Dora was 47 and had a sex change along with a few other sex related comments, and replaced the word Bannana in Boot's artical with the S curse word. This is not proper to say the least and one of the users I talked to said that the Backyardagains artical is also vandalized alot. Parents, kids, and people ,like me, who just enjoy those shows look it up and this kind of thing should NOT be allowed. Thank You Superx 23:18, 2 January 2007 (UTC)
- Bots are already watching those pages... but bots are dumb and can't catch all types of vandalism. ---J.S 00:00, 3 January 2007 (UTC)
True but Those BOTs are checking other pages as well. that Vandalizm stuck out like a sore thumb and none of those bots caught it except for one after I fixed it myself and I think that just one BOT who's job it is too check those pages would be better than sevaral others who are checking a bunch of other pages as well. Superx 01:10, 3 January 2007 (UTC)
Yes but that would only apily here if the stuff I mentioned ACTULLY HAD SOMETHING TO DO WITH THE SHOW! Curse words and other such stuff is only allowed if it is relavent to the artical and none of that is like that thus making that point you mentioned doesn't apliy in this situation. Superx 12:00, 5 January 2007 (UTC)
Finishing a template migration
Need to migrate all the existing transclusions of {{CopyrightedFreeUse}} to {{PD-release}} per discussion here. BetacommandBot started on this a few weeks ago and then mysteriously quit about 7/8ths of the way through and I haven't been able to get a response from Betacommand since then. Could someone else finish this so that we can finally delete that template. Thanks. Kaldari 01:27, 3 January 2007 (UTC)
- Alphachimpbot is on it. alphachimp. 01:35, 3 January 2007 (UTC)
- All done. alphachimp. 08:25, 3 January 2007 (UTC)
American television series by decade cleanup
Cleaning up from this category move: Misplaced Pages:Categories for deletion/Log/2006 December 19#American television series by decade where the meaning of the category was changed, there should be no overlap with Category:Anime by date of first release, because by the English definition no US originated-series that we know of is anime.
I'd like a bot to re-categorize with the following rule: If article in Category:Anime series and in Category:XXXXs American television series then remove from Category:XXXXs American television series and add to Category:Anime of the XXXXs instead. (The latter category includes both films and series.) --GunnarRene 05:37, 3 January 2007 (UTC)
Popes interwiki
Please add ro interwiki to all popes pages. Just created, Romihaitza 12:31, 3 January 2007 (UTC)
WikiProject France Bot
We need to add the {{WikiProject France}} to all the articles belonging to France and its sub categories. So would be nice if someone could do it for us or tell me how to do it. STTW (talk) 09:45, 4 January 2007 (UTC)
- I can do this, please put a list here of the categories and indicate whether subcategories should be included. ST47Talk 11:15, 4 January 2007 (UTC)
- Category:France and all it subcategories, thanks in advance. STTW (talk) 15:21, 4 January 2007 (UTC)
- 4 levels deep, categories with France or French in the name only, 23313 hits, converted to talk, prepending template, skipping if it contains {{WikiProject France ST47Talk 20:22, 5 January 2007 (UTC)
hot bot action for test wiki
Can someone please go over to http://test.wikipedia.org and with a bot populate Category:Really big category with anything, it doesn't matter what. Just dump every page and every image into the category please to test how the category system works when it is pushed to its limit. Testing man 22:53, 4 January 2007 (UTC)
- I started, but then noticed that even if I categorized every single page on the wiki, that only comes to around 600, which we have categorys far larger than already. I can't see you'd get a very useful stress-test when the wiki is so small ^_^ ShakingSpirit 06:30, 5 January 2007 (UTC)
Page-protecting syso-bot
People usually do a good job of protecting the templates on the Main Page; but there have been some that slip through the cracks and the results can be disastrous. I propose a bot that would be given sysop status. I know this is controversial, and there was a big discussion about a similar request at the AFD page awhile back. Such, anyone allowed to know the password must have already been approved for adminship through conventional means, and it should be open-source. It will protect the next day's templates in advance of them being on the Main Page (say, 24 hours) and then unprotect them afterwards. Preferably, it would make sure the pages stay protected until off the Main Page, and even be able to work with the pictures for POTD, but they'd have to be specified in advance, whereas the templates would run on the {{CURRENTDAY}} magic word system. This would be a big help in reducing the possibility of Main Page vandalism (believe me, it happens).--HereToHelp 03:52, 5 January 2007 (UTC)
- see Misplaced Pages:Bots/Requests for approval/ProtectionBot Betacommand 05:54, 5 January 2007 (UTC)
- Oh. I feel stupid now.--HereToHelp 03:30, 6 January 2007 (UTC)
deletion bot
I have the feeling I'm gonna get yelled at for this one, but how about a bot that deletes articles that have a clear concensus on Misplaced Pages:Articles for deletion. For example, it's quite obvious that Misplaced Pages:Articles for deletion/Myspacephobia is going to get deleted, but it's currently waiting for an admin to do the work. Yes I know this would mean an admin bot, but that's not without precedent. Also, this bot would ONLY work on articles with a very obvious concensus. As for vandals abusing the bot, I don't think it would be an issue. First off it'd ignore IPs, secondly it'd have a minimum amount of time for voting, and there's too many legitamate voters to contest a bad faith deletion for the bot to touch it. Btw, this bot would also close candidates that are clearly keep as well. Vicarious 07:39, 5 January 2007 (UTC)
- Absolutely not. AFD is not a vote, it's a discussion to achieve consensus. A bot will never be in a position to properly determine whether or not consensus is achieved. I'd strongly oppose both the creation and sysop status of such an account. (Coincidentally, from a purely technical angle, such a bot would probably not be difficult to create...) alphachimp 07:45, 5 January 2007 (UTC)
- Although I understand your position, I'm not sure I agree with your argument. I agree that a computer couldn't tell who was winning a debate, but it could if both people were arguing the same side. Similarly this bot couldn't decide what consensus was concluded in an opposed discussion, but I don't see why it couldn't take advantage of the fact that everyone is on the same side of the discussion and that the concensus has already been reached. Vicarious 07:58, 5 January 2007 (UTC)
- So you're proposing that we break deletion debates down into purely mechanical decisions? There's a clear difference between achieving consensus and simply "counting the votes". Administrators use discretion to evaluate the weight and strength of the arguments presented, making a decision based not only on those facts that they have surmised, but also on the strength of those arguments. It's quite possible to achieve "no consensus" even with an overwhelming "vote" for deletion. alphachimp 08:08, 5 January 2007 (UTC)
- But is it possible to achieve no concensus with 10 votes to delete and 0 to keep? Vicarious 08:14, 5 January 2007 (UTC)
- Absolutely, because AFD is not a vote. It's possible that the arguments could be entirely baseless, and all of the "votes" placed afterwards could be founded on those arguments. alphachimp 08:16, 5 January 2007 (UTC)
- I understand that it's a discussion not a vote, but I would be astonished if that scenario had happened ever, let alone with any frequency. I confess I don't spend a lot of time on WP:AFD but I've spent a little and I find your argument specious. In fact, I think if that were to happen then even the admin that came along to close the debate would likely miss the same fallacy that the other 10 editors had. Vicarious 08:26, 5 January 2007 (UTC)
- I'd certainly hope not, but that's a possibility. It's still a lot more comforting to leave such important decisions up to human judgment. alphachimp 08:34, 5 January 2007 (UTC)
- What would be nice is an auto AfD relisting bot that relists articles w/ less than say 5 comments on it -- 64.180.84.87 09:41, 5 January 2007 (UTC)
- I'd certainly hope not, but that's a possibility. It's still a lot more comforting to leave such important decisions up to human judgment. alphachimp 08:34, 5 January 2007 (UTC)
- I understand that it's a discussion not a vote, but I would be astonished if that scenario had happened ever, let alone with any frequency. I confess I don't spend a lot of time on WP:AFD but I've spent a little and I find your argument specious. In fact, I think if that were to happen then even the admin that came along to close the debate would likely miss the same fallacy that the other 10 editors had. Vicarious 08:26, 5 January 2007 (UTC)
- Absolutely, because AFD is not a vote. It's possible that the arguments could be entirely baseless, and all of the "votes" placed afterwards could be founded on those arguments. alphachimp 08:16, 5 January 2007 (UTC)
- But is it possible to achieve no concensus with 10 votes to delete and 0 to keep? Vicarious 08:14, 5 January 2007 (UTC)
- So you're proposing that we break deletion debates down into purely mechanical decisions? There's a clear difference between achieving consensus and simply "counting the votes". Administrators use discretion to evaluate the weight and strength of the arguments presented, making a decision based not only on those facts that they have surmised, but also on the strength of those arguments. It's quite possible to achieve "no consensus" even with an overwhelming "vote" for deletion. alphachimp 08:08, 5 January 2007 (UTC)
- Although I understand your position, I'm not sure I agree with your argument. I agree that a computer couldn't tell who was winning a debate, but it could if both people were arguing the same side. Similarly this bot couldn't decide what consensus was concluded in an opposed discussion, but I don't see why it couldn't take advantage of the fact that everyone is on the same side of the discussion and that the concensus has already been reached. Vicarious 07:58, 5 January 2007 (UTC)
musical artist template
{{Infobox musical artist 2
->
{{Infobox musical artist
86.201.106.176 13:23, 5 January 2007 (UTC)
Images on commons with different name
I do not have any programming skills about running bots. I can handle and run the bot if someone writes the code to replace the image link from the articles, with the existing image on commons with different name. I suppose this type of bot could be useful other than english wikipedia as well in some of the cases. There are many examples of the images could be found in this category. Shyam 19:52, 5 January 2007 (UTC)
Tagging closed FAC nominations
I proposed this to Raul654 on his talk page, but he'd rather not add it to his workload, though he supported using a bot instead.
I'd like a bot to watch the Featured log (for successful noms) and Featured archive (for failed noms) and automatically tag each one with a line that indicates when they were closed (i.e. added to the archive) and the result. That way, it'll be possible to determine from the page itself what happened.
I'm thinking it should add
Promoted ~~~~~
or
Not Promoted ~~~~~
at the bottom of each, in line with WP:FPC. Night Gyr (talk/Oy) 20:59, 5 January 2007 (UTC)
- Nice to see someone working on this idea, why not have it archive the page like in the XFD's? That way, justr looking at it tells you if it is done or not. and the summary is at the top. The Placebo Effect 21:01, 5 January 2007 (UTC)
- I figured we should have a more consistent FxC style independent of xFD style. Those big boxes and colored backgrounds make sense if the pages are still going to be transcluded along side live debates, like xFD, but it's a lot of excess formatting to add when people are less likely to be confused. Night Gyr (talk/Oy) 21:07, 5 January 2007 (UTC)
- Personally,I think we should add a template at the top that says what day the article passed or failed and mention that the debate is closed. The Placebo Effect 21:10, 5 January 2007 (UTC)
- I figured we should have a more consistent FxC style independent of xFD style. Those big boxes and colored backgrounds make sense if the pages are still going to be transcluded along side live debates, like xFD, but it's a lot of excess formatting to add when people are less likely to be confused. Night Gyr (talk/Oy) 21:07, 5 January 2007 (UTC)
Yeah, top or bottom isn't really a big issue for me, and top (immediately below the section head) is probably better for quick reference. FPC uses {{FPCresult}}, so it needn't be a complicated template. Night Gyr (talk/Oy) 21:15, 5 January 2007 (UTC)
- I've thought about this before. A bot could run about once a day and do a number of tasks:
- Check the promotion and non-promotion logs for any updates from Raul654
- Add a note to the candidate sub page indicating promotion or not
- Remove the page from WP:FAC if it is still there (this might simplify one of Raul654's mundane tasks)
- Update the {{fac}} on the article talk page for non-promotions
- This could even be done in a way that would make future fac submissions easy, eliminating quite a bit of needless work the other FA clerks currently handle
- Possibly verify/update wikiproject assessments on the article talk page for promoted FAs
- Has anyone else set up an account to develop a bot along this line yet? Gimmetrow 04:10, 6 January 2007 (UTC)
Would like to have a similar bot do the same (in reverse) for Featured article review; rather than Promoted or Not Promoted, the bot would return Kept or Removed Featured status, based on the Featured article review archive. SandyGeorgia (Talk) 05:53, 7 January 2007 (UTC)
Misplaced Pages:Translation
Hello, we are (finishing to) putting in place a new translation project.
There are two things I would greatly appreciate if it was done by a bot.
First, we had to make a small modification of the format of the translation pages which are used for every translation request. The task is : For every page in Category:Translation sub-pages version 1, this kind of change needs to be done.
Second, there are a lot of categories to initialize with a very simple wikicode, 7 for each language and they are 50 of them. All red links of the array on Misplaced Pages:Translation/*/Lang (except the first column of the array which has a different syntax) should be initialized with the syntax explained on this page.
Let me know if you need any furhter info
Jmfayard 18:46, 6 January 2007 (UTC)
Abandoned Article bot
There is now a project dealing with articles which have not been modified or viewed recently at Misplaced Pages:WikiProject Abandoned Articles. Would there be any way to generate a bot which might list only articles which haven't been modified since, say, 2005 (or some other really long time, maybe by year), for the use of this project to help find the most overlooked articles? Badbilltucker 20:35, 6 January 2007 (UTC)
- See Special:Ancientpages. —Mets501 (talk) 22:23, 6 January 2007 (UTC)
- Ancientpages has two problems. First, it only lists the oldest 1000 articles (that is, 1000 articles with the oldest "most recent edit"). That, of course, is plenty to work on for any project. But the second problem is that at least 90% of the 1000 articles are disambiguation pages - not what members of the project are really interested in. An ideal bot would be able to screen out disambiguation pages from its results.
- Alternatively, I guess, a special page (database listing) similiar to ancientpages, but excluding disambiguation pages, would suffice. John Broughton | Talk 02:17, 7 January 2007 (UTC)
I'm in the process of importing a database dump and I'll gather these statistics for you. To be clear, you want a list of pages with the oldest most recent edit, and is in the main namespace, and is not a disambiguation page; Correct? -- Jmax- 07:33, 7 January 2007 (UTC)
Categories: