This is an old revision of this page, as edited by Jmax- (talk | contribs) at 02:46, 4 January 2007 (→[]). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
Revision as of 02:46, 4 January 2007 by Jmax- (talk | contribs) (→[])(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff) Shortcut- ]
This is a page for requesting work to be done by a bot. This is an appropriate place to simply put ideas for bots. If you need a piece of software written for a specific article you may get a faster response time at the computer help desk. You might also check Misplaced Pages:Bots to see if the bot you are looking for already exists. Please add your bot requests to the bottom of this page.
If you are a bot operator and you complete a request, note what you did, and archive it. Requests that are no longer relevant should also be archived in a timely fashion.
This talk page is automatically archived by Werdnabot. Any sections older than 14 days are automatically archived to Misplaced Pages:Bot requests/Archive 8. Sections without timestamps are not archived. |
Archives |
---|
AfD alert bot
Given that my proposal for an additional step to the AfD process (found here) is meeting both opposition and the suggestion that the job could be better done by a bot, I've brought that proposal here. The suggestion is a reasonably simple one:
- Once or twice a day (preferably the latter), the bot would scan through the list of AfD nominations at WP:AFD/T.
- For each nomination, it would check the history to find the creator and creation date of the article;
- If the article is older than four months, it is ignored and the bot continues to scan the other nominations.
- If it is younger than four months, the process continues.
- The bot then moves to the User Talk page of the article's creator, and checks that it does not already contain an instance of {{AFDWarning}} or {{AFDNote}} for that article.
- If there is none, it places {{subst:AFDNote|ArticleName}} -- ~~~~ at the bottom of the page. No new section is required, since the template creates its own.
- Finally, it returns to WP:AFD/T and continues.
This would avoid the bureaucracy that is the major criticism of my original proposal, and (hopefully) significantly reduce the problems of biting that I raised there. Thanks! Daveydweeb (/review!) 01:11, 27 November 2006 (UTC)
- I should note that I would be happy to manually run this bot once or twice daily, if it were not fully automated. Daveydweeb (/review!) 01:34, 27 November 2006 (UTC)
- If there is currently no system in place to notify users that articles they started are nominated for deletion then I think this is a great idea. I am unable to offer you any help right now due to other commitments but if you need help coding this in a week or so I could help. You can start the approvals request process without actually having any code written, which will tell you if it is a good idea or whether to give it up. - PocklingtonDan 19:19, 12 December 2006 (UTC)
- Support - this sounds like a really good idea, and I was wondering if any progress has been made. I can write this bot from scratch if necessary, just let me know. Jayden54 18:18, 22 December 2006 (UTC)
- I have listed this request on Requests for approval so if have any comments or suggestions, please list them there. Cheers, Jayden54 20:01, 26 December 2006 (UTC)
deadlink removal
I'm using the Weblinkchecker.py bot and have a whole load of bad links. Is there a way to have a bot remove them from the articles? (I reaize that this could be hard, since we have refs and links) One output looks like:
- http://www.zbi.ee/fungal-genomesize/index.php
- In Animal Genome Size Database on Tue Sep 12 01:04:24 2006, Socket Error: (10054, 'Connection reset by peer')
- In C-value on Wed Nov 29 14:41:33 2006, Socket Error: (10060, 'Operation timed out')
ST47Talk 22:13, 5 December 2006 (UTC)
- I was looking at replace.py for this, and, well, it's ugly. Every link would need me to run the bot another time, with different parameters. I'm thinking a .BAT file with each replacement.
- RegExes I will place here for development purposes
- ''opening tag''*''additional text, like {{cite''%link]*''More text before the end''|</ref>]''end tag''
- To
- Tested in AWB, didn't work. Any other ideas? ST47Talk 19:22, 6 December 2006 (UTC)
- Eagle_101, king of regular expressions, says:
- (<ref>.*?url=\s*|\)
- Eagle_101, king of regular expressions, says:
- I don't think simply removing dead links is a good idea at all. The linkw as presumably added for a reason - because it held good content. Because of web caching services such as google and alexa and WebCite etc this information may still be available eve thought he original link is dead. I would not want a bot siply removing the dead link without giving people a chance to manually update or find a caches copy of the linked page - PocklingtonDan 14:53, 18 December 2006 (UTC)
- Definitely NOT a good idea, though an understandable desire: Misplaced Pages:Dead external links specifically says that dead links are NOT to be removed.
- On the other hand, TAGGING such dead links or otherwise marking them could be a GREAT idea - then other editors would know that a problem existed when they read the article with the bad link in it. A similar concept is being discussed at Misplaced Pages talk:Disambiguation pages with links; it has been suggested that a template be put immediately after the bad link; the template would display the problem (as does, for example, {{fact}}), and could contain a link ("more"; "help", whatever) that a user could click on to get to an instruction page that would discuss possible ways to fix the bad external link. John Broughton | Talk 00:54, 28 December 2006 (UTC)
Dead link bot?
Is it possible to make a bot that checks to see if links are dead? The Placebo Effect 01:38, 14 December 2006 (UTC)
- Yes, but it would be quite complicated. The bot would have to ensure that it wasn't a problem with your connection, and that it wasn't just a temporary server outage. —Mets501 (talk) 01:46, 14 December 2006 (UTC)
- It's possible, if distributed or tested on multiple hosts. --Jmax- 06:35, 14 December 2006 (UTC)
- This would actually be quite a good idea i think. As above, probably best to have clone code running on two servers and also checking link twice, at a 48-72 hour remove to ignore temporary network problems. Such a bot would be made redundant if the proposed webcite was ever implemented, but until then I think it owuld be useful, if after finding a dead link, it posted a warning notice to the article's talk page perhaps? Sounds like a good idea. - PocklingtonDan 14:49, 18 December 2006 (UTC)
- I could code something like this in Perl, using POE, if needed, but I wouldn't be able to run it long-term. --Jmax- 14:52, 18 December 2006 (UTC)
- The best way to do this is probably check the links and then post what the bots summary of the links are on the talkpage. I can program in java but i have no clue how to make a bot that checks websites and evaluates them. The Placebo Effect 02:36, 21 December 2006 (UTC)
- The pywikipedia framework contains a script for doing just this, which I'll happily tweak to this purpose up and leave running on my server. The only "problem" I can see is how it would grab the links to be checked, as grabbing every external link, even from an API or database dump, would take a fair while (I don't even want to imagine how many external links there are in total on Misplaced Pages). I'd still be willing to do this, but it's going to be a long project, not just an overnight AWB run! ShakingSpirit 03:19, 21 December 2006 (UTC)
- I was assuming it would check all the external links in an article, then post on the article's talkpage. The Placebo Effect 03:25, 21 December 2006 (UTC)
- Yup, should be easy to do that, my point was that going through every single article from A-Z checking links will take a fair amount of time, and isn't too 'friendly' to the server ^_^ ShakingSpirit 03:31, 21 December 2006 (UTC)
- I realize my request is probably unreasonable, but I just had the thought that perhaps after finding a deadlink the bot could find a link to a cached version (on google or the wayback machine or somewhere) and link to that instead. Vicarious 15:25, 26 December 2006 (UTC)
- Finding a cached dead link on an Internet archive such as WebCite is easy - the syntax is http://www.webcitation.org/query.php?url=deadlink (or http://www.webcitation.org/query.php?url=deadlink&date=date for a certain cached date). However, the bot would never know which version the author meant to cite - in case of dynamically changing websites that's a problem. That's why I made a proposal some time ago to prospectively archive (cache) all cited URLs on Misplaced Pages, which is ridiculously easy using WebCite . Writing a bot which prospectively adds a "cached version" links to all cited links in new articles (thereby eliminating the problem of broken links in the first place) would make much more sense than just detecting broken links. I also proposed a policy change on citing sources suggesting that authors should add links to cached versions to their links as much as possible - but a bot would help to make this a quasi-standard. --Eysen 18:08, 26 December 2006 (UTC)
- Couldn't the bot check the page history for when the link was added and assume that is the version to use? Vicarious 23:02, 26 December 2006 (UTC)
- To the best of my knowledge, there's no easy way to check when the link was added short of going though each edit in the page history and scraping it; a solution which is ugly, and wastes both the bot user's bandwidth and the server. I have, however, come up with another idea ^_^ ShakingSpirit 00:38, 27 December 2006 (UTC)
- EDIT: I was wrong; you can grab the page history in a bandwidth and phrasing friendly manner. Still, personally I don't think every dead link should be automatically replaced with an archived version, as sometimes the information the link contained is out of date - sometimes links go dead for a reason! I'd like to hear others' opinions ^_^ ShakingSpirit 00:44, 27 December 2006 (UTC)
- Couldn't the bot check the page history for when the link was added and assume that is the version to use? Vicarious 23:02, 26 December 2006 (UTC)
- Finding a cached dead link on an Internet archive such as WebCite is easy - the syntax is http://www.webcitation.org/query.php?url=deadlink (or http://www.webcitation.org/query.php?url=deadlink&date=date for a certain cached date). However, the bot would never know which version the author meant to cite - in case of dynamically changing websites that's a problem. That's why I made a proposal some time ago to prospectively archive (cache) all cited URLs on Misplaced Pages, which is ridiculously easy using WebCite . Writing a bot which prospectively adds a "cached version" links to all cited links in new articles (thereby eliminating the problem of broken links in the first place) would make much more sense than just detecting broken links. I also proposed a policy change on citing sources suggesting that authors should add links to cached versions to their links as much as possible - but a bot would help to make this a quasi-standard. --Eysen 18:08, 26 December 2006 (UTC)
- I realize my request is probably unreasonable, but I just had the thought that perhaps after finding a deadlink the bot could find a link to a cached version (on google or the wayback machine or somewhere) and link to that instead. Vicarious 15:25, 26 December 2006 (UTC)
- The pywikipedia framework contains a script for doing just this, which I'll happily tweak to this purpose up and leave running on my server. The only "problem" I can see is how it would grab the links to be checked, as grabbing every external link, even from an API or database dump, would take a fair while (I don't even want to imagine how many external links there are in total on Misplaced Pages). I'd still be willing to do this, but it's going to be a long project, not just an overnight AWB run! ShakingSpirit 03:19, 21 December 2006 (UTC)
- This would actually be quite a good idea i think. As above, probably best to have clone code running on two servers and also checking link twice, at a 48-72 hour remove to ignore temporary network problems. Such a bot would be made redundant if the proposed webcite was ever implemented, but until then I think it owuld be useful, if after finding a dead link, it posted a warning notice to the article's talk page perhaps? Sounds like a good idea. - PocklingtonDan 14:49, 18 December 2006 (UTC)
- It's possible, if distributed or tested on multiple hosts. --Jmax- 06:35, 14 December 2006 (UTC)
I would happily code something for this, However, I have concerns regarding WMF policy on using webcite and other proprietary methods of caching web sites. -- Jmax- 09:22, 27 December 2006 (UTC)
Please look at Misplaced Pages:Dead external links — Iamunknown 01:50, 29 December 2006 (UTC)
Misplaced Pages:WikiProject Massively multiplayer online games Banner Bot
I was wondering if someone with a bot would be able to place banners for Misplaced Pages:WikiProject Massively multiplayer online games on all talk pages (including non-existant ones) in the category of Category:Massively multiplayer online games including it's sub-categories. This need not be a reoccurring event, but it is necessary to get this WikiProject up and running. Any help is appreciated! Greeves 04:31, 19 December 2006 (UTC)
- I should be able to start within the next 24 hours —The preceding unsigned comment was added by Betacommand (talk • contribs) 05:39, 19 December 2006 (UTC).
- Not to be rude, but when will you be starting? It has almost been a week. If you cannot do it, would there be any other bot owners willing to help? By the way, the tag to place on the talk pages is {{WP_MMOG}}. Thanks in advance and have a merry Christmas! Greeves 00:00, 25 December 2006 (UTC)
- He started today (my watchlist is now full of "Tagging for {{WP_MMOG}}" ^_^) ShakingSpirit 00:41, 25 December 2006 (UTC)
- Great! Thanks Betacommand! Greeves 17:10, 25 December 2006 (UTC)
- Not a problem just spread the word about my bot and the availability of it :) Betacommand 01:14, 26 December 2006 (UTC)
- Great! Thanks Betacommand! Greeves 17:10, 25 December 2006 (UTC)
- He started today (my watchlist is now full of "Tagging for {{WP_MMOG}}" ^_^) ShakingSpirit 00:41, 25 December 2006 (UTC)
- Not to be rude, but when will you be starting? It has almost been a week. If you cannot do it, would there be any other bot owners willing to help? By the way, the tag to place on the talk pages is {{WP_MMOG}}. Thanks in advance and have a merry Christmas! Greeves 00:00, 25 December 2006 (UTC)
Punctuation Bot
Hey how about a bot that will put all the commas, periods (all punctuation except semi-colons, in fact) inside quotation marks; it looks quite unprofessional to see articles written with punctuation outside quotations. - Unisgned comment added by User:165.82.156.110
- Not quite sure what you are proposing. Perhaps you could provide a sample of correct and incorrect punctuation within the context of quotations? - PocklingtonDan 21:17, 20 December 2006 (UTC)
- I think he's referring to punctuation within quotations, which is not really an english "rule", more of a matter of style. See Misplaced Pages:Manual of Style#Quotation marks. There's no real way to automate this, and there's no real reason to, in my opinion -- Jmax- 21:22, 20 December 2006 (UTC)
- Does he mean just the trailing full stop/period? If so, then it should always g outside the closing quotation mark, but he seems to suggest you should never use punctuation within quotation marks, which I don't understand - quotation marks are used to quote somebody. If the words you are quoted would reasonable be punctuated when written in prose, then that punctuation is included, regardless of whether it is in punctuation marks. The following is perfectly valid punctuation, regardless of style:
- My friend said to me "The cat sat on the mat, then it bit me. I don't think it likes me".
- Does he mean just the trailing full stop/period? If so, then it should always g outside the closing quotation mark, but he seems to suggest you should never use punctuation within quotation marks, which I don't understand - quotation marks are used to quote somebody. If the words you are quoted would reasonable be punctuated when written in prose, then that punctuation is included, regardless of whether it is in punctuation marks. The following is perfectly valid punctuation, regardless of style:
- And what about a single word in quotation marks, followed by a comma - the comma obviously shouldn't go inside the quotation marks. John Broughton | Talk 01:17, 28 December 2006 (UTC)
- Wow. I was taught in school that the trailing punctuation should go inside the quotations. For example, 'That bastard at the movie said "shhh!" Can you belive it?' (even with the assumption that "shhh" wasn't being exclaimed.) Then again, maybe I wasn't paying good enough attention in class. ---J.S 00:12, 29 December 2006 (UTC)
- Having the punctuation always inside quotation marks is the American style; British English uses punctuation inside if it belongs to the quotation, and otherwise outside. (http://en.wikipedia.org/British_and_american_english_differences#Punctuation) Therefore I don't think this bot would be a good idea, since it only represents the American usuage. —The preceding unsigned comment was added by CJHung (talk • contribs) 03:16, 1 January 2007 (UTC).
WikiProject Strategy Games bot
Like MMORPG project, we need a bot to add some tags for our banner. The banner is Template:SGames or {{SGames}} and is located here. We have a list of categories to put them in, and they are
- Category:Real-time strategy computer games
- Category:Turn-based strategy computer games
- Category:Age of Discovery computer and video games
- Category:Free, open source strategy games
- Category:Panhistorical computer and video games
- Category:Abstract strategy games
- Category:Chess variants
- Category:Tic-tac-toe
- Category:Strategy computer games
- Category:Real-time tactical computer games
- Category:Economic simulation games
- Category:Strategy game stubs
- Category:City building games
- Category:God games
If someone could make a bot or teach us how, that would be great. Thanks, Clyde (talk) 02:00, 21 December 2006 (UTC)
- I can do that for you Saturday - to confirm, you'd like {{SGames}} added to all articles in those categories? If there are subcategories, should I include them? ST47Talk 02:07, 22 December 2006 (UTC)
- Um yes the template is correct, and I found some uneeded subcategories, though I'm still going through them. I found that Category:Virtual toys doesn't fit. I also removed Category:Chess and Category:Strategy (they have too much nonrelevant info) so if you find those as subcategories, don't add them. Actually, are you more experienced with this? Would it be better just to not add to any subcategories, and I'll personally add them later? What's your call?--Clyde (talk) 05:30, 23 December 2006 (UTC)
- Sorry for the delay, comcast decided to stab my internet. I'll start that now, without subcategories just in case. ST47Talk 15:35, 27 December 2006 (UTC)
- Okay thanks.--Clyde (talk) 15:53, 28 December 2006 (UTC)
{{Permprot}} application
A bot would be needed to carry this task following a modification to {{Permprot}}:
- Look linked pages in the template: namespace.
- If the page uses {{/doc}}, add doc=yes as a parameter to {{Permprot}}.
Circeus 21:14, 23 December 2006 (UTC)
- Basically no bots have admin rights, so this has to be done by hand. —Mets501 (talk) 02:25, 26 December 2006 (UTC)
- {{Permprot}} is a talk page header. Circeus 02:31, 26 December 2006 (UTC)
- can I get a few links to what you are talking about (examples)? Betacommand 02:47, 26 December 2006 (UTC)
- I think he means pages like Template:POV. —Mets501 (talk) 03:03, 26 December 2006 (UTC)
- No. Stuff like template talk:Cite news. However, the problem is actually that template:permprot is not too widely used, and maybe I should make similar edits to Template:Protected template instead, which will then make the bot request far more useful, by making it clear that edits to the documentation (quite frequent) can be made directly. Circeus 03:16, 26 December 2006 (UTC)
- I think he means pages like Template:POV. —Mets501 (talk) 03:03, 26 December 2006 (UTC)
Category counts
I'm trying to find out how many articles and categories ultimately descend from Category:Dungeons & Dragons and how many of these are stubs (both by categorization and by byte/word count). Lists would be good if possible. For comparison sake, I'm also seeking similar numbers for Category:Chess. This is for the following purposes:
- See how much of Misplaced Pages as a whole is given over to D&D. (It seems to me D&D exposure is much higher among Wikipedians than the general population.)
- Determine whether sweeping mergers are called for, into titles one level less specific (e.g. Dwarven deities in the World of Greyhawk rather than the individual deities).
NeonMerlin 23:56, 24 December 2006 (UTC)
- There are 1041 articles in categories branching from Category:Chess. In order to collect the statistics you desire, I would have to request each of those pages and perform a character count. I'll speak to someone from the Bot Approvals Group and see if they'll let me perform this, or what I must do in order to. -- Jmax- 21:30, 25 December 2006 (UTC)
- PockBot would give you a list of all articles in category, as well as their article class (eg stub) - PocklingtonDan 16:28, 27 December 2006 (UTC)
Neutrality template categories
Happy Holidays and Happy New Year to everyone. I'm curious if anyone knows of any bots working the neutrality template categories. I would like to know what percentage of articles have neutrality-related tags by WikiProject and have a report generated, with a template updated on the project page (Pearle produced a similar report listing articles needing cleanup). After the report is generated, the template on the project page could be updated with a percentage linking to the category of WikiProject-related neutrality issues. Something like, "12% of articles require attention for neutrality-related issues." WikiProject departments would deal with this. The bot would only need to be run once a week. Thanks. —Viriditas | Talk 03:33, 25 December 2006 (UTC)
template replacement
Hello
Misplaced Pages:WikiProject Business and Economics has a new template and the old one needs to be replaved with the new one.
Remove Old:
This article is part of WikiProject Business and Economics. This means that the WikiProject has identified it as related to Business and Economics. WikiProject Business and Economics is an attempt to improve, grow and organize Misplaced Pages's articles related to Business and Economics. We need all your help, so join in today! | Error: no shortcuts were specified and the |msg= parameter was not set. |
Replace with new : Template:Bus&Econ
Here is a sample page with the old template : Talk:Brian Gelber
Here is a sample page with the new template: Talk:David Tepper
Thank you
Trade2tradewell 13:56, 26 December 2006 (UTC)
- I could only find 7 with the subst:ed template as above, I have fixed those. Rich Farmbrough, 00:01 28 December 2006 (GMT).
Image bot idea
Hello. I have had a bot idea. Would it be possible for a bot to scan through every image that comes under Category:Non-free image copyright tags - of which there are tens of thousands - and flag all those that are over a certain file size / set of dimensions / resolution? This is as fair use only allows for a low resolution image, and I have apotted a veritable crapload that are nowhere near being low resolution.
A very clever bot could then automatically downscale and overwrite the image with a lower res version, and automatically leave a note on the talk page of the original uploader.
Is this even technically feasible? Proto::► 13:19, 27 December 2006 (UTC)
- It's possible, but I don't think it's a feasible. Low resolution is quite subjective. (subjective to the original image that is). Then again, I guess a 2 meg image is likely not "low resolution." ---J.S 04:49, 29 December 2006 (UTC)
Anyone willing to analyse some bot pseudocode?
I'm building a research (ie, no edit) bot in C++... since I'm not really that experienced in programing I was wondering if someone would be willing to check my pseudocode?
The basic concept behind the bot is to identify when a particular string of text was added to an article using a binary search method. In theory it could search though the history of a page with 10,000 edits with less then 15 page-requests.
A research program like this will be a helpful tool in tracking down subtle vandals and spammers. So.. I've kinda drifted. Anyone more experienced with OOP languages want to audit my pseudocode? ---J.S 23:59, 28 December 2006 (UTC)
- Here's the link... User:J.smith/pseudocode. ---J.S 00:06, 29 December 2006 (UTC)
- I've written a perl interpretation of your pseudocode, but am having trouble understanding precisely the context of that block. How will it be used? Is that the 'main' function? Where is the return value used? -- Jmax- 08:41, 29 December 2006 (UTC)
- I'm not certain, but I believe the idea is the user provides the wikipedia page and a string that's in the current version of the article. The function returns the diff of when that string was added. So, I would say that no this wouldn't be main, this would probably be 'search'. Vicarious 09:26, 29 December 2006 (UTC)
- Does it recurse? Where should it recurse, if it does? -- Jmax- 09:31, 29 December 2006 (UTC)
- No I don't think so, Main would take the user's input, run the search function then either link to or redirect the user to diff page. Vicarious 09:34, 29 December 2006 (UTC)
- Does it recurse? Where should it recurse, if it does? -- Jmax- 09:31, 29 December 2006 (UTC)
- I'm not certain, but I believe the idea is the user provides the wikipedia page and a string that's in the current version of the article. The function returns the diff of when that string was added. So, I would say that no this wouldn't be main, this would probably be 'search'. Vicarious 09:26, 29 December 2006 (UTC)
- I've written a perl interpretation of your pseudocode, but am having trouble understanding precisely the context of that block. How will it be used? Is that the 'main' function? Where is the return value used? -- Jmax- 08:41, 29 December 2006 (UTC)
Here is a perl implementation, less the essential bits (which could easily be added). I'm not entirely sure if the algorithm will even work properly, actually. Something seems off about it. -- Jmax- 10:09, 29 December 2006 (UTC)
- Well, it does have limitations. If the text was added in and taken out multiple times it won't necessarily find the -first- time the string was added, but it will find one of the times the string was added. There are a number of elements I haven't designed yet so the code is incomplete. ---J.S 17:38, 29 December 2006 (UTC)
- The basic idea here is that the user would input the name of the article to search and the string of text they were looking for and then the program would output a link to the first version of the page with that paticular string. ---J.S 17:43, 29 December 2006 (UTC)
- As was hinted at above, a binary search skips over many alterations. A binary search will find one alteration where the string appeared, but it might not be the first time the string appeared. The bot might look at versions 128, 64, 32, 48, 56, 60, 58, 59 and identify version 59 as having the string while 58 does not. But the string might have been inserted in version 34 and deleted in version 35, as well as several other times. (SEWilco 05:45, 30 December 2006 (UTC))
- Yes, but even that can be usefull information when tracking stuff down...
- It occurs to me, this might be useful for tracking down an unsigned post on a talk-page when the date is completely unknown. Hmmm... ---J.S 05:48, 30 December 2006 (UTC)
- It at least can help in many situations. You wanted comments on the method, and now you know some of the limitations. If you really want to find the first insertion of a string you could examine the article-with-history format which is used in data dumps. (SEWilco 16:00, 30 December 2006 (UTC))
- That could be done, but a db dumb is quite huge:( Maybe I should chat with the toolserver people on that when they get replication up and running? ---J.S 09:35, 31 December 2006 (UTC)
- Is the full-with-history available through Export? (SEWilco 15:03, 31 December 2006 (UTC))
- Help:Export says the full history for a page is available, but at bottom of page is a note that it has been disabled for performance reasons. If the history was available you'd have a single file where you'd just have to recognize the version header (and a few others such as Talk page) and by remembering the earliest version with the desired text be able to find the version in a single read of one file. At present that's only relevant if you search a mirror with export history enabled. (SEWilco 06:39, 3 January 2007 (UTC))
- That could be done, but a db dumb is quite huge:( Maybe I should chat with the toolserver people on that when they get replication up and running? ---J.S 09:35, 31 December 2006 (UTC)
- It at least can help in many situations. You wanted comments on the method, and now you know some of the limitations. If you really want to find the first insertion of a string you could examine the article-with-history format which is used in data dumps. (SEWilco 16:00, 30 December 2006 (UTC))
- Although I don't think this is as big of issue as you guys do, I have a relatively elagent solution to the finding the first insertion problem. Run the exact same search again on only the preceding versions. Have it include a case where if it never finds the string it'll let the first search know it found the right one. This method won't work if the string has been absent from most of the versions, but by far the most common reason the original search won't work is it'll find pageblankings and attribute the sentence to the person that reverts it, this solution solves that problem. Vicarious 01:08, 1 January 2007 (UTC)
- That's a brilliant solution! I'll certainly include a function for this. ---J.S 19:06, 2 January 2007 (UTC)
Ad-stopping bot
In theory, the bot will look through new articles to try and find key phrases like "our products" and "we are a". It then places a template on the page like this:
AdBot suspects this page of being blatant advertising, otherwise known as spam.
Please check this page conforms to the neutral point-of-view policy before nominating for speedy deletion, deleting or removing this template. |
And places it in a relevant category. A human (or other intelligent individual) would then look through the list and nominate any articles that are blatant ads for WP:SPEEDY.
What do you think? --///Jrothwell /// 13:15, 29 December 2006 (UTC)
- Sure, why not? But "suspects that this page contains" rather than "suspects this page of being" might be a little more neutral, as well as more grammatically correct. And it would be an ad-flagging bot, not an ad-stopping bot. (I'm quibbling, I know.) John Broughton | Talk 15:13, 29 December 2006 (UTC)
- Sounds good. There might be some changes in implementation (EG. That flag might cause concern), but I've found phrases like those to be dead giveaways to both commercial intent and notorious copyright infringements.
- It's a good idea. Other phrases you could search for include "our company", "visit our website/site/home page", and "we provide". You might also want to add "fixing the article" to the list of suggested options. Proto::► 01:51, 31 December 2006 (UTC)
- I've altered the template slightly to fit in with everyone's suggestions. Here's the revised template:
AdBot suspects this article contains blatant advertising, otherwise known as spam.
If the subject of the article complies with the Misplaced Pages notability guidelines, please fix the article if it doesn't conform to the neutral point-of-view policy. If the subject is not notable, please nominate the article for speedy deletion. |
- I'm also making a template for user pages of people whose pages have been flagged. Any other thoughts? --///Jrothwell /// 16:06, 31 December 2006 (UTC)
- The user-talk template is at User:Jrothwell/Templates/Adbot-note. Is there anyone who'd be willing to code the bot? --///Jrothwell /// 17:13, 31 December 2006 (UTC)
- Sounds like a great idea for a bot. If you haven't found anyone yet, I'd be willing to code it. Best, Hagerman 19:10, 31 December 2006 (UTC)
- Shouldn't it be "If the article does not assert the notability of the subject, please nominate the article for speeedy deletion. OR "If the subject is not notable, please nominate the article for deletion." Please see WP:CSD#A7. --WikiSlasher 04:23, 1 January 2007 (UTC)
- I don't know if this is a good idea. Googling "we are a" gives mostly legitimate pages where the phrase appears in a quotation. I think at the least there should be a human signing off on each flagging. --24.193.107.205 06:10, 2 January 2007 (UTC)
(undent) The issue of false positives is important. Certainly if a large majority of flaggings related to a particular phrase are in fact in error, that phrase shouldn't be used by the bot. But keep in mind that this flagging will only be used for new articles, which are much more likely to be spam then existing ones, so drawing conclusions from your search of existing articles isn't necessarily a good idea.
In any case, the bot should be tested by seeing what happens using a given phrase for (say) the first ten articles it finds. For example, our products looks like a good phrase to use. A google search on that found Enterprise Engineering Center (user who created article has done nothing else), plus several others (in the top 10 results) that were tagged as appearing to be advertisements.
Finally, the bot is only doing flagging. A human has to actually nominate an article for deletion (and it's easy to remove a template). But your comment does raise a point about there being a link to click on to complain about the bot. John Broughton | Talk 15:40, 2 January 2007 (UTC)
- It strikes me that the Bayesian approach commonly used to detect e-mail spam could work here as well. All we'd need (besides a simple matter of programming) is a way to train the bot. I suppose, if the bot is watching the RC feed anyway, that deletion of a tagged article could be seen as a confirmation that it was spam, while removal of the tag could be taken as a sign that it was not (until and unless the article is deleted after all). But there would still need to be a manual training interface, if only for the initial training before the bot is started. —Ilmari Karonen (talk) 03:23, 3 January 2007 (UTC)
I suggest this template:
AdBot suspects this article (or parts of this article) are blatant advertising, otherwise known as spam.
If the subject of the article complies with the Misplaced Pages notability guidelines, please fix the article if it doesn't conform to the neutral point-of-view policy. If the subject is not notable, please nominate the article for speedy deletion. |
Cocoaguycontribs 03:42, 3 January 2007 (UTC)
forced autoarchive
I know there's already autoarchive bots running such as the one archiving this page, but I think a bot that operates a little differently could be effectively used to archive all article talkpages. First off, it would only archive talk pages that are very long, so 3 year old comments on a tiny talk page would be left untouched. When the bot runs across a very long talk page it will archive similarly to current bots, but with a high threshold, for example all sections older than 28 days (rather than the typical 7 days). Also, unlike current bots I'd suggest we make this opt out rather than opt in, although very busy talk pages or talk pages that are manually archived wouldn't be touched anyway because they'd either be short enough or would have no inactive sections. Vicarious 03:56, 1 January 2007 (UTC)
- If there is interest in this and someone can code it up and get it approved, I'll volunteer to host it and run it under the EssjayBot banner. Essjay (Talk) 03:58, 1 January 2007 (UTC)
- Werdnabot is customizable as to how many days of no replies in a section before it archives - If the only feature you want is to only archive after a certain page length is reached, wouldn't it just be easier to put in a feature request to Werdna, rather than re-inventing the wheel? ShakingSpirit 04:04, 1 January 2007 (UTC)
- Unless I've missed something, Werdnabot is like the EssjayBots, it's an opt-in. Archiving all article talk pages on Misplaced Pages would require a bit more than just a new feature; it's going to have several hundred thousand talk pages to parse, it's going to need a lot more efficient code than the current opt-in code. Essjay (Talk) 04:08, 1 January 2007 (UTC)
- My apologies, I missed that part. Though that does bring up a new point - wouldn't this cause very unnecessary stress on the servers? Crawling through every single article talk page on wikipedia must be very bandwidth-intensive, even using Special:Export or an API, or some such. It would also create a huge number of new pages - again, putting strain on the database server. Does the small convenience of having a shorter talk page to look though justify this? Maybe I'm playing devil's advocate, but I'm sure this has been debated before and wasn't found to be such a good idea ^_^ ShakingSpirit
- Unless I've missed something, Werdnabot is like the EssjayBots, it's an opt-in. Archiving all article talk pages on Misplaced Pages would require a bit more than just a new feature; it's going to have several hundred thousand talk pages to parse, it's going to need a lot more efficient code than the current opt-in code. Essjay (Talk) 04:08, 1 January 2007 (UTC)
- Well, it'll be a strain on the server that hosts it, parsing all those pages. However, if it's done right, it will only archive pages of a certain length, which should avoid most of the one or two line talk pages that are out there, thus reducing any server load. At this point, with 6,930,644 articles, 62,143,410 pages, and tens of thousands of edits a minute, one little bot archiving pages (and set on a delay, to avoid any problems) is hardly likely to bring the site down. As long as it is given a reasonable delay time on it's editing, it should be fine. The real problem will be getting the community signed on to the idea. Essjay (Talk) 04:28, 1 January 2007 (UTC)
- As for bandwidth, I don't think it would be an issue. First off it could run once on a database dump to get the ball rolling, then it could patrol recent changes looking only at "talk:" changes. If it still seems like it could hog bandwidth I can think of many more ways to cut down the number of pages it checks. First off ignore any pages that just had characters removed instead of added. Secondly only check every third page (or so), this operates under the premise that big talk pages get big because they're edited often, so it'll pop up again soon if it's going to need archiving. Thirdly, the bot could store a local hash table of page lengths so rather than loading the page each time it would add (or subtract) the number of characters listed on Special:Recentchanges and could only load the page if it needs archived. This wouldn't be as hard on a bot as it sounds, the storage space would only be a few megs because all it needs is the page's hash and size. Also the computation would be easy, because it would hash, not search for the page so the lookup time is O(1) and the calculations are all real simple. Vicarious 04:34, 1 January 2007 (UTC)
- Well, it'll be a strain on the server that hosts it, parsing all those pages. However, if it's done right, it will only archive pages of a certain length, which should avoid most of the one or two line talk pages that are out there, thus reducing any server load. At this point, with 6,930,644 articles, 62,143,410 pages, and tens of thousands of edits a minute, one little bot archiving pages (and set on a delay, to avoid any problems) is hardly likely to bring the site down. As long as it is given a reasonable delay time on it's editing, it should be fine. The real problem will be getting the community signed on to the idea. Essjay (Talk) 04:28, 1 January 2007 (UTC)
- Ok, the archive bots are great and seem to work really well... but forced archiving of one particular style on a project-wide scale? I'd so rather we keep the opt-in system and some active "recruiting" of large talk pages. ---J.S 19:04, 2 January 2007 (UTC)
congratulations bot
I suspect this idea isn't even remotely feasible, but I thought I'd suggest it in case I was wrong. A bot that posts a note on a user's talk page when they reach a milestone edit count (1k, 5k, whatever). It'd say congrats and maybe have a time and link for the thousandth edit. Vicarious 05:26, 1 January 2007 (UTC)
- Probably not realistically possible. It would be too much of a strain looking through all users' contributions and counting them. We currently have 3,140,639 registered users, and loading Special:Contributions 3,140,639 times would just be a killer, and continuing to do that over time would just be even worse. —Mets501 (talk) 06:37, 1 January 2007 (UTC)
- Users could opt-in by posting their count somewhere, or on irc, and the bot could watch the irc RC channel and just count edits. While it would work, is there enough of a point? ST47Talk 19:48, 2 January 2007 (UTC)
- It sounds a bit counter productive actually. Except for making the distinction between a new editor and a regular editor there isn't much value in an edit count... and focusing on edit count has negative impact. ---J.S 00:04, 3 January 2007 (UTC)
- I think you've missed the point a little. This isn't about telling editors their worth, it's a tiny pat on the back. I enjoy seeing the odometer on my car roll over to an even 10,000 even though it has no significance; this was supposed to be similarly cute and lighthearted. Accordingly because it would be difficult it's not worth it. Vicarious 05:33, 3 January 2007 (UTC)
- It sounds a bit counter productive actually. Except for making the distinction between a new editor and a regular editor there isn't much value in an edit count... and focusing on edit count has negative impact. ---J.S 00:04, 3 January 2007 (UTC)
- Users could opt-in by posting their count somewhere, or on irc, and the bot could watch the irc RC channel and just count edits. While it would work, is there enough of a point? ST47Talk 19:48, 2 January 2007 (UTC)
MessageBot
I suspect this idea has been thought of before but i don't see its fruit so here goes. When i first discovered the talk page here I couldn't for the life of me understand why wikipedia couldn't have a normal message box interface, even if it need be public. This simply means showing the thread of an exchange on different talkpages. It would save us having to keep an eye for a reply on the page we left a message the day before etc. A bot can simply thread a talk exchange, of course this would require tagging our talk as we reply to a message. This is more a navigational issue but since its not been integrated into the main wiki OS it seems to be left for a bot. I don't know how it would run though. Suggestions? frummer 17:57, 1 January 2007 (UTC)
- Why not a TalkBot? User A posts a message on User B's talk page. User B responds on his/her own talk page, and this posting invokes (somehow) the TalkBot (maybe a template in the section, like {{TalkBot}}?). The TalkBot determines that the orginal posting from A isn't on A's talk page, and so (a) copys the section heading on B's talk page to A's talk page as a new section; (b) adds You wrote: and the text of A's posting, to that page, and (c) copies B's response on B's page to A's talk page (all with proper indentation). John Broughton
- It gets tricker if A responds on A's talk page and isn't a subscriber to the TalkBot service, but perhaps the bot could insert a hidden comment in the heading of the new section on A's talk page, such as <--- Section serviced by TalkBot --->, and then watch for that textstring in the data stream?
- Thats a good clarification. frummer 14:15, 2 January 2007 (UTC)
- Here is an idea... Why not have a bot that can automaticly move the conversation to a (new)sub-page and then include the conversation into the talk page. Anyone else who wants the conversation on their talk page can include the conversation as well. A new template can be made to "trigger" the bot. Hmmm ---J.S 19:21, 2 January 2007 (UTC)
The Oregon Trail (computer game) anti-vandal bot
I was wondering if it would be possible to create a bot that would serve solely to revert the addition of a link to the oregon trail article. once every other week or so a user adds a link for a free game download that we delete off the article. the bot would just have to monitor the External links category, removing the link: http://www.spesw.com/pc/educational_games/games_n_r/oregon_trail_deluxe.html Oregon Trail Deluxe download whenever it appears. Thanks, please let me know on my talk page if this is a possibility that anyone could take up. Thanks again, b_cubed 17:00, 2 January 2007 (UTC)
- You might want to see WP:SPAM. Blacklisting the link might be an option...
- If it's not, you might want to contact the user who runs User:AntiVandalBot to have that added to the list of things it watches for. ---J.S 19:09, 2 January 2007 (UTC)
- I've added the link to Shadowbot's spam blacklist. Thanks for the link! Shadow1 (talk) 21:48, 2 January 2007 (UTC)
- No, thank you :) b_cubed 21:58, 2 January 2007 (UTC)
DarknessBot
May someone please operate this for me? It's already been userpaged, accounted, and flagged. D•a•r•k•n•e•s•s•L•o•r•d•i•a•n•••CCD••• 22:12, 2 January 2007 (UTC)
- Operate it for you? As in execute? -- Jmax- 14:14, 3 January 2007 (UTC)
- Yes, you can change the name even, but give me a little credit for creating it before my bot malfunctioned. :( D•a•r•k•n•e•s•s•L•o•r•d•i•a•n•••CCD••• 00:44, 4 January 2007 (UTC)
- Why can't you operate it? -- Jmax- 02:46, 4 January 2007 (UTC)
Children Page Protection Bot
I would like to suggest the creation of a BOT to defend articals for children's show. For some reason these pages appile to vandals and I think somthing needs to help protect them. I'll use an exsample before the Dora the Explorer page was put back under protection it was vandalized alot one time sticks in my mind the most was by a user named Oddanimals who, stated Dora was 47 and had a sex change along with a few other sex related comments, and replaced the word Bannana in Boot's artical with the S curse word. This is not proper to say the least and one of the users I talked to said that the Backyardagains artical is also vandalized alot. Parents, kids, and people ,like me, who just enjoy those shows look it up and this kind of thing should NOT be allowed. Thank You Superx 23:18, 2 January 2007 (UTC)
- Bots are already watching those pages... but bots are dumb and can't catch all types of vandalism. ---J.S 00:00, 3 January 2007 (UTC)
True but Those BOTs are checking other pages as well. that Vandalizm stuck out like a sore thumb and none of those bots caught it except for one after I fixed it myself and I think that just one BOT who's job it is too check those pages would be better than sevaral others who are checking a bunch of other pages as well. Superx 01:10, 3 January 2007 (UTC)
Finishing a template migration
Need to migrate all the existing transclusions of {{CopyrightedFreeUse}} to {{PD-release}} per discussion here. BetacommandBot started on this a few weeks ago and then mysteriously quit about 7/8ths of the way through and I haven't been able to get a response from Betacommand since then. Could someone else finish this so that we can finally delete that template. Thanks. Kaldari 01:27, 3 January 2007 (UTC)
- Alphachimpbot is on it. alphachimp. 01:35, 3 January 2007 (UTC)
- All done. alphachimp. 08:25, 3 January 2007 (UTC)
American television series by decade cleanup
Cleaning up from this category move: Misplaced Pages:Categories for deletion/Log/2006 December 19#American television series by decade where the meaning of the category was changed, there should be no overlap with Category:Anime by date of first release, because by the English definition no US originated-series that we know of is anime.
I'd like a bot to re-categorize with the following rule: If article in Category:Anime series and in Category:XXXXs American television series then remove from Category:XXXXs American television series and add to Category:Anime of the XXXXs instead. (The latter category includes both films and series.) --GunnarRene 05:37, 3 January 2007 (UTC)
Popes interwiki
Please add ro interwiki to all popes pages. Just created, Romihaitza 12:31, 3 January 2007 (UTC)
Categories: