Revision as of 18:54, 30 September 2004 view sourceAnthony (talk | contribs)Extended confirmed users17,889 edits →Currently running bots← Previous edit | Revision as of 21:17, 30 September 2004 view source Francis Schonken (talk | contribs)Extended confirmed users68,468 editsm →Current policy on running bots: Categ of people reserveNext edit → | ||
Line 20: | Line 20: | ||
# The bot is not a ''']''' | # The bot is not a ''']''' | ||
# The bot has been '''approved''' | # The bot has been '''approved''' | ||
Note the ] reserve against robotized category assignation. | |||
== Benefits and drawbacks == | == Benefits and drawbacks == |
Revision as of 21:17, 30 September 2004
Misplaced Pages policy discourages the use of bots. Bots are automatic processes interacting with Misplaced Pages over the World Wide Web. Before designing and implementing any bot on Misplaced Pages, please read the guidelines below.
Shortcut- ]
We almost always prefer to rely on human input for editing, and only carefully designed bots are allowed. While bots are capable of doing a lot of work, they strain the system's ability to keep up, both technically and intellectually. Bots could be used to add to or generate articles, while others could be used to edit or even destroy articles: see types of bots and history of Misplaced Pages bots. Well-designed bots can provide concrete benefits to the Misplaced Pages project, but even good bots have some drawbacks.
Current policy on running bots
Before running a bot, you must get approval on Misplaced Pages talk:Bots. State there precisely what the bot will do. Get a rough consensus on the talk page that it is a good idea. Wait a week to see if there are any objections, and if there aren't, go ahead and run it for a short period so it can be monitored. After this period, you should ask that the user be marked as a bot at m:requests for permissions.
- Sysops should block bots, without hesitation, if they are unapproved, doing something the operator didn't say they would do, messing up articles or editing too rapidly
- New bots should run without a bot flag so people can check what it's doing
- Bots should wait for 30-60 seconds between edits until it is accepted the bot is ok, afterwards waiting at least 10 seconds between edits after a steward has marked them as a bot
- There should be no (unattended) spelling fixing bots; it is quite simply not technically possible to create such a bot that will not make incorrect changes; if the big office utility companies can't make a perfect automated spellchecker, you most likely can't either
- The operator should be at, or logged into, the machine the bot is running on to terminate it if necessary during the debugging phase, or the bot is liable to be blocked without notice
- If you are planning to use a "spider", recursive wget, or similar software to get a local copy of wikipedia, please download the database dumps instead.
The burden of proof is on the bot-maker to demonstrate the following:
- The bot is harmless
- The bot is useful
- The bot is not a server hog
- The bot has been approved
Note the Categorization of people reserve against robotized category assignation.
Benefits and drawbacks
Benefits bots can offer
- Provides a good template of pre-formatted data for contributors (see how the Newton, Massachusetts entry has been expanded; imagine if the Periodic Table were used to start the 100+ articles for the elements Note: This has already been done)
- Potentially provides a unique resource not directly available elsewhere on the web (the small-town bot is a good example of a well-designed bot—see Ram-Man's description of the data acquisition process—uck!)
- Provides full coverage in cases where an a priori undeterminable subset of the data has a high likelihood of being (or becoming) interesting even though a randomly chosen entry has a low probability of being interesting / useful.
- Can perform chores that might become tedious for a human, such as uploading a large series of images. The Anomebot is the first bot with this capability.
Inherent drawbacks of using bots in current system
- Adds tens of thousands of entries to Misplaced Pages that are unlikely to see a human edit any time soon (in fact, we could probably extrapolate the nearly exact rate at which they will get edited by seeing how many have been edited so far)
- Artificially inflates the perceived activity of Misplaced Pages
- Can be perceived as tilting (and possibly could tilt) the purpose of Misplaced Pages away from being an encyclopedia and towards being a gazetteer / Sports Trivia Reference / etc.
- Danger of abuse by "vandal-bots", or just "clueless-bots". A bot running out of control could potentially cause heavy server load or even a denial of service attack.
These pros and cons apply to bot additions in the aggregate—individual bot entries raise issues similar to those of stub entries. In fact, they're often one and the same.
Any graceful solution would provide the automatic functionality of the pros without the negative consequences of the cons.
Bots and recent changes
There have been general complaints about interference with normal contributor operations, esp. Special:Recentchanges.
In response to popular demand, a feature has been added to hide edits by registered bots from display in Recentchanges; see the list below for active bots. To include bot edits in Recentchanges, manually add hidebots=0 to your query string, or click "show bot edits" at the top of Recentchanges.
Currently running bots
- AngBot is a bot designed by André Engels and operated by Angela to delete articles in situations of mass page-creation vandalism. It was designed in response to the sustained vandal attacks on the Chinese Misplaced Pages.
- IsraBot is a bot created by AdamRaizen (in Python) for uploading stubs on Israeli cities.
- Robbot, a bot written by Rob Hooft and others (in Python) is used by Andre Engels to do Disambiguation. The robot does not run independently, but requires a human to actually choose the disambiguation; it nevertheless can run a lot faster than pure human disambiguation. Edits show up as minor edits with the link being disambiguated in the commentary. It may also be used for adding or correcting interwiki-links.
- Sethbot, a bot for creating appropriate redirects for American place names. Designed by André Engels, this bot is operated by Seth Illys.
- snobot used by snoyes to upload pictures (if there are a lot to upload) and to avoid redirects (see: snoyes/fredirect for more information).
- UgenBot is ugen64's use of the pywikipediabot. It will probably be used sporadically.
- Timwi is using a bot to fix double redirects.
- D6 is using pywikipediabot for disambiguation, categories, possibly repairing the ] references in Rambot data. This bot is operated by Docu.
- Guanabot is Guanaco running the pywikipediabot, sometimes with modifications.
- KevinBot is a custom bot written and run by Kevin Rector. It can do lots of different things. It's primary use thus far is to change the race and ethnicity links is pages entered by Rambot.
- NohatBot is various Perl scripts written by Nohat to do various things that he was too lazy to do by hand.
- CanisRufus is a bot, run by RedWolf, mainly used for disambiguation by running the solve_disambiguation.py script under pywikipediabot.
- Janna is run by User:Anthony DiPierro.
Other registered bots
- The rambot was scanning and modifying all existing county articles to implement some discussed racial link changes and to automatically add links to the pictures uploaded by the Anomebot. In addition, the bot also functions as a SpellBot with human interaction. It has not run for over a year.
- User:The Anomebot has been created for automated submissions by User:The Anome. The initial intent is to upload approximately 5000 map diagrams created by User:Wapcaplet. This has now been done, and new uses are now being thought of for the Anomebot...
Software which may be useful for making bots
- Python Misplaced Pages Robot Framework
- WikiGateway (perhaps sometime in the future; it doesn't support MediaWiki yet)