Misplaced Pages

talk:Bot policy - Misplaced Pages

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.

This is an old revision of this page, as edited by Ram-Man (talk | contribs) at 02:51, 26 October 2002 (move talk here). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Revision as of 02:51, 26 October 2002 by Ram-Man (talk | contribs) (move talk here)(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

What is the Misplaced Pages policy on automated page creation? I notice that Ram-Man is currently entering statistics for every town in the US using some sort of script (unless he's a very fast typer.) I'm not sure that this is a particularly great idea. In the more general sense, I think that script-generation could get us into a lot of trouble (how do you revert vandalism when it's spread across five thousand pages?) Is there a page that would be more suitable for this discussion? Dachshund

I think it is fine. There have been a couple bumps in the road, but his bot's entries now appear to be correctly named, wikified, NPOVed and also have good and factual information. The only real policy on auto page creation is that you need to be very careful when you are doing it. For example somebody started importing hundred year old Eastman Bible Dictionary entries via bot and that caused an uproar: The entries were highly POV, incorrectly named, written in a pedantic Victorian prose and were incorrectly wikified (self links, multiple links, incorrectly named edit links...). The bot's IP was temporarily blocked and we worked everything out with the bot's creator on the Misplaced Pages mailing list. The city entries don't have these problems and also have the bare essentials that are needed for any city article; population and geography. And on top of this there is also demographic information. When complete this will be a unique resource on the net. What is better is that whenever somebody in the US looks

up their town they will find an entry in Misplaced Pages (and hopefully they will add some historical info to the article after finding it). If an actual vandal uses a bot then we will block that bot's IP. --mav

Three thoughts on batch page creation:
1) Special:Recentchanges is presently useless due to the town & county bot. This is the source of the irritation which led me to notice that:
2) The main page's count of Misplaced Pages articles is increasingly inflated -- we've gone from 60k to 70k awfully quickly, but:
3) These thousands of town and county pages are not encyclopedia articles, nor are the bulk of them ever likely to become same. They are atlas or gazetteer entries that have been converted to useless paragraphs rather than useful tables. The data are potentially valuable as such -- perhaps there should be a WikiAtlas? -- but they are no more encyclopedic than would be batch-added dictionary entries. --FOo
They are a bit telephone-directory-ish. I hope in future people will add colour and detail to them. It would be good though if bots like this went a little slower -- that was discussed before with the Eason's bot: only 100 every hour or less, please! Otherwise, as said above, RC is unusable, even with number of edits set to 1,000. We may have added 10k articles, but we haven't really added any value. Hundreds of core topics are still uncovered or amateurishly-written, and here we have a page for every one-horse town across the US. It won't project a terribly good image of wikipedia; that concerns me. -- Tarquin 20:20 Oct 21, 2002 (UTC)
I disagree that these entries are harmful. I just came across Auburn, California which is a small city near where I live. I've been meaning to write an article about Auburn ever sine I started the project in January but never did so because finding boring yet vital to have up-to-date population and geographic information isn't fun at all -- this is a perfect thing for a bot to do. So since all the boring to find info was already there I simply added a few external links, a history section and a short line in the intro on why this city is interesting. Granted many small towns won't ever be updated with more than what is there now, but most towns don't have much of any historical significance outside their own counties. So what if they exist in our database? They have correct info, are correctly wikified and named. Having every town, city and village in our database ensures that anybody in the US who is looking up information on their hometown via an external search engine will find that info here -

which makes these entries an important reader/contributor recruitment tool. Many of the same people will then update the articles with historical and other information. Yes, the US Census has this info but it isn't very readable or accessible and it can't be added to or its presentation improved. Can you think of another resource like this on the net (with 2000 data)? With that said, I also agree that Recent Changes is useless while Ram-Man's bot is at work. I wish there were a back-end way to import the 20,000 remaining cities/towns/villages/places. --mav

they are not bad pages -- but it's the Mithril argument again: newcomers clicking Random Page, finding pages and pages of middle-earth may think "encyclopedia! tolkienopedia, more like!"; finding hundreds of jargon file pages may think it's just a ton of hacker slang; finding these thousands of pages may think it's largely an encyclopedia of US towns. I am probably overreacting a bit, but we seem to be leaning every which way but toward serious core encyclopedic subjects: Arts, literature, science. There are plenty of minor novelists of the past centuries we don't say anything about, who are more important that these towns. I'm not against these town pages, but we must balance them! -- Tarquin
Obviously I agree with having the articles since I am the one making them. One thing I could do would be to make all the changes minor and then those changes could be filtered out by those who set up the option in their preferences. They are not minor, but maybe no one cares.
Since starting to add the information I have gotten comments from a number of people. One common idea is that without the articles in some form, people don't bother to add one line descriptions about a town because they want to avoid stub articles. I have had a number of people say that now they can add some information because the articles exist. In fact the RC's shows that people have been modifying their own town articles and adding some misc information. Unlike Maverick, I think that with an influx of users if many of them update their own home cities, then we can add quite a bit of new information. Also there is the possibility of adding other information automatically such as latitude and longitude, county seat information, etc.
I would vote to modify the "random" option to give city, state articles a lower priority. -- Ram-Man
You mean "like maveric" right? I was arguing for keeping the entries and allowing you to finish. --mav
Well you think that most of these entries will never be filled up with data. You once thought that these entries would never even be created. I think I did this just because you said it couldn't be done. So while I agree with you one everyone else, I don't believe that this wikipedia cannot grow to have those entries become much more complete entries. -- Ram-Man
There's nothing precisely wrong with the articles. In the future, perhaps automated pages could be saved on some other website as a static page, and only a link added from the Wiki page? Dachshund
I certainly don't think we should wipe them. -- Tarquin

Although I have been (and continue to be) a vocal opponent of automatic content creation and editing processes on Misplaced Pages, I think creating these articles is on balance a good thing. As Ram-Man suggests, they make good "seed" articles for people to add a sentence or two about their own town, and as long as they don't interfere with existing articles, that's good. However, I'd like to see the bot slowed down for two reasons: one, there is a strong presumption against bots here in general; the burden of proof is on the bot-maker to demonstrate that the bot is (1) useful, (2) harmless, and (3) not a server hog. If there's any doubt about any of these, the bot should be slow enough that humans have time to find problems, report them, and get them fixed. Secondly, the "Recent Changes" page is an important part of the Misplaced Pages user experience, and the fact that it is essentially useless while the bot is running is very annoying. Slowing down to, say, a page a minute would greatly improve the usability

of the system.

At any rate, I think it meets the "useful" test, and as far as I can tell from server logs, the bot isn't a major factor in server load, so that's good, but I think "harmless" should include not hogging the recent changes list, so let's keep it running, but at a leisurely pace. --LDC

It should be noted that I made a mistake of invalid data in some 2,000 articles. The bot repaired all of these. That is to say that if I make a stupid mistake, I will do my best to fix it. However going slowly has an important disadvantage, as pointed out by Maverick. The orphans page, which a lot of people apparently use, is full of lots of cities and townships. To fix these, I have to use the bot to update all the entries. At 1 modification per minute, this will mean that the orphan page is going to be unusable for possibly weeks or months. As I have suggested, I can make my changes minor and people can filter some of them out (partial solution). Going slow severly limits the progress I can make at fixing the various quirks that are introduced in all these entries because I simply wait for it to finish. This should be noted!

Let's say for instance that I use one modification every 30 seconds. That would be about 3,000 modifications per day. Essentially it would take me about 2 weeks for any change I decide to make to all the entries, such as adding latitude and longitude or fixing mistakes. -- Ram-Man

Theoretically speaking, we could set something up where the bot's modification times are fudged back a bit, so they wouldn't cover up the actual most recent changes. I don't know if that's a good idea, it's just a though. Bring it up on the mailing list. --Brion
Hm. Perhaps there should be another option in our prefs where we can turn off anything submited by a registered bot? Just give the bot's IP to the developers and then perhaps they could make each entry you sumbit marked with a B for bot. Displaying bot edits would be turned off by defualt in user preferences. But it is important that bots get registered somehow before this is allowed. --mav
I like the idea of registering bots, however, when could such a feature be implemented? When might *any* solution be accomplished? -- Ram-Man
Please keep tables as tables, instead of converting them to prose.

Anyway, it would seem that Misplaced Pages was never designed to handle bots.

And while you're at it, why limit it to the USA? Why not do England, Canada, Australia... why limit it to English-speaking countries? Why not do the wole world?? Clearly there is something absurd about this!

Besides, if you want to know about a town, do you really want just a bunch of numbers? Or do you want to know what is actually IN the town, such as malls, arcades, parks, etc.? Juuitchan

I would assume that Ram-Man is doing the US because that's what he's got census data for and that's what he's interested in. As far as additional data, yes, we want all that. But we can't have everything at once, now can we? --Brion 12:03 Oct 22, 2002 (UTC)
If I were to post every baseball score of every Major League baseball game ever played, with all the statistics and all that, it would be roughly analogous to what Ram-Man is doing. --Juuitchan
Not at all. Any encyclopedia would have the census information about the population, etc. The "problem" with the rambot is that it doesn't distinguish between the major leagues and the low minor leagues (and it doesn't fill in county seats), but just as I have beefed up his form letter for Newton, Massachusetts, you can do the same for wherever you live and eventually we'll have them all, and, if you don't, we'll still have the basic information about your town.
County seats are on the agenda, along with latitude and longitude. And I agree, if I were the only one to ever work on these articles, *maybe* it would be a terrible idea. But I am naive and hope that other people beef up articles! -- Ram-Man
Speaking of baseball statistics, what is so terrible about adding them? I can understand if the only thing you added was the so-called unimportant ones, but this is supposed to be an all-encompassing (read: never-ending process) encyclopedia. -- Ram-Man