This is an old revision of this page, as edited by Wavelength (talk | contribs) at 03:40, 31 August 2012 (→"full-time" and "part-time" false positives: commenting). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
Revision as of 03:40, 31 August 2012 by Wavelength (talk | contribs) (→"full-time" and "part-time" false positives: commenting)(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff) AutoWikiBrowser 6.3.1.1- Home
Introduction and rules - User manual
How to use AWB - Discussion
Discuss AWB, report errors, and request features - User tasks
Request or help with AWB-able tasks - Technical
Technical documentation
- Changelog
- Developer discussion
- Modules
- Regular expression
- Sandbox
- Template redirects
- Typos
- Usage stats
- Userbox
Archives |
This page has archives. Sections older than 40 days may be automatically archived by Lowercase sigmabot III. |
georaphical => geographical_geographical-2012-06-11T13:26:00.000Z">
example. Regards, SunCreator 13:26, 11 June 2012 (UTC)_geographical"> _geographical">
- According to the search box, you just fixed the only example of "georaphical" in the whole of Misplaced Pages. So it's not worth adjusting the typo rules to fix it. -- John of Reading (talk) 13:30, 11 June 2012 (UTC)
- How many occurrences before you are interested in the typo? Regards, SunCreator 13:52, 11 June 2012 (UTC)
- I've had a look through the archives and can't find any guidance on this. Since the list is already so large, I wouldn't like to see it expanded to cover rare typos. 25, maybe? Opinions, anyone? -- John of Reading (talk) 16:20, 11 June 2012 (UTC)
- I had just been thinking about this, and had decided on 2 dozen. We can split the difference—24.5 seems about right. Chris the speller 17:24, 11 June 2012 (UTC)
- I've had a look through the archives and can't find any guidance on this. Since the list is already so large, I wouldn't like to see it expanded to cover rare typos. 25, maybe? Opinions, anyone? -- John of Reading (talk) 16:20, 11 June 2012 (UTC)
- How many occurrences before you are interested in the typo? Regards, SunCreator 13:52, 11 June 2012 (UTC)
- I think such a high number is inappropriate. It seems to be saying only fix common typos and leave the others. Is there such a downside to adding more rules? Regards, SunCreator 15:32, 15 June 2012 (UTC)_geographical">
_geographical">
- Yes, because each new rule slows down the processing of each page. If a typo does not appear on many pages, it is probably simpler just to fix them. To do this with AWB, use "Wiki search (text)" and a "Find & Replace" rule - or just fix it by hand in the edit box. -- John of Reading (talk) 15:47, 15 June 2012 (UTC)
- Is there a list of such words that are typos but rejected from AWB rules that we can go through to correct in the way you describe? Regards, SunCreator 20:52, 15 June 2012 (UTC)
- Common misspellings can be listed at WP:LCM; some of those are covered by AWB rules and some are not. I'm not aware of any place to list "uncommon misspellings". I just keep a list of any I find in a file on my computer, and every few weeks I take a break from my other projects and fix those typos instead. -- John of Reading (talk) 08:08, 16 June 2012 (UTC)
- Is there a list of such words that are typos but rejected from AWB rules that we can go through to correct in the way you describe? Regards, SunCreator 20:52, 15 June 2012 (UTC)
- Yes, because each new rule slows down the processing of each page. If a typo does not appear on many pages, it is probably simpler just to fix them. To do this with AWB, use "Wiki search (text)" and a "Find & Replace" rule - or just fix it by hand in the edit box. -- John of Reading (talk) 15:47, 15 June 2012 (UTC)
- I think such a high number is inappropriate. It seems to be saying only fix common typos and leave the others. Is there such a downside to adding more rules? Regards, SunCreator 15:32, 15 June 2012 (UTC)_geographical">
_geographical">
- Not sure if this discussion is stale... and I'm undecided about if we need to cut out a lot of the typo fixes. I do find the searching to be slower than is ideal, but then again I'm running it on a slow computer. There's a danger in declaring which rules are rare because a lot of rules are getting fixed because they're in the list. Only if we had detailed stats about which rules hit the most could we really know which ones are rare.
- And even if they are rare, they are often ones that people don't notice and correct on their own. So at the very minimum, we should put the "deleted" rules into another list, such as a secondary AWB list. That way someone could run through a database dump in batch every so often and correct these orphan typos. Shadowjams (talk) 22:55, 15 July 2012 (UTC)
- This thread was about adding a new rule, not deleting an existing rule. But I agree 100% that we shouldn't delete any existing rules without collecting proper statistics on how many times each rule is used. That could be a feature request, perhaps? -- John of Reading (talk) 14:57, 16 July 2012 (UTC)
Number of article pages with an AWB typo
To give a rough estimate of the number of article pages with an AWB typo I took a sample of 1000 mainspace articles(using random in AWB). AWB reported that 17 had typos after a pre-parse mode scan. After checking manually two contained false positives and where dismissed the remaining 15 where saved(although 8 where cosmetic issues). 15 in 1000 scaled up for the 3,975,490 articles on Misplaced Pages is 59,632 typos page to go. Regards, SunCreator 14:20, 17 June 2012 (UTC)
- I sampled another thousand with 20 found of which 2 where false positives. So 18 in 1000 is 71559. Will try and check again in a months time. Regards, SunCreator 16:44, 17 June 2012 (UTC)
- TypoScan lists at least 80,000 left to go (depending on my own activities), but it should be clear that the first pass has a user error ratio 3x higher then what it should be on the 'skips' so I believe the actual number is 135,000 to 150,000 that WILL be hit upon by the rules contained herein. Also since we are not running 100% detection of typos with the rules, the actual number of typos on articles could be much higher. ChrisGualtieri (talk) 15:10, 24 June 2012 (UTC)
- Sampled another two thousand. 42 where typos, 7 where false positives leaving 35 (btw 12 where white space typos). So a typo rate of 35 in 2000 for 4,011,244 article works out as 70187. Regards, SunCreator 23:21, 26 July 2012 (UTC)
False positive 2
humourous => humorous. Humourous is the British English spelling. Regards, SunCreator 15:44, 2 July 2012 (UTC)
- What's your source? The OED disagrees with you. Rjwilmsi 17:02, 2 July 2012 (UTC)
- Interesting. Will check some sources nearer the end of the week. Regards, SunCreator 14:29, 3 July 2012 (UTC)
- I went to a bookshop and checked out some dictionaries but erroneously somehow thought the word to check was anonymous. Darn! Regards, SunCreator 15:19, 1 August 2012 (UTC)
- Interesting. Will check some sources nearer the end of the week. Regards, SunCreator 14:29, 3 July 2012 (UTC)
April Fools Day
- <Typo word="April Fool('s/s') Day" find="\bpril\s+ool('s|s')\s+day\b" replace="April Fool$1 Day" />
- <Typo word="April Fools' Day" find="\bpril\s+ools\s+ay\b" replace="April Fools' Day" />
Seems considerable duplication here. Why not one rule? Regards, SunCreator 22:58, 19 July 2012 (UTC)
- While the second rule adds the apostrophe, the first rule doesn't change "Fool's" to "Fools'" since both are used. GoingBatty (talk) 01:13, 23 July 2012 (UTC)
- Got it thanks, both are used, that's what I missed. Regards, SunCreator 23:43, 26 July 2012 (UTC)
Working/upper/middle/lower-class
Should this rule also cover combinations like upper-middle-class lower-middle-class. Regards, SunCreator 13:15, 21 July 2012 (UTC)
- Upper middle class contains "upper middle class individuals" in the lead. What's the proper hyphenization? Thanks! GoingBatty (talk) 00:08, 22 July 2012 (UTC)
- I fixed it (and "white-collar professionals"). "Upper-middle-class individuals", because "upper" modifies "middle-class", not "individuals". Chris the speller 00:38, 22 July 2012 (UTC)
- I don't think any change has been made to the rule. Was that an oversight? Regards, SunCreator 22:50, 22 July 2012 (UTC)
- I think it's waiting for someone to decide that such a change is reasonable, doable and worthwhile; the rule is already somewhat clunky. Maybe some other editor will comment; it's only been a day since the issue came up. Chris the speller 00:29, 23 July 2012 (UTC)
- FWIW I agree on the correction, but lack the programming chutzpah to make the change myself. Khazar2 (talk) 00:32, 23 July 2012 (UTC)
- Added new rule for "(Upper/lower)-middle-class". GoingBatty (talk) 01:22, 23 July 2012 (UTC)
- FWIW I agree on the correction, but lack the programming chutzpah to make the change myself. Khazar2 (talk) 00:32, 23 July 2012 (UTC)
- I think it's waiting for someone to decide that such a change is reasonable, doable and worthwhile; the rule is already somewhat clunky. Maybe some other editor will comment; it's only been a day since the issue came up. Chris the speller 00:29, 23 July 2012 (UTC)
- I don't think any change has been made to the rule. Was that an oversight? Regards, SunCreator 22:50, 22 July 2012 (UTC)
- I fixed it (and "white-collar professionals"). "Upper-middle-class individuals", because "upper" modifies "middle-class", not "individuals". Chris the speller 00:38, 22 July 2012 (UTC)
- Also ending with home(s) i.e upper middle class homes on Huntley's far west side => upper-middle-class homes on Huntley's far west side. Regards, SunCreator 12:14, 29 July 2012 (UTC)
- Have added homes to this Working/upper/middle/lower-class rule. Not sure about "home" so left it for now. Perhaps Chris the speller could comment on that. Regards, SunCreator 22:05, 1 August 2012 (UTC)
friend
Could the friend rule be adjusted so it does NOT change Frindall to Friendall. There are many occurrences of the individual Bill Frindall. Regards, SunCreator 19:19, 21 July 2012 (UTC)
- Done GoingBatty (talk) 23:59, 21 July 2012 (UTC)
- Thanks. This was causing a lot of false positives. Regards, SunCreator 22:47, 22 July 2012 (UTC)
Fun with sports and hyphens
A few suggested sports fixes:
- game winning goal => game-winning goal (at least 300-400 occurrences)
- walkoff => walk-off (at least 50; this appears to fix "walk-off" in the sense of a 9th inning baseball win as well as its occasional use for striking workers)
- game winning home => game-winning home (40-50)
- game winning hit => game-winning hit (30)
I've given each of these substitutions a test run and didn't see any significant false positives. Thanks as always for your efforts Khazar2 (talk) 21:46, 21 July 2012 (UTC)
- Added "game-winning" and "walk-off" rules. GoingBatty (talk) 01:34, 23 July 2012 (UTC)
"Under-development"
I reverted the "Overdevelopment" rule to its former state that does not treat "under-development". There are a number of articles that use "under-development" attributively, such as Grupo Alexander Bain. It's an ugly construct, and I would rather see "an under-development campus" changed to "a campus that is under development", but we can't change "an under-development campus" to "an underdevelopment campus", which has has a different, and pejorative, meaning. Chris the speller 12:53, 22 July 2012 (UTC)
Qur'an rule
I'm a bit uneasy with this rule given the subjects article is called Quran. Regards, SunCreator 13:08, 22 July 2012 (UTC)
- Quran states: "The Quran...also transliterated Qur'an, Koran, Al-Coran, Coran, Kuran, and Al-Qur'an, is the central religious text of Islam". Since there are apparently several acceptable spellings, I wouldn't want the rule to change "Quran" to "Qur'an" or vice versa. Which replacements does the typo rule make that are you concerned about? Thanks! GoingBatty (talk) 01:29, 23 July 2012 (UTC)
- This. Just me being unfamiliar it seems. Regards, SunCreator 02:09, 23 July 2012 (UTC)
- It does seem that there are many ways to write it. That specific one seems okay from some websites, the Qu'ran has more then half a dozen 'okay' ways to write it. Koran, Coran, Quran, Qur'an, Qur’ān and al-Qur’ān are some of the most popular ones. Though it seems to be due to a shift in political correctness and accuracy of the religious text for transcription. The evolution of it is still ongoing. ChrisGualtieri (talk) 14:58, 23 July 2012 (UTC)
- This. Just me being unfamiliar it seems. Regards, SunCreator 02:09, 23 July 2012 (UTC)
paraguayan => Paraguayan_Paraguayan-2012-07-22T13:29:00.000Z">
People of Paraguay are Paraguayan. Regards, SunCreator 13:29, 22 July 2012 (UTC)_Paraguayan"> _Paraguayan">
- Done. Chris the speller 14:24, 22 July 2012 (UTC)
- Thanks. Regards, SunCreator 22:46, 22 July 2012 (UTC)
Morissette rule
There are three 'Bill Morrisette'. I think rule should be refined to avoid them. Regards, SunCreator 22:45, 22 July 2012 (UTC)
- Bill isn't the only Morrisette with a Misplaced Pages article. I've taken the conservative approach and changed the rule so it will only fix misspellings of Alanis Morissette. As always, other ideas are appreciated. GoingBatty (talk) 01:45, 23 July 2012 (UTC)
- Thanks. Regards, SunCreator 22:08, 26 July 2012 (UTC)
New rule "On board"
I have had many complaints and questions in the past about the difference between "on board" and "onboard", so I will lay it out here and reference this discussion in a comment attached to the rule.
- The adjective "onboard" (or "on-board", according to a few dictionaries) is attributive, and is always followed by a noun (or another adjective and noun):
- "They brought their own sandwiches, as the onboard food was usually tasteless".
- "He hoped there was enough power for the on-board electrical devices."
- The prepositional phrase or idiom "on board" indicates that something is located or installed in a train, airplane or vessel:
- "Everyone was on board, so he shut the door."
- "She was glad to see that there was a toaster on board the lifeboat."
The Typo rule fixes many cases where "onboard" is followed by something other than a noun or adjective (such as punctuation, an article or an adverb), indicating that it is not used attributively, so it knows that "on board" should be substituted. The rule certainly misses many misuses of "onboard", but after much testing it has produced next to zero false positives, and there are a ton of these to be fixed. Chris the speller 01:15, 23 July 2012 (UTC)
- I haven't seen any false positives yet and I've corrected quite a few so far. ChrisGualtieri (talk) 15:02, 23 July 2012 (UTC)
Sorry, you are making a basic grammatical error here. Atributive Attributive adjectives are not always immediately followed by the noun, although they are usually. The important fact when considering the adjective when it occurs after the noun is whether or not there is a linking verb between the noun and the adjective. If there is no such verb then the adjective is still attributive. - Nick Thorne 15:01, 31 July 2012 (UTC)
- I was trying to keep things simple enough that most AWB editors (and AWB critics) can get a handle on what the rule is trying to accomplish without spending a whole afternoon on a grammar refresher course. The point is that the rule does a good job of avoiding changes where "onboard" is an attributive (that's spelled correctly, BTW) adjective. If you have seen cases where the rule has changed an actual attributive case of "onboard" to "on board", please let us know. I think you'll have a hard time finding even one or two cases of the attributive use of "onboard" that is not followed immediately by a noun or another adjective. If you can't find such cases, what is the point of making this discussion more complicated? Our purpose here is to improve and maintain Misplaced Pages, not to display our knowledge of the fine points of grammar. Creating AWB Typo rules is largely a game of controlling the odds, and this rule seems to be ahead of the game at this point. Chris the speller 19:30, 31 July 2012 (UTC)
- Sorry about the spelling mistake in my first use of the word, now corrected. (I always try to keep my spelling correct.) I take your point about trying to keep things simple, but I question whether that is always a good thing when dealing with subtle points of grammar. As for an example, the reason I raise this whole issue was this edit of an article on my watch list. I think that bots are not best suited to making grammatical changes on the less well understood points of grammar, not least because there are always exceptions, usually contextual in nature, that make it hard or impossible to codify every possible situation. - Nick Thorne 22:45, 31 July 2012 (UTC)
- I'm skeptical that the edit you flag here is a false positive. Googling NYT and BBC (to make sure ENGVAR isn't an issue), "people on board" outnumbers "people onboard" by about 150:1. "Personnel onboard" has a smaller sample size but equivalent results. Clearly the former is the preferred usage. Khazar2 (talk) 23:17, 31 July 2012 (UTC)
- Khazar2 is right: in the example provided by Nick Thorne, "onboard" was not used attributively, and should be two words; it is a prepositional phrase, the equivalent of "aboard". While writing that last sentence, I suddenly realized that there is a simple test to help decide whether "onboard or "on board" should be used: if "aboard" could be substituted, then "on board" is correct; otherwise, "onboard" should be used. Using the above example, "Everyone was aboard, so he shut the door." makes as much sense as "Everyone was on board". Another point: AWB is not a bot; editors are looking at each change to verify its correctness. Chris the speller 03:10, 1 August 2012 (UTC)
- Your example fails because there is a linking verb between the noun and the adjective, an important point. In the Nias article it said the aircraft had 11 people onboard. This could have been written the aircraft had 11 onboard people with no change in meaning, it just seems a little unnatural which is why the adjective follows the nouns in this case. The word onboard in both cases is being used attributively - it is attributing the property of location to the people. As a former Fleet Air Arm officer, I watch many pages related to naval aviation and nautical matters. It was because of this that the subject came to my attention. The word onboard is perhaps not very common in everyday speech, but in aviation and nautical discussions it has a particular meaning which is not quite the same as on board. One of the things that disappoints me about Misplaced Pages is that sometimes well intentioned people make changes to articles that indicate an incomplete understanding of the particular subject. It is a form of unintentional dumbing down of the encyclopedia. I would have thought that one of the purposes of the encyclopedia is to educate people. If part of that is making sure that obscure points of grammar are attended to then IMO that is no bad thing. This is not a criticism of your work, on the contrary, fixing up spelling and grammar mistakes in the encyclopedia is a great service to the community. In this case however, I think you're missing a subtle shade of meaning. In any case I don't plan keep on about this. If you decide to change the article back I will of course be happy about that. If not, well let's face it, it's not the most pressing issue on Misplaced Pages is it? - Nick Thorne 23:20, 1 August 2012 (UTC)
- I know perfectly well what a prepositional phrase is. The most pressing issue on WP is accuracy, but spelling, grammar and punctuation are important. I will continue to correct those aspects as well. Chris the speller 02:56, 2 August 2012 (UTC)
- Your example fails because there is a linking verb between the noun and the adjective, an important point. In the Nias article it said the aircraft had 11 people onboard. This could have been written the aircraft had 11 onboard people with no change in meaning, it just seems a little unnatural which is why the adjective follows the nouns in this case. The word onboard in both cases is being used attributively - it is attributing the property of location to the people. As a former Fleet Air Arm officer, I watch many pages related to naval aviation and nautical matters. It was because of this that the subject came to my attention. The word onboard is perhaps not very common in everyday speech, but in aviation and nautical discussions it has a particular meaning which is not quite the same as on board. One of the things that disappoints me about Misplaced Pages is that sometimes well intentioned people make changes to articles that indicate an incomplete understanding of the particular subject. It is a form of unintentional dumbing down of the encyclopedia. I would have thought that one of the purposes of the encyclopedia is to educate people. If part of that is making sure that obscure points of grammar are attended to then IMO that is no bad thing. This is not a criticism of your work, on the contrary, fixing up spelling and grammar mistakes in the encyclopedia is a great service to the community. In this case however, I think you're missing a subtle shade of meaning. In any case I don't plan keep on about this. If you decide to change the article back I will of course be happy about that. If not, well let's face it, it's not the most pressing issue on Misplaced Pages is it? - Nick Thorne 23:20, 1 August 2012 (UTC)
- Khazar2 is right: in the example provided by Nick Thorne, "onboard" was not used attributively, and should be two words; it is a prepositional phrase, the equivalent of "aboard". While writing that last sentence, I suddenly realized that there is a simple test to help decide whether "onboard or "on board" should be used: if "aboard" could be substituted, then "on board" is correct; otherwise, "onboard" should be used. Using the above example, "Everyone was aboard, so he shut the door." makes as much sense as "Everyone was on board". Another point: AWB is not a bot; editors are looking at each change to verify its correctness. Chris the speller 03:10, 1 August 2012 (UTC)
- I'm skeptical that the edit you flag here is a false positive. Googling NYT and BBC (to make sure ENGVAR isn't an issue), "people on board" outnumbers "people onboard" by about 150:1. "Personnel onboard" has a smaller sample size but equivalent results. Clearly the former is the preferred usage. Khazar2 (talk) 23:17, 31 July 2012 (UTC)
- Sorry about the spelling mistake in my first use of the word, now corrected. (I always try to keep my spelling correct.) I take your point about trying to keep things simple, but I question whether that is always a good thing when dealing with subtle points of grammar. As for an example, the reason I raise this whole issue was this edit of an article on my watch list. I think that bots are not best suited to making grammatical changes on the less well understood points of grammar, not least because there are always exceptions, usually contextual in nature, that make it hard or impossible to codify every possible situation. - Nick Thorne 22:45, 31 July 2012 (UTC)
Guerilla -> Guerrilla_Guerrilla-2012-07-25T14:52:00.000Z">
Several dictionaries list "Guerilla" as an alternate spelling. Macmillan and oxforddictionaries.com are a couple. I don't like the single-"r" spelling at all, but there it is. Sorry, I'm going to remove the rule. Chris the speller 14:52, 25 July 2012 (UTC)_Guerrilla"> _Guerrilla">
- BTW, you might feel better after seeing that someone once tried to go the other way with this (see talk Archive 1). I also commented in talk Archive 3 that research on each article is needed before choosing "r" or "rr". Chris the speller 16:48, 25 July 2012 (UTC)
Umayyad entry
Not sure why this is happening, but the entry seems to be going for any loose 'd' and attempting to change it to Umayyad. Even with the shortening for 'd.' for 'died' in articles like Abdullah Al-Refai. I am not disabling it yet, but I've had 6+ false positives in the last 10 minutes. ChrisGualtieri (talk) 12:28, 26 July 2012 (UTC)
- I fixed it; it had one too many vertical bars. Chris the speller 15:01, 26 July 2012 (UTC)
- Thanks for fixing this - I'd just noticed the same problem myself. Colonies Chris (talk) 15:26, 26 July 2012 (UTC)
- Thank you! I was wondering why it was doing that, I've only corrected 2000 typos and had it come up so many times. I don't fully understand the rules and how they operate. I hate to say it, but it had a good detection on contractions like 'they'd' and 'she'd' which bug me. ChrisGualtieri (talk) 16:13, 26 July 2012 (UTC)
- What a Chris team! One Chris to create it, one Chris to test it, one Chris to fix it! I'm sure we'll be swapping roles in the future. Chris the speller 17:05, 26 July 2012 (UTC)
- Thank you! I was wondering why it was doing that, I've only corrected 2000 typos and had it come up so many times. I don't fully understand the rules and how they operate. I hate to say it, but it had a good detection on contractions like 'they'd' and 'she'd' which bug me. ChrisGualtieri (talk) 16:13, 26 July 2012 (UTC)
- Thanks for fixing this - I'd just noticed the same problem myself. Colonies Chris (talk) 15:26, 26 July 2012 (UTC)
Capitalisation of egyptian => Egyptian(s)_Egyptian(s)-2012-07-28T21:43:00.000Z">
Seems like a good idea. No obvious false positives at Egyptian. Regards, SunCreator 21:43, 28 July 2012 (UTC)_Egyptian(s)"> _Egyptian(s)">
- Doesn't the existing rule in the Geographical proper names section cover this? GoingBatty (talk) 03:03, 30 July 2012 (UTC)
- Good question. I wasn't aware of that rule and had to manually correct this. So I guess the answer is the existing rule doesn't cover it. But I'm not sure why. Regards, SunCreator 03:57, 30 July 2012 (UTC)
- My guess was there were unbalanced quotation marks in the article causing AWB to skip that section of the article, but I didn't see that. GoingBatty (talk) 04:14, 30 July 2012 (UTC)
- No, as it corrected the word after; allready => already. See the previous edit. It appears the problem is with the rule. I will test it later. Regards, SunCreator 04:45, 30 July 2012 (UTC)
- The text is in User:John of Reading/Sandbox. A typo rule is disabled if it matches any wikilink in the article. By experiment, I find that this test is fooled by "links" to the File namespace. So, because the article contains
]
, the "Egypt" rule is turned off. I'll log a bug. -- John of Reading (talk) 07:04, 30 July 2012 (UTC)- What a strange error! Thank you John. Regards, SunCreator 12:19, 30 July 2012 (UTC)
- The text is in User:John of Reading/Sandbox. A typo rule is disabled if it matches any wikilink in the article. By experiment, I find that this test is fooled by "links" to the File namespace. So, because the article contains
- No, as it corrected the word after; allready => already. See the previous edit. It appears the problem is with the rule. I will test it later. Regards, SunCreator 04:45, 30 July 2012 (UTC)
- My guess was there were unbalanced quotation marks in the article causing AWB to skip that section of the article, but I didn't see that. GoingBatty (talk) 04:14, 30 July 2012 (UTC)
- Good question. I wasn't aware of that rule and had to manually correct this. So I guess the answer is the existing rule doesn't cover it. But I'm not sure why. Regards, SunCreator 03:57, 30 July 2012 (UTC)
- Rjwilmsi (talk · contribs) is happy to make the change if we can agree that it will do more good than harm. But, on reflection, I think it will be very difficult to work out if this change would be an improvement. Using the current code, some typos are not getting fixed - but it took a sharp-eyed AWB user to notice one of them and raise it here. Using the proposed new code, these typos would be fixed - but there would probably be some new false positives. I have no idea whether the extra fixes would outnumber the extra false positives, and it would take a serious amount of work to find out. -- John of Reading (talk) 05:43, 1 August 2012 (UTC)
Servey -> Survey_Survey-2012-07-29T08:40:00.000Z">
I constructed the following rule after encountering misspellings of "survey" (and other forms of the word):
<Typo word="Survey" find="\b()rvey(*)\b" replace="$1urvey$2" />
However, in testing, I found less than 20 pages in a wikitext search for "servey", "serveyed", and "serveying". Several of those turned out to be false positives, matching on people whose names were actually "Servey".
Might this rule be too risky to add to the RETF list? Maybe it would be better to have it match only forms with a suffix?
<Typo word="Survey" find="\b()rvey(+)\b" replace="$1urvey$2" />
Input is appreciated. Thanks, Tuvok 08:40, 29 July 2012 (UTC)_Survey"> _Survey">
- If the false positives are people called "Servey" then amending the rule to avoid those starting with an uppercase 'S' would likely improve the rule considerably. Regards, SunCreator 09:27, 29 July 2012 (UTC)
- This seems to be a case where there are too few hits to justify a new Typo rule. There should be at least a few dozen errors, with very few false positives, before adding a new rule. Chris the speller 12:55, 29 July 2012 (UTC)
Extra spaces left by one or more rules that move punctuation to before <ref> tags_tag-2012-07-29T08:44:00.000Z">
Sometimes, a period is moved from after a closing </ref> to before the starting <ref>, but instead of being simply moved its old location is filled with a space, resulting in two spaces between sentences. Not the end of the world, but certainly unnecessary.
This also happens at the end of a line sometimes, which seems to bypass the rules that trim trailing spaces. (Obviously that's related to running the rules in a particular order.) Tuvok 08:44, 29 July 2012 (UTC)_tag"> _tag">
- This isn't part of the typo rules, this is one of AWB's general rules. Even though this wouldn't change how the article is presented to the reader, you may wish to create a bug report. GoingBatty (talk) 03:11, 30 July 2012 (UTC)
- Thanks, GoingBatty. Apparently I've taken your username to heart in advance of meeting you, thanks to the complexity of AWB. I'll take this elsewhere, and thanks again for correcting my heading. Cheers, Tuvok 04:37, 30 July 2012 (UTC)
Departement => French?_French?-2012-07-29T13:10:00.000Z">
Current rule:<Typo word="Département(al)" find="\b()epartement(ale?)?\b" replace="$1épartement$2" />
I don't understand this rule which changes Departement => Département(the French word for department), but why not go with Departement => Department the English spelling. On the English Misplaced Pages even Departments of France has the spelling departments. Regards, SunCreator 13:10, 29 July 2012 (UTC)_French?">
_French?">
- See false positive here. Maybe the rule can be made more specific i.e to change to French spelling if being preceded with le or des or proceeded with au or des. Regards, SunCreator 13:22, 29 July 2012 (UTC)
- I've already done the ones which specifically mention the french variant and ignore all others for a great many pages, the false positives vastly outnumber the real ones. ChrisGualtieri (talk) 14:40, 29 July 2012 (UTC)
- So I take it your all for disabling the rule then. BTW, a false positive from earlier today. Regards, SunCreator 19:24, 29 July 2012 (UTC)
- I disagree with that. The correct term for a French department is département. Department and département are not the same. So when referring to the specific département, as in the French département of Côtes-d'Armor I would expect to use the correct term. Seems to be a matter that was previously dealt with back in 2006 and never again. Why use an english word when the french term is there. ChrisGualtieri (talk) 03:34, 1 August 2012 (UTC)
- In one way that is correct, because it depends on context. But the rule currently has no context and thus blindly recommends changing every departement typo to the French when the English may be correct. It's the same as the 'distict' typo that could be either 'distinct' or 'district'. Regards, Sun Creator 10:38, 9 August 2012 (UTC)
- I disagree with that. The correct term for a French department is département. Department and département are not the same. So when referring to the specific département, as in the French département of Côtes-d'Armor I would expect to use the correct term. Seems to be a matter that was previously dealt with back in 2006 and never again. Why use an english word when the french term is there. ChrisGualtieri (talk) 03:34, 1 August 2012 (UTC)
- So I take it your all for disabling the rule then. BTW, a false positive from earlier today. Regards, SunCreator 19:24, 29 July 2012 (UTC)
- I've already done the ones which specifically mention the french variant and ignore all others for a great many pages, the false positives vastly outnumber the real ones. ChrisGualtieri (talk) 14:40, 29 July 2012 (UTC)
Etc. => etc._etc.-2012-07-29T21:39:00.000Z">
Only in the exception article Etc. could this start a sentence, so couldn't it be made into lowercase? Like the "i.e." rule it would seem appropriate to use only lowercase "etc." Regards, SunCreator 21:39, 29 July 2012 (UTC)_etc."> _etc.">
- There are several uppercase examples at the disambiguation page ETC, such as Etc... (a Czech rock band), Etc. (the b-sides and rarities album of the influential punk band Jawbreaker), and Etc. (a bonus disc accompanying the Pet Shop Boys' 2009 release Yes.) GoingBatty (talk) 03:22, 30 July 2012 (UTC)
- Thanks. Those possibilities seem to cover a small number of topics that could be individually resolved with
{{Not a typo}}
. I'm encouraged to think this could be a workable rule. Regards, SunCreator 16:02, 30 July 2012 (UTC)
- Thanks. Those possibilities seem to cover a small number of topics that could be individually resolved with
.i.e. rule for Irish websites
<Typo word="i.e." find="\bi(?:\.?e|e\.)()(?<!\.ie.|'ie')" replace="i.e.$1" /><!--don't generalize to capital Ie; avoid matching website.ie; avoid matching 'ie' used as syllable -->
This rule was changed to avoid Irish .ie domains. I just noticed it's not working, and still changes .ie. to .i.e. for example on Irish poetry. Can someone with Regex wizardry take a look at correcting the issue. Regards, SunCreator 22:36, 29 July 2012 (UTC)
More French loanwords
I see there are typo rules for some French loanwords. Should we also add rules for bête noire, bourrée, château(x?), passé, and séance? (Potential rules for château and séance should not include capital letters - see their disambiguation pages.) Thanks! GoingBatty (talk) 03:17, 31 July 2012 (UTC)
- Not for "chateau" or "seance". "Chateau" is the English word, which allows "château" as an alternate spelling (in some dictionaries). Same goes for "seance/séance". The Château page has been roughly handled by a group of editors suffering from fairly bad cases of hyperforeignism. This is the English Misplaced Pages, and the standard for spellings is a good English dictionary. Chris the speller 04:26, 31 July 2012 (UTC)
- OK, I've updated the château and séance articles to indicate the unaccented versions are acceptable (just like fête). Thanks! GoingBatty (talk) 04:37, 31 July 2012 (UTC)
Text with a lot of typos
here is some text with a lot of typos corrected. Could any of these be good for typo fixing? Regards, SunCreator 23:58, 31 July 2012 (UTC)
- I would say at least three: Conservative, Successor and student but possibly others also. Kumioko (talk) 00:42, 1 August 2012 (UTC)
- There are less than a dozen articles with "stuent", and even fewer cases of the other misspellings. I would say that this so rare as to be slightly below the threshold for adding a Typo rule. Please read the section above, "georaphical => geographical", for other ways to deal with rare misspellings. Chris the speller 03:23, 1 August 2012 (UTC)
- I suspect only the capitalisation of 'panjab' would meet the previously discussed 24 or 25 occurrence level. It just goes to highlight that the majority of typos are low volume and therefore the current AWB typo strategy misses them. Regards, SunCreator 14:14, 1 August 2012 (UTC)
- There are less than a dozen articles with "stuent", and even fewer cases of the other misspellings. I would say that this so rare as to be slightly below the threshold for adding a Typo rule. Please read the section above, "georaphical => geographical", for other ways to deal with rare misspellings. Chris the speller 03:23, 1 August 2012 (UTC)
Edit summary
Please check Wikipedia_talk:AutoWikiBrowser/Feature_requests#Improve_edit_summary_for_.22typos_fixed.22. Do you think we should implement this for enwiki? -- Magioladitis (talk) 19:04, 1 August 2012 (UTC)
- Good idea, so that this part of the edit summary is self-contained. -- John of Reading (talk) 19:43, 1 August 2012 (UTC)
- Sounds like a good idea to me! :-) GoingBatty (talk) 03:08, 2 August 2012 (UTC)
- (Aside) And if the general fixes added "]" if and only if the general fixes did anything, I wouldn't have to pick one of my two edits summaries before saving each edit. -- John of Reading (talk) 20:19, 1 August 2012 (UTC)
rev 8255 done for en-wiki Wikimedia projects. Rjwilmsi 08:25, 13 August 2012 (UTC)
Distinct rule
Distinct rule converts 'Distict' => 'Distinct' but many times the correct word is 'District'. Regards, SunCreator 12:35, 3 August 2012 (UTC)
- There is a District rule that converts 'Distict' => 'District' also, but is seems in practice the Distinct rule gets it first. Regards, SunCreator 14:04, 3 August 2012 (UTC)
Lifelong false positive
She sacrificed her life long ago. => she sacrificed her lifelong ago. Regards, SunCreator 00:05, 4 August 2012 (UTC)
- Plus "and a way of life long gone", "ended her life long before they reached her", "a mode of life long since defunct" and "of a life long-lived on one side. Regards, SunCreator 02:30, 4 August 2012 (UTC)
- Adjusted rule is handle those situations. Regards, SunCreator 02:37, 4 August 2012 (UTC)
née
It seems that there are up to seven ways that people spell their own name when it contains a variation of "nee", and Regex wants to change every single one of them to née. It accounts for maybe 1/5 of the "typos" that Regex picks up in my filtered searches. Is there any way we could change, or even better, eliminate this rule? hajatvrc @ 20:02, 4 August 2012 (UTC)
- In fact I've never seen it make a correct change with this rule. hajatvrc @ 20:04, 4 August 2012 (UTC)
- Examples please, I'll look into it. Regards, SunCreator 20:19, 4 August 2012 (UTC)
- This edit is an example of a correct change. GoingBatty (talk) 20:40, 4 August 2012 (UTC)
- Was looking for examples of false positives. Here are some correct changes. Regards, SunCreator 20:49, 4 August 2012 (UTC)
- I was responding to Hajatvrc's statement saying "I've never seen it make a correct change with this rule.". This edit and this edit are two more correct changes. I hope Hajatvrc can provide examples of false positives, per your request. GoingBatty (talk) 20:53, 4 August 2012 (UTC)
- Here is some correct ones that hajat did. Regards, SunCreator 20:57, 4 August 2012 (UTC)
- I was responding to Hajatvrc's statement saying "I've never seen it make a correct change with this rule.". This edit and this edit are two more correct changes. I hope Hajatvrc can provide examples of false positives, per your request. GoingBatty (talk) 20:53, 4 August 2012 (UTC)
- Was looking for examples of false positives. Here are some correct changes. Regards, SunCreator 20:49, 4 August 2012 (UTC)
- This edit is an example of a correct change. GoingBatty (talk) 20:40, 4 August 2012 (UTC)
- Examples please, I'll look into it. Regards, SunCreator 20:19, 4 August 2012 (UTC)
Generally, it is when it is not used to say "born as" but it is their actual name. I am searching for examples I've come across. But in the meantime I'm curious whether the uses are correct in: Petra Taylor, Annabelle Collins (Brookside), Jackie Corkhill, etc.. hajatvrc @ 21:03, 4 August 2012 (UTC)
- Forgive me for saying "never" it was an inappropriate hyperbole. hajatvrc @ 21:10, 4 August 2012 (UTC)
- No problem. Regards, SunCreator 21:16, 4 August 2012 (UTC)
- They look okay to me. Are you saying the née change is questionable as it may not be her maiden family name? I'm somewhat confused at what the issue is. Regards, SunCreator 21:16, 4 August 2012 (UTC)
- I feel like there was one category of people from a certain ethnicity where nearly every woman had that as their actual name, but I'm trying to remember which one it was! hajatvrc @ 21:19, 4 August 2012 (UTC)
- The rule is case sensitive so would successfully avoid Watchman Nee, John Nee, Lim Nee Soon and similar names. Regards, SunCreator 22:02, 4 August 2012 (UTC)
- And the point with the three that I linked is they spelled it with a grave accent on the second e. So that is correct also? hajatvrc @ 21:23, 4 August 2012 (UTC)
- neè seems incorrect so changing neè => nèe would appear to be good. Regards, SunCreator 22:02, 4 August 2012 (UTC)
- Based on exact-phrase Google searches, there appear to be countless women who spell it "neè" and countless women who spell it "née". I had never encountered the former until I started using TypoScan a few days ago. The problem is, I can't find a reputable source that says née is or is not the only way to spell it. Do you know of one? hajatvrc @ 22:07, 4 August 2012 (UTC)
- Google News shows no English language result for "neè". "neè" is not in my Collins dictionary or online on the Oxford dictionary. Tell me what you are looking at in Google? All I see is social media and Facebook typos. Regards, SunCreator 22:31, 4 August 2012 (UTC)
- "neè" -facebook -twitter -youtube hajatvrc @ 22:34, 4 August 2012 (UTC)
- Google News shows no English language result for "neè". "neè" is not in my Collins dictionary or online on the Oxford dictionary. Tell me what you are looking at in Google? All I see is social media and Facebook typos. Regards, SunCreator 22:31, 4 August 2012 (UTC)
- Based on exact-phrase Google searches, there appear to be countless women who spell it "neè" and countless women who spell it "née". I had never encountered the former until I started using TypoScan a few days ago. The problem is, I can't find a reputable source that says née is or is not the only way to spell it. Do you know of one? hajatvrc @ 22:07, 4 August 2012 (UTC)
- neè seems incorrect so changing neè => nèe would appear to be good. Regards, SunCreator 22:02, 4 August 2012 (UTC)
- I feel like there was one category of people from a certain ethnicity where nearly every woman had that as their actual name, but I'm trying to remember which one it was! hajatvrc @ 21:19, 4 August 2012 (UTC)
Then on the other hand, "neè" site:en.wikipedia.org only produces four articles. hajatvrc @ 22:44, 4 August 2012 (UTC)
- I suppose I could change those and see if anyone gets angry... hajatvrc @ 22:46, 4 August 2012 (UTC)
- I found one that neither the wiki or Google search found. Regards, SunCreator 23:48, 7 August 2012 (UTC)
AWB avoids too many areas that contain typos
I'm fairly new to typo correction with AWB. In my testing of regex additions/changes, I find that AWB skips a substantial portion of the typos that would match because they're 1) in references, 2) in text indented with a colon, 3) seemingly many other areas. None of this is well documented. I don't quite understand this: we're expected to review changes anyway, so why have so many areas ignored? Here's an example: my target "origional" did not hit here , but an unrelated typo hit (I had edit summary trouble here, ignore that). So I manually and temporarily removed the indentation ":" that I presumed was blocking the typo fix, within AWB in that edit. Then, parsing the article again, AWB corrected the typo I wanted , so it was the colon causing the problem (and I manually replaced the colon). It kneecaps the project to have some many textual areas excluded from correction. I wouldn't mention it if I hadn't had about 40% or more of target typos ignored by AWB so far. Riggr Mortis (talk) 02:48, 5 August 2012 (UTC)
- Misplaced Pages:AutoWikiBrowser/Typos#Usage states "When used on AWB, typo-fixing is automatically prevented on image names, templates, wikilink targets and quotes (including indented paragraphs). If a typo rule matches a wikilink target, this rule will be ignored on the whole page." GoingBatty (talk) 03:42, 5 August 2012 (UTC)
- I've seen that; I said well documented. "Indented paragraphs": there are many ways to do that. So "Joe's Journal of Psychaitry" doesn't get corrected because it has an asterisk in front of it: pointless. Templates: the template name itself (obviously), or its parameters too? In any case, the substantive point remains. Riggr Mortis (talk) 03:54, 5 August 2012 (UTC)
- It's the entire template. What article contains/contained "Joe's Journal of Psychaitry" with an asterisk? GoingBatty (talk) 04:32, 5 August 2012 (UTC)
- I think you're taking me rather literally; but in fact, AWB is ignoring no less than seven instances of "psychatric", which relates to a regex I added the other day. Try it. An article with a bullet point and "psychatric" is List of oldest buildings and structures in Toronto (it's also contained with a link, but not the URL, so who cares—all regular text is susceptible to typos, regardless of what wikicode it's wrapped in.) Riggr Mortis (talk) 05:07, 5 August 2012 (UTC)
- I think it's good that AWB does not fix "psychatric" in the reference in Manpreet Singh, since the source actually uses "Psychatric". That's an example why AWB is conservative in its corrections, and does not make changes to the other six articles where "psychatric" is in a reference or external link. GoingBatty (talk) 22:45, 5 August 2012 (UTC)
- I don't agree to be honest unless people are using AWB sloppily. I won't make such a change unless I could validate it somehow. Either way, Chris's solution of setting level of exclusion is the way to go. Regards, SunCreator 22:53, 5 August 2012 (UTC)
- I think it's good that AWB does not fix "psychatric" in the reference in Manpreet Singh, since the source actually uses "Psychatric". That's an example why AWB is conservative in its corrections, and does not make changes to the other six articles where "psychatric" is in a reference or external link. GoingBatty (talk) 22:45, 5 August 2012 (UTC)
- I think you're taking me rather literally; but in fact, AWB is ignoring no less than seven instances of "psychatric", which relates to a regex I added the other day. Try it. An article with a bullet point and "psychatric" is List of oldest buildings and structures in Toronto (it's also contained with a link, but not the URL, so who cares—all regular text is susceptible to typos, regardless of what wikicode it's wrapped in.) Riggr Mortis (talk) 05:07, 5 August 2012 (UTC)
- It's the entire template. What article contains/contained "Joe's Journal of Psychaitry" with an asterisk? GoingBatty (talk) 04:32, 5 August 2012 (UTC)
- I've seen that; I said well documented. "Indented paragraphs": there are many ways to do that. So "Joe's Journal of Psychaitry" doesn't get corrected because it has an asterisk in front of it: pointless. Templates: the template name itself (obviously), or its parameters too? In any case, the substantive point remains. Riggr Mortis (talk) 03:54, 5 August 2012 (UTC)
- (edit conflict) I Agree with Riggr. It would seem appropriate to work towards having less content ignored. It's not ignored in wikEd and I imagine in the future that will the common editing method. Might take reading to find out the reason behind these in the past but I'm open in having more content even if that means more cleaning up in terms of image renaming, marking more
{{not a typo}}
etc. Regards, SunCreator 04:37, 5 August 2012 (UTC)- I often put a Typo rule into my Find & Replace rules and then search for that error and run them all to ground. But it might be a better move to add options in AWB to allow Typo fixes in indented paragraphs, Wikilink targets, etc. This would let each AWB user choose his or her own comfort level with how many hits will need to be skipped, how much extra examination will be needed, and how much risk they want to take. Chris the speller 12:50, 5 August 2012 (UTC)
- Some options to set level of exclusion seems a great first step. Regards, SunCreator 13:01, 5 August 2012 (UTC)
- I often put a Typo rule into my Find & Replace rules and then search for that error and run them all to ground. But it might be a better move to add options in AWB to allow Typo fixes in indented paragraphs, Wikilink targets, etc. This would let each AWB user choose his or her own comfort level with how many hits will need to be skipped, how much extra examination will be needed, and how much risk they want to take. Chris the speller 12:50, 5 August 2012 (UTC)
- (edit conflict) I Agree with Riggr. It would seem appropriate to work towards having less content ignored. It's not ignored in wikEd and I imagine in the future that will the common editing method. Might take reading to find out the reason behind these in the past but I'm open in having more content even if that means more cleaning up in terms of image renaming, marking more
- The number of typos Regex can get is also very limited. So it won't get them all, or even half of all typos on a page in which they may hide. Aside from loading every page with a built in checker (instead of AWB) we will continue to miss many simply by using AWB loaded with Regex. ChrisGualtieri (talk) 05:39, 6 August 2012 (UTC)
- So you'd prefer that we not maximize the value of all the work that's been done here over years, because perfection can't be obtained? Not a strong argument in any context, really. Riggr Mortis (talk) 23:43, 6 August 2012 (UTC)
- I'd agree with you if it wasn't for the fact that I've corrected more then 40,000 articles worth of typos with Typoscan. I'm in the boat of 'Regex is good', but I cannot bypass the sheer force of a modern spellchecker that offers options but retains a 97-99% detection rate or higher. Regex is limited for many reasons, but its limitations cover important typos. ChrisGualtieri (talk) 00:34, 7 August 2012 (UTC)
- So you'd prefer that we not maximize the value of all the work that's been done here over years, because perfection can't be obtained? Not a strong argument in any context, really. Riggr Mortis (talk) 23:43, 6 August 2012 (UTC)
I have submitted a feature request to add an option(s) to allow Typo fixing in more of these areas. Chris the speller 15:27, 6 August 2012 (UTC)
- Thank you Chris! Riggr Mortis (talk) 23:43, 6 August 2012 (UTC)
Off- and oficial
<Typo word="Off-" find="\b()f(?:|ff)(er(?:ed|ings?)|ice(?:r?|holder)s?|icia(l(?:s?|ly|dom|ism)|te?|ting))\b" replace="$1ff$2" />
Many rules try to avoid 'oficial' because of common foreign language usage.
The above rule does change it although the comment implies otherwise. Can we amend this so oficial is left unchanged. Regards, SunCreator 14:29, 5 August 2012 (UTC)
- Please do so! That and differencia or whatever it is. Same with whatever changes Enpippi to Empippi, anything which sets En to Em. These rules constantly hit upon articles with foreign languages, the chances of finding an actual correction seems very low. ChrisGualtieri (talk) 05:42, 6 August 2012 (UTC)
- -Emp now disabled. I just hit it with a false positive "Enpl." minutes after reading the above post. I may re-enable and tune it at another time. Regards, SunCreator 06:20, 6 August 2012 (UTC)
- Also, foreign language texts should be flagged with appropriate {{lang}} templates. The all of the typo rules will ignore the text. -- JHunterJ (talk) 12:52, 6 August 2012 (UTC)
- What if you don't know what language it is? Regards, SunCreator 21:26, 6 August 2012 (UTC)
- See Misplaced Pages:Language recognition chart and its list of external links.
- —Wavelength (talk) 21:44, 6 August 2012 (UTC)
- So what language is "Interlingue" or "Sillaba votz es literals" or "La Diferencia"? Some times you only get a word and Wiki article deal with everything including the most unusual ancient languages. Labelling text is not only time consuming to research but if incorrect misleading to those that later edit the article. So useless it is obvious I use
{{Not a typo}}
. Regards, SunCreator 22:11, 6 August 2012 (UTC)- Just discovered another solution just leave the language empty i.e
{{lang||foreign words}}
. Regards, SunCreator 02:51, 8 August 2012 (UTC)
- Just discovered another solution just leave the language empty i.e
- So what language is "Interlingue" or "Sillaba votz es literals" or "La Diferencia"? Some times you only get a word and Wiki article deal with everything including the most unusual ancient languages. Labelling text is not only time consuming to research but if incorrect misleading to those that later edit the article. So useless it is obvious I use
- What if you don't know what language it is? Regards, SunCreator 21:26, 6 August 2012 (UTC)
- Also, foreign language texts should be flagged with appropriate {{lang}} templates. The all of the typo rules will ignore the text. -- JHunterJ (talk) 12:52, 6 August 2012 (UTC)
- -Emp now disabled. I just hit it with a false positive "Enpl." minutes after reading the above post. I may re-enable and tune it at another time. Regards, SunCreator 06:20, 6 August 2012 (UTC)
Sports vocabulary
I propose that the following misspellings be corrected.
- athalet(e,ic) —> athlet(e,ic)
- (bi,tri,pent,hept,dec)athalon —> (bi,tri,pent,hept,dec)athlon
- cycle(ing,ist) —> cycl(ing,ist)
- parapaleg(ia,ic) —> parapleg(ia,ic)
- quadrupaleg(ia,ic) —> quadrupleg(ia,ic)
- Ukarainian —> Ukrainian
I have been seeing some of those errors on external pages.
—Wavelength (talk) 17:35, 5 August 2012 (UTC)
- I'm all for correcting these, but there are very few, too few to merit the creation of Typo rules. Chris the speller 18:24, 5 August 2012 (UTC)
- Thank you for considering my proposal.
- —Wavelength (talk) 18:51, 5 August 2012 (UTC)
- No problem; if you find a misspelling that occurs in a couple of dozen articles or more, let us know. With that many to chew on, we'll try to give AWB a rip at them. Chris the speller 19:13, 5 August 2012 (UTC)
- Already is a -athalon rule for handling (bi,tri,pent,hept,dec)athalon. Regards, SunCreator 05:15, 6 August 2012 (UTC)
- No problem; if you find a misspelling that occurs in a couple of dozen articles or more, let us know. With that many to chew on, we'll try to give AWB a rip at them. Chris the speller 19:13, 5 August 2012 (UTC)
homonomy => homonymy_homonymy-2012-08-06T06:38:00.000Z">
I think this change could be a false positive per http://dictionary.reference.com/browse/homonomy, http://www.thefreedictionary.com/Homonomy but they could be mistakes. Oxford English online doesn't have the word homonomy. Regards, SunCreator 06:38, 6 August 2012 (UTC)_homonymy"> _homonymy">
Avoid having a rule detect a correct spelling
Writing typo rules says : Avoid having a rule detect a correct spelling
- Is the above a rule or a guide? If it's a rule then both the lifetime break it. There maybe others.
- It seems however that it's better to create a single rule that detects correct spelling then multiple ones, but perhaps I'm missing something. Regards, SunCreator 08:58, 6 August 2012 (UTC)
- Fixed with this edit. It's a rule, otherwise you end up with edit summaries like "Typos fixed: lifetime ban -> lifetime ban". -- JHunterJ (talk) 12:50, 6 August 2012 (UTC)
Invoke the RETF option
- "AWB loads directly from this list whenever someone invokes the RETF option."
How does one invoke the RETF option other then closing AWB and restarting it? Regards, SunCreator 20:51, 6 August 2012 (UTC)
- Do you mean refreshing the typo list without restarting? If so, "File->Refresh status/Typos". Riggr Mortis (talk) 23:34, 6 August 2012 (UTC)
- Yes, that is what I meant. Thank you, it was not obvious. Regards, SunCreator 23:56, 6 August 2012 (UTC)
- I always thought that meant when someone checks the Enable RegexTypoFix box. GoingBatty (talk) 00:58, 7 August 2012 (UTC)
- Yes, that is what I meant. Thank you, it was not obvious. Regards, SunCreator 23:56, 6 August 2012 (UTC)
Redundant units of currency
How common are errors involving redundant units of currency, such as "$10 dollars" and "£10 pounds"? Additional units and their symbols are mentioned in the article "Currency sign".
—Wavelength (talk) 22:07, 6 August 2012 (UTC)
Regex testing
You can use the AWB find and replace to test new typo rules. I just found this out and feeling like a n00bie, so sharing in case others might not know of this excellent way of testing new rules. Regards, SunCreator 12:55, 8 August 2012 (UTC)
Qaran → Qur'an
I'm concerned about AWB changing Qaran → Qur'an in edits like and . Qaran clearly is used in these cases as a placename, and searches indicate that such a place exists (see, for example, here). People are using AWB to turn such usage into nonsense. — Hebrides (talk) 12:55, 8 August 2012 (UTC)
- Sorry for that edit I should not of saved it. I'll adjust the rule not to change the place of Qaran. Regards, Sun Creator 13:04, 8 August 2012 (UTC)
- Thanks. Also, how do I search for all instances where AWB has changed Qaran → Qur'an so that I can decide whether to change them back? This is vital. — Hebrides (talk) 13:06, 8 August 2012 (UTC)
- Not sure, that's difficult. Maybe get a database dump(or someone who has one) prior to the rule being added(Feb 28,2010) and find articles with 'Qaran' spelling and check they are still okay? Regards, Sun Creator 13:28, 8 August 2012 (UTC)
- I wish there was a way, I'll ask around about searching edit summaries. Because an edit summary search tool would bring this one up with the way AWB works, it won't catch 100% if the typo changes are numerous, but I bet it would grab a majority. ChrisGualtieri (talk) 13:43, 8 August 2012 (UTC)
- Thanks. We really do need a way of undoing the trail of damage that a rogue regex can leave in its wake. — Hebrides (talk) 09:30, 9 August 2012 (UTC)
- I wish there was a way, I'll ask around about searching edit summaries. Because an edit summary search tool would bring this one up with the way AWB works, it won't catch 100% if the typo changes are numerous, but I bet it would grab a majority. ChrisGualtieri (talk) 13:43, 8 August 2012 (UTC)
- Not sure, that's difficult. Maybe get a database dump(or someone who has one) prior to the rule being added(Feb 28,2010) and find articles with 'Qaran' spelling and check they are still okay? Regards, Sun Creator 13:28, 8 August 2012 (UTC)
- Thanks. Also, how do I search for all instances where AWB has changed Qaran → Qur'an so that I can decide whether to change them back? This is vital. — Hebrides (talk) 13:06, 8 August 2012 (UTC)
New space after a full stop
For the new full stop rule, please report any false positives here. I've ran it though several thousand of the most difficult articles, domain stuff mainly but it's conceivable that there it has a blind spot, but I don't know where to look. So any reports of false positives would be useful, even one would be great. Regards, Sun Creator 13:25, 8 August 2012 (UTC)
- Preparing for 2000 article check with TypoScan. Will respond after I run the test. ChrisGualtieri (talk) 13:41, 8 August 2012 (UTC)
- Question. I assume it is meant to fix errors such as this, "public.Among" -> "public. Among" in Kairos Future, right? It is not adding the space to this and other articles, I haven't taken it on a test drive in the 'India section' of Misplaced Pages where such sentences have higher then normal errors and lack of spacing. ChrisGualtieri (talk) 14:59, 8 August 2012 (UTC)
- I think a mistimed rule edit, invoke the RETF. Try again. Let me know the article if the problem persists. Regards, Sun Creator 15:08, 8 August 2012 (UTC)
- Still continues. The only reason it hits the page with Regex is because of an actual typo from before, but it is not catching the spacing matter. ChrisGualtieri (talk) 15:14, 8 August 2012 (UTC)
- What is the article name? Regards, Sun Creator 15:16, 8 August 2012 (UTC)
- Kairos Future as noted above. :) ChrisGualtieri (talk) 15:20, 8 August 2012 (UTC)
- The rule worked. Not sure why it doesn't for you. Regards, Sun Creator 15:24, 8 August 2012 (UTC)
- Huh. That is unusual, I'll try it again later on and report back. ChrisGualtieri (talk) 15:56, 8 August 2012 (UTC)
- Works now. Odd why it didn't take on the refresh before. ChrisGualtieri (talk) 03:46, 9 August 2012 (UTC)
- Huh. That is unusual, I'll try it again later on and report back. ChrisGualtieri (talk) 15:56, 8 August 2012 (UTC)
- The rule worked. Not sure why it doesn't for you. Regards, Sun Creator 15:24, 8 August 2012 (UTC)
- Kairos Future as noted above. :) ChrisGualtieri (talk) 15:20, 8 August 2012 (UTC)
- What is the article name? Regards, Sun Creator 15:16, 8 August 2012 (UTC)
- Still continues. The only reason it hits the page with Regex is because of an actual typo from before, but it is not catching the spacing matter. ChrisGualtieri (talk) 15:14, 8 August 2012 (UTC)
- I think a mistimed rule edit, invoke the RETF. Try again. Let me know the article if the problem persists. Regards, Sun Creator 15:08, 8 August 2012 (UTC)
- Question. I assume it is meant to fix errors such as this, "public.Among" -> "public. Among" in Kairos Future, right? It is not adding the space to this and other articles, I haven't taken it on a test drive in the 'India section' of Misplaced Pages where such sentences have higher then normal errors and lack of spacing. ChrisGualtieri (talk) 14:59, 8 August 2012 (UTC)
I've disabled this. It's a great rule but many computer articles have valid 'Somevarible.Somefunction' or 'Somesoftware.Someproduct' used in them. I don't feel that adding {{not a typo}}
to many articles is productive at this point. Regards, Sun Creator 16:58, 9 August 2012 (UTC)
Rule tuning
Before fine tuning the existing rules I'd like to establish the purpose clearly and ideally get consensus on the general intent of the rules.
Degree of precision
At one end you can have blunt rules with many false positives or you can have precise rules which deal with specific variations of a word that have yet to occur. Some options on this spectrum maybe:
- Basic word, anything goes, no consideration of variants
- Check the most common related forms
- Check variants in several dictionary's including related forms
- Check variants in several dictionary's including related forms ignoring stuff not in the wild
- Check variants in several dictionary's including related forms and related forms of related forms etc
- Check variants in several dictionary's including related forms and related forms of related forms etc ignoring stuff not in the wild
- No false positive is acceptable, disable any rule that produces any false positives
- Most rules today appear to be a 2, occasionally some are 4. It's also to be noted that precision is related to length of root letters. I'd like to see rules become more precise, ideally a 6. Regards, Sun Creator 15:07, 9 August 2012 (UTC)
Exceptions
How much should a rule deal with exceptions? A rule should:
- Ignore exceptions
- Handle the most obvious exceptions
- Handle common exceptions found or reported
- Handle common exceptions occurring in Misplaced Pages
- Handle common exceptions occurring on the internet
- Handle reoccurring exceptions in Misplaced Pages
- Handle reoccurring exceptions on the internet
- Handle all exceptions in the wild(properly technically impossible)
URL options
Regardless of a rule you could add a begin and end part to deal with avoiding websites URLS and domain name but it would result in a longer rule and an occasional miss of a typo. Is this a desired option?
Splitting up existing rules
In some cases splitting a rule into two would result in more precision. Especially if a rules doesn't deal with a single typo. If precision is the aim is it okay to split a rule?
Multiple possibilities
Many typos have multiple possibilities. 'distict' could be corrected to 'district' or 'distinct' or simply ignored. Maybe in the future a disambiguation option like a spell checker could be available but for now we have a more limited choice. Many of our current false positives are as a result of a rule picking the incorrect choice out of multiple possibilities.
Should the purpose be to correct with multiple rules, correct to the most likely word with only one rule or leave it alone entirely?
Documentation
In order to tune a rule you have to first work out what you want it to correct, what to avoid and once a rule is created to know it's pitfalls. It would seems appropriate to leave separate documentation showing the typos fixed along false positive information ,so that others could check or adjust a rule at a later time. Would individual /Typos/Rulename pages for each rule be welcomed?
Feedback appreciated. Regards, Sun Creator 14:52, 9 August 2012 (UTC)
C# code → C#code ???
Why is AWB changing "C# code" to "C#code"? I haven't tried any tests, but several other programming languages also end with # and might be caught by the same unfortunate rule. – Hebrides (talk) 10:26, 10 August 2012 (UTC)
- I think that's because AWB has logic to remove the space after # for the external links sections. Maybe the code needs to be refined a little. Kumioko (talk) 11:08, 10 August 2012 (UTC)
- What article? I tested on List of numerical libraries, and it was fine. Regards, Sun Creator 15:31, 10 August 2012 (UTC)
- Sorry, I was just AWBing through 500 new articles and when I spotted it wanted to change ] to ] I just clicked Skip for that article. So I'm sorry I have no idea which of the 500 it was. A few articles later I decided I'd better flag up this problem here. I don't have AWB on the computer I'm using this evening, or I'd test it out by putting ] into a sandbox. — Hebrides (talk) 21:12, 10 August 2012 (UTC)
- The next time AWB tries to make a questionable change, the first thing to do is hit the "Typos" tab, and it will show you what Typo rule fired on that article. Chris the speller 21:16, 10 August 2012 (UTC)
- Sorry, I was just AWBing through 500 new articles and when I spotted it wanted to change ] to ] I just clicked Skip for that article. So I'm sorry I have no idea which of the 500 it was. A few articles later I decided I'd better flag up this problem here. I don't have AWB on the computer I'm using this evening, or I'd test it out by putting ] into a sandbox. — Hebrides (talk) 21:12, 10 August 2012 (UTC)
- What article? I tested on List of numerical libraries, and it was fine. Regards, Sun Creator 15:31, 10 August 2012 (UTC)
rev 8253 Exception for C# code etc. in genfixes function FixLinkWhitespace. Rjwilmsi 06:36, 13 August 2012 (UTC)
- Thanks, Rjwilmsi, but you seem to have included only C# and F# in your exception. Probably worth catering for A# and J# too. Cheers — Hebrides (talk) 11:57, 13 August 2012 (UTC)
- rev 8256. Rjwilmsi 13:03, 13 August 2012 (UTC)
- Good. Thanks. — Hebrides (talk) 13:12, 13 August 2012 (UTC)
- So this is genfixes related and not directly about typos. Regards, Sun Creator 13:33, 13 August 2012 (UTC)
- I just built and tested rev 8258 and confirmed this problem is now fixed. Thanks Rjwilmsi. — Hebrides (talk) 06:17, 14 August 2012 (UTC)
- So this is genfixes related and not directly about typos. Regards, Sun Creator 13:33, 13 August 2012 (UTC)
- Good. Thanks. — Hebrides (talk) 13:12, 13 August 2012 (UTC)
- rev 8256. Rjwilmsi 13:03, 13 August 2012 (UTC)
Womens and Mens
Why is Womens always converted to Women's with the "-men's" rule but Mens is not? I don't understand the rule or maybe the exceptions. Regards, Sun Creator 12:06, 10 August 2012 (UTC)
- 'Womens' does not yield as many false positives as 'Mens', which will hit phrases such as "mens rea" and "Mens sana in corpore sano". Chris the speller 13:23, 10 August 2012 (UTC)
- O yes. It'd obvious when you point that out! Regards, Sun Creator 15:45, 10 August 2012 (UTC)
- According to Apostrophe#Possessives in names of organizations (version of 20:17, 31 July 2012), "ometimes the apostrophe is omitted in the names of clubs, societies, and other organizations, even though the standard principles seem to require it".
- —Wavelength (talk) 14:43, 10 August 2012 (UTC)
- Not compelling. The only 'womens' exception is an organisation without any mention on Misplaced Pages except the Apostrophe page. Regards, Sun Creator 15:45, 10 August 2012 (UTC)
"long time" hyphenation
An uncertain suggestion for discussion: is it possible or wise to establish a rule hyphenating "long time" before (and only before) a noun? I've been manually cleaning up some by searching phrasing like "his long time" or "her long time", but this won't catch phrases like "Jane Jones, a long-time opponent of birth control," etc. On the other hand, a rule of "long time " to "long-time " would create some false positives from "a long time period" or a "a long time capsule". Khazar2 (talk) 20:41, 11 August 2012 (UTC)
- As an update to this, I've now corrected several hundred instances of "long time friend" to "long-time friend" with AWB. If it's not possible to make a more general rule about this, perhaps one could be crafted simply by looking for common phrases like "long time friend", "rival", "boyfriend", etc. Khazar2 (talk) 23:59, 12 August 2012 (UTC)
- What should not be overlooked is that most dictionaries indicate that "longtime" should be closed, not hyphenated. If you prefer the hyphenated form (allowed in some dictionaries), the most proper way to fix these is to make two passes: 1) Skipping pages that contain "longtime", changing "long time" to "long-time"; 2) Skipping pages that do not contain "longtime", changing "long time" to "longtime". This way the changes will conform to the style of each article. My preference is "longtime", but Macmillan (usually the best reference on hyphenation) and Cambridge specify "long-time", so I won't change that to the closed form. Chris the speller 14:57, 13 August 2012 (UTC)
- Thanks, Chris. I'll follow your suggestion. Khazar2 (talk) 15:03, 13 August 2012 (UTC)
-ound- rule
The "-ound-" rule now no longer matches further endings yet still has a $2, what is the rule now supposed to be doing? Rjwilmsi 06:20, 13 August 2012 (UTC)
- Oops, I've removed the $2, it's not needed and was tested without it. The words ending(if there is one) is left the same as this rules deals with the earlier "uond" part so now both "Gruond"=>"Ground" and "Suondproof"=>"Soundproof" work. Regards, Sun Creator 09:40, 13 August 2012 (UTC)
- Though now it won't meet the convention that typo rules match at least a whole word, so that the edit summary shows entire words? Rjwilmsi 21:39, 13 August 2012 (UTC)
- Wasn't aware of any such convention. Don't see that written anywhere, but I'll go adjust it to give a pretty edit summary. Regards, Sun Creator 21:58, 13 August 2012 (UTC)
- The edit summary now shows the middle and end of word. It is convertion to show the word in the edit summary in full? This rule doesn't look like it's ever shown the word in full. It's possible to do that of course, but it's a few more cycles to do it that way. Regards, Sun Creator 22:19, 13 August 2012 (UTC)
- Wasn't aware of any such convention. Don't see that written anywhere, but I'll go adjust it to give a pretty edit summary. Regards, Sun Creator 21:58, 13 August 2012 (UTC)
- Though now it won't meet the convention that typo rules match at least a whole word, so that the edit summary shows entire words? Rjwilmsi 21:39, 13 August 2012 (UTC)
Extra rules with false positives
What do we do with rules that naturally have lots of false positives but are still useful when used with care. I have some in my find and replace. Do we want to throw them in the standard rules? Properly not, but shall we have a seperate list for anyone who wants additional find and replaces? Regards, Sun Creator 14:00, 13 August 2012 (UTC)
- Good idea. The separate list should allow plenty of room for a description of what to watch out for. Chris the speller 14:42, 13 August 2012 (UTC)
- I agree--good idea. Khazar2 (talk) 14:51, 13 August 2012 (UTC)
- Have made a strt at Misplaced Pages:AutoWikiBrowser/Typos/Extra. Feel free to change the formating, I have no real idea what is best layout for this. Regards, Sun Creator 16:33, 13 August 2012 (UTC)
- I agree--good idea. Khazar2 (talk) 14:51, 13 August 2012 (UTC)
Superbowl -> Super Bowl_Super_Bowl-2012-08-18T12:12:00.000Z">
The American Super Bowl may well always be spelt this way but il Superbowl (the Italian equivalent) is not and this has now twice been corrected on this page and perhaps on others. Please could this error be corrected? mgSH 12:12, 18 August 2012 (UTC)_Super_Bowl"> _Super_Bowl">
- I found three pages where this has happened, and corrected them all and wrapped a "Not a typo" template around them. This should prevent both AWB users and manual editors from changing them. This is the best way to handle such a rare occurrence, rather than monkeying with AWB. Chris the speller 15:09, 18 August 2012 (UTC)
- Ah, thanks; I didn't know this was possible. mgSH 18:18, 18 August 2012 (UTC)
Edit request on 23 August 2012
This edit request has been answered. Set the |answered= or |ans= parameter to no to reactivate your request. |
Request to add fix of spelling error jewellary into regex. Despite the article redirect, the error seems to persist: http://en.wikipedia.org/search/?search=jewellary&fulltext=Search Chrishelenius (talk) 18:14, 23 August 2012 (UTC)
- Done. I don't see many occurrences, but expanding the "Jewellery" rule slightly should incur very little additional cost. Chris the speller 20:38, 23 August 2012 (UTC)
Suggestion regarding "New" additions
Some of the "New" additions have been there for a very long time and some even duplicate typo fixes found further below. What is the procedure if any for moving them down? How long do we leave them there before they are no longer new?
Also, some, such as some names seem unnecessary and relatively low impact. Some such as Sam Elliot would probably be better IMO if we just took a few at a time and ran them as tasks, removed them from the list and add them to a subpage showing they were there and what we did about them. Kumioko (talk) 01:13, 28 August 2012 (UTC)
- One thing that might help editors to answer those questions is a mechanism for recording, for each listed item, the date and time of its addition to the list, the date and time of its removal from the list, and the number of true-positive corrections made because of its presence on the list.
- —Wavelength (talk) 01:22, 28 August 2012 (UTC)
- Each edit is recorded in the wiki, so you can find out when something is added or deleted. But how would you suggest capturing the number of "true-positive corrections"? GoingBatty (talk) 03:57, 28 August 2012 (UTC)
- The revision history does show many of the details that I mentioned, but some searching is required if one wishes to find the date and time of the addition or removal of a particular item. I had in mind a separate list for compiling additions and removals, which now I suggest can be a sortable wikitable with columns for "item", "date and time of addition", and "date and time of removal".
- The AutoWikiBrowser might record the number of revisions (supposed "corrections") that it makes for each item listed at Misplaced Pages:AutoWikiBrowser/Typos. Those numbers might be compiled in one place, possibly in a fourth column in the previously mentioned sortable wikitable. Human editors who revert "false-positive" corrections might record corresponding numbers in a fifth column there. Human editors might also record, in a sixth column, the difference between the numbers in columns 4 and 5. Human editors might also record, in a seventh column, the value of each number in column 5 as a percentage of the corresponding value in column 4. Spreadsheets might help with the calculations.
- —Wavelength (talk) 20:47, 28 August 2012 (UTC)
What is the scope of a typo rule
What is the default scope of a typo rule in AWB, I mean does it search in: interlanguage links, inside <--- commented out text -->, does it search inside <source=code> here</source>, <ref>references</ref> and "quoted text"? Some rules don't apply in some case for example some consider grammar should not be done in quotes but spelling typos can be. Perhaps an option can be added to each rule to define it's scope. Regards, Sun Creator 13:46, 29 August 2012 (UTC)
- I believe that the typo rules in general skip the following things: Comments, templates, and the area next to sic templates. I'm not sure about Source code or other HTML tags. Kumioko (talk) 14:46, 29 August 2012 (UTC)
"full-time" and "part-time" false positives
I've been playing with searches for "full-time" and "part-time", and these rules seem to generate an unfortunate number of false positives--or perhaps a better way to put it would be unnecessary positives. My understanding is that the phrase "full-time work" must always be hyphenated, but "work full time" may or may not be. Quick searches of the LA Times and NYT show that their style guides allow both usages, so the hyphenated/non-hypenated appears to be a null issue. Would it be possible to reset this rule to only cases where the words "full time" or "part time" precede the noun? Khazar2 (talk) 14:51, 30 August 2012 (UTC)
- Macmillan Dictionary (which I have found to be very specific and very dependable on hyphenation issues) lists the adjective "full-time" with the notation "usually before noun" – "It is hard to combine study with a full-time job." And it lists the adverb "full-time" – "Her youngest child is in daycare full-time." Is there a case where a sentence is better because "full time" is unhyphenated? I can't think of a case where the hyphen could confuse a reader, and it sure is going to make the fixing of the adjective more difficult if the Typo rule has to list all possible nouns that could possibly follow "full-time", or adjective-noun phrases, such as "a full-time, permanent job". WP:HYPHEN says "Consult a good dictionary", but not "Consult a big newspaper". The punctuation in most Misplaced Pages articles stinks; how is it ever going to get better if more obstacles are placed in front of editors and tools are taken away? Chris the speller 02:34, 31 August 2012 (UTC)
- I share your concern for Misplaced Pages spelling and punctuation, of course. But I'm also wary of setting AWB to auto-correct things that appear to be legitimate variation, and this rule generates a tremendous number of neutral edits. An equal case could be made that by having tens of thousands of valid sentences like "he worked full time" flagged for review and correction is itself an obstacle, due to slowdown it creates in other work. (And it does seem to me that newspaper style guides can be considered at least a legitimate variant here; at the very least, if the New York Times is also employing it, this is not a usage that's begging for correction.)
- I'm a big fan of your work generally, though, so having said my piece, I'm happy to yield to your judgement if no one else objects. Cheers, and thanks for all your work, Khazar2 (talk) 03:11, 31 August 2012 (UTC)
- I prefer the exclusive use of the hyphenated form for the technical reasons explained by Chris the speller, and technical reasons have been invoked at WT:MOS and WP:MOS. To forestall complaints by subsequent editors, the edit summary can mention "technical reasons". Also, I recommend that this be discussed at WT:MOS, but please wait until User:Noetica is again available.
- —Wavelength (talk) 03:40, 31 August 2012 (UTC)