Misplaced Pages

talk:AutoWikiBrowser/Typos - Misplaced Pages

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
< Misplaced Pages talk:AutoWikiBrowser

This is an old revision of this page, as edited by Wavelength (talk | contribs) at 20:53, 13 April 2011 (Removing hyphens after -ly adverbs: commenting with many links). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Revision as of 20:53, 13 April 2011 by Wavelength (talk | contribs) (Removing hyphens after -ly adverbs: commenting with many links)(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff) AutoWikiBrowser 6.3.1.1
Shortcut
Archiving icon
Archives

/Archive 1
/Archive 2
/Archive 3



This page has archives. Sections older than 40 days may be automatically archived by Lowercase sigmabot III.

hda/had Error

Hello! Not sure if the regex can be adjusted to actually catch this, but "hda" should not correct to "had" if it's in a path, e.g. "/dev/hda1/foo/" :) ennasis @ 03:47, 18 Adar I 5771 / 22 February 2011 (UTC)

Thanks. Which rule triggered that change (found in the Typos tab on AWB)? There might be a way to identify the path in the Wikimarkup as not-English. (Also, additions to Talk pages are not minor edits. WP:MINOR. Editors who are ignoring minor edits won't see your new question.) -- JHunterJ (talk) 11:47, 22 February 2011 (UTC)
All such uses should probably go in code tags. Rjwilmsi 13:55, 23 February 2011 (UTC)
Ah. Must've forgot to uncheck the minor box last time. :)
How about this: "english" to "English" unless in a link, e.g., http://www.israelradio.org/english.html ? I've encountered this as well. ennasis @ 03:07, 30 Adar I 5771 / 6 March 2011 (UTC)
I did a search for "http://www.israelradio.org/english.html" and found 5 articles on Misplaced Pages. I processed them all through AWB, and AWB didn't try to change "english" to "English" on any of the articles. On what article did you encounter this issue? GoingBatty (talk) 03:21, 6 March 2011 (UTC)
That was just an example - I forget the actual link. I encountered this on Wikibooks the other day. Since they don't have any AWB pages, the RegEx is loaded from here. I'll try to rescan and find out exactly what the link was. ennasis @ 04:35, 30 Adar I 5771 / 6 March 2011 (UTC)
Found an example on Wikibooks, the page is here, and the "typo" it finds is
  • www.stimulus.virginia.gov > www.stimulus.Virginia.gov
Not sure if that helps at all, or if it can be avoided. But there's an example of what I mean. ennasis @ 06:41, 30 Adar I 5771 / 6 March 2011 (UTC)
 Done - updated the "Virginia" rule with this edit. GoingBatty (talk) 14:49, 6 March 2011 (UTC)

billionaire

AWB did not catch "bilionar" as misspelling of "billionaire" here. Please add a rule. --bender235 (talk) 18:06, 25 February 2011 (UTC)

It does not appear to be a common typo. -- JHunterJ (talk) 20:14, 25 February 2011 (UTC)
Still we could add the rule in case the typo occurs again. Misplaced Pages is an evolving resource. --bender235 (talk) 19:07, 26 February 2011 (UTC)
"To do: Remove rare words. Note that no matches today does not mean a rule is rare, since another user may have used the rule to fix many articles yesterday." Each rule consumes some resources, and the goal is not to have 100% of possible typos included at the expense of being unable to run the tool to fix any of them. -- JHunterJ (talk) 21:43, 26 February 2011 (UTC)
If a word is rare why not put it into a rolling set for each day of the month? That you can still run the tool but rare fixes will still be picked up. ϢereSpielChequers 09:57, 2 March 2011 (UTC)
You can do that yourself. The AWB ruleset I run is the basic typo rules listed here, in addition to my own set of more nuanced rules (some of which require more human discretion than is appropriate for AWB). You could do the same. Shadowjams (talk) 07:23, 17 March 2011 (UTC)

Spurious endash?

Hello!

This change from "-" to "–" is obviously incorrect, see WP:HYPHEN. I am not sure that it is a bug of AWB, not some hand-made rule of a particular user. I know little about AWB and therefore ask here to help fix the problem, via technical changes or maybe social interaction (I have some negative bias towards automated editors and experience some troubles communicating with them). Incnis Mrsi (talk) 16:49, 8 March 2011 (UTC)

The word "replaced" in the edit summary shows that this change is a "find and replace" rule set up by the user, not something built into AWB. But I think the change is correct. The edit summary refers to WP:ENDASH, and the change seems a correct example of point 5 there, since "World War" contains a space. -- John of Reading (talk) 18:10, 8 March 2011 (UTC)
Point 5? It looks bizarre to me and apparently contradicts to WP:HYPHEN, but it exists. Sorry for a false alarm. Incnis Mrsi (talk) 19:17, 8 March 2011 (UTC)

Is "du Pré" rule necessary

Is the "du Pré" rule really necessary? I stumbled across the rule when it wanted to make an incorrect change to blogger "Jacqueline Dupree" in Media in Washington, D.C. Thanks! GoingBatty (talk) 03:25, 9 March 2011 (UTC)

 Fixed with this edit. Kept the rule, but allowed the "Dupree" spelling. -- JHunterJ (talk) 12:19, 9 March 2011 (UTC)

Colege -> College_College-2011-03-12T23:44:00.000Z">

Alex (talk) 23:44, 12 March 2011 (UTC)_College"> _College">

 Done, along with Colegiate → Collegiate - GoingBatty (talk) 00:22, 13 March 2011 (UTC)

"an unusually long period" → "a unusually long period" ?

Resolved

RegexTypoFix wants to change "an unusually long period" → "a unusually long period" on Fisher Hall and Marcum Center (Miami University), based on the "A ..." rule. This doesn't seem like a correct change to me. Comments? GoingBatty (talk) 03:34, 13 March 2011 (UTC)

It wants to change "an usually" to "a usually". That seems a correct change, although it seems the text may have intended unusually there. -- JHunterJ (talk) 12:09, 13 March 2011 (UTC)
Ah, the text is "an usually"! Once I changed the text to "an unusually long period", then RegexTypoFix doesn't want to change it. Thanks! GoingBatty (talk) 15:17, 13 March 2011 (UTC)

"niger" matching even when part of scientific name

Although the "Niger(ia)" rule is set up to not match scientific names, it tries to change Chlidonias niger to Chlidonias Niger on articles such as List of birds of Oregon. Could someone please see if the rule can be updated? Thanks! GoingBatty (talk) 04:36, 13 March 2011 (UTC)

It did not attempt the change when I just tried it. -- JHunterJ (talk) 12:06, 13 March 2011 (UTC)
It still does for me - I'm using AWB SVN 7634. GoingBatty (talk) 15:10, 13 March 2011 (UTC)
I'm a few builds behind, SVN 7471. -- JHunterJ (talk) 17:07, 13 March 2011 (UTC)

"Maintenance" rule does not catch "maintanance"

I fixed about 2 dozen of these by supplying my own Find/Replace. I would like to fix the "Maintenance" rule, but I get dizzy when I look at that one for more than a few seconds. Chris the speller (talk) 17:09, 15 March 2011 (UTC)

 Done with this edit. -- JHunterJ (talk) 17:16, 15 March 2011 (UTC)

I tweaked a rule -- sorry about any confusion that may have followed

I tweaked the "(In)Significant" rule, then reverted it, not because I saw anything wrong with it, but because when I reloaded AWB to retest the whole thing live, I found that no RegEx fixes were working for me. It was as if "Enable RegEx TypoFix" was unchecked. So in a near panic, I reverted the rule change. Well, after the reversion, RegEx fixes are still not working for me. I had saved my AWB settings before shutting it down, and reloaded settings after launching it again, so it's not some setting that I forgot. My own Find/Replace rules work fine, as do General Fixes, so I can work, but it's like sweeping with a smaller broom. Any ideas would be appreciated, as would a report that other editors are successfully using RegEx Typo after reloading AWB. As for the rule tweak, maybe an experienced tweaker can look it over and give an opinion. Chris the speller (talk) 22:05, 17 March 2011 (UTC)

It's working for me. I restarted and tried processing A Rocha, "typos fixed: accomodation → accommodation" -- John of Reading (talk) 22:11, 17 March 2011 (UTC)
Thanks, that's very reassuring. I feel better that it's just me. Of course, it also makes me feel a little paranoid. Chris the speller (talk) 22:21, 17 March 2011 (UTC)
Well, now *some* of the RegEx rules seem to be working for me. I'm going to reboot the whole machine (mine, not WP!), what the heck. After that, I'll retest and evaluate whether the existing two rules for "significant" are actually working better than I thought. Chris the speller (talk) 22:58, 17 March 2011 (UTC)

Only a handful of RegEx Typos work for me now, and only intermittently. I know most of you have more fun things to do, but for a change of pace, if an editor wants to help, try the following:

  1. enable Find and Replace
  2. add a Find and Replace for 'reponsible' to 'responsible'
  3. enable RegEx Typos
  4. put 'User:Chris the speller/Sandbox2' in the page list
  5. Start.

It should fix 'reponsible' if Find and Replace is working. It should also fix about 11 other misspellings on 11 other lines if Regex Typos is working. Don't bother to save it, to allow retesting. When I do it, it doesn't catch the other 11 lines. I'd love to hear how other editors fare with this. Chris the speller (talk) 01:44, 18 March 2011 (UTC)

When I try your test, Find and Replace works, but AWB AVN 7634 doesn't find any typos. However, it does find typos on articles (see my contributions) and User:GoingBatty/Sandbox2. GoingBatty (talk) 03:17, 18 March 2011 (UTC)
I am using AWB SVN 7471, just downloaded last week. Am I already 163 versions behind? How often do I need to update it? Since yours also missed all the typos on my sandbox page, it seems that mine is not the only one misfiring. I have found a few typos, but they seem to come in spurts; it finds typos in 2 or 3 pages in a row, then misses them in dozens and dozens. Thanks for giving it a try. You don't sound too worried, but I have a bad feeling about this. Chris the speller (talk) 04:04, 18 March 2011 (UTC)
I download the latest AWB snapshot as soon as it's available, usually because it fixes a bug or includes a feature request I've submitted. Maybe you should report a bug? Good luck! GoingBatty (talk) 04:16, 18 March 2011 (UTC)
The typos in User:Chris the speller/Sandbox2 are in indented paragraphs. RegExpTypoFix skips those in case the indent is marking a quotation. Take out the colons, and the typos get fixed. (Using SVN 7471). -- John of Reading (talk) 08:08, 18 March 2011 (UTC)
Thanks, John, for providing the clear and simple answer to my problem. And thanks again, Batty, for taking the time to look into it. A humbling experience; I feel like the sorcerer's apprentice. Maybe I should change the name of this talk section to "The wrong way to set up a regression test for changes to AWB Typos" ;-)       Chris the speller (talk) 14:33, 18 March 2011 (UTC)
I added a note on the project page to indicate that typos are not checked in indented paragraphs. Thanks for the info, John! GoingBatty (talk) 01:58, 19 March 2011 (UTC)

-hsi

This seems a fairly common part of a name. Be nice not to try to change it to "-his". Rich Farmbrough, 00:14, 25 March 2011 (UTC).

 Done with this edit. GoingBatty (talk) 00:50, 25 March 2011 (UTC)

bilbliography > bibliography_bibliography-2011-03-29T04:10:00.000Z">

Fairly common typo: .

By the way, why have both Misplaced Pages:Lists_of_common_misspellings and Misplaced Pages:AutoWikiBrowser/Typos?

Thanks. 128.138.43.231 (talk) 04:10, 29 March 2011 (UTC)_bibliography"> _bibliography">

Yes, there were 10; I fixed 8, was headed off at the pass on 2 of them. I guess these may pop up at a rate of about 4 or 5 a month. I'm not sure what is a good cutoff number to qualify for adding a new typo rule.
"Lists of common misspellings" is for people, who are expected to show good judgment, brush off false positives and decide on changing "guerrila" to either "guerilla" or "guerrilla" based on the predominant spelling in the article, while "Typos" is for AWB and high speed. Even a fairly low rate of false positives makes for a pretty bumpy ride while using AWB. And "Typos" is tuned so that one rule can fix various suffixes and prefixes. Listing every possible variant in a separate rule would really bog it down, or maybe jam it good. Chris the speller  05:39, 29 March 2011 (UTC)
Thanks for your explanation. Also, I was confused by the "View (previous 20 | next 20) (20 | 50 | 100 | 250 | 500)" at the bottom of the search results. I assumed there were as many as 500 occurrences of that typo. Sorry. =) 128.138.43.231 (talk) 05:56, 29 March 2011 (UTC)

Capitalization of "earth"

An editor using the AWB has twice made edits to the article "Drummond Matthews" by which the text "the earth" has been capitalized to "the Earth". My contention is as follows: that "earth" should be capitalized only when it is used specifically as a name (e.g. Earth has only one moon, or Earth, Venus and Mercury are the three innermost planets). When used after "the" the word becomes a common noun analogous with "the world" or "the globe" and should not be capitalized. That's my understanding, anyway. Can the software be tweaked to enable it to make this distinction? Godingo (talk) 22:45, 31 March 2011 (UTC)

"the Earth's mantle", etc., are correct. See also the occurrences of "the Earth" on The Earth. -- JHunterJ (talk) 23:26, 31 March 2011 (UTC)
I've disabled the rule for now. -- John of Reading (talk) 06:06, 1 April 2011 (UTC)
Why? -- JHunterJ (talk) 10:52, 1 April 2011 (UTC)
Rightly or wrongly, that's my instinctive reaction when someone raises an objection to a "New addition" and not much discussion has happened. Please re-instate the rule if you are sure that it is correct. You might want to fix note 5 in The Earth as well. Reference 80 is someone else's title, so of course that should stay as a lowercase "e". -- John of Reading (talk) 11:06, 1 April 2011 (UTC)
I fixed Ref 80 as well; someone else's automated case fixing of someone else's title was incorrect. I believe the rule is correct, but I'll see if I'm alone in this. Thanks; I didn't realize it was in the "new additions" section. -- JHunterJ (talk) 12:58, 1 April 2011 (UTC)
I've now found the right section in the manual of style, and looked at the first twenty of so potential corrections found by a database scan for \bhe\s+earth's\b. I'm happy with this rule. -- John of Reading (talk) 13:29, 1 April 2011 (UTC)
Capitalizing "Earth" on Drummond Matthews looks good to me (and I capitalized one more instance inside a wikilink which AWB won't change), as the article specifically refers to our planet. I would have guessed that the only distinction was "Earth" meant the planet and "earth" meant dirt (which seems to coincide with the MOS), but dictionary.com has other examples of where lowercase "earth" is acceptable. GoingBatty (talk) 18:13, 1 April 2011 (UTC)

Cataloger or cataloguer

AWB corrects cataloger to cataloguer. Cataloger is common US spelling used by the Library of Congress among others. Please adjust your list. Thanks.Dankarl (talk) 13:17, 1 April 2011 (UTC)

 Fixed with this edit. -- JHunterJ (talk) 13:45, 1 April 2011 (UTC)

"Improv(e/ise)" rule goofs up "imprevious"

It changes "imprevious" to "improvious", which is a further step away from the correct "impervious". Anyone want to tweak it to prevent this strange twist? It's probably not worth getting it to actually fix this misspelling, as only Ramalinga Swamigal had an example of it, and it is now extinct in the wild. Chris the speller  05:15, 10 April 2011 (UTC)

Performance question

Which runs faster, "(M|m)" or "()", on the RegEx engine used by AWB? The former is shorter by one character, but if the other construction runs faster, that might be the way to go. The Typos list has the first format at the front of most rules, such as "\b(M|m)imic(ing|ed)\b". There was a discussion in October 2010 Misplaced Pages talk:AutoWikiBrowser/Typos/Archive 3#Profiling heads up for you guys that said explicit character classes ran faster than shorthand character classes ( faster than \w), but did not cover explicit character classes versus alternation. A web search of "regex performance character classes alternation" led me to High Performance JavaScript by Nicholas C. Zakas on Google Books, which advises against starting a RegEx with alternation. Would anyone be interested in testing this? Here's the real fun part: if it's worth changing, AWB would be a great way to change the Typo rules! Chris the speller  19:27, 11 April 2011 (UTC)

Removing hyphens after -ly adverbs

Today on WT:MOS, an editor requested a script for automatically removing hyphens after -ly adverbs in compound modifiers. I have put about 140 RegEx rules on User:Chris the speller/Adverbs, along with instructions on using an XML editor to splice them into an AWB settings file. This method misses very few standard -ly adverbs, but completely avoids problems with fly, July, Italy, family and the like. Chris the speller  23:35, 12 April 2011 (UTC)

To clarify: I'm not pushing this method for addition to the WP:AWB/Typos list, but this seemed to be the place to let a few daring editors know how they can load Find & Replace rules if they want to take this on. Since the list contains only known standard -ly adverbs, there are very few false-positive hits. The differences still need to be examined, and a hyphen removal does not exactly jump off the difference window. The main things to watch out for are changes to quotations, links (I don't usually have the "Ignore ..." boxes checked), and longer compounds (a slowly-but-surely strategy). If this is the wrong forum to bring this up, please move this discussion or ask me to move it. Chris the speller  14:31, 13 April 2011 (UTC)
The relevant guideline is at Misplaced Pages:Manual of Style#Hyphens, subsection 3, point 4 (shortcut WP:HYPHEN, permanent link here).
The current WT:MOS discussion is in the specific section Misplaced Pages talk:Manual of Style#"A hyphen is not used after a standard -ly adverb" and a requested exception for articles on New Zealand (permanent link here).
Some archived discussions about the guideline are the following.
For any software designed for automatically removing hyphens after ly adverbs in compound modifiers, I recommend that the software discriminate at least four different categories of instances.
  • instances where the hyphen is automatically removed
(including those with quotations from websites which omit the hyphen)
  • instances where the hyphen is automatically retained
(after fly, July, Italy, and family; in surnames such as Beverly-Smith; in French-language place names such as Romilly-sur-Seine; in English-language place names such as Ashly-on-Avon; in web addresses; in quotations from websites which also use a hyphen; in only-begotten)
  • instances where the hyphen is automatically retained but the suffix ly (or y in adverbs such as fully) is automatically removed
(http://www.onelook.com/?w=full-*&ls=a, http://www.onelook.com/?w=close-*&ls=a, http://www.onelook.com/?w=loose-*&ls=a, http://www.onelook.com/?w=tight-*&ls=a, http://www.onelook.com/?w=high-*&ls=a, http://www.onelook.com/?w=low-*&ls=a, http://www.onelook.com/?w=deep-*&ls=a, http://www.onelook.com/?w=new-*&ls=a, http://www.onelook.com/?w=old-*&ls=a, http://www.onelook.com/?w=narrow-*&ls=a, http://www.onelook.com/?w=wide-*&ls=a, http://www.onelook.com/?w=open-*&ls=a)
  • instances which the software flags for human inspection
(see Misplaced Pages talk:Manual of Style/Archive 106#Another kind of exception)
half-hourly, hourly, daily, nightly, weekly, fortnightly, semi-monthly, monthly, quarterly, yearly
early, kindly, likely, only
easterly, northerly, southerly, westerly
After my discussions with User:Noetica at User:Noetica/Archive4#Complications with ly and User:Noetica/Archive4#Specific cases with ly, I have been compiling a list of examples at User:Wavelength/About English (permanent link here), which I reproduce here as follows.
or with: minded, named, shaped, forked, sized, colo(u)red, tinted, rooted, charged, dated, banked, spaced
Is it possible to design software which can do all those things?
Wavelength (talk) 20:53, 13 April 2011 (UTC)