Misplaced Pages

User:Mboverload/RegExTypoFix

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
< User:Mboverload

This is an old revision of this page, as edited by John (talk | contribs) at 23:03, 6 August 2006 (think consensus was reached). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Revision as of 23:03, 6 August 2006 by John (talk | contribs) (think consensus was reached)(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)
Developer(s)mboverload
Stable release0.1.20 (In AWB, no update} / 2006-07-26
TypeTypofix built-in to AutoWikiBrowser
LicenseGPL
Websitesourceforge.net/.../regextypofix
Shortcut
  • ]

RegExTypoFix (Regular Expression Typographical error Fixer, or RETF) is a set of over 1600 regular expressions used to automatically fix common typos and misspellings. It is built into AutoWikiBrowser. Anyone who can use AutoWikiBrowser can use RegExTypoFix. It is also easily ported into any application that supports regular expression strings.

The lofty goal of RETF is to be completely automatic. That is, 100% accuracy. Some day it may be built into programs that want basic spellchecking without user input.

RETF is completely manually compiled. By spellchecking articles/talk pages with Microsoft Word, and seeing which typos are the most common then adding those. I can add about 35 new words an hour. Example of how it works:
find="\b(D|d)issapoin(t|ts|ted|ting|tment|tments)\b" replacewith="$1isappoin$2" />

The only interaction needed is to review the change and hit save. You can do other stuff while it loads the page in the background.

Using RETF

RegExTypoFix is updated often, sometimes very often. AWB can't be updated every day, so I supply supplement setting files with my latest typos. Every time AWB is updated you just throw that XML setting file away until I release my next update. Using the latest version means the most number of fixed words and the least number of errors (that is, zero).

(1) Download supplemental updates

User:Mboverload/RegExTypoFix/updatebox

Once you have AutoWikiBrowser (requires permission to use) you can download the latest version of RegExTypoFix here

Extract Typos.xml from the zip file either using the zip extraction tool provided with XP or use 7zip.

Under the new system that Martin (User:Bluemoose) developed Typos.xml is separate from the settings file. This way it can be easily swapped out and doesn't interfere with your own person find and replace settings. Put it in the AWB directory. It is loaded automatically when you start AWB.

In addition, each Typos.xml file comes coded with a version number. Perhaps in some future release AWB will be able to check to see if there's an update for you!

(2) Monitor for new versions between AWB releases (highly recommended)

To ensure you have the latest version please choose whichever option you deem appropriate.

  1. Put ---NEW REGEXTYPOFIX VERSION--- on your watchlist
  2. Get an email alert every time a new official release comes out (requires a free Sourceforge.net account)
  3. BEST: Get the version I'm using that second with TortoiseSVN.
    1. Install TortoiseSVN
    2. Right click in an empty folder > SVN Checkout
    3. Type in https://svn.sourceforge.net/svnroot/regextypofix/, and hit ok.
    4. Right click in folder > SVN Update every time for the latest version and change notes.

(3) Get started

  1. Start AutoWikiBrowser
  2. More Options tab > Enable RegExTypoFix
  3. More Options tab > Skip article when no typo fixed
  4. Start tab > Summary box dropdown list > Select ]
  5. Find a misspelling you want to fix

Communicate

If you have fewer than 50 articles in your AWB queue you're not aiming high enough.
--mboverload@

Sign up for the spam list

A weekly newsletter describing that week's changes and other comments

Talk with other users on IRC

We share an IRC channel with AutoWikiBrowser at chat.freenode.net - #AutoWikiBrowser. If you don't have an IRC client I suggest mIRC.

Have problems/suggestions/a word you want to be included?

Lines that need to be removed

Things that need a tweaking/fixing

  • Under review Humurou(s|sly|sness) -> Humourou(s|sly|sness) correct spelling is "Humorous..." --Guinnog 10:50, 5 August 2006 (UTC)
  • I took it that we reached a cosensus there; at least I hope so, as I made the change in my local copy of the coding and enacted the edits! --Guinnog 23:03, 6 August 2006 (UTC)
  • What the heck is going on in this edit? See how between -> between (2)? It's been happening to me a bunch. alphaChimp 20:38, 6 August 2006 (UTC)
  • Yeah, here's another one --Guinnog 20:46, 6 August 2006 (UTC)
    It seems to me like an AWB error that is fixed in the new release (today). alphaChimp 21:16, 6 August 2006 (UTC)

Misspellings to be added

Please see RegExTypoFix/rejectedwords and the full list of fixed misspellings before you suggest a word. Thanks!

  • Holding - strech / streching / streched -> stretch- --Guinnog 14:08, 31 July 2006 (UTC)
  • weakday --> weekday (if you already have this I'm sorry) alphaChimp 06:25, 3 August 2006 (UTC)
  • pronounciation -> pronunciation. Former has lots of g-hits. Outriggr 23:43, 4 August 2006 (UTC)
  • Calvanism -> Calvinism. Do you do proper nouns? Outriggr 23:43, 4 August 2006 (UTC)
  • Publically -> publicly --Guinnog 10:54, 5 August 2006 (UTC)

Misspellings added because of user input

  • Added in 0.1.20 - repond and variants -> respond --Guinnog 14:23, 31 July 2006 (UTC)
  • Added in 0.1.10 - flourescent -> fluorescent --Guinnog 10:56, 31 July 2006 (UTC)
  • Added in 0.1.10 - milennium -> millennium --Guinnog 01:29, 31 July 2006 (UTC)
  • Added in 0.1.05 - flourine -> fluorine --Guinnog
  • Added in 0.1 - "noteable" to "notable" —Mets501 (talk) 21:13, 26 July 2006 (UTC)
Category: