User:ClueBot NG/Documentation

< User:ClueBot NG

This is an old revision of this page, as edited by 98.222.57.24 (talk) at 10:21, 14 November 2010 (Nobody is looking at the config, so people can ask individually for it. Also, dataset review interface instructions have been moved to the interface itself.). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Revision as of 10:21, 14 November 2010 by 98.222.57.24 (talk) (Nobody is looking at the config, so people can ask individually for it. Also, dataset review interface instructions have been moved to the interface itself.)(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

Team

Christopher Breneman — Crispy1989 (talk · contribs) — wrote and maintains the core engine and core configuration.
Cobi Carter — Cobi (talk · contribs) — wrote and maintains the Misplaced Pages interface code and dataset review interface.
Tim — Tim1357 (talk · contribs) — wrote some of the dataset generation code and maintains the training dataset.

Questions, comments, contributions, and suggestions regarding:

the core engine, algorithms, and configuration should be directed to Crispy1989 (talk · contribs).
the bot's operation, whitelists, and interface to Misplaced Pages should be directed to Cobi (talk · contribs).
the bot's dataset should be directed to Tim1357 (talk · contribs).

Languages

C / C++ — The core is written in C/C++ from scratch.
PHP — The bot shell (Misplaced Pages interface) is written in PHP, and shares some code with the original ClueBot.
Python — Some of the dataset management tools are written in Python.
Bash — A few scripts to make it easier to train and maintain the bot are Bash scripts.
Java — The dataset review interface is written in Java using the Google App framework.

Statistics

As Cluebot-NG requires a dataset to function, the dataset can also be used to give fairly accurate statistics on its accuracy and operation. Different parts of the dataset are used for training and trialing, so these statistics are not biased.

The exact statistics change and improve frequently as we update the bot. Currently:

Selecting a threshold to optimize total accuracy, the bot correctly classifies over 90% of edits.
Selecting a threshold to hold false positives at a maximal rate of 0.25%, the bot catches approximately 63% of all vandalism.