Revision as of 22:25, 21 November 2010 editCrispy1989 (talk | contribs)434 editsNo edit summary← Previous edit | Revision as of 23:48, 21 November 2010 edit undoCrispy1989 (talk | contribs)434 edits Updated statistics to reflect rate calculated from unbiased, human-reviewed dataset.Next edit → | ||
Line 18: | Line 18: | ||
The exact statistics change and improve frequently as we update the bot. Currently: | The exact statistics change and improve frequently as we update the bot. Currently: | ||
* Selecting a threshold to optimize total accuracy, the bot correctly classifies over 90% of edits. | * Selecting a threshold to optimize total accuracy, the bot correctly classifies over 90% of edits. | ||
* Selecting a threshold to hold false positives at a maximal rate of 0.25%, the bot catches approximately |
* Selecting a threshold to hold false positives at a maximal rate of 0.25%, the bot catches approximately 55% of all vandalism. | ||
== Development News/Status == | == Development News/Status == |
Revision as of 23:48, 21 November 2010
Team
- Christopher Breneman — Crispy1989 (talk · contribs) — wrote and maintains the core engine and core configuration.
- Cobi Carter — Cobi (talk · contribs) — wrote and maintains the Misplaced Pages interface code and dataset review interface.
- Tim — Tim1357 (talk · contribs) — wrote the original dataset downloader code and scripts to generate portions of the original dataset.
Questions, comments, contributions, and suggestions regarding:
- the core engine, algorithms, and configuration should be directed to Crispy1989 (talk · contribs).
- the bot's interface to Misplaced Pages and dataset review interface should be directed to Cobi (talk · contribs).
- the bot's original dataset should be directed to Tim1357 (talk · contribs).
Dataset Review Interface
For the bot to be effective, the dataset needs to be expanded. Our current dataset has some degree of bias, as well as some inaccuracies. We need volunteers to help review edits and classify them as either vandalism or constructive. We hope to eventually completely replace our current dataset with a random sampling of edits, reviewed and classified by volunteers. A list of current contributors, more thorough instructions on how to use the interface, and the interface itself, are at the dataset review interface.
Statistics
As Cluebot-NG requires a dataset to function, the dataset can also be used to give fairly accurate statistics on its accuracy and operation. Different parts of the dataset are used for training and trialing, so these statistics are not biased.
The exact statistics change and improve frequently as we update the bot. Currently:
- Selecting a threshold to optimize total accuracy, the bot correctly classifies over 90% of edits.
- Selecting a threshold to hold false positives at a maximal rate of 0.25%, the bot catches approximately 55% of all vandalism.
Development News/Status
Core Engine
- Current version is working well.
- Currently writing a dedicated wiki markup parser for more accurate markup-context-specific metrics. (No existing alternative parsers are complete or fast enough)
Dataset Review Interface
- Code to import edits into database is finished.
- Currently changing logic that determines the end result for an edit.
Dataset Status
- We found that the Python dataset downloader we used to generate the training dataset does not generate data that is identical to the live downloader. It's possible that this is greatly reducing the effectiveness of the live bot. We're working on writing shared code for live downloading and dataset generation so we can regenerate the dataset.
- This has been fixed and the bot retrained. It's now working much better.
- Currently getting more data from the review interface.
Languages
- C / C++ — The core is written in C/C++ from scratch.
- PHP — The bot shell (Misplaced Pages interface) is written in PHP, and shares some code with the original ClueBot.
- Java — The dataset review interface is written in Java using the Google App framework.
- Bash — A few scripts to make it easier to train and maintain the bot are Bash scripts.
- Python — Some of the original dataset management and downloader tools were written in Python.
Information About False Positives
False positives with Cluebot-NG are (essentially) inevitable. For it to be effective at catching a great deal of vandalism, a few constructive (or at least, well-intentioned) edits are caught. There are very few false positives, but they do happen. About two out of every thousand edits reviewed are misclassified as vandalism. If one of your edits is incorrectly identified as vandalism, simply redo your edit, remove the warning from your talk page, and if you wish, report the false positive. Cluebot-NG is not sentient - it is an automated robot, and if it incorrectly reverts your edit, it does not mean that your edit is bad, or even substandard - it's just a random error in the bot's classification, just like email spam filters sometimes incorrectly classify edits as spam.
The reason false positives are necessary is due to how the bot works. It uses a complex internal algorithm called an Artificial Neural Network that generates a probability that a given edit is vandalism. The probability is usually pretty close, but can sometimes be significantly different from what it should be. Whether or not an edit is classified as vandalism is determined by applying a threshold to this probability. The higher the threshold, the fewer false positives, but also the fewer vandalism caught. A threshold is selected by assuming a fixed false positive rate (percentage of constructive edits incorrectly classified as vandalism), which is currently set at 0.25%, and optimizing the amount of vandalism caught based on that. This means that there will always be some false positives, and it will always be at around 0.25% of constructive edits.
When false positives occur, they may not be poor quality edits, and there may not even be an apparent reason. If you report the false positive, the bot maintainers will examine it, try to determine why the error occurred, and if possible, improve the bot's accuracy for future similar edits. While it will not prevent false positives, it may help to reduce the number of good-quality edits that are false positives. Also, if the bot's accuracy improves so much that the false positive rate can be reduced without a significant drop in vandalism catch rate, we may be able to reduce the overall number of false positives.
If you want to help significantly improve the bot's accuracy, you can make a difference by contributing to the review interface. This should help us more accurately determine a threshold, catch more vandalism, and eventually, reduce false positives.