Revision as of 12:52, 4 November 2021 editRich Smith (talk | contribs)Account creators, Extended confirmed users, Page movers, IP block exemptions, New page reviewers, Pending changes reviewers, Rollbackers32,373 editsm Reverted edits by Officialanshumantiwari (talk) to last version by MediaWiki message deliveryTag: Rollback← Previous edit | Revision as of 03:54, 5 November 2021 edit undoNaomiAmethyst (talk | contribs)Edit filter managers, Extended confirmed users, Rollbackers, Template editors6,269 edits →ClueBot NG on SqWiki: Reply.Next edit → | ||
Line 263: | Line 263: | ||
::Yeah, I understand that because I read how it worked. I was thinking to maybe keep it on a kind of a "simulation" mode while it learned (maybe just don't give it the bot flag yet?) and later unleash it in full power. - ] (]) 11:32, 1 November 2021 (UTC) | ::Yeah, I understand that because I read how it worked. I was thinking to maybe keep it on a kind of a "simulation" mode while it learned (maybe just don't give it the bot flag yet?) and later unleash it in full power. - ] (]) 11:32, 1 November 2021 (UTC) | ||
:::I don't think it quite works like that, the bot flag is irrelevant. {{Ping|Cobi}} could maybe assist as well? - ]<sup>]|]|]</sup> 11:33, 1 November 2021 (UTC) | :::I don't think it quite works like that, the bot flag is irrelevant. {{Ping|Cobi}} could maybe assist as well? - ]<sup>]|]|]</sup> 11:33, 1 November 2021 (UTC) | ||
::::At the very least, the bot needs a several tens of thousands of randomly sampled main-space edits categorized as good or bad to even have a chance of being reasonably accurate, but ideally more. I also do not speak Albanian, so I couldn't reasonably offer support for false positives or anything like that. The bot itself is open source, and most of the tooling should be in the repo. | |||
::::It seems that {{u|DamianZaremba}}'s been reworking some of the training tooling, but the original training tooling is mostly . It's a bit of a mess since it is mostly a snapshot of some of our working directories back when we were originally training the bot. The basic idea was there was a MySQL database called EditDB, and it had a table called . | |||
::::Tools like took data in on stdin in the form of "123456 V" or "234567 C" to mark revid 123456 as vandalism and revid 234567 as constructive. Tools like would then emit XML suitable for training the bot's core from the edits in the EditDB. Tools like were built to find other ways of generating classifications like by checking if someone reverted real-world edits. This was not as effective as the smaller (but still large) hand-curated datasets. | |||
::::Finally, after using generateXML.php to generate train.xml, trial.xml, and bayestrain.xml in the editsets directory (we used limit clauses to split the files, with 0-16000 in bayestrain.xml, 16000-60000 in train.xml, and the rest in trial.xml), we then ran to train the bot and then get metrics on the efficacy of the bot. There are also tools like which attempts to explore reasonable ANN parameters which are stored in and what we believe to be reasonable values for training datasets between 50,000 and 100,000 edits. | |||
::::If any of that made some sort of sense, you may wish to give it a go. If not, maybe find a bot dev on SqWiki that has time and desire to curate and run a SqWiki version? -- ]<sup>(]|]|])</sup> 03:54, 5 November 2021 (UTC) | |||
== ] == | == ] == |
Revision as of 03:54, 5 November 2021
This user talk page is watched by friendly talk page watchers which means that someone other than me might reply to your query. Their input is welcome and their help with messages that I cannot reply to quickly, or to facilitate communication when it’s faltering, is appreciated. |
|
Archives (Index) |
|
This page is archived by ClueBot III. |
Tech News: 2021-43
Latest tech news from the Wikimedia technical community. Please tell other users about these changes. Not all changes will affect you. Translations are available.
Recent changes
- The Coolest Tool Award 2021 is looking for nominations. You can recommend tools until 27 October.
Changes later this week
- The new version of MediaWiki will be on test wikis and MediaWiki.org from 26 October. It will be on non-Misplaced Pages wikis and some Wikipedias from 27 October. It will be on all wikis from 28 October (calendar).
Future changes
- Diff pages will have an improved copy and pasting experience. The changes will allow the text in the diff for before and after to be treated as separate columns and will remove any unwanted syntax.
- The version of the Liberation fonts used in SVG files will be upgraded. Only new thumbnails will be affected. Liberation Sans Narrow will not change.
Meetings
- You can join a meeting about the Community Wishlist Survey. News about the disambiguation and the real-time preview wishes will be shown. The event will take place on Wednesday, 27 October at 14:30 UTC. See how to join.
Tech news prepared by Tech News writers and posted by bot • Contribute • Translate • Get help • Give feedback • Subscribe or unsubscribe.
20:07, 25 October 2021 (UTC)
GWB.
Hi Rich! On the GWB page I do not see the changes in the box on the right side of the page. It is the box that gives a brief description of the bridge. It still says October 24th. How do I make Those changes? Thank you! — Preceding unsigned comment added by Maincable (talk • contribs) 23:49, 25 October 2021 (UTC)
- @Maincable: That would be as part of the 'Infobox bridge' template, the bits you are looking for are begin= and open=. Also, remember to sign your messages on talk pages using 4 ~ signs - Rich 14:34, 26 October 2021 (UTC)
Articles you might like to edit, from SuggestBot
Note: All columns in this table are sortable, allowing you to rearrange the table so the articles most interesting to you are shown at the top. All images have mouse-over popups with more information. For more information about the columns and categories, please consult the documentation and please get in touch on SuggestBot's talk page with any questions you might have.
SuggestBot picks articles in a number of ways based on other articles you've edited, including straight text similarity, following wikilinks, and matching your editing patterns against those of other Wikipedians. It tries to recommend only articles that other Wikipedians have marked as needing work. We appreciate that you have signed up to receive suggestions regularly; your contributions make Misplaced Pages better — thanks for helping!
If you have feedback on how to make SuggestBot better, please let us know on SuggestBot's talk page. -- SuggestBot (talk) 12:53, 28 October 2021 (UTC)
The Signpost: 31 October 2021
- From the editor: Different stories, same place
- News and notes: The sockpuppet who ran for adminship and almost succeeded
- Discussion report: Editors brainstorm and propose changes to the Requests for adminship process
- Recent research: Welcome messages fail to improve newbie retention
- Community view: Reflections on the Chinese Misplaced Pages
- Traffic report: James Bond and the Giant Squid Game
- Technology report: Wikimedia Toolhub, winners of the Coolest Tool Award, and more
- Serendipity: How Misplaced Pages helped create a Serbian stamp
- Book review: Misplaced Pages and the Representation of Reality
- WikiProject report: Redirection
- Humour: A very Wiki crossword
ClueBot NG on SqWiki
Hey Rich!
I'm a crat from SqWiki. These days I was shown ClueBot NG from a user when I asked him advice in fighting vandalism. Would it be possible to make ClueBot NG work in wikis other than EnWiki? We (and I believe a lot of other wikis as well) would be really grateful to benefit from it if it was possible. - Klein Muçi (talk) 00:49, 1 November 2021 (UTC)
- @Klein Muçi: it can, however I needs a lot of training data. Pinging @DamianZaremba: to see if he can provide more input to what is required - Rich 07:06, 1 November 2021 (UTC)
- Yeah, I understand that because I read how it worked. I was thinking to maybe keep it on a kind of a "simulation" mode while it learned (maybe just don't give it the bot flag yet?) and later unleash it in full power. - Klein Muçi (talk) 11:32, 1 November 2021 (UTC)
- I don't think it quite works like that, the bot flag is irrelevant. @Cobi: could maybe assist as well? - Rich 11:33, 1 November 2021 (UTC)
- At the very least, the bot needs a several tens of thousands of randomly sampled main-space edits categorized as good or bad to even have a chance of being reasonably accurate, but ideally more. I also do not speak Albanian, so I couldn't reasonably offer support for false positives or anything like that. The bot itself is open source, and most of the tooling should be in the repo.
- It seems that DamianZaremba's been reworking some of the training tooling, but the original training tooling is mostly here. It's a bit of a mess since it is mostly a snapshot of some of our working directories back when we were originally training the bot. The basic idea was there was a MySQL database called EditDB, and it had a table called editset.
- Tools like editClassificationToEditDB.php took data in on stdin in the form of "123456 V" or "234567 C" to mark revid 123456 as vandalism and revid 234567 as constructive. Tools like generateXML.php would then emit XML suitable for training the bot's core from the edits in the EditDB. Tools like autodatasetgen.go were built to find other ways of generating classifications like by checking if someone reverted real-world edits. This was not as effective as the smaller (but still large) hand-curated datasets.
- Finally, after using generateXML.php to generate train.xml, trial.xml, and bayestrain.xml in the editsets directory (we used limit clauses to split the files, with 0-16000 in bayestrain.xml, 16000-60000 in train.xml, and the rest in trial.xml), we then ran trainandtrial.sh to train the bot and then get metrics on the efficacy of the bot. There are also tools like autotraintrial.php which attempts to explore reasonable ANN parameters which are stored in localtoolconfig and what we believe to be reasonable values for training datasets between 50,000 and 100,000 edits.
- If any of that made some sort of sense, you may wish to give it a go. If not, maybe find a bot dev on SqWiki that has time and desire to curate and run a SqWiki version? -- Cobi 03:54, 5 November 2021 (UTC)
- I don't think it quite works like that, the bot flag is irrelevant. @Cobi: could maybe assist as well? - Rich 11:33, 1 November 2021 (UTC)
- Yeah, I understand that because I read how it worked. I was thinking to maybe keep it on a kind of a "simulation" mode while it learned (maybe just don't give it the bot flag yet?) and later unleash it in full power. - Klein Muçi (talk) 11:32, 1 November 2021 (UTC)
Tech News: 2021-44
Latest tech news from the Wikimedia technical community. Please tell other users about these changes. Not all changes will affect you. Translations are available.
Recent changes
- There is a limit on the amount of emails a user can send each day. This limit is now global instead of per-wiki. This change is to prevent abuse.
Changes later this week
- The new version of MediaWiki will be on test wikis and MediaWiki.org from 2 November. It will be on non-Misplaced Pages wikis and some Wikipedias from 3 November. It will be on all wikis from 4 November (calendar).
Tech news prepared by Tech News writers and posted by bot • Contribute • Translate • Get help • Give feedback • Subscribe or unsubscribe.
20:27, 1 November 2021 (UTC)