Misplaced Pages

User talk:Rich Smith: Difference between revisions

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
Browse history interactively← Previous editNext edit →Content deleted Content addedVisualWikitext
Revision as of 12:52, 4 November 2021 editRich Smith (talk | contribs)Account creators, Extended confirmed users, Page movers, IP block exemptions, New page reviewers, Pending changes reviewers, Rollbackers32,373 editsm Reverted edits by Officialanshumantiwari (talk) to last version by MediaWiki message deliveryTag: Rollback← Previous edit Revision as of 03:54, 5 November 2021 edit undoNaomiAmethyst (talk | contribs)Edit filter managers, Extended confirmed users, Rollbackers, Template editors6,269 edits ClueBot NG on SqWiki: Reply.Next edit →
Line 263: Line 263:
::Yeah, I understand that because I read how it worked. I was thinking to maybe keep it on a kind of a "simulation" mode while it learned (maybe just don't give it the bot flag yet?) and later unleash it in full power. - ] (]) 11:32, 1 November 2021 (UTC) ::Yeah, I understand that because I read how it worked. I was thinking to maybe keep it on a kind of a "simulation" mode while it learned (maybe just don't give it the bot flag yet?) and later unleash it in full power. - ] (]) 11:32, 1 November 2021 (UTC)
:::I don't think it quite works like that, the bot flag is irrelevant. {{Ping|Cobi}} could maybe assist as well? - ]<sup>]&#124;]&#124;]</sup> 11:33, 1 November 2021 (UTC) :::I don't think it quite works like that, the bot flag is irrelevant. {{Ping|Cobi}} could maybe assist as well? - ]<sup>]&#124;]&#124;]</sup> 11:33, 1 November 2021 (UTC)
::::At the very least, the bot needs a several tens of thousands of randomly sampled main-space edits categorized as good or bad to even have a chance of being reasonably accurate, but ideally more. I also do not speak Albanian, so I couldn't reasonably offer support for false positives or anything like that. The bot itself is open source, and most of the tooling should be in the repo.
::::It seems that {{u|DamianZaremba}}'s been reworking some of the training tooling, but the original training tooling is mostly . It's a bit of a mess since it is mostly a snapshot of some of our working directories back when we were originally training the bot. The basic idea was there was a MySQL database called EditDB, and it had a table called .
::::Tools like took data in on stdin in the form of "123456 V" or "234567 C" to mark revid 123456 as vandalism and revid 234567 as constructive. Tools like would then emit XML suitable for training the bot's core from the edits in the EditDB. Tools like were built to find other ways of generating classifications like by checking if someone reverted real-world edits. This was not as effective as the smaller (but still large) hand-curated datasets.
::::Finally, after using generateXML.php to generate train.xml, trial.xml, and bayestrain.xml in the editsets directory (we used limit clauses to split the files, with 0-16000 in bayestrain.xml, 16000-60000 in train.xml, and the rest in trial.xml), we then ran to train the bot and then get metrics on the efficacy of the bot. There are also tools like which attempts to explore reasonable ANN parameters which are stored in and what we believe to be reasonable values for training datasets between 50,000 and 100,000 edits.
::::If any of that made some sort of sense, you may wish to give it a go. If not, maybe find a bot dev on SqWiki that has time and desire to curate and run a SqWiki version? -- ]<sup>(]&#124;]&#124;])</sup> 03:54, 5 November 2021 (UTC)


== ] == == ] ==

Revision as of 03:54, 5 November 2021


Purge this page's server cache
This user talk page is watched by friendly talk page watchers which means that someone other than me might reply to your query. Their input is welcome and their help with messages that I cannot reply to quickly, or to facilitate communication when it’s faltering, is appreciated.
Stop Have I Declined Your Draft?If I have declined your draft, I do not normally review the same draft twice. If you think you have fixed the issues outlined in the decline message, then please resubmit the draft for another reviewer to look at. Do NOT leave me a message here.
Some cookies to welcome you to my Talk Page!!
It is around 6:25 AM where this user lives in Poole. (1735449927 in Unix Time)



Archives (Index)



This page is archived by ClueBot III.

Tech News: 2021-43

Latest tech news from the Wikimedia technical community. Please tell other users about these changes. Not all changes will affect you. Translations are available.

Recent changes

Changes later this week

  • Recurrent item The new version of MediaWiki will be on test wikis and MediaWiki.org from 26 October. It will be on non-Misplaced Pages wikis and some Wikipedias from 27 October. It will be on all wikis from 28 October (calendar).

Future changes

  • Diff pages will have an improved copy and pasting experience. The changes will allow the text in the diff for before and after to be treated as separate columns and will remove any unwanted syntax.
  • The version of the Liberation fonts used in SVG files will be upgraded. Only new thumbnails will be affected. Liberation Sans Narrow will not change.

Meetings

Tech news prepared by Tech News writers and posted by bot • Contribute • Translate • Get help • Give feedback • Subscribe or unsubscribe.

20:07, 25 October 2021 (UTC)

GWB.

Hi Rich! On the GWB page I do not see the changes in the box on the right side of the page. It is the box that gives a brief description of the bridge. It still says October 24th. How do I make Those changes? Thank you! — Preceding unsigned comment added by Maincable (talkcontribs) 23:49, 25 October 2021 (UTC)

@Maincable: That would be as part of the 'Infobox bridge' template, the bits you are looking for are begin= and open=. Also, remember to sign your messages on talk pages using 4 ~ signs - Rich 14:34, 26 October 2021 (UTC)

Articles you might like to edit, from SuggestBot

Note: All columns in this table are sortable, allowing you to rearrange the table so the articles most interesting to you are shown at the top. All images have mouse-over popups with more information. For more information about the columns and categories, please consult the documentation and please get in touch on SuggestBot's talk page with any questions you might have.

Views/Day Quality Title Tagged with…
6 Quality: High, Assessed class: NA, Predicted class: GA SlamTV! (talk) Add sources
64 Quality: High, Assessed class: C, Predicted class: GA Mad Man Pondo (talk) Add sources
18 Quality: Medium, Assessed class: Stub, Predicted class: C Sainik School, Gopalganj (talk) Add sources
6 Quality: Medium, Assessed class: NA, Predicted class: C Full Time Hobby (talk) Add sources
3,252 Quality: Medium, Assessed class: B, Predicted class: B Partition of India (talk) Add sources
20 Quality: Medium, Assessed class: Start, Predicted class: B Immaculate Heart Academy (talk) Add sources
525 Quality: Medium, Assessed class: Start, Predicted class: C Chapati (talk) Cleanup
21 Quality: High, Assessed class: Start, Predicted class: GA Charles E. Smith Jewish Day School (talk) Cleanup
35 Quality: Medium, Assessed class: Stub, Predicted class: C Auchi Polytechnic (talk) Cleanup
261 Quality: Medium, Assessed class: C, Predicted class: B Interoception (talk) Expand
42 Quality: Medium, Assessed class: Start, Predicted class: C Joint Polar Satellite System (talk) Expand
1,124 Quality: High, Assessed class: Start, Predicted class: GA Keith Lee (wrestler) (talk) Expand
4 Quality: Medium, Assessed class: Start, Predicted class: C Adam Neylon (talk) Unencyclopaedic
21 Quality: Low, Assessed class: Start, Predicted class: Start Garrison Forest School (talk) Unencyclopaedic
191 Quality: Medium, Assessed class: NA, Predicted class: C Parexel (talk) Unencyclopaedic
83 Quality: High, Assessed class: C, Predicted class: FA Explorers Program (talk) Merge
33 Quality: Low, Assessed class: NA, Predicted class: Start Tuwana (talk) Merge
24 Quality: Medium, Assessed class: C, Predicted class: B History and use of the single transferable vote (talk) Merge
73 Quality: Medium, Assessed class: Stub, Predicted class: C Zubaz (talk) Wikify
10 Quality: Medium, Assessed class: C, Predicted class: C Chiapas Highlands (talk) Wikify
150 Quality: High, Assessed class: C, Predicted class: GA Education in Ghana (talk) Wikify
2 Quality: Low, Assessed class: NA, Predicted class: Start Special Sensor Ultraviolet Limb Imager (talk) Orphan
33 Quality: Medium, Assessed class: B, Predicted class: B Eating disorders and memory (talk) Orphan
19 Quality: Medium, Assessed class: Start, Predicted class: C President T (talk) Orphan
2 Quality: Low, Assessed class: Stub, Predicted class: Start Kosmos 37 (talk) Stub
2 Quality: Low, Assessed class: Stub, Predicted class: Start Kosmos 104 (talk) Stub
4 Quality: Low, Assessed class: Stub, Predicted class: Start Assyriska Föreningen i Norrköping (talk) Stub
6 Quality: Low, Assessed class: Stub, Predicted class: Stub The Bride of Glomdal (talk) Stub
2 Quality: Low, Assessed class: Stub, Predicted class: Start Kosmos 15 (talk) Stub
2 Quality: Low, Assessed class: Stub, Predicted class: Start Kosmos 197 (talk) Stub

SuggestBot picks articles in a number of ways based on other articles you've edited, including straight text similarity, following wikilinks, and matching your editing patterns against those of other Wikipedians. It tries to recommend only articles that other Wikipedians have marked as needing work. We appreciate that you have signed up to receive suggestions regularly; your contributions make Misplaced Pages better — thanks for helping!

If you have feedback on how to make SuggestBot better, please let us know on SuggestBot's talk page. -- SuggestBot (talk) 12:53, 28 October 2021 (UTC)

The Signpost: 31 October 2021

* Read this Signpost in full * Single-page * Unsubscribe * MediaWiki message delivery (talk) 20:15, 31 October 2021 (UTC)

ClueBot NG on SqWiki

Hey Rich!

I'm a crat from SqWiki. These days I was shown ClueBot NG from a user when I asked him advice in fighting vandalism. Would it be possible to make ClueBot NG work in wikis other than EnWiki? We (and I believe a lot of other wikis as well) would be really grateful to benefit from it if it was possible. - Klein Muçi (talk) 00:49, 1 November 2021 (UTC)

@Klein Muçi: it can, however I needs a lot of training data. Pinging @DamianZaremba: to see if he can provide more input to what is required - Rich 07:06, 1 November 2021 (UTC)
Yeah, I understand that because I read how it worked. I was thinking to maybe keep it on a kind of a "simulation" mode while it learned (maybe just don't give it the bot flag yet?) and later unleash it in full power. - Klein Muçi (talk) 11:32, 1 November 2021 (UTC)
I don't think it quite works like that, the bot flag is irrelevant. @Cobi: could maybe assist as well? - Rich 11:33, 1 November 2021 (UTC)
At the very least, the bot needs a several tens of thousands of randomly sampled main-space edits categorized as good or bad to even have a chance of being reasonably accurate, but ideally more. I also do not speak Albanian, so I couldn't reasonably offer support for false positives or anything like that. The bot itself is open source, and most of the tooling should be in the repo.
It seems that DamianZaremba's been reworking some of the training tooling, but the original training tooling is mostly here. It's a bit of a mess since it is mostly a snapshot of some of our working directories back when we were originally training the bot. The basic idea was there was a MySQL database called EditDB, and it had a table called editset.
Tools like editClassificationToEditDB.php took data in on stdin in the form of "123456 V" or "234567 C" to mark revid 123456 as vandalism and revid 234567 as constructive. Tools like generateXML.php would then emit XML suitable for training the bot's core from the edits in the EditDB. Tools like autodatasetgen.go were built to find other ways of generating classifications like by checking if someone reverted real-world edits. This was not as effective as the smaller (but still large) hand-curated datasets.
Finally, after using generateXML.php to generate train.xml, trial.xml, and bayestrain.xml in the editsets directory (we used limit clauses to split the files, with 0-16000 in bayestrain.xml, 16000-60000 in train.xml, and the rest in trial.xml), we then ran trainandtrial.sh to train the bot and then get metrics on the efficacy of the bot. There are also tools like autotraintrial.php which attempts to explore reasonable ANN parameters which are stored in localtoolconfig and what we believe to be reasonable values for training datasets between 50,000 and 100,000 edits.
If any of that made some sort of sense, you may wish to give it a go. If not, maybe find a bot dev on SqWiki that has time and desire to curate and run a SqWiki version? -- Cobi 03:54, 5 November 2021 (UTC)

Tech News: 2021-44

Latest tech news from the Wikimedia technical community. Please tell other users about these changes. Not all changes will affect you. Translations are available.

Recent changes

  • There is a limit on the amount of emails a user can send each day. This limit is now global instead of per-wiki. This change is to prevent abuse.

Changes later this week

  • Recurrent item The new version of MediaWiki will be on test wikis and MediaWiki.org from 2 November. It will be on non-Misplaced Pages wikis and some Wikipedias from 3 November. It will be on all wikis from 4 November (calendar).

Tech news prepared by Tech News writers and posted by bot • Contribute • Translate • Get help • Give feedback • Subscribe or unsubscribe.

20:27, 1 November 2021 (UTC)