Revision as of 21:56, 8 June 2015 editDoc James (talk | contribs)Administrators312,288 edits →No updates to /rc lately← Previous edit | Revision as of 17:41, 21 June 2015 edit undoMann jess (talk | contribs)Extended confirmed users, Pending changes reviewers, Rollbackers14,672 edits →Eranbot, positive found in ref tag: fixNext edit → | ||
(3 intermediate revisions by the same user not shown) | |||
Line 256: | Line 256: | ||
Okay have split the nearly one million byte page into 7 subpages. Eran how to we get the "javascript" to work on those 7 subpages? ] (] · ] · ]) 21:55, 8 June 2015 (UTC) | Okay have split the nearly one million byte page into 7 subpages. Eran how to we get the "javascript" to work on those 7 subpages? ] (] · ] · ]) 21:55, 8 June 2015 (UTC) | ||
::By the way how often does the bot die and therefore how often does it need restarting? Might be good to have a few of us able to restart the bot. ] (] · ] · ]) 21:56, 8 June 2015 (UTC) | ::By the way how often does the bot die and therefore how often does it need restarting? Might be good to have a few of us able to restart the bot. ] (] · ] · ]) 21:56, 8 June 2015 (UTC) | ||
== Eranbot, positive found in ref tag == | |||
Hi. was recently tagged by Eranbot. I'm not too familiar with the bot, but it appears that it is supposed to skip checking citations, but it caught this addition of a quote inside a ref tag. If the bot is ''not'' intended to skip citations, it might be worth ignoring content added to the "quote" parameter in citation templates (for example: {{tq|<nowiki>{{cite web|url=...|quote=Ignore this}}</nowiki>}}) Such content is always going to be picked up by the bot, but probably should not be. I skimmed the source, and it seems to run a few regexes before submitting to Turnitin. It should be possible to remove the quote with something simple like... <span style="color:green">/quote=*/i</span> Let me know if I can be of any help. Thanks. — ]<span style="margin:0 7px;font-variant:small-caps;font-size:0.9em">· ]]</span> 17:39, 21 June 2015 (UTC) |
Revision as of 17:41, 21 June 2015
Autocompletion
Hi. I'm wondering how I would be able to test the script you suggested at the link above. I recently stumbled upon the gadget proposal page and saw your suggestion. I tried to import it into my personal js, but was unable to get it to work. Any ideas? Killiondude (talk) 08:31, 20 January 2012 (UTC)
- Hi, you should try to replace it with the following code:
- (see example in User:ערן/common.js).
- By the way, importScript is deprecated and mw.loader is the new convention. However, the old way should still work, but as this is an external script use importScriptURI instead of importScript. Eran (talk) 10:33, 20 January 2012 (UTC)
- Oh, thanks! I'll try it now. I've used importScriptURI before, I didn't think to use it on this occasion. As you can see, I'm not particularly apt at coding. :-) Killiondude (talk) 17:51, 20 January 2012 (UTC)
- Works great! Definitely useful. Thanks again! Killiondude (talk) 17:56, 20 January 2012 (UTC)
Invitation to events in June and July: bot, script, template, and Gadget makers wanted
I invite you to the yearly Berlin hackathon. It's 1-3 June and registration is now open. If you need financial assistance or help with visa or hotel, just mention it in the registration form.
This is the premier event for the MediaWiki and Wikimedia technical community. We'll be hacking, designing, and socialising, primarily talking about ResourceLoader and Gadgets (extending functionality with JavaScript), the switch to Lua for templates, Wikidata, and Wikimedia Labs.
Our goals for the event are to bring 100-150 people together, including lots of people who have not attended such events before. User scripts, gadgets, API use, Toolserver, Wikimedia Labs, mobile, structured data, templates -- if you are into any of these things, we want you to come!
I also thought you might want to know about other upcoming events where you can learn more about MediaWiki customization and development, how to best use the web API for bots, and various upcoming features and changes. We'd love to have power users, bot maintainers and writers, and template makers at these events so we can all learn from each other and chat about what needs doing.
Check out the the developers' days preceding Wikimania in July in Washington, DC and our other events.
Best wishes! - Sumana Harihareswara, Wikimedia Foundation's Volunteer Development Coordinator. Please reply on my talk page, here or at mediawiki.org. Sumana Harihareswara, Wikimedia Foundation Volunteer Development Coordinator 14:51, 2 April 2012 (UTC)
autocomplete.js needs protocol-relative URL
- Copied from User talk:ערן/autocomplete.js by me, the original poster, as recommended by Rjd0060 (talk · contribs)
The URL should be made protocol-relative, that is: mw.loader.load('//bits.wikimedia.org...
in order to avoid mixed content in the event that someone is using the secure server without HTTPS Everywhere. Your common.js appears already to contain the proper protocol-relative usage. --SoledadKabocha (talk) 21:17, 11 October 2012 (UTC)
- Done Eran (talk) 07:03, 12 October 2012 (UTC)
How would I configure autocomplete to work for other namespaces?
I use your Autocomplete script as a gadget on the All The Tropes wiki on the Orain wikifarm service and am quite pleased with it, but I was wondering how it could be configured to work in namespaces other than the main namespace, like our "Forum:" namespace?
Any help in fixing this would be appreciated. GethN7 (talk) 03:21, 20 May 2014 (UTC)
- Small update, it seems that it works fine in the namespace itself, but does not work on LQT when in the edit window. If you can provide a patch for this, it would be appreciated. GethN7 (talk) 04:44, 20 May 2014 (UTC)
- Hi GethN7, the script is designed mainly for editing articles, and less for "disscusion" pages, though it is very similar. I am not familiar with LQT so I can't say why doesn't it works there (maybe opensearch API isn't configurated to search in this namespace?) Eran (talk) 21:23, 23 May 2014 (UTC)
User:Eran/refToolbarVe.js
Hi, I've been trying to get refToolbarVe working as an example of making a dialog. (I realize it's been supplanted by in-house methods). I thought maybe it had name conflicts with in-house cite dialog but couldn't get it working on localhost after a refactor either; can you confirm that it is still working? Also any tips on how to debug "dialog just doesn't show up" problems would be helpful as I've had the same problem with ones I've tried to write from scratch as well... :). Thanks for all your work on the VE gadget tutorials! Mvolz (talk) 09:10, 29 June 2014 (UTC)
- Mvolz, thank you and I'm glad tutorials are helpful.
- I think there was a change in the VE API, and we didn't fix it for that specific gadget as there is already a built-in support. To fix it:
- change dialogFactory=>windowFactory
- change 'dialog' => 'window'.
- You can also take a look in User:ערן/veReplace.js which should work.
- Another possible problem with dialogs not opening correctly may be the wrong order of the decelerations of inheritance (it is important to put the inheritClass call before actual deceleration implementation). Eran (talk) 18:11, 29 June 2014 (UTC)
- Hi Eran,
- I removed that script from my common.js file just now, and a lot of problems with opening VisualEditor just cleared up, too. WhatamIdoing (talk) 22:52, 30 June 2014 (UTC)
- There was some API break in the VE (changes for method names) which we weren't aware too, and it seems to break the VE when it loads plugins that weren't updated according to these changes. I changed it now so it won't cause exceptions. Eran (talk) 04:29, 1 July 2014 (UTC)
Core ve changes that will affect veReplace.js
Just a heads up in case you didn't know, but there were some ve core changes that will affect veReplace (Notably, the class ve.ui.Dialog has been replaced- try FragmentDialog- and the dialog constructor will need to take an additional argument, manager). The changes are getting deployed this Thursday at 11am PST time on en-wiki: I don't know when on he-wiki! Mvolz (talk) 17:07, 21 July 2014 (UTC)
- Mvolz, thanks for the notice :) Eran (talk) 17:25, 21 July 2014 (UTC)
Misplaced Pages:Manual of Style/Words to watch/Config
can I help develop it?--Gabrielchihonglee (talk) 10:52, 7 August 2014 (UTC)
- Hi Gabrielchihonglee, I would like to have more people involved. Are you in the hackathon? I'm siting in the corner (near the coffee table ;) ). Eran (talk) 14:32, 7 August 2014 (UTC)
- I am sorry that I am in Hong Kong, I am just an online volunteer of Wikimedia2014. --Gabrielchihonglee (talk) 00:45, 8 August 2014 (UTC)
- Hi Gabrielchihonglee, I would really love to have the list of common words to avoid in other languages except English (such as Chinese), so it would possible to use this tool in other languages. Another important imporvment would be to expend the list with more words based on the manual of style. I already created a similar list for Hebrew (I'm not editing so much in English Misplaced Pages so I didn't add so many words in English).
- Once it is created in other language it is easy to be adopted in other languages with the following code:
- I am sorry that I am in Hong Kong, I am just an online volunteer of Wikimedia2014. --Gabrielchihonglee (talk) 00:45, 8 August 2014 (UTC)
/* Load clippy for VE editing */ mw.config.set('WEASLE_WORD_PAGE', 'PAGENAME'); //replace the PAGENAME with the name in the local language $.getScript('//en.wikipedia.org/search/?title=User:%D7%A2%D7%A8%D7%9F/WeaselWords.js&action=raw&ctype=text/javascript');
- I've a similar but a but a different tool for wikitext editing, that warns users on bad style sentences. Once such list would be available in other languages it would be possible to create more tools for improved user editing, not only the clippy ;) Thanks, Eran (talk) 19:29, 8 August 2014 (UTC)
- 1) I think I can help with translating the list of common words to avoid to zh-hant, zh-hans and zh-yue.
- 2) Can you talk a little bit more about creating other tools? Thanks!--Gabrielchihonglee (talk) 00:59, 9 August 2014 (UTC)
- great
- in hewiki there is a tool for various options to improve article which works in wikitext editing: it includes warning of usage of words to avoid and also h≈elps to locate them (and other suggestions such as replacing fair use images with free, and replacing disambig link to correct link etc). This tool, and other similar tools in other languages use hard coded "dictionary" - and extracting such dictionary out of such tools should make them work in other languages. I'll give here a link to these other tools later on. Eran (talk) 06:32, 9 August 2014 (UTC)
- Where is the list? And where should I place my translate?
- I will study it after I get the link.--Gabrielchihonglee (talk) 07:59, 9 August 2014 (UTC)
- The list can be created in in Wikipediia namespace, with some intro, than "-----", and than the list itself in the format of "*WORD TO WATCH//DESCRIPTION", e.g. similar to English list.
- Other similar tools are he:MediaWiki:Gadget-Checkty and ru:MediaWiki:Gadget-wfTypos - both works in wikitext. Both the Russian and the Hebrew gadget use autuomatic replacement for "safe" common typos, and the Hebrew one also use warnings/suggestions for the "words to watch" - which isn't safe or not possible to automaticlly fix. Eran (talk) 09:24, 9 August 2014 (UTC)
- I've a similar but a but a different tool for wikitext editing, that warns users on bad style sentences. Once such list would be available in other languages it would be possible to create more tools for improved user editing, not only the clippy ;) Thanks, Eran (talk) 19:29, 8 August 2014 (UTC)
- Just did the translation into Chinese. See Misplaced Pages:避免使用的字詞/配置.
- Are you planning to develop other tools? If yes, just tell me and I may help. :)--14.198.2.193 (talk) 13:06, 9 August 2014 (UTC)
- Awesome!! Did you write on this in Village pump so users would know it exist and can use it?
- Sure. I'm going to refactor the hewiki tool to be more generic and use this external list instead of hard coded list, and then users would be able to use this list of "words to watch" also in the classical editor.
- Many thanks, Eran (talk) 13:17, 9 August 2014 (UTC)
- I am sorry that I don't know he. Is there any new tools in en? --Gabrielchihonglee (talk) 14:18, 10 August 2014 (UTC)
A barnstar for you!
The Original Barnstar | |
For the great work you have done on the copy and paste detection bot. Doc James (talk · contribs · email) (if I write on your page reply on mine) 01:39, 22 August 2014 (UTC) |
BAGBot: Your bot request EranBot
Someone has marked Misplaced Pages:Bots/Requests for approval/EranBot as needing your input. Please visit that page to reply to the requests. Thanks! AnomieBOT⚡ 03:59, 22 August 2014 (UTC) To opt out of these notifications, place {{bots|optout=operatorassistanceneeded}} anywhere on this page.
EranBot
- moved from User talk:EranBot 12:02, 22 August 2014 (UTC)
Jmh649 is this your bot account? If not, who is the operator? Do you plan to seek bot approval? — xaosflux 01:07, 22 August 2014 (UTC)
- Yes plan to seek bot approval and no this is not my bot (i have no idea how to write bots). Just got back from Wikimania and still catching up. Doc James (talk · contribs · email) (if I write on your page reply on mine) 01:13, 22 August 2014 (UTC)
- This account says "There is NO plan for this bot to made edits to mainspace.", I've removed autopatrolled as there should be no impact to NPP. — xaosflux 02:00, 22 August 2014 (UTC)
- Hi Xaosflux, I requested the autopatrolled right to avoid CAPTCHA in edits, which prevents the bot from running automatically. Autopatrolled is the weakest right possible according Special:UserGroupRights (skipcaptcha right) for a new account. Anyway once the bot will finish its trial period, and will get added to bot group there will be no need for it. Thanks, Eran (talk) 07:56, 22 August 2014 (UTC)
- Have given it back. Doc James (talk · contribs · email) (if I write on your page reply on mine) 09:00, 22 August 2014 (UTC)
- OK, of course the 'bot' flag should take care of this long term. — xaosflux 12:33, 22 August 2014 (UTC)
- Have given it back. Doc James (talk · contribs · email) (if I write on your page reply on mine) 09:00, 22 August 2014 (UTC)
- Hi Xaosflux, I requested the autopatrolled right to avoid CAPTCHA in edits, which prevents the bot from running automatically. Autopatrolled is the weakest right possible according Special:UserGroupRights (skipcaptcha right) for a new account. Anyway once the bot will finish its trial period, and will get added to bot group there will be no need for it. Thanks, Eran (talk) 07:56, 22 August 2014 (UTC)
- This account says "There is NO plan for this bot to made edits to mainspace.", I've removed autopatrolled as there should be no impact to NPP. — xaosflux 02:00, 22 August 2014 (UTC)
- Yes plan to seek bot approval and no this is not my bot (i have no idea how to write bots). Just got back from Wikimania and still catching up. Doc James (talk · contribs · email) (if I write on your page reply on mine) 01:13, 22 August 2014 (UTC)
Attention Bot Operator @ערן:: Please seek your bot trial approval at WP:RFBOT. — xaosflux 02:07, 22 August 2014 (UTC)
- Have added it here Misplaced Pages:Bots/Requests_for_approval#Current_requests_for_approval. Doc James (talk · contribs · email) (if I write on your page reply on mine) 02:47, 22 August 2014 (UTC)
- See bot request, trial in userspace is good. — xaosflux 12:37, 22 August 2014 (UTC)
- Please see the thread at User_talk:Jmh649#User:EranBot_account_flags, if you have discovered a bug I am interested. — xaosflux 04:27, 24 August 2014 (UTC)
Discussion of improvements
Hey Eran. Have started going over the diffs here User:EranBot/Copyright. Wondering about a few adjustments as explained. The bot did pick up one positive and have had a chance to educate a user. :-) Doc James (talk · contribs · email) (if I write on your page reply on mine) 04:45, 23 August 2014 (UTC)
- Also wrote some comments here Misplaced Pages:MED/Copyright Doc James (talk · contribs · email) (if I write on your page reply on mine) 06:30, 23 August 2014 (UTC)
- Hi Doc James, I started to go over the few first edits (from the bottom). It did find a possible copyvio in Temporomandibular joint dysfunction and I reverted this edit and notified the user. In any case I added a new column to the table, "status", so editors who go over the list will be able to write what is the status of the edit (TP/FP), and then we can do fine tuning to improve the precision. cheers, Eran (talk) 07:39, 23 August 2014 (UTC)
- Excellent. It should be fairly easy to exclude edits that are reverts correct as they are tagged as such in the edit summary? Doc James (talk · contribs · email) (if I write on your page reply on mine) 08:02, 23 August 2014 (UTC)
- Looks like you have already fixed this :-) Doc James (talk · contribs · email) (if I write on your page reply on mine) 08:02, 23 August 2014 (UTC)
- Excellent. It should be fairly easy to exclude edits that are reverts correct as they are tagged as such in the edit summary? Doc James (talk · contribs · email) (if I write on your page reply on mine) 08:02, 23 August 2014 (UTC)
- Could it give the user talk link for the editors, to save time when contacting them? Wiki CRUK John (talk) 19:47, 23 August 2014 (UTC)
- Done (will be added in next update of the page). Eran (talk) 20:54, 23 August 2014 (UTC)
- Could it also link the article history, not just one edit? Revert wars keep cropping up, especially when mirrored. LeadSongDog come howl! 17:27, 8 September 2014 (UTC)
- Edits that just add reference citations should not trigger a report. Ignoring anything in ref tags or in citation templates would help, for instance. LeadSongDog come howl! 17:30, 8 September 2014 (UTC)
- The bot is already removing citations (and templates), ref tags and so on, but indeed there are catchs of small edits that just add refernce. I guess the reference "breaks" the paragraph, and then the bot "think" the editor removed one paragraph and replace it with 2 (with similar content). I will take further look into it in the near future. Thank you for your helpful comments, Eran (talk) 21:17, 8 September 2014 (UTC)
- By removing ref tags, does that mean it just ignores the ref between those tags? LeadSongDog come howl! 02:15, 11 September 2014 (UTC)
- Yes. (e.g
<ref>citation....</ref>and not<ref>citation....</ref>). Eran (talk) 16:30, 11 September 2014 (UTC)- Great, so if it does hit on the remainder, we can expect those to be naked citations, which require action of a different sort anyhow, perhaps tagging with
{{Citation cleanup|section}}
or just{{full}}
. I guess that can be treated as a bonus value for the bot! LeadSongDog come howl! 18:18, 11 September 2014 (UTC)- Also are named refs caught too, such as <ref name=George>citingprofessorblather</ref>?LeadSongDog come howl! 18:23, 12 September 2014 (UTC)
- Great, so if it does hit on the remainder, we can expect those to be naked citations, which require action of a different sort anyhow, perhaps tagging with
- Yes. (e.g
- By removing ref tags, does that mean it just ignores the ref between those tags? LeadSongDog come howl! 02:15, 11 September 2014 (UTC)
- The bot is already removing citations (and templates), ref tags and so on, but indeed there are catchs of small edits that just add refernce. I guess the reference "breaks" the paragraph, and then the bot "think" the editor removed one paragraph and replace it with 2 (with similar content). I will take further look into it in the near future. Thank you for your helpful comments, Eran (talk) 21:17, 8 September 2014 (UTC)
- Are various forms of quote markup all caught, such as "blahblah", 'blahblah', {{quotation|blahblah*}}, <blockquote>blahblah</blockquote> and others? LeadSongDog come howl! 18:21, 12 September 2014 (UTC)
- HTML tags are removed, and quotes aren't removed. I will update the bot so it will post somewhere in wmflabs the sent text, which will be useful for debugging and for suggestions. Thanks, Eran (talk) 17:46, 15 September 2014 (UTC)
Leave out links labeled "connection error"
This all appear to be exceedingly poor quality sources. Can we leave them out? Doc James (talk · contribs · email) (if I write on your page reply on mine) 22:04, 23 August 2014 (UTC)
- For listing sites to skip does this format work User:EranBot/Copyright/Blacklist Doc James (talk · contribs · email) (if I write on your page reply on mine) 22:15, 23 August 2014 (UTC)
- For now I added it manually. Eran (talk) 04:12, 24 August 2014 (UTC)
- For listing sites to skip does this format work User:EranBot/Copyright/Blacklist Doc James (talk · contribs · email) (if I write on your page reply on mine) 22:15, 23 August 2014 (UTC)
Hidden text
Wondering if we could add the text the line the goes with status such as I did here
Once again many thanks for your excellent work. Doc James (talk · contribs · email) (if I write on your page reply on mine) 01:22, 24 August 2014 (UTC)
- Doc James, yes it is OK. <ref name="XX"/> to link to a followup comment (putting later a <references><ref name="XX">COMMENT</ref></references>) should be OK too. But it should be consistent in sense that the first word should be TP/FP. Eran (talk) 04:18, 24 August 2014 (UTC)
Bot runs
How many times is the bot run per day? Doc James (talk · contribs · email) (if I write on your page reply on mine) 01:06, 25 August 2014 (UTC)
- Doc James, Every 3 hours. Eran (talk) 04:13, 25 August 2014 (UTC)
- Thanks. Doc James (talk · contribs · email) (if I write on your page reply on mine) 05:29, 26 August 2014 (UTC)
"connection error"
The sources with the label "connection error" here are useless. Can we leave them out? Doc James (talk · contribs · email) (if I write on your page reply on mine) 05:29, 26 August 2014 (UTC)
Misplaced Pages forks and mirrors
Can the bot check suspected positives against listings in Misplaced Pages:Mirrors_and_forks subpages? That might improve the FP rate. Many of the forks there have page names that systematically derive from the WP article title. Might as well tilt the Whack-A-Mole game in our favour. LeadSongDog come howl! 17:10, 26 August 2014 (UTC)
- It also seems to be ignoring entries in the blacklist, e.g.
- http://*.isfoundhere.com/ did not prevent a false positive from http://malaria.isfoundhere.com/
- Some guidance on how those entries should be formatted would help. LeadSongDog come howl! 15:23, 28 August 2014 (UTC)
- Hi LeadSongDog (I'm also responding to @Jmh649), it seems that Misplaced Pages:Mirrors_and_forks is for human not for machines ;) There is no consistency in the format or mirrors & forks list: sometimes it is with/without nowiki, sometimes the link is to main page () or other page (), and sometime it include the prefix of the site (). It isn't possible to cut the suffix, as sometimes only part of the site is mirror.
- Regarding the blacklist, I haven't yet written parser for blacklist (just added some of them manually). I guess we would like to have sometimes similar to User:CorenSearchBot/exclude so we can reuse it.
- In any case, I did some fixes to the bot so it can rank low quality sites and hint on mirrors based on their content. If we find this hints as reliable enough we can remove those sites automatically. Eran (talk) 08:02, 30 August 2014 (UTC)
- Great if you want us to start creating a list like that for CorenSearchBot/exclude we can. Would the bot automatically follow that than? Doc James (talk · contribs · email) (if I write on your page reply on mine) 08:35, 30 August 2014 (UTC)
- We haven't yet written parser for the blacklist, but once we do... :) (tagging @Ladsgroup). Eran (talk) 08:52, 30 August 2014 (UTC)
- a few ideas to consider, but not sure how to implement... Parser should ignore the http https FTP distinction, as many sites serve the same paths by multiple protocols. Could also ignore the top level domains for that matter. Should derive or share entries from User:CorenSearchBot/exclude and Misplaced Pages:Mirrors_and_forks/All. Should be language independent (via wikidata?) LeadSongDog come howl! 20:40, 30 August 2014 (UTC)
- OK, now the blacklist should work.
- The blacklist is based on regular expressions so there is no problems with http/ftp protocol (unless it is explicitly part of the regex in the blacklist).
- User:CorenSearchBot/exclude - The blacklist have almost the same format so we can fork it. Unfortunately Misplaced Pages:Mirrors_and_forks/All isn't machine readable in its current state, as there is no consistency in the way URL and alternative URLs are written.
- Eran (talk) 21:21, 30 August 2014 (UTC)
- Well the vast majority of the urls are preceded by either "URL" or "website". Most of the other info on that list is of limited use for this bot. LeadSongDog come howl! 22:47, 30 August 2014 (UTC)
- I forked User:CorenSearchBot/exclude into the blacklist and reformatted the entries. One way or the other, so far we seem to be catching the bulk (if not all) the mirrors. Would it be possible for the report entries to include a two or three word string that matched? That would make it much quicker for reviewing humans to localize the troublesome part of the text. LeadSongDog come howl! 17:22, 3 September 2014 (UTC)
- Well the vast majority of the urls are preceded by either "URL" or "website". Most of the other info on that list is of limited use for this bot. LeadSongDog come howl! 22:47, 30 August 2014 (UTC)
- OK, now the blacklist should work.
- a few ideas to consider, but not sure how to implement... Parser should ignore the http https FTP distinction, as many sites serve the same paths by multiple protocols. Could also ignore the top level domains for that matter. Should derive or share entries from User:CorenSearchBot/exclude and Misplaced Pages:Mirrors_and_forks/All. Should be language independent (via wikidata?) LeadSongDog come howl! 20:40, 30 August 2014 (UTC)
Still picking up reverts
Such as this edit Doc James (talk · contribs · email) (if I write on your page reply on mine) 01:47, 27 August 2014 (UTC)
- Similarly, editors who reorganize text by cut-save-paste-save (in separate edits) are getting picked up. Perhaps use a diff from the latest version by a different editor vice just the latest version? LeadSongDog come howl! 13:22, 10 September 2014 (UTC)
- That would be more difficult technically. We need to be able to splice out blocks of text. Moving text does not count as a revert. Doc James (talk · contribs · email) (if I write on your page reply on mine) 13:25, 10 September 2014 (UTC)
- Isn't it just a question of which version to diff from? Even just comparing against yesterday's version would seem better than arbitrarily using the "last" one. LeadSongDog come howl! 02:10, 11 September 2014 (UTC)
- We just sorted out an issue where a copyvio by an indef'd copypaster that had been removed was restored by a good faith edit by Formerly98. The existence of mirrors could make this kind of thing **very** confusing. I`m thinking that it would be useful to (eventually) have the bot apply something like wikiblame to help determine the first edition of the suspect text in the article. This might, of course, be too resource-intensive to be practical. LeadSongDog come howl! 17:10, 18 September 2014 (UTC)
- In some sense it is feasible - the bot can add a link to search in wikiblame (but only a small fraction of the added text) :) Eran (talk) 17:27, 18 September 2014 (UTC)
- We just sorted out an issue where a copyvio by an indef'd copypaster that had been removed was restored by a good faith edit by Formerly98. The existence of mirrors could make this kind of thing **very** confusing. I`m thinking that it would be useful to (eventually) have the bot apply something like wikiblame to help determine the first edition of the suspect text in the article. This might, of course, be too resource-intensive to be practical. LeadSongDog come howl! 17:10, 18 September 2014 (UTC)
- Isn't it just a question of which version to diff from? Even just comparing against yesterday's version would seem better than arbitrarily using the "last" one. LeadSongDog come howl! 02:10, 11 September 2014 (UTC)
- That would be more difficult technically. We need to be able to splice out blocks of text. Moving text does not count as a revert. Doc James (talk · contribs · email) (if I write on your page reply on mine) 13:25, 10 September 2014 (UTC)
EranBot suggestion
Also pinging User:Jmh649. Would it be useful to have a bot (perhaps written by someone else) that builds a list of Misplaced Pages mirrors, to remove them from EranBot's false positives? I was thinking this might be worth requesting at WP:BOTREQ, though it's not really an internal bot function at all. I think it could be done by taking some characteristic text from multiple pages and looking for domains that come up with positive matches on a high percentage of them. Mike Christie (talk - contribs - library) 01:58, 12 September 2014 (UTC)
- We have a list of 2000 mirrors. They just need to be converted into machine readable format. User:Ocaasi has details. Doc James (talk · contribs · email) (if I write on your page reply on mine) 07:10, 12 September 2014 (UTC)
- The bot looks for characteristic text of mirrors, or standard attribution to Misplaced Pages, in if it does find it, it adds "Mirror" suggestion to add for blacklist (this is just a suggestion and isn't added automaticlly to blacklist). However, there are many sties which copy Misplaced Pages content without attribution (which is BTW copyright violation from their side). Eran (talk) 07:15, 12 September 2014 (UTC)
- I understand. I was just thinking that the process of identifying both full-fledged mirrors and sites with substantial copyvios is something that might usefully be subcontracted, so to speak, to another bot. It could generate lists of mirrors in machine readable format, and keep the list up to date as new ones are found. Mike Christie (talk - contribs - library) 14:24, 12 September 2014 (UTC)
- The bot looks for characteristic text of mirrors, or standard attribution to Misplaced Pages, in if it does find it, it adds "Mirror" suggestion to add for blacklist (this is just a suggestion and isn't added automaticlly to blacklist). However, there are many sties which copy Misplaced Pages content without attribution (which is BTW copyright violation from their side). Eran (talk) 07:15, 12 September 2014 (UTC)
- We have a list of 2000 mirrors. They just need to be converted into machine readable format. User:Ocaasi has details. Doc James (talk · contribs · email) (if I write on your page reply on mine) 07:10, 12 September 2014 (UTC)
I compiled a list of mirrors from the WMF plagiarism study last year, which may have some additional sites that aren't on the bigger list. These are the ones that were coming up frequently in Grammarly's plagiarism matching system (already in the form of regex, too): User:Sage (Wiki Ed)/mirrors.--Sage (Wiki Ed) (talk) 19:55, 12 September 2014 (UTC)
- Great - we can fork it to the blacklist and avoid checking them. I think some mirrors are actually "grey list" mirrors - sometimes they have their own content and not always mirroring wikipedia. I'm not sure how should we handle them. Eran (talk) 17:59, 15 September 2014 (UTC)
- I went ahead and incorporated them, if the FPs don't come down we're nowhere. Changed the regex form to match. Thanks, Sage! LeadSongDog come howl! 18:58, 16 September 2014 (UTC)
- I've added the Misplaced Pages:Mirrors_and_forks entries too, for a total of 1563. Let's hope the grey list mirrors are not too much of an issue. I think it's better than rejecting good faith edits for the wrong reason. LeadSongDog come howl! 18:57, 17 September 2014 (UTC)
- I went ahead and incorporated them, if the FPs don't come down we're nowhere. Changed the regex form to match. Thanks, Sage! LeadSongDog come howl! 18:58, 16 September 2014 (UTC)
Still trapping on added refs
This edit only added a ref. Why was that caught, if the ref tagged content is supposedly removed? LeadSongDog come howl! 15:51, 18 September 2014 (UTC)
- A little tougher case at , where the refs are not wrapped in tags. LeadSongDog come howl! 03:06, 22 September 2014 (UTC)
- Antimicrobial - the change itself isn't copvio, but as it change word + adding reference, it is considered as changing paragraph, and the whole paragraph was sent for checking. Maybe it would be possible to check the diff against the content again to eliminate such changes to appear here as the diff change size is just word.
- In general copyright protects also collections of facts, so copying a full (long) list of bibliographic references may be considered copyright violation, though this is not the case here of course. I'm not sure how we should handle such edits. maybe the diff size threshold should be higher? Eran (talk) 17:48, 22 September 2014 (UTC)
- Not sure I get your point. The diff should be the only part going for checking, at least for now. The balance of the text has been in the article for years. Wasting human effort on chasing mirrors is rather pointless, as there will always be more of them. Especially wasteful as at least half of the mirrors are readily identifiable by the string "Misplaced Pages" which they contain. LeadSongDog come howl! 22:30, 22 September 2014 (UTC)
LeadSongDog, Doc James: Notice the 2 following changes in the bot:
- report link - added links to ithenticate for comparing the diffs (already appear in the last runs)
- diff is now in sequence resolution instead of line resolution (for next bot runs): e.g Instead of comparing line by line the diff is compared char by char. For example the diff in is adding of single sentence ("Voluntary counseling... test positive"), instead of replacing the whole paragraph "Programs encouraging..." to a new paragraph with similar text plus sentence.
This is definitely the case in the example, and I think in general it should be better, but please let me know if you see weired diffs or wrong behavior. This change would also affect the number of diffs behind sent to ithenticate service: for the example diff above, will not be sent to ithenticate since it is too small diff (only 163 characters). Thanks, Eran (talk) 13:10, 25 September 2014 (UTC)
- The use of javascript in the reports is a bit of an issue, as some firewalls block it. LeadSongDog come howl! 16:56, 25 September 2014 (UTC)
- OK, so I'll add also a simple link for that case. Eran (talk) 17:03, 25 September 2014 (UTC)
- Thank you. It's probably more wp:ACCESSIBLE (e.g. for screen readers) that way too.LeadSongDog come howl! 17:18, 25 September 2014 (UTC)
- OK, so I'll add also a simple link for that case. Eran (talk) 17:03, 25 September 2014 (UTC)
Simpler idea
Given that many edits actually do provide an url or citation to the source used (imagine that!!!) it would seem prudent to first check the edit diff against such cited sources before going further afield to look at everything else. Is that an available option? LeadSongDog come howl! 16:56, 25 September 2014 (UTC)
- It may be possible. Can you please provide 1-3 example diffs for which it should say the content is similar to the referenced URL? Eran (talk) 17:27, 25 September 2014 (UTC)
- Recently, cited , cited "<ref><ref>Stanford University microbiologist Nathan Wolfe quoted in National Geographic article on Microbes - Jan 2013 pg 141</ref></ref>" and cited "<ref>{{cite web | url=http://www.womenshealth.gov/publications/our-publications/fact-sheet/hashimoto-disease.html | title=Hashimoto's disease fact sheet | publisher=Office on Women's Health, U.S. Department of Health and Human Services, womenshealth.gov (or girlshealth.gov) | date=July 16, 2012 | accessdate=23 November 2014 | reviewed by=Cooper MD, DS |}}</ref>". Each of these NCP violations were caught, but they were just that, not failures to cite. LeadSongDog come howl! 18:19, 25 September 2014 (UTC)
Same text added and removed in diff
The quality of results is really coming along. Nice work so far! Checking out the recent false positives, it seems like aside from mirrors, the edits that cause problems are mainly ones like this one (report). Here, the block of text with a match was modified, but the actual matching text was not the part that was added. It seems like in a case like this where there's an <ins class="diffchange diffchange-inline"> tag inside the ins tag, only the content inside the tags should be checked.--Sage (Wiki Ed) (talk) 20:28, 20 October 2014 (UTC)
- Thank you. It already removes parts of the text that already exist in previous edit, but this works for a paragraph. I just added a small tweak to do so also for smaller chuncks of the text. Eran (talk) 22:03, 20 October 2014 (UTC)
VE gadget
I've left some comments at User talk:ערן/veReplace.js about an upcoming breaking change. ESanders (WMF) (talk) 13:38, 28 November 2014 (UTC)
- Have these changes been made? Doc James (talk · contribs · email) 05:22, 11 December 2014 (UTC)
A barnstar for you!
The Defender of the Wiki Barnstar | |
For creating Misplaced Pages's most important bot. One that has keep hundreds if not thousands of pieces of copyright violations out of Misplaced Pages. Doc James (talk · contribs · email) 01:55, 11 December 2014 (UTC) |
BAGBot: Your bot request EranBot 2
Someone has marked Misplaced Pages:Bots/Requests for approval/EranBot 2 as needing your input. Please visit that page to reply to the requests. Thanks! AnomieBOT⚡ 23:38, 18 December 2014 (UTC) To opt out of these notifications, place {{bots|optout=operatorassistanceneeded}} anywhere on this page.
Getting duplicate reports
The bot reported here and then on the next run again here on the same two infractions. LeadSongDog come howl! 16:27, 16 January 2015 (UTC)
New buttons
I love these new buttons here . Can we begin using the new setup for the medical articles?
Can we switch the buttons such that "green" is no copyvio/FP and "red" is copyvio/TP
Doc James (talk · contribs · email) 04:20, 1 March 2015 (UTC)
Problem getting reports
I can't get reports any more. It looks like something's gone off with the tool server? Please have a look.LeadSongDog come howl! 13:04, 15 April 2015 (UTC)
- LeadSongDog, it should be fixed. Thanks! Eran (talk) 19:14, 15 April 2015 (UTC)
- That's got it, thank you. LeadSongDog come howl! 21:33, 15 April 2015 (UTC)
No updates to /rc lately
Hi! I noticed that EranBot hasn't updated the global recent changes copyvio page in a while. Is that on hold for the time being?--Sage (Wiki Ed) (talk) 16:50, 8 June 2015 (UTC)
- Hi Sage (Wiki Ed), I just restarted it (and moved all the old reports in User:EranBot/Copyright/rc/archive) Eran (talk) 19:22, 8 June 2015 (UTC)
- Can we have the bot break down the edits by week maybe? When the lists get to large they are hard to load. Doc James (talk · contribs · email) 21:28, 8 June 2015 (UTC)
- Ideally we could set it up such that more load the farther one scrolls down like the "new page patrol". How hard is that to do? Doc James (talk · contribs · email) 21:29, 8 June 2015 (UTC)
- Can we have the bot break down the edits by week maybe? When the lists get to large they are hard to load. Doc James (talk · contribs · email) 21:28, 8 June 2015 (UTC)
Okay have split the nearly one million byte page into 7 subpages. Eran how to we get the "javascript" to work on those 7 subpages? Doc James (talk · contribs · email) 21:55, 8 June 2015 (UTC)
- By the way how often does the bot die and therefore how often does it need restarting? Might be good to have a few of us able to restart the bot. Doc James (talk · contribs · email) 21:56, 8 June 2015 (UTC)
Eranbot, positive found in ref tag
Hi. This edit was recently tagged by Eranbot. I'm not too familiar with the bot, but it appears that it is supposed to skip checking citations, but it caught this addition of a quote inside a ref tag. If the bot is not intended to skip citations, it might be worth ignoring content added to the "quote" parameter in citation templates (for example: {{cite web|url=...|quote=Ignore this}}
) Such content is always going to be picked up by the bot, but probably should not be. I skimmed the source, and it seems to run a few regexes before submitting to Turnitin. It should be possible to remove the quote with something simple like... /quote=*/i Let me know if I can be of any help. Thanks. — Jess· Δ♥ 17:39, 21 June 2015 (UTC)