Misplaced Pages

talk:Wikidata/2017 State of affairs - Misplaced Pages

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
< Misplaced Pages talk:Wikidata

This is an old revision of this page, as edited by Francis Schonken (talk | contribs) at 07:08, 16 January 2018 (instruct bot to archive all). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Revision as of 07:08, 16 January 2018 by Francis Schonken (talk | contribs) (instruct bot to archive all)(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)


Archives
Archive 1Archive 2Archive 3
Archive 4Archive 5Archive 6
Archive 7Archive 8Archive 9
Archive 10Archive 11Archive 12
Archive 13Archive 14


This page has archives. Sections older than 1 days may be automatically archived by Lowercase sigmabot III.

Follow-up at Misplaced Pages talk:Wikidata/2018 State of affairs

Wikidata vandalism again affecting enwiki articles

Image from enwiki article showing Wikidata vandalism

For nearly 7 hours today, Nepal no longer existed, and the Nepalese on Wikidata (and enwiki wherever the Wikidata label was used) lived in "Nepalpeneflacido" instead (with "flacido" meaning "flaccid", and, well, you can guess the rest). Like I said before, changing the label on Wikidata is the equivalent of a page move on enwiki. Wikidata has no means at the moment to prevent such moves (or they need to protect all of the page, they can't protect only the label), and not enough editors to patrol this (despite claims about the much larger base of editors they have and so on). And on enwiki (or on other wikis), not enough people (hardly any) have Wikidata changes enabled in their watchlist as that produces loads of unreadable garbage and changes which don't affect enwiki at all. So these changes time and again remain unnoticed for hours (or longer), affecting an unknown number of pages. It happened with Romania recently, now with Nepal, probably others I didn't notice at the time as well.

The effect of this on enwiki is limited now. But if e.g. many more biographies would have the Wikidata version of the infobox, or other types of infoboxes would be converted to pure Wikidata versions, this would become much more problematic. The strength of Wikidata (one change affecting many pages at once) is a serious weakness if you can't be reasonably sure that such changes are either beneficial or very quickly reverted. Fram (talk) 15:38, 5 December 2017 (UTC)

Did you first notice the vandalism when you initially edited the Lumbini article? Richard Nevell (talk) 19:07, 5 December 2017 (UTC)
Yes. Fram (talk) 21:45, 5 December 2017 (UTC)
So you just left it rather than fixed the vandalism on Wikidata? Richard Nevell (talk) 22:34, 5 December 2017 (UTC)
Richard, why do you think Fram has a moral obligation to edit Wikidata? The rules, the knowledge base, the expectations, and the community seem to me to be different from those on Misplaced Pages (though I have to admit a lot of ignorance on this subject). Shouldn't a volunteer be able to choose which communities they want to invest their time in? - Dank (push to talk) 23:17, 5 December 2017 (UTC)
I don't think it's a matter of being obligated to do something, but I do find it curious that someone would notice a mistake and then leave it. For me it does change the initial framing of this post. Richard Nevell (talk) 23:32, 5 December 2017 (UTC)
I want to be clear that I'm not anti-Wikidata; I don't know enough about it to be against it. I see a large volunteer community putting effort into it, and that's probably a good thing, and worth doing. But when Wikipedians try to explain the problems that arise ... and Fram was bringing up a relevant point here, I thought ... we're confronted by people who seem to want to shame us for not being sufficiently pro-Wikidata. Maybe that wins points on some kind of scorecard, but it doesn't seem like a strategy that's likely to produce an end result of successful integration and cooperation. - Dank (push to talk) 23:34, 5 December 2017 (UTC)
Misplaced Pages and Wikidata have a lot more in common than they do separating them. Using that common ground as the basis for collaboration between the two communities would be beneficial for both sites. Richard Nevell (talk) 23:45, 5 December 2017 (UTC)
When I notice a template using Wikidata (World Heritage Site) creating problems on hundreds of enwiki articles and propose a solution, you oppose that solution and then proceed to do nothing about the problems, meaning that the problems (actual wrong information on enwiki articles) persists for many more months. This is apparently not a problem for you. But when I see a problem on enwiki caused by Wikidata vandalism, fix it on enwiki where I notice it, and then do nothing further, you go all moral outrage on me? Even though I have tried rather hard to fix the root cause (using Wikidata on enwiki), which would make the symptom (vandalism of an English label on Wikidata) a rather futile form of vandalism instead of the effective one it was now. You are the one promoting the use of Wikidata on enwiki, and claiming that this is so beneficial for both; then you have the obligation to organize things so that such problems happen less and less often, and to search for solutions. But all I see is someone who rejects solutions but then is surprised when others don't want to edit their pet project which causes the problems. I had seen this coming (that people would expect us to edit Wikidata whether we want to or not), but it's not going to work. Fram (talk) 05:37, 6 December 2017 (UTC)
Whatever Misplaced Pages(s) and Wikidata may have in common is more likely to be lost than strengthened by attempts to coerce Wikipedians into maintaining Wikidata, whether they are technical effects like imposing Wikidata descriptions on Misplaced Pages displays om Mobile view, or rhetoric claiming that Wikipedians have any ethical obligation to work on another project. From the Wikipedian point of view Wikidata is becoming more trouble than it is worth. Beware the backlash. · · · Peter (Southwood) : 06:27, 6 December 2017 (UTC)

Similar vandalism (changing the English label) from the last few days, which all lasted for hours:

  • Oceania was named "africa" for more than 5 hours
  • Canada was "Culo" for nearly five hours
  • Guinea was only "Negrazo" for nearly two hours, so that's relatively quick
  • Faggot (food) is now "Meatballs" (not reverted yet, after more than 24 hours)
  • Astronomy is in English now "천문학" since 36 hours (not reverted yet)
  • Henry VIII: for more than two hours was completely vandalized, resulting e.g. in everyone getting "obey hitler" as the English description on their apps or elsewhere (same for French and German readers, by the way). The IP could vandalize as much as it wanted for 23 consecutive edits spread over more than 30 minutes, so it's not one subtle edit which slipped through the cracks

As a bonus, for people comparing relative short-term vandalism on wikidata with long-term vandalism on enwiki: Diego Simeone, a page seen by some 900 people a day on enwiki alone: Since 9 May until yesterday (i.e. for nearly seven months), he had a completely wrong name "Roberto Fernández" and the not-so-flattering and not-so-English description "futbolista medio".

The above is only a selection of some of the most obvious and high profile vandalism, and doesn't include some very high-profile unreverted vandalism examples (yeah, sue me). It only focuses on one aspect of vandalism (English labels), and doesn't include things like J. K. Rowling being a Reptilian with 43 children for hours... Fram (talk) 10:07, 6 December 2017 (UTC)

One of the unreverted high profile examples has meanwhile been found: Muhammad Ali was known for more than 1 day as "Muhammad L'kahba"... And the two unreverted ones I linked to above were reverted four minutes later! Fram (talk) 19:51, 6 December 2017 (UTC)

On the case of kpop artist Suga, though I do not feel the least obligation to fix things in wikidata, I *did* try to fix the vandalism there, but it was so intermixed with good editions that it was a complete mess. It's pretty obvious that vandalism control is not working conveniently at Wikidata, and there is founded doubt that it ever will, so one first, important step towards the usability of that resource could be closing it to anonymous/newby editing. And even so, we'll still have to deal with editwarring migrating from the wikipedia articles into Wikidata, something there's no solution for at the moment, but to completely remove the Wikidata gadget (infobox or whatever) from the already protected article on the Misplaced Pages side. At this point, each time I use information from wikidata live on Misplaced Pages articles, I feel like I'm doing the wrong thing. It's blatantly not reliable.-- Darwin 01:20, 7 December 2017 (UTC)

And for 21 hours, Wikidata didn't have an entry for John F. Kennedy (like I said, a rather high profile page), but they had one for Putita loca ("crazy bitch") instead. Fram (talk) 08:19, 7 December 2017 (UTC)

@Fram and DarwIn: I don't think you'll find anyone that *likes* that this vandalism happens. However, there is a reason why {{sofixit}} was created. Remember that this type of vandalism used to be (and still sometimes is!) a common argument against Misplaced Pages vs. traditional encyclopaedias - and we have tackled that on enwp/ptwp with a mixed success rate. Just complaining about this issue really doesn't help, it's much better to be pro-active either individually (by reverting the vandalism, or pointing it out to someone that can revert it for you) or systematically (by passing on the lessons learnt here / figuring out better ways of catching Wikidata vandalism - e.g., see m:2017 Community Wishlist Survey/Wikidata/Better countervandalism tools). Personally, I don't draw a line between Misplaced Pages/Commons/Wikidata/Wikisource/etc. - they are all different ways to share knowledge, and I try to use the best tool for the job. However, I acknowledge that some don't like to cross project borders (even if they can use the same login on either side of the border), and although I can't understand it, I still want to help fix the problems, both individually and systematically. This conversation so far hasn't done that, though (except for @Ymblanter's reverts). Thanks. Mike Peel (talk) 22:31, 7 December 2017 (UTC)
@Mike Peel: Unfortunately, like it or not, there are borders, and I knew them the hardest way when I was blocked in an alien Misplaced Pages for trying to remove blatant original research from there, because on that project - as I've found later - original research produced by local Wikipedians was perfectly OK on that project. It is simply not acceptable that Misplaced Pages editors can't control information and vandalism on their own project, and have to go on errands to alien projects begging for something to be done. It's simply not acceptable, from whatever side we look at it. It is not anyone's obligation that every time a vandalism spread, or an edit war happens, they have to find their way to the Wikidata admin board - wherever that is - and explain the whole situation to an alien community nobody knows nothing about, with their own rules, and request that some action is taken. In the case of edit wars, it would basically mean to extend the conflict to yet another project. This is not acceptable in the least. Either Wikidata solves their problems, or as sad as it can be, we would be much better off using Wikidata only for interwiki purposes - (and, hopefully, without the Wikidata community messing up with that, and interfering in the way other projects work those issues, as they have been doing). You asked for suggestions, I already made one: Completely blocking Wikidata to IP/newbie editing, at least until some functional vandalism control is put on place, would be a good start to make it usable.-- Darwin 00:21, 8 December 2017 (UTC)
There are many ways to fix things of course. Vandalism reversion on Wikidata is dealing with individual cases. Improving or getting rid of Wikidata infoboxes and the like is dealing with the root cause on the enwiki side. Getting better anti-vandal tools is a possible solution on the Wikidata side. "Just complaining about this issue really doesn't help" but that's hardly the only thing I have done of course. That you don't like many of my actions doesn't mean that they haven't happened of course. That vandalism happens on enwiki as well is hardly an argument to use a site which is even worse at catching vandalism instead. I agree that we should use the best tools available, but I don't see how you can pretend that Wikidata is that tool at the moment (or perhaps ever). Wikidata (like Wikisource, Wikiquote, ...) is a tool with a different purpose. We wouldn't transclude info from Wikivoyage or Wikinews into enwiki either. "I still want to help fix the problems, both individually and systematically." As long as it means using Wikidata though, like at the WHS infobox? You are causing many of the problems we have here with Wikidata, so I don't think you are the best person to lecture about "sofixit" and about willingness to fix problems. You are willing to fix minor issues as long as no one questions the major one, which is "is Wikidata really the best tool for this or that job". This conversation, like many others here, are a way to increase understanding and awareness of the actual scale of the problems, which aren't anecdotical but chronic and serious. Reverts of individual cases or protection of individual cases minutes after I have pointed them out here is not fixing the actual problems, it's adressing a forest fire with a water gun and berating someone else who is on the phone (to the fire brigade) instead of picking up a second water gun. Fram (talk) 08:02, 8 December 2017 (UTC)

Status of descriptions

@DannyH (WMF): what is the status of the Wikidata descriptions on enwiki? Are you waiting for us to do anything, or are you progressing with the development and implementation of the magic word? Is there a Phab ticket for this where we can see what is done and what needs to be done, and perhaps some timing? Fram (talk) 08:21, 7 December 2017 (UTC)

Earlier today, the semi-protected Tel Aviv article (45,000 pageviews yesterday!) for more than three hours had the description "The Capital Of Israel", after an IP changed it on Wikidata (they wouldn't have been able to make that edit on enwiki!). That this kind of politically charged editing is being shown to enwiki readers is completely unacceptable, and should have been fixed a long time ago by the WMF. Can you please finally do something about this? This discussion has been going on for months. Fram (talk) 13:45, 7 December 2017 (UTC)

Yannow I was thinking it was phab:T152743 or one of its daughter tasks but apparently not... Jo-Jo Eumerus (talk, contributions) 13:59, 7 December 2017 (UTC)
Thanks. I did a search for this on Phab and couldn't find a ticket for this either. Fram (talk) 14:30, 7 December 2017 (UTC)
Hi Fram, the thread where we were talking about this got archived: Misplaced Pages talk:Wikidata/2017 State of affairs/Archive 12#November Magic Word proposal from WMF. The team that's going to work on the magic word is planning to start making the changes in January, with estimated finish by the end of February. I don't actually know the tickets right now, but I'll find out, and report back. -- DannyH (WMF) (talk) 16:03, 7 December 2017 (UTC)
Fram, we reached agreement to collaborate on a multi-option RFC. I meant to get a draft started, but haven't been able to focus on it yet. I can't get to it today, my brain is fried on lack of sleep. Alsee (talk) 17:40, 7 December 2017 (UTC)
@DannyH (WMF) and Fram: As with @Alsee, I understood that this was heading to an RfC, *not* immediate implementation of a magic word. Thanks. Mike Peel (talk) 22:01, 7 December 2017 (UTC)
Yeah, I'm happy to collaborate on an RfC whenever it gets started. But we offered to make the magic word in Jan/Feb, and I don't want to go back on that offer just because the conversation quieted down. -- DannyH (WMF) (talk) 00:49, 8 December 2017 (UTC)
Oh, and to answer Fram's question above: there isn't a ticket yet. We're talking about requirements. -- DannyH (WMF) (talk) 00:51, 8 December 2017 (UTC)
Thanks for the answer, but that's terrible. This has been decided by RfC long, long ago, this has been discussed to death for months, and you don 't even have a first ticket for this and are talking about requirements (with whom? Not with us, clearly). What you (WMF) need to do is simple: develop a magic word, develop an option to have a blank description (wherever we want it, not on some pre-defined list of article types: this is preferably as simple as "no magic word is no description"), and make it available. The only thing that needs discussion between WMF and enwiki is "Hi enwiki, the magic word is ready, please start using it and tell us when to disable the showing of the Wikidata description on Enwiki completely and everywhere". Nothing else still needs to be decided or discussed by the WMF, or else you should come here and ask it. Fram (talk) 05:38, 8 December 2017 (UTC)

Wouldn't it be more logical to proceed now with the RfC, before anyone starts developing the so-called magicword? Depending on the result, it couldn't be needed at all, right? And, in any case, the RfC is completely independent of that development.-- Darwin 01:06, 8 December 2017 (UTC)

We need to start an RfC to discuss which description we want to have, and a bot request to populate the magic word then. Fram (talk) 05:38, 8 December 2017 (UTC)

RfC started at Misplaced Pages:Village pump (proposals)#RfC: Populating article descriptions magic word. I will post this at CENT as well. Fram (talk) 09:59, 8 December 2017 (UTC)

@Fram: That seems to be missing the key question of 'do we want to use a magic word for descriptions?'. Thanks. Mike Peel (talk) 11:55, 8 December 2017 (UTC)
I thought that was the compromise agreed upon with the WMF? What else would you use, a template? Fine by me, but that seems to be harder to implement in the apps and so on for the WMF, so I didn't see the benefit of arguing about that aspect, and neither did apparently anyone else here since the magic word was proposed. Fram (talk) 11:59, 8 December 2017 (UTC)
Please show me the consensus for that. As far as I can see, there are a few options - no descriptions, Wikidata descriptions (with/without extra visibility/editing here), magic word, local template, etc. There was some discussion here about it, but not a wider RfC. And now this has jumped to 'how do we do this in practice' without saying 'do we want to do this?'... Thanks. Mike Peel (talk) 12:13, 8 December 2017 (UTC)
I believe Mike Peel is right. It should first be asked/decided if descriptions are useful/needed at all, and then ask/decide about their implementation. (Personally I believe it will be a pacific win for the "yes, they are useful" but it's important that it gets referended and recorded, instead of sounding like yet another WMF imposition). Another important thing is what to do with newly created articles, as they will be missing the magic word by default (bot adding?). Frankly, this "magic word" thing sounds *a lot* like a bad patch, something that will inevitably have to be resolved/replaced in the middle term. Descriptions are obviously useful, and should be somehow part of the mediawiki environment where we edit, not those artesanal, primitive, and obviously insufficient (already by design) "magic words". IMO the "magic word" path should only be followed if it can be easily transformed in the (near) future into some Mediawiki embedded feature, so that the immense work that is going to be done now can be entirely used to populate that feature. If this is pacific, then it's OK to proceed with the "magic word" option.-- Darwin 12:30, 8 December 2017 (UTC)
If there truly are some (or many) people advocating for "no, we shouldn't have descriptions, period", then an RfC on that question is useful. But an RfC on a question that isn't really disputed in the first place is a waste of time. I have no idea what kind of "part of the mediawiki environment" you envision, so I can't really comment on that. Fram (talk) 12:49, 8 December 2017 (UTC)
@Fram: As descriptions seem to be something that will become kind of mandatory, and linked to app features, lists, and much more, I was thinking on something like a description box, somewhere around the article, kind of what you already have in Wikidata, but local to the Wikipedias. A "magic word" looks a lot like a temporary patch, not something for the long - not even the middle run. At least it has to be designed so that it allows for a future export to such a feature, in case it gets implemented in the future (something very probable, IMO, it should be embedded by design, not used as a template, or even a magic word).-- Darwin 15:33, 8 December 2017 (UTC)
(ec)"Some" discussion? Two months, 7 archives. The WMF have made it clear that a magic word is the solution, not an option. You are free to start an RfC on that of course, but it seems a lot more useful to try to find a workable compromise to finally get this resolved, than to start a fruitless RfC on whether the WMF should use a template instead of a magic word and so on, and then have the current RfC anyway. And "Wikidata descriptions" is the one option we already had an RfC about and which was then clearly rejected. The options now are no descriptions, or descriptions on enwiki through a template or a magic word. On the latter, the WMF clearly prefers a magic word for technical reasons, and I see no good reason to force the use of a template instead in that case. Which leaves us with "no descriptions" vs. "magic word", and just simply disallowing descriptions on enwiki seems counterproductive (and not as far as I know supported by anyone).
So we all want to provide descriptions in some cases, an RfC has decided that we don't want this to be Wikidata-hosted descriptions, and the WMF ha made it clear that for them a magic word is then the best solution. Since there was no progress since then and no RfC was started (but is needed), I started this one. If you know want to start a counter-RfC, then be my guest, but first think about how this would benefit both enwiki and the WMF. Fram (talk) 12:39, 8 December 2017 (UTC)
As Alsee had said, we'd agreed to collaborate on a multi-option RfC -- I was just waiting for someone to start writing it, so I could participate. This RfC didn't include me, so I added my piece in the discussion under the "What to do with blanks" section. -- DannyH (WMF) (talk) 15:02, 8 December 2017 (UTC)
Well, everyone seems to disagree on what was actaully agreed upon, if anything, so now we at least get a discussion where everyone can participate. Although I do hope that the quality of conbtributions will in general be better than "I don't know of any examples where a blank description would be better" right below a blatant example of such a case, as that gives the impression of someone whose idea of a "collaborative RfC" is "a place where I can repeat my thoughts without having to read what others have said". Not really the best start you could take there... Fram (talk) 15:57, 8 December 2017 (UTC)
As I wrote there, I agree that the vandalism response rate on Wikidata is too slow -- I cited your example, and said it's disappointing and frustrating. I think the solution to that is to make that response rate better, by making it easier for Misplaced Pages editors to monitor and fix vandalism of the descriptions. I disagree that the best solution is to pre-emptively blank descriptions because we know that there's a possibility that they'll be vandalized. I'm asking for specific examples where editors would make the choice to not show a description on the article page, because a blank description is better than the majority of good-to-adequate descriptions already on Wikidata. -- DannyH (WMF) (talk) 16:07, 8 December 2017 (UTC)
Since none of the other solutions exist yet, the best solution by far now is to have a blank description if there is no onwiki description. Making it necessary for enwiki editors to edit Wikidata to solve problems visible on enwiki is the big dream of the WMF and some ardent Wikidata proponents, but I (and as far as I can judge from many reactions over the months) many others have no intention to start editing two sites to have one good article. The reason we use Commons is that we usually can trust Commons to deal with vandalism swiftly on their own. If Commons files were a continuous source of long-lasting vandalism on major articles, we would soon decide to host most files locally instead. Anyway, this was given in the past as well, but do you really thing Spirit (Depeche Mode album) needs the description "Depeche Mode album"? Seems somewhat redundant... Such articles don't need a description on enwiki, but need one like the current one on Wikidata. Fram (talk) 16:16, 8 December 2017 (UTC)
Fram, I just fixed World War II (Q362) after you pointed out the vandalism. The process was exactly the same as fixing it on Wikidata: click history, click undo, write "revert vandalism." Everything was in English and I used my same username. If you aren't willing to do that, I think you're being obtuse, because you'd rather point out a flaw and wage a boundary dispute with Wikidata, then WP:FIXTHEPROBLEM.--Carwil (talk) 16:53, 8 December 2017 (UTC)
Good for you. Why would I revert vandalism on a site which I believe should not be used on enwiki anywhere but where the use of descriptions has been forced upon us by the WMF without adequate tools to deal with them? Just like they did with e.g. Gather, which they were very, very reluctant to abolish even when it became clear that we didn't want it? I am not going to clean up the mess they created, I want to stop the source of the mess, that's all. Fram (talk) 09:04, 9 December 2017 (UTC)
Fram, I agree that there needs to be a real solution to helping Misplaced Pages editors monitor and edit Wikidata descriptions; I'm anxious to hear Lydia's update on the progress on getting the descriptions into WP watchlists, which is a necessary (but not a final) step. I think the desire for a fix now is absolutely understandable. At the same time, I don't want to build a feature now that could potentially mass-blank thousands of good-to-adequate descriptions, when we could be working on getting the moderation to work properly.
I really believe that the short descriptions are valuable to the readers and editors who see them, and building a magic word that defaults to no description means taking the existing descriptions away by default. It would take away a feature that a lot of people use for reasons that they don't know or care about, and I don't see how that could be a positive step. I'm advocating for the magic word override, here and internally, because it will result in higher-quality descriptions for the users. Blanking the descriptions by default will result in lower-quality descriptions (i.e., none at all). I know this situation is frustrating right now, but the most important thing is that people get the most value out of reading Misplaced Pages.
Thanks for bringing up the Depeche Mode page -- you're right, I'd forgotten about that example. I think in that case, the best description would be "2017 Depeche Mode album". I think mild redundancy around disambiguators is pretty much inevitable. -- DannyH (WMF) (talk) 01:24, 9 December 2017 (UTC)
No, such mild redundancy is not inevitable at all, unless you are completely unwilling to allow blank descriptions and look for excuses all the time. You want examples where blank would be better, but vandalism doesn't count, redundancy doesn't count, and everything else could probably also be changed or would be aceptable to you. Perhaps you should care more about BLP violations than about keeping your precious descriptions at all costs? Can you tell me what progress there has been made the last 9 months by the WMF? They can't even get the enwiki watchlist to show English labels instead of P-numbers and Q-numbers for Wikidata changes, which means that it is completely unclear what is being changed. This has been asked for years... "At the same time, I don't want to build a feature now that could potentially mass-blank thousands of good-to-adequate descriptions, when we could be working on getting the moderation to work properly. " Tough luck, you (WMF) have stalled this for long enough, didn't keep your initial promise, and show very little interest in cooperating now or listening to concerns. As usual (Flow, Gather, MV, ACT, ...) Gie us the magic word, and we will populate it and we will decide where we need descriptions (and which ones) and where a blank might be better. These are content decisions, and the WMF should stay the hell out of content decisions. Fram (talk) 09:04, 9 December 2017 (UTC)

The case of Misplaced Pages titles with parenthetical disambiguators is one of the few places where we might actually want blank descriptors to ease reading, so it's a decent example where a magic word helps both projects. Even there, maybe we should consider ways of incorporating slightly more informative descriptions on both projects. I recently shepherded ORFN, a page about a graffiti artist named Aaron Curry, through AfC. Now there was already an Aaron Curry (artist), so this involved generating some hatnotes and tweaking the Wikidata descriptions. What I learned is that the text between the two parentheses is often an inadequate description. So we can't just shut off short descriptions because of a parenthetical disambiguator; someone needs to make a editorial decision for each. This is even more true at disambiguators that state a domain, like Aaron Curry (American football).--Carwil (talk) 12:55, 9 December 2017 (UTC)

Statistically, what fraction of Wikidata descriptions are bad?

Given that various claims and counter-claims have been made about how bad or good Wikidata descriptions are, but I don't think anyone's yet looked at a reasonable sample, I've written some code that fetches N random articles (using RandomPageGenerator, the pywikibot equivalent of Special:Random) and their Wikidata descriptions, and I've put the result for 1,000 articles at User:Mike Peel/Wikidata descriptions. 385 descriptions were blank, and 5 articles didn't have Wikidata entries. Looking through the rest, I can only spot one (0.1%) that's actually bad - "Tiger Mangweni - Rugbpy player" (typo now fixed) - although there are many that could be improved. Anyone else want to have a look through and see what they think? I can refresh/enlarge the sample easily if needed. Thanks. Mike Peel (talk) 23:53, 11 December 2017 (UTC)

Thanks, that's useful. A quick scan of the list confirms my (limited) experience, namely that the descriptions are useful for mobile readers. Some descriptions should be improved, but that's true of a lot of things. Johnuniq (talk) 01:47, 12 December 2017 (UTC)
Yeah, thanks for pulling this data, it's really helpful to see a random set. There are a lot that should be improved. I think any of them that are one word long -- book, ship, academic -- need some kind of qualifier. Some of them are too long, and occasionally there's a full sentence. But I don't see any that are actively harmful or wrong, the worst you can say is that the short ones aren't descriptive enough.
I made a change to the examples where you got an error -- that happened when an article topic didn't have an associated item on Wikidata. Those are just displaying as not having a description; there isn't an error message on the page. (I checked.) So I changed the label from ERROR to (no Wikidata item), to express what's actually seen on the page. -- DannyH (WMF) (talk) 02:05, 12 December 2017 (UTC)
I gave my impression of that sample on the RfC page. There are some clear errors in the list (including a Dutch description), many that are superfluous or confusing (the "Wikimedia" ones, or ones that repeat the disambiguation) and many, many that should be improved. The 0.1% error rate is not really correct in any case... But basically, yes, we need descriptions in many cases, there is little dispute of that. Whether this set is better than a set created automatically from the first lines of the articles isn't clear though, other tests on this page (archives) indicate that using information from the first line in general produces better results (though obviously not in all cases). Fram (talk) 08:29, 12 December 2017 (UTC)
Mike Peel, This is a useful list. There are few cases of where a description is not needed, and does not exist, and a few cases of where a description is not needed, but does exist and Misplaced Pages would be better served by a blank, also cases where the description is good enough as a start, and even some where it would be difficult to improve the description. It probably demonstrates the full range of quality available. Most of the missing descriptions appear to be in places where descriptions are needed. Some of the descriptions that exist but do not look useful for Misplaced Pages look useful for Wikidata. I did not spot any that are obviously wrong, but then I did not compare most of them to the article content, so if they looked plausible, I would not pick up a problem.
DannyH (WMF), If the worst you can say is that the short ones are not descriptive enough, you probably haven't looked very closely at the quality, and did not see the ones which are obviously redundant. · · · Peter (Southwood) : 16:03, 12 December 2017 (UTC)
Peter, I have looked closely at the list; I'm making a slightly different judgment about the impact of redundancy. I think mild redundancy is acceptable, especially with disambiguation phrases. For example, Shine On (Ralph Stanley album) has "album by Ralph Stanley" -- I think a better description would be "2005 album by Ralph Stanley"; if I was involved in determining the format for record albums that's the one I'd support. But either way, the title is redundant with the first sentence of the article: "Shine On is a 2005 album by American bluegrass artist Ralph Stanley." Seeing this sequence on an article page -- "Shine On (Ralph Stanley album)", "2005 album by Ralph Stanley", "Shine On is a 2005 album by American bluegrass artist Ralph Stanley" -- has some redundant information, but the title and first sentence have redundant information anyway. I don't think that redundancy confuses or annoys anybody; you'd just skim past it and move on with the article. I don't see the harm.
If I was involved in determining the format for short descriptions on album pages, I'd want to have a uniform format that everyone could follow, so people wouldn't have to make case-by-case judgment calls every time there's a disambiguation phrase. Shine On (Ralph Stanley album) would have "2005 album by Ralph Stanley", Making Love (album) would have "1999 album by Atom and His Package", and The Conquest of You would have "1997 album by Kid Creole and the Coconuts". Is that the kind of redundancy that you're concerned about?
Also, I'm curious about the items where you say that a description isn't needed, and Misplaced Pages would be better served by a blank. Can you share those examples? -- DannyH (WMF) (talk) 17:23, 12 December 2017 (UTC)
And here I was thinking you didn't involve yourself with content decisions. If most editors on enwiki would decide that in cases like these, it is better to have no description than a redundant or too long and detailed one, then who is the WMF to interfere with that? Stick to your role please. Getting the same information twice in a row (title and first sentence) is enough, getting it three times in a row is ridiculous, and not needed for these use cases where you get the title but not the first line. Will anyone think "Oh, it's the 2005 album, I was looking for the 2004 album"? Just as many as will think "Oh, I was looking for the album with song X" or "I was looking for the album where Y featured on a song" or whatever. A description should give enough information to make the general subject clear, especially for those cases where the subject isn't clear from the title. That's it. If we want longer descriptions following some rule, we can easily decide this here and program a bot to fill or change them if wanted. We don't need the WMF holding our hands to guide us to the best description. They have better things to do. Fram (talk) 17:45, 12 December 2017 (UTC)
Yeah, I was just using that as an example. Like I said, once there's a magic word, Misplaced Pages editors make decisions on how to use it. The question that people are asking in this discussion is whether the existing descriptions are good or bad; that's a judgment call, and I'm expressing my opinion about it. -- DannyH (WMF) (talk) 18:05, 12 December 2017 (UTC)
Some input on the application of the short description from WMF is useful, as they will be using it to distribute our content. What we need from them is how they will use it, and what constraints there may be on text length and that kind of thing, so we can try to produce useful, as well as accurate, descriptions. Is there an optimum length range? It there a constraint on maximum number of characters for display reasons? Should we try for more information, or just enough to distinguish between likely targets? · · · Peter (Southwood) : 05:59, 13 December 2017 (UTC)
The disambiguation and list article pages (there are 68 listed in the sample) are clearly neither errors nor vandalism. The problem of how to use these labels is worth discussing. I find "Misplaced Pages disambiguation page" to be a useful subtitle, but others' perspectives may differ. Lists are typically adequately described by their title, and the subtitle should be suppressed. However, in both cases, there are good reasons for this text to be visible in the VisualEditor. If an editor is inserting a link into an article, it's very helpful to know whether "James Montgomery" is an article about a person, or a disambiguation page.--Carwil (talk) 22:55, 12 December 2017 (UTC)
Yes, and that applies even more to someone browsing Misplaced Pages on a mobile device where the typical approach is to search for a topic or name, then select a page to view from the presented list. Such a reader would quickly learn what "Misplaced Pages disambiguation page" means and would regard that description as very useful. Johnuniq (talk) 02:18, 13 December 2017 (UTC)
Wikipedia disambiguation page is also better than Wikimedia disambiguation page, as it is more specific and more readers will know what Misplaced Pages means. It also allows disambiguation between the projects if applicable. If the search only targets Misplaced Pages, then "Disambiguation page" without the qualifier would be better. This would be easily done on Misplaced Pages, but would reduce the value of the Wikidata description, where the project name is useful, as Wikidata links to other projects besides Misplaced Pages where the subject matches, and Wikivoyage, for example, also uses disambiguation pages. This is not so much a harmful thing as a lack of elegance, so it is not urgent, just preferable. · · · Peter (Southwood) : 05:59, 13 December 2017 (UTC)
DannyH (WMF), In any case where the article title describes the subject as fully as a reasonable length short description would, the short description is redundant. It is always possible to write a longer description with more information, but I understand the function of the short description is to give just enough for the reader to recognise which article from a group is most likely to be the one they are looking for. Perhaps you need to define the purpose of the short description more precisely.
A case in point here is those album article titles you mentioned. The existing short descriptions are redundant (when they contain no information that is not already in the title) and look unprofessional, but when you add information such as the date, they become more useful. As they are very short to start with, and not much longer when expanded, they would be acceptable as providing useful information, and not exceeding the limits of a short description. Much depends on whether the added information actually helps the reader find the right article. As explained by Fram above, some added information may not be useful for this purpose. · · · Peter (Southwood) : 06:07, 13 December 2017 (UTC)