Article snapshot taken from Wikipedia with creative commons attribution-sharealike license.
Give it a read and then ask your questions in the chat.
We can research this topic together.
(One intermediate revision by the same user not shown)
Line 136:
Line 136:
.*Hewgyr.* <moveonly>
.*Hewgyr.* <moveonly>
.*Hewgyor." <moveonly>
.*Hewgyor." <moveonly>
.*Everett.* # Used for harassment username and page creation - remove end Dec 2008
# DISALLOW CREATION OF USER OR USER TALK PAGES FOR A SPECIFIC IP RANGE BY NON-AUTOCONFIRMED USERS
# DISALLOW CREATION OF USER OR USER TALK PAGES FOR A SPECIFIC IP RANGE BY NON-AUTOCONFIRMED USERS
Revision as of 15:46, 7 December 2008
# This is a title blacklist; every title that matches regex here are forbidden to create.
# Options exist to stop editing, account creation, and moves as well. See mw:Extension:Title Blacklist for documentation
# See the talk page for more information.
# Please comment any additions made to the blacklist.
# Note: Internally, the pattern delimiter is '/', so be sure to escape all '/'s.
# UTF-8 mode is enabled. Do not use literal non-breaking spaces in regexes as some browsers cannot handle them.
# OBSCURE ASCII CHARACTER LOOKALIKES
.*.* <casesensitive> # Select Unicode Letterlike Symbols (excluding Kelvin, Angstrom and Ohm signs, see talk)
.*.* <casesensitive> # Circled and parenthesized Latin letters
.*.* <casesensitive | errmsg=titleblacklist-custom-fullwidth> # Fullwidth Latin letters
.*.* <casesensitive | moveonly> # Question mark lookalikes, used for page move vandalism
.*.* <casesensitive> # Phonetic extensions, almost never used in valid titles
.*.* <casesensitive | moveonly> # IPA extensions, somewhat more common, so blocking only moves for now
.*.* <casesensitive | moveonly> # Select mathematical operators (excluding "−", "∞" and some other common ones)
.*.* <casesensitive | moveonly> # Misc./supplemental mathematical symbols
.*.* <casesensitive | moveonly> # Letter lookalikes; none of these are currently used in any mainspace title
# OTHER UNDESIRABLE CHARACTERS
.*.* <casesensitive | errmsg=titleblacklist-custom-nbsp> # Non-breaking and other unusual spaces, with custom error message
.*.* <casesensitive> # BiDi overrides
.*.* <casesensitive> # "Other punctuation", with some exceptions (may need more, this is a huge character class); note that single-character titles are permitted by the title whitelist
.*\p{Cc}.* <casesensitive> # Control characters
.*\x{FEFF}.* <casesensitive> # Byte order mark
.*.* <casesensitive> # Swastikas, hammer-and-sickle
.*\x{00AD}.* <casesensitive> # Soft-hyphen
.*.* <casesensitive> # Very few characters outside the Basic Multilingual Plane are useful in titles
.*.* <casesensitive> # Graphic pictures for control codes
# EXCESSIVE PUNCTUATION OR REPETITION
.*{3}(?<!!!!).*
.*{2}(?<!!!!).* <moveonly>
.*\s+.*
.*‽‽.* <moveonly>
.*¿¿.* <moveonly>
.*{2}.* # Disallows two adjacent "separator" characters (mostly funky spaces)
.*{5}.* # Disallows five consecutive characters that are not letters (in any script), numbers, or spaces
.*()\1{4}.* <moveonly> # Disallows four or more of the same character from page moves
.*(.)\1{10}.* <newaccountonly> # Disallows eleven or more of the same character repeated in usernames
.*\p{Lu}(\P{L}*\p{Lu}){9}.* <casesensitive | moveonly> # Disallows moves with more than nine consecutive capital letters
# INVERTED QUESTION MARK WITH NON-LATIN TEXT
.*¿.*.*
.*.*¿.*
# DISALLOW CREATION OF USER OR USER TALK PAGES FOR A SPECIFIC IP RANGE BY NON-AUTOCONFIRMED USERS
User( talk)?:71\.107\.(1(2|\d)|2(\d|5))\.(?\d\d?|2(5|\d)) <autoconfirmed>
User( talk)?:75\.47\.(1(2|\d)|2(\d|5))\.(?\d\d?|2(5|\d)) <autoconfirmed>
# PAGE MOVE TARGETS
(.*\W)?(|\\W\)+(\W|\W.*\W)?((\W|\W.*\W)?)*((\W|\W.*\W)?)+((\W|\W.*\W)?)++(\W.*)? <moveonly> # HERMY
(.*\W)?+(\W|\W.*\W)?((\W|\W.*\W)?)+((\W|\W.*\W)?)+((\W|\W.*\W)?)*(|\\W\)+(\W.*)? <moveonly> # YMREH
.*((\W|\W.*\W)?(\W|\W.*\W)?)+((\W|\W.*\W)?)+((\W|\W.*\W)?)+.* <moveonly>
.*I\W*B\W*H\W*H\W*F\W*S.* <moveonly>
.*I\W*F\W*S\W*N\W*Z.* <moveonly>
Misplaced Pages( talk)?:(*(?-i:).*|(.*\W)?+(\W|\W.*\W)?(((\W|\W.*\W)?)+((\W|\W.*\W)?)+((\W|\W.*\W)?)++|((\W|\W.*\W)?)+((\W|\W.*\W)?)+((\W|\W.*\W)?)+Y+)(\W.*)?) <moveonly> # No haggery in project space, please. (Only ASCII/Latin1 characters needed in this regexp.)
(Help|Portal)( talk)?:(.*(?-i:).*|(.*\W)?+(\W|\W.*\W)?(((\W|\W.*\W)?)+((\W|\W.*\W)?)+((\W|\W.*\W)?)++|((\W|\W.*\W)?)+((\W|\W.*\W)?)+((\W|\W.*\W)?)+Y+)(\W.*)?) <moveonly> # ..nor in help or portal spaces either. (Only ASCII/Latin1 characters needed in this regexp.)
# DISALLOW PAGE MOVES TO MIXED-SCRIPT TITLES
# Intentionally move-only due to false positives
(?!(User|Misplaced Pages)( talk)?:|Talk:)\P{L}*\p{Latin}.*.* <moveonly> # Latin + non-Latin
(?!(User|Misplaced Pages)( talk)?:|Talk:)\P{L}*.*\p{Latin}.* <moveonly> # Latin + non-Latin
(?!(User|Misplaced Pages)( talk)?:|Talk:)\P{L}*\p{Greek}.*.* <moveonly> # Greek + non-Greek
(?!(User|Misplaced Pages)( talk)?:|Talk:)\P{L}*.*\p{Greek}.* <moveonly> # Greek + non-Greek
(?!(User|Misplaced Pages)( talk)?:|Talk:)\P{L}*\p{Cyrillic}.*.* <moveonly> # Cyrillic + non-Cyrillic
(?!(User|Misplaced Pages)( talk)?:|Talk:)\P{L}*.*\p{Cyrillic}.* <moveonly> # Cyrillic + non-Cyrillic
# Slightly different regexp for user/project/talk pages, to allow e.g. Latin subpages of Cyrillic usernames:
((User|Misplaced Pages)( talk)?:|Talk:)(.*\/)?\P{L}*\p{Latin}*.* <moveonly> # Latin + non-Latin
((User|Misplaced Pages)( talk)?:|Talk:)(.*\/)?\P{L}**\p{Latin}.* <moveonly> # Latin + non-Latin
((User|Misplaced Pages)( talk)?:|Talk:)(.*\/)?\P{L}*\p{Greek}*.* <moveonly> # Greek + non-Greek
((User|Misplaced Pages)( talk)?:|Talk:)(.*\/)?\P{L}**\p{Greek}.* <moveonly> # Greek + non-Greek
((User|Misplaced Pages)( talk)?:|Talk:)(.*\/)?\P{L}*\p{Cyrillic}*.* <moveonly> # Cyrillic + non-Cyrillic
((User|Misplaced Pages)( talk)?:|Talk:)(.*\/)?\P{L}**\p{Cyrillic}.* <moveonly> # Cyrillic + non-Cyrillic
.*(\P{L}*){4}.* <casesensitive | moveonly> # Non-Latin all caps
# Block a particular bot
AOL user message bot .* <newaccountonly>
# GENERIC IMAGE FILE NAMES (with custom error message)
# at most three letters of potentially meaningful text:
(Image|File):\P{L}*((Ima?ge?|Pict?(ure)?|Media|Photo)\P{L}+)?(\p{L}\P{L}*){0,3}((orig|copy|thumb|small)\P{L}*)?\.+ <reupload | errmsg=titleblacklist-custom-imagename>
# no more than two contiguous letters (raising to three would be tempting, but needs more testing):
(Image|File):\P{L}*((Ima?ge?|Pict?(ure)?|Media|Photo)\P{L}+)?(\p{L}{1,2}\P{L}+)*((\p{L}{1,2}|orig|copy|thumb|small)\P{L}*)?\.+ <reupload | errmsg=titleblacklist-custom-imagename>
# month name followed by no more than two contiguous letters, JPEG suffix (be careful if you edit this, easy to trigger false positives):
(Image|File):\P{L}*(January|Jan|February|Febr?|March|Mar|April|Apr|May|June?|July?|August|Aug|September|Sept?|October|Oct|November|Nov|December|Dec)(\P{L}+\p{L}{1,2})*\P{L}*\.JPE?G <reupload | errmsg=titleblacklist-custom-imagename>
# Common digital cameral file names, based on list at http://diddly.com/random/about.html
# See also MediaWiki:Filename-prefix-blacklist, used to generate a warning on the upload form
(Image|File):DCP\d+\.JPG <reupload | errmsg=titleblacklist-custom-imagename> # Kodak
(Image|File):DSC.\d+\.JPG <reupload | errmsg=titleblacklist-custom-imagename> # Design rule for Camera File system (Nikon, Fuji, Polaroid)
(Image|File):MVC-?\d+\.JPG <reupload | errmsg=titleblacklist-custom-imagename> # Sony Mavica
(Image|File):P\d+\.JPG <reupload | errmsg=titleblacklist-custom-imagename> # Olympus, Kodak
(Image|File):I?MG?\d+\.JPG <reupload | errmsg=titleblacklist-custom-imagename> # Canon, Pentax
(Image|File):1\d+-\d+(_IMG)?\.JPG <reupload | errmsg=titleblacklist-custom-imagename> # Canon
(Image|File):(IM|EX)\d+\.JPG <reupload | errmsg=titleblacklist-custom-imagename> # HP Photosmart
(Image|File):DC\d+\.JPG <reupload | errmsg=titleblacklist-custom-imagename> # Kodak
(Image|File):PIC?\d+\.JPG <reupload | errmsg=titleblacklist-custom-imagename> # Minolta
(Image|File):PANA\d+\.JPG <reupload | errmsg=titleblacklist-custom-imagename> # Panasonic
(Image|File):DUW\d+\.JPG <reupload | errmsg=titleblacklist-custom-imagename> # some mobile phones
(Image|File):CIMG\d+\.JPG <reupload | errmsg=titleblacklist-custom-imagename> # Casio
(Image|File):JD\d+\.JPG <reupload | errmsg=titleblacklist-custom-imagename> # Jenoptik
# Other common patterns
(Image|File):\d{9}{6}_{2}\P{L}*\.\w+ <reupload | errmsg=titleblacklist-custom-imagename> # some image hosting site?
(Image|File):\d{8,}_{10}(_)?\P{L}*\.\w+ <reupload | errmsg=titleblacklist-custom-imagename> # another image hosting site?
# (Image|File):(\d{9,10})+?\.\w+ <reupload | errmsg=titleblacklist-custom-imagename> # yet another image hosting site? (redundant to "no more than two contiguous letters")
(Image|File):({8}-)?{4}-{4}-{4}-?{12}.* <reupload | errmsg=titleblacklist-custom-imagename> # UUID (with some variations included)
(Image|File):(|\d+)_{10,}(-\d+-|_?(\w\w?|full))?\.+ <reupload | errmsg=titleblacklist-custom-imagename> # L_9173c67eae58edc35ba7f2df08a7d5c6.jpg, 2421601587_abaf4e3e81.jpg, 1_bf38bcd9c5512a5ab99ca2219a4b1e2f_full.gif, etc.
(Image|File):\P{L}*No\P{L}*name\P{L}*\.+ <reupload | errmsg=titleblacklist-custom-imagename> # Noname2.jpg