模組:Data consistency check
此模块檢查維基詞典上使用的語言、語言系屬和文字数据模块的有效性和內部一致性:包括Category:語言資料模块 和 Module:scripts/data。
輸出
编辑檢測到差異:
- 赫爾尼基語, the canonical name for the code
xum-her
, is wrong; it should be 赫爾尼基語.
- 赫爾尼基語, the canonical name for the code
xum-her
, is wrong; it should be 赫爾尼基語.
- The data key
preprocess_links
for ??? (th-new
) is invalid. - 赫爾尼基語 (
xum-her
) has a canonical name that is not unique; it is also used by the codexhr
.
- Phla–Pherá, the canonical name for
alv-pph
, is repeated in the table ofaliases
. - 古印度-雅利安語支 (
inc-old
) has no child families or languages.
- 書面挪威語 (
nb
) has 中古挪威語 (gmq-mno
) set as an ancestor, but is not in the 西斯堪地那維亞語支 (gmq-wes
). - 書面挪威語 (
nb
) has 丹麥語 (da
) set as an ancestor, but is not in the 東斯堪地那維亞語支 (gmq-eas
).
- 「巴格里語」(
bfy
)的規範名稱不唯一,同時被代碼bgq
使用。 - 「巴特里語」(
bgw
)的規範名稱不唯一,同時被代碼btv
使用。 - 「比里語」(
bhb
)的規範名稱不唯一,同時被代碼bzr
使用。 - 「波拉語」(
boa
)的規範名稱不唯一,同時被代碼bxd
使用。 - Dakaka, the canonical name for
bpa
, is repeated in the table ofotherNames
. - 「巴薩語」(
bsq
)的規範名稱不唯一,同時被代碼bas
使用。 - 「巴薩語」(
bzw
)的規範名稱不唯一,同時被代碼bas
使用。
- Shuba, the canonical name for
cbq
, is repeated in the table ofotherNames
. - Tiri, the canonical name for
cir
, is repeated in the table ofotherNames
. - Cakchiquel-Quiché Mixed Language, the canonical name for
ckz
, is repeated in the table ofaliases
. - Maa, the canonical name for
cma
, is repeated in the table ofaliases
. - Island Carib, the canonical name for
crb
, is repeated in the table ofotherNames
.
- 「達伊語」(
dax
)的規範名稱不唯一,同時被代碼dij
使用。
- Ngen, the canonical name for
gnj
, is repeated in the table ofotherNames
.
- 加勒比印度斯坦語 (
hns
) has 博杰普爾語 (bho
) set as an ancestor, but is not in the 比哈爾語 (inc-bih
). - 加勒比印度斯坦語 (
hns
) has 阿瓦德語 (awa
) set as an ancestor, but is not in the 東印地語支 (inc-hie
).
- Numee, the canonical name for
kdk
, is repeated in the table ofaliases
. - 「凱克語」(
keh
)的規範名稱不唯一,同時被代碼kzq
使用。 - 「科龍語」(
klm
)的規範名稱不唯一,同時被代碼kyo
使用。 - 「凱特語」(
kmg
)的規範名稱不唯一,同時被代碼ket
使用。 - 「馬卡揚語」(
kmx
)的規範名稱不唯一,同時被代碼aup
使用。 - 「科拉克語」(
koz
)的規範名稱不唯一,同時被代碼hhr
使用。 - 「科維語」(
kvc
)的規範名稱不唯一,同時被代碼kqb
使用。
- Looma, the canonical name for
lom
, is repeated in the table ofotherNames
. - Larevat, the canonical name for
lrv
, is repeated in the table ofaliases
.
- Mafea, the canonical name for
mkv
, is repeated in the table ofaliases
. - Mae, the canonical name for
mme
, is repeated in the table ofaliases
. - 「艾西語」(
mmq
)的規範名稱不唯一,同時被代碼ahs
使用。 - 「西部湘西苗語」(
mmr
)的規範名稱不唯一,同時被代碼muq
使用。 - 「曼丁哥語」(
mnk
)的規範名稱不唯一,同時被代碼man
使用。 - Marino, the canonical name for
mrb
, is repeated in the table ofaliases
. - Merlav, the canonical name for
mrm
, is repeated in the table ofaliases
. - 「穆西語」(
mui
)的規範名稱不唯一,同時被代碼mse
使用。 - Central Maewo, the canonical name for
mwo
, is repeated in the table ofaliases
.
- 「洪語」(
nev
)的規範名稱不唯一,同時被代碼hnu
使用。 - Yuaga, the canonical name for
nua
, is repeated in the table ofaliases
.
- 「奧米語」(
omi
)的規範名稱不唯一,同時被代碼aom
使用。
- 「布那語」(
pbv
)的規範名稱不唯一,同時被代碼bvn
使用。 - 「普泰語」(
pht
)的規範名稱不唯一,同時被代碼mfl
使用。 - Pwaamei, the canonical name for
pme
, is repeated in the table ofaliases
. - 「波爾語」(
pmm
)的規範名稱不唯一,同時被代碼blf
使用。 - Paunaca, the canonical name for
pnk
, is repeated in the table ofotherNames
. - 「帕語」(
ppt
)的規範名稱不唯一,同時被代碼pai
使用。
- 「拉瓦語」(
rwo
)的規範名稱不唯一,同時被代碼luf
使用。
- Zire, the canonical name for
sih
, is repeated in the table ofaliases
. - 「薩姆語」(
snx
)的規範名稱不唯一,同時被代碼raq
使用。
- 「泰雷諾語」(
tiv
)的規範名稱不唯一,同時被代碼ter
使用。
- Uripiv-Wala-Rano-Atchin, the canonical name for
upv
, is repeated in the table ofaliases
. - Ura (New Guinea), the canonical name for
uro
, is repeated in the table ofotherNames
. - Lehalurup, the canonical name for
urr
, is repeated in the table ofaliases
.
- 「馬庫瓦語」(
vmw
)的規範名稱不唯一,同時被代碼lva
使用。 - Banam Bay, the canonical name for
vrt
, is repeated in the table ofaliases
.
- Yanomámi, the canonical name for
wca
, is repeated in the table ofaliases
. - 「莫語」(
wkd
)的規範名稱不唯一,同時被代碼mkg
使用。 - Wè Northern, the canonical name for
wob
, is repeated in the table ofotherNames
.
- 「卡里亞語」(
xcr
)的規範名稱不唯一,同時被代碼khr
使用。 - Indus Valley Language, the canonical name for
xiv
, is repeated in the table ofotherNames
. - 「卡納西語」(
xns
)的規範名稱不唯一,同時被代碼soq
使用。
- Nyâlayu, the canonical name for
yly
, is repeated in the table ofaliases
. - 「帕拉語」(
ypa
)的規範名稱不唯一,同時被代碼plq
使用。 - Yaroamë, the canonical name for
yro
, is repeated in the table ofotherNames
. - 「貝納語」(
yun
)的規範名稱不唯一,同時被代碼bez
使用。 - 「尤奇語」(
yuq
)的規範名稱不唯一,同時被代碼yuc
使用。
- 儘管原始布那-卡西-林甘語(
aav-pkl-pro
)是布那-卡西-林甘語語支(aav-pkl
)的祖語,但它並不使用預期的名稱「原始布那-卡西-林甘語語」。 - The name Kelantan Peranakan Hokkien is found twice or more in the list of
aliases
for 吉蘭丹峇峇 (mis-hkl
). - 儘管Proto-Cangin(
alv-cng-pro
)是Cangin(alv-cng
)的祖語,但它並不使用預期的名稱「原始Cangin」。 - 儘管Proto-Edekiri(
alv-edk-pro
)是埃德基里語支(alv-edk
)的祖語,但它並不使用預期的名稱「原始埃德基里語」。 - 儘管Proto-Fali(
alv-fli-pro
)是Fali(alv-fli
)的祖語,但它並不使用預期的名稱「原始Fali」。 - 儘管Proto-Guang(
alv-gng-pro
)是Guang(alv-gng
)的祖語,但它並不使用預期的名稱「原始Guang」。 - 儘管原始中多哥語(
alv-gtm-pro
)是加納-多哥山區語支(alv-gtm
)的祖語,但它並不使用預期的名稱「原始加納-多哥山區語」。 - 儘管Proto-Heiban(
alv-hei-pro
)是Heiban(alv-hei
)的祖語,但它並不使用預期的名稱「原始Heiban」。 - 儘管Proto-Idomoid(
alv-ido-pro
)是Idomoid(alv-ido
)的祖語,但它並不使用預期的名稱「原始Idomoid」。 - 儘管Proto-Igboid(
alv-igb-pro
)是Igboid(alv-igb
)的祖語,但它並不使用預期的名稱「原始Igboid」。 - 儘管Proto-Kwa(
alv-kwa-pro
)是庫阿語支(alv-kwa
)的祖語,但它並不使用預期的名稱「原始庫阿語」。 - 儘管Proto-Mumuye(
alv-mum-pro
)是Mumuye(alv-mum
)的祖語,但它並不使用預期的名稱「原始Mumuye」。 - 儘管Proto-Arnhem(
aus-arn-pro
)是Arnhem(aus-arn
)的祖語,但它並不使用預期的名稱「原始Arnhem」。 - 儘管Proto-Daly(
aus-dal-pro
)是Daly(aus-dal
)的祖語,但它並不使用預期的名稱「原始Daly」。 - 儘管原始伊瓦伊賈語(
aus-wdj-pro
)是Iwaidjan(aus-wdj
)的祖語,但它並不使用預期的名稱「原始Iwaidjan」。 - Proto-Amuesha-Chamicuro (
awd-amc-pro
) has a proto-language code associated with the invalid codeawd-amc
. - Proto-Kampa (
awd-kmp-pro
) has a proto-language code associated with the invalid codeawd-kmp
. - Maypure, the canonical name for
awd-mpr
, is repeated in the table ofaliases
. - 儘管Proto-Nawiki(
awd-nwk-pro
)是Nawiki(awd-nwk
)的祖語,但它並不使用預期的名稱「原始Nawiki」。 - Passé, the canonical name for
awd-pas
, is repeated in the table ofaliases
. - Proto-Paresi-Waura (
awd-prw-pro
) has a proto-language code associated with the invalid codeawd-prw
. - 儘管Proto-Cupan(
azc-cup-pro
)是Cupan(azc-cup
)的祖語,但它並不使用預期的名稱「原始Cupan」。 - 儘管Proto-Takic(
azc-tak-pro
)是Takic(azc-tak
)的祖語,但它並不使用預期的名稱「原始Takic」。 - 儘管原始阿布哈茲-阿巴扎語(
cau-abz-pro
)是阿布哈茲-阿巴札語族(cau-abz
)的祖語,但它並不使用預期的名稱「原始阿布哈茲-阿巴札語」。 - 儘管Proto-Andian(
cau-and-pro
)是Andian(cau-and
)的祖語,但它並不使用預期的名稱「原始Andian」。 - 儘管Proto-Masa(
cdc-mas-pro
)是Masa(cdc-mas
)的祖語,但它並不使用預期的名稱「原始Masa」。 - 儘管Proto-Caddoan(
cdd-pro
)是Caddoan(cdd
)的祖語,但它並不使用預期的名稱「原始Caddoan」。 - 儘管原始布立吞語(
cel-bry-pro
)是布立吞亞支(cel-bry
)的祖語,但它並不使用預期的名稱「原始布立吞亞支」。 - 儘管Proto-Chimakuan(
chi-pro
)是Chimakuan(chi
)的祖語,但它並不使用預期的名稱「原始Chimakuan」。 - 儘管Proto-Bongo-Bagirmi(
csu-bba-pro
)是Bongo-Bagirmi(csu-bba
)的祖語,但它並不使用預期的名稱「原始Bongo-Bagirmi」。 - 儘管Proto-Mangbetu(
csu-maa-pro
)是Mangbetu(csu-maa
)的祖語,但它並不使用預期的名稱「原始Mangbetu」。 - 儘管Proto-Sara(
csu-sar-pro
)是Sara(csu-sar
)的祖語,但它並不使用預期的名稱「原始Sara」。 - 原始魯凱語 (
dru-pro
) has a proto-language code associated with 魯凱語 (dru
), which is not a family. - 儘管Proto-Gbaya(
gba-pro
)是Gbaya(gba
)的祖語,但它並不使用預期的名稱「原始Gbaya」。 - 儘管原始諾爾斯語(
gmq-pro
)是北日耳曼語支(gmq
)的祖語,但它並不使用預期的名稱「原始北日耳曼語」。 - 儘管原始卡姆塔語(
inc-krn-pro
)是卡姆塔土話(inc-krn
)的祖語,但它並不使用預期的名稱「原始卡姆塔土話」。 - 儘管原始安納托利亞語(
ine-ana-pro
)是安那托利亞語族(ine-ana
)的祖語,但它並不使用預期的名稱「原始安那托利亞語」。 - 儘管原始桑格萊奇伊什卡什米語(
ira-sgi-pro
)是Sanglechi-Ishkashimi(ira-sgi
)的祖語,但它並不使用預期的名稱「原始Sanglechi-Ishkashimi」。 - 儘管原始舒格南羅尚語(
ira-shr-pro
)是Shughni-Roshani(ira-shr
)的祖語,但它並不使用預期的名稱「原始Shughni-Roshani」。 - 儘管原始舒格南雅茲古拉米語(
ira-shy-pro
)是Shughni-Yazghulami(ira-shy
)的祖語,但它並不使用預期的名稱「原始Shughni-Yazghulami」。 - 儘管原始舒格南雅茲古拉米蒙賈尼語(
ira-sym-pro
)是Shughni-Yazghulami-Munji(ira-sym
)的祖語,但它並不使用預期的名稱「原始Shughni-Yazghulami-Munji」。 - 儘管原始扎扎其古拉尼語(
ira-zgr-pro
)是扎扎-古拉尼語支(ira-zgr
)的祖語,但它並不使用預期的名稱「原始扎扎-古拉尼語」。 - 儘管原始日語(
jpx-pro
)是日本-琉球語系(jpx
)的祖語,但它並不使用預期的名稱「原始日本-琉球語」。 - 儘管原始漢特語(
kca-pro
)是漢特語組(kca
)的祖語,但它並不使用預期的名稱「原始漢特語組」。 - 儘管原始克木語(
mkh-khm-pro
)是Khmuic(mkh-khm
)的祖語,但它並不使用預期的名稱「原始Khmuic」。 - 儘管原始莽語(
mkh-pkn-pro
)是Pakanic(mkh-pkn
)的祖語,但它並不使用預期的名稱「原始Pakanic」。 - 儘管原始曼西語(
mns-pro
)是曼西語組(mns
)的祖語,但它並不使用預期的名稱「原始曼西語組」。 - 儘管原始楚馬什語(
nai-chu-pro
)是丘馬什語系(nai-chu
)的祖語,但它並不使用預期的名稱「原始丘馬什語」。 - 儘管原始契努克語(
nai-ckn-pro
)是奇努克語系(nai-ckn
)的祖語,但它並不使用預期的名稱「原始奇努克語」。 - 儘管原始卡拉普亞語(
nai-klp-pro
)是Kalapuyan(nai-klp
)的祖語,但它並不使用預期的名稱「原始Kalapuyan」。 - 儘管Proto-Maidun(
nai-mdu-pro
)是Maiduan(nai-mdu
)的祖語,但它並不使用預期的名稱「原始Maiduan」。 - 儘管Proto-Plateau Penutian(
nai-plp-pro
)是Plateau Penutian(nai-plp
)的祖語,但它並不使用預期的名稱「原始Plateau Penutian」。 - 儘管原始托托索克語(
nai-tot-pro
)是Totozoquean(nai-tot
)的祖語,但它並不使用預期的名稱「原始Totozoquean」。 - 儘管Proto-Tsimshianic(
nai-tsi-pro
)是Tsimshianic(nai-tsi
)的祖語,但它並不使用預期的名稱「原始Tsimshianic」。 - 儘管Proto-Utian(
nai-utn-pro
)是烏蒂語族(nai-utn
)的祖語,但它並不使用預期的名稱「原始烏蒂語」。 - 儘管Proto-Eastern Oti-Volta(
nic-eov-pro
)是Eastern Oti-Volta(nic-eov
)的祖語,但它並不使用預期的名稱「原始Eastern Oti-Volta」。 - 儘管Proto-Gurunsi(
nic-gns-pro
)是Gurunsi(nic-gns
)的祖語,但它並不使用預期的名稱「原始Gurunsi」。 - 儘管Proto-Grassfields(
nic-grf-pro
)是Grassfields(nic-grf
)的祖語,但它並不使用預期的名稱「原始Grassfields」。 - 儘管Proto-Gur(
nic-gur-pro
)是Gur(nic-gur
)的祖語,但它並不使用預期的名稱「原始Gur」。 - 儘管Proto-Jukunoid(
nic-jkn-pro
)是Jukunoid(nic-jkn
)的祖語,但它並不使用預期的名稱「原始Jukunoid」。 - 儘管Proto-Lower Cross River(
nic-lcr-pro
)是下克羅斯河語支(nic-lcr
)的祖語,但它並不使用預期的名稱「原始下克羅斯河語」。 - 儘管Proto-Ogoni(
nic-ogo-pro
)是Ogoni(nic-ogo
)的祖語,但它並不使用預期的名稱「原始Ogoni」。 - 儘管Proto-Oti-Volta(
nic-ovo-pro
)是Oti-Volta(nic-ovo
)的祖語,但它並不使用預期的名稱「原始Oti-Volta」。 - 儘管Proto-Plateau(
nic-plt-pro
)是Plateau(nic-plt
)的祖語,但它並不使用預期的名稱「原始Plateau」。 - 儘管Proto-Ubangian(
nic-ubg-pro
)是烏班吉語支(nic-ubg
)的祖語,但它並不使用預期的名稱「原始烏班吉語」。 - 儘管Proto-Dizoid(
omv-diz-pro
)是Dizoid(omv-diz
)的祖語,但它並不使用預期的名稱「原始Dizoid」。 - 原始奧塞提亞語 (
os-pro
) has a proto-language code associated with 奧塞梯語 (os
), which is not a family. - 儘管Proto-Kalamian(
phi-kal-pro
)是Kalamian(phi-kal
)的祖語,但它並不使用預期的名稱「原始Kalamian」。 - 儘管原始哈馬黑拉-鳥頭灣語(
poz-hce-pro
)是馬黑拉-鳥頭灣語支(poz-hce
)的祖語,但它並不使用預期的名稱「原始馬黑拉-鳥頭灣語」。 - 儘管原始東部波利尼西亞語(
poz-pep-pro
)是東波利尼西亞語(poz-pep
)的祖語,但它並不使用預期的名稱「原始東波利尼西亞語」。 - 儘管Proto-Kadu(
qfa-kad-pro
)是Kadu(qfa-kad
)的祖語,但它並不使用預期的名稱「原始Kadu」。 - 儘管原始仡佬語(
qfa-kra-pro
)是仡央語群(qfa-kra
)的祖語,但它並不使用預期的名稱「原始仡央語」。 - 儘管原始侗台語(
qfa-tak-pro
)是壯侗語系(qfa-tak
)的祖語,但它並不使用預期的名稱「原始壯侗語」。 - 儘管Proto-Quechuan(
qwe-pro
)是克丘亞語系(qwe
)的祖語,但它並不使用預期的名稱「原始克丘亞語」。 - 儘管Proto-Boran(
sai-bor-pro
)是Boran(sai-bor
)的祖語,但它並不使用預期的名稱「原始Boran」。 - 儘管Proto-Taranoan(
sai-tar-pro
)是Taranoan(sai-tar
)的祖語,但它並不使用預期的名稱「原始Taranoan」。 - Wayumará, the canonical name for
sai-way
, is repeated in the table ofaliases
. - 儘管Proto-Witotoan(
sai-wit-pro
)是Witotoan(sai-wit
)的祖語,但它並不使用預期的名稱「原始Witotoan」。 - 儘管Proto-Daju(
sdv-daj-pro
)是Daju(sdv-daj
)的祖語,但它並不使用預期的名稱「原始Daju」。 - 儘管Proto-Eastern Jebel(
sdv-eje-pro
)是Eastern Jebel(sdv-eje
)的祖語,但它並不使用預期的名稱「原始Eastern Jebel」。 - 儘管Proto-Nilotic(
sdv-nil-pro
)是尼羅語支(sdv-nil
)的祖語,但它並不使用預期的名稱「原始尼羅語」。 - 儘管Proto-Nyima(
sdv-nyi-pro
)是Nyima(sdv-nyi
)的祖語,但它並不使用預期的名稱「原始Nyima」。 - 儘管Proto-Taman(
sdv-tmn-pro
)是Taman(sdv-tmn
)的祖語,但它並不使用預期的名稱「原始Taman」。 - 儘管原始白語(
sit-bai-pro
)是白語組(sit-bai
)的祖語,但它並不使用預期的名稱「原始白語組」。 - 儘管Proto-Koman(
ssa-kom-pro
)是Koman(ssa-kom
)的祖語,但它並不使用預期的名稱「原始Koman」。 - 儘管原始台語(
tai-pro
)是壯傣語支(tai
)的祖語,但它並不使用預期的名稱「原始壯傣語」。 - 儘管原始西南台語(
tai-swe-pro
)是西南壯傣語支(tai-swe
)的祖語,但它並不使用預期的名稱「原始西南壯傣語」。 - 儘管原始庫基-欽語(
tbq-kuk-pro
)是庫基語支(tbq-kuk
)的祖語,但它並不使用預期的名稱「原始庫基語」。 - 「巴拉語」(
tuw-bal
)的規範名稱不唯一,同時被代碼bao
使用。 - 儘管原始通古斯語(
tuw-pro
)是滿-通古斯語系(tuw
)的祖語,但它並不使用預期的名稱「原始滿-通古斯語」。 - 原始薩馬提亞語 (
xsc-sar-pro
) has a proto-language code associated with the invalid codexsc-sar
.
- 噶哈巫語 (
map-kxv
) has data in Module:languages/data/exceptional, but does not have corresponding data in Module:languages/data/exceptional/extra. - 太魯閣語 (
map-trv
) has data in Module:languages/data/exceptional, but does not have corresponding data in Module:languages/data/exceptional/extra.
apc
is set as an ISO 639-3 code on multiple items:Q56593
和Q22809485
.kjv
is set as an ISO 639-3 code on multiple items:Q838165
和Q31199873
.msn
is set as an ISO 639-3 code on multiple items:Q3331111
和Q3563857
.ttt
is set as an ISO 639-3 code on multiple items:Q56489
和Q123964178
.
- 摩斯電碼, the canonical name for the code
Morse
, is wrong; it should be 摩爾斯電碼. Sunuwar
, the code for the canonical name 蘇努瓦爾文, is wrong; it should beSunu
.
- 摩斯電碼, the canonical name for the code
Morse
, is wrong; it should be 摩爾斯電碼.
- 布列斯符號(
Blis
)未被任何語言使用並且沒有給出供自動檢測所用的字元。 - 塞普勒斯-米諾斯文字(
Cpmn
)未被任何語言使用。 - 平假名(
Hira
)未被任何語言使用。 - 假名(
Hrkt
)未被任何語言使用。 - 圖像渲染(
Imag
)未被任何語言使用並且沒有給出供自動檢測所用的字元。 - 國際音標(
Ipach
)未被任何語言使用並且沒有給出供自動檢測所用的字元。 - Moon(
Moon
)未被任何語言使用並且沒有給出供自動檢測所用的字元。 - 摩爾斯電碼(
Morse
)未被任何語言使用並且沒有給出供自動檢測所用的字元。 - 音樂記號(
Music
)未被任何語言使用。 - 未指定文字(
None
)未被任何語言使用並且沒有給出供自動檢測所用的字元。 - 朗格朗格(
Roro
)未被任何語言使用並且沒有給出供自動檢測所用的字元。 - 盧米文數字(
Rumin
)未被任何語言使用。 - 旗語(
Semap
)未被任何語言使用並且沒有給出供自動檢測所用的字元。 - Visible Speech(
Visp
)未被任何語言使用並且沒有給出供自動檢測所用的字元。 - 數學記號(
Zmth
)未被任何語言使用。 - 符號(
Zsym
)未被任何語言使用。 - 未定文字(
Zyyy
)未被任何語言使用並且沒有給出供自動檢測所用的字元。 - 未編碼文字(
Zzzz
)未被任何語言使用並且沒有給出供自動檢測所用的字元。 - The codes
fa-Arab
,ug-Arab
,ks-Arab
,ps-Arab
,ur-Arab
,tt-Arab
,ota-Arab
,ku-Arab
,mzn-Arab
andsd-Arab
are currently alias codes. Only one code should be used in the data. - The codes
ms-Arab
andkk-Arab
are currently alias codes. Only one code should be used in the data. - The data key
sort_by_scraping
for 日文 (Jpan
) is invalid.
進行的檢查
编辑對於多個数据模块:
- 語言、語言系屬和詞源語言的代碼必須是唯一的,不能相互衝突。
- 不得在其他名稱列表中找到語言、語言系屬和詞源語言的規範名稱。
- 其他名稱列表中的每個名稱只能出現一次。
otherNames
如果存在,則必須是一個数组。- 維基數據項 ID 必須是正整數,或者以
Q
開頭、以十進位數字結尾的字符串。
Module:languages 使用的數據必須滿足以下條件:
- Each code must be defined in the correct submodule according to whether it is two-letter, three-letter or exceptional.
- The canonical name (field
1
) must be present and must not be the same as the canonical name of another language. - If field
2
is notnil
, it must a valid Wikidata item ID. - If field
3
orfamily
is given and notnil
, it must be a valid family code. - If field
4
orscripts
is given and notnil
, it must be an array, and each string in the array must be a valid script code. - If
ancestors
is given, it must be an array, and each string in the array must be a valid language or etymology language code. - If
family
is given, it must be a valid family code. - If
type
is given, it must be one of the recognised values (regular
,reconstructed
,appendix-constructed
). - If
entry_name
is given, it must be a table that contains either two arrays (from
andto
) or a string (remove_diacritics
) or both. - If
sort_key
is given, it may either be a string, or at table that in turn contains either two arrays (from
andto
) or a string (remove_diacritics
). - If
entry_name
orsort_key
is given, thefrom
array must be longer or equal in length to theto
array. - If
standardChars
is given, it must form a valid Lua string pattern when placed between square brackets with^
before it ("[^...]
). (It should match all characters regularly used in the language, but that cannot be tested.) - If
override_translit
is set,translit
must also be set, because there must be a transliteration module that can override manual transliteration. - If
link_tr
is present, it must betrue
. - Have no data keys besides these:
1, 2, 3, "entry_name", "sort_key", "display", "otherNames", "aliases", "varieties", "type", "scripts", "ancestors", "wikimedia_codes", "wikipedia_article", "standardChars", "translit", "override_translit", "link_tr"
.
未執行的檢查:
- If
translit
is present, it should be the name of a module, and this module should contain atr
function that takes a pagename (and optionally a language code and script code) as arguments. - If
sort_key
is a string, it should be the name of a module, and this module should contain amakeSortKey
function that takes a pagename (and optionally a language code and script code) as arguments. - If
entry_name
orsort_key
is a table and contains a fieldremove_diacritics
, the value of the field should be a string that forms a valid Lua pattern when it is placed inside negated set notation ([^...]
).
此模块沒有檢查這些項目,因為如果不滿足以上條件,模块錯誤將很快出現在條目中(如Module:utilities 嘗試為與該語言相關的分類生成排序鍵,或者full_link
嘗試使用音譯模塊等)。
Module:languages/code to canonical name 和 Module:languages/canonical names 必須包含且僅應包含 Module:languages 的数据子模块中的所有代碼和規範名稱。
Module:etymology languages 使用的数据必須滿足以下條件:
- 必須給出
canonicalName
。 parent
必須給出,且必須是有效的語言、語言系屬或詞源語言的代碼。- If
ancestors
is given, it must be an array, and each string in the array must be a valid language or etymology language code. The etymology language should also be listed as the ancestor of a regular language. - Have no data keys besides these:
"canonicalName", "otherNames", "parent", "ancestors", "wikipedia_article", "wikidata_item"
.
Module:families 中的代碼数据必須:
- Have
canonicalName
, which must not be the same as the canonical name of another family. - If
family
is given, it must be a valid family code. - Have at least one language or subfamily belonging to it.
- Have no data keys besides these:
"canonicalName", "otherNames", "family", "protoLanguage", "wikidata_item"
.
Module:scripts 中的代碼数据必須:
- Have
canonicalName
. - Have at least one language that lists it as one of its scripts.
- Have a
characters
pattern for script autodetection, and this must form a valid Lua string pattern when placed between square brackets ("[...]"
). (It should match all characters in the script, but that cannot be tested.) - Have no data keys besides these:
"canonicalName", "otherNames", "parent", "systems", "wikipedia_article", "characters", "direction"
.
-- TODO:
-- ietf_subtag field used with a 2/3-letter langauge/family code except qaa-qtz, or a 4-letter script code.
-- Check against files containing up-to-date ISO data, to cross-check validity.
local m_languages = require("Module:languages")
local m_language_data = require("Module:languages/data/all")
local m_language_codes = require("Module:languages/code to canonical name")
local m_language_canonical_names = require("Module:languages/canonical names")
local m_etym_language_data = require("Module:etymology languages/data")
local m_etym_language_codes = require("Module:etymology languages/code to canonical name")
local m_etym_language_canonical_names = require("Module:etymology languages/canonical names")
local m_family_data = require("Module:families/data")
local m_family_codes = require("Module:families/code to canonical name")
local m_family_canonical_names = require("Module:families/canonical names")
local m_scripts = require("Module:scripts")
local m_script_data = require("Module:scripts/data")
local m_links = require("Module:links")
local m_script_utils = require("Module:script utilities")
local m_str_utils = require("Module:string utilities")
local m_table = require("Module:table")
local Array = require("Module:array")
local codepoint = m_str_utils.codepoint
local concat = table.concat
local dump = mw.dumpObject
local gcodepoint = m_str_utils.gcodepoint
local get_lang = m_languages.getByCode
local insert = table.insert
local list_to_text = mw.text.listToText
local new_title = mw.title.new
local split = m_str_utils.split
local ugmatch = m_str_utils.gmatch
local umatch = m_str_utils.match
local export = {}
local messages
local function discrepancy(modname, ...)
local ok, result = pcall(function(...) messages[modname]:insert(string.format(...)) end, ...)
if not ok then
mw.log(result, ...)
end
end
local all_codes = {}
local language_names = {}
local etym_language_names = {}
local family_names = {}
local script_names = {}
local nonempty_families = {}
local allowed_empty_families = {tbq = true}
local nonempty_scripts = {}
do
local function link_lang(name)
if name:find("[Ll]anguage$") then
return "[[:Category:" .. name .. "|" .. name .. "]]"
else
return "[[:Category:" .. name .. "|" .. name .. "]]"
end
end
local function link_etym_lang(name)
if name:find("[Ll]anguage$") then
return name
else
return name
end
end
local function link_family(name)
if name:match("[Ll]anguages$") or name:match("[Ll]ects$") then
return "[[:Category:" .. name .. "|" .. name .. "]]"
else
return "[[:Category:" .. name .. "|" .. name .. "]]"
end
end
function export.link(data)
if not data[1] then
return "???"
end
local type = data.type
return type:match("etymology%-only") and link_etym_lang(data[1]) or
type:match("family") and link_family(data[1]) or
link_lang(data[1])
end
end
local link = export.link
local function link_script(name)
if not name then
return "???"
elseif name:find("[Cc]ode$") or name:find("[Ss]emaphore$") then
return "[[:Category:" .. name:gsub("^%l", string.upper) .. "|" .. name .. "]]"
else
return "[[:Category:" .. name .. "|" .. name .. "]]"
end
end
local function invalid_keys_message(modname, code, data, invalid_keys, is_script)
local plural = #invalid_keys ~= 1
discrepancy(modname, "The data key%s %s for %s (<code>%s</code>) %s invalid.",
plural and "s" or "",
invalid_keys
:map(
function(key)
return "<code>" .. key .. "</code>"
end)
:concat(", "),
(is_script and link_script or link)(data[1]),
code,
plural and "are" or "is")
end
local function check_data_keys(valid_keys, is_script)
valid_keys = Array(valid_keys):to_set()
return function (modname, code, data)
local invalid_keys
for k in pairs(data) do
if not valid_keys[k] then
invalid_keys = invalid_keys or Array()
invalid_keys:insert(k)
end
end
if invalid_keys then
invalid_keys_message(modname, code, data, invalid_keys, is_script)
end
end
end
-- Modification of isArray in [[Module:table]].
-- This assumes all keys are either integers or non-numbers.
-- If there are fractional numbers, the results might be incorrect.
-- For instance, find_gap{"a", "b", [0.5] = true} evaluates to 3, but there
-- isn't a gap at 3 in the sense of there being an integer key greater than 3.
local function find_gap(t, can_contain_non_number_keys)
local i = 0
for k in pairs(t) do
if not (can_contain_non_number_keys and type(k) ~= "number") then
i = i + 1
if t[i] == nil then
return i
end
end
end
end
local function check_true_or_string_or_nil(modname, code, data, field_name)
local field = data[field_name]
if not (field == nil or field == true or type(field) == "string") then
discrepancy(modname,
"%s (<code>%s</code>) has an <code>%s</code> value that is not <code>nil</code>, <code>true</code> or a string: <code>%s</code>",
link(data), code, field_name,
dump(data[field_name])
)
end
end
local function check_array(modname, code, canonical_name, data, array_name, subarray_name, can_contain_non_number_keys)
local subtable = data
if subarray_name then
subtable = assert(data[subarray_name], subarray_name)
end
local array_type = type(subtable[array_name])
if array_type == "table" then
local gap = find_gap(subtable[array_name], can_contain_non_number_keys)
if gap then
discrepancy(modname, "The %s array in %sthe data table for %s (<code>%s</code>) has a gap at index %d.",
array_name,
subarray_name and "the " .. subarray_name .. " field in " or "",
canonical_name,
code, gap)
else
return true
end
else
discrepancy(modname, "The %s field in %sthe data table for %s (<code>%s</code>) should be an array (table) but is %s.",
array_name,
subarray_name and "the " .. subarray_name .. " field in " or "",
canonical_name,
code,
array_type == "nil" and "nil" or "a " .. array_type)
end
end
local function check_no_alias_codes(modname, mod_data)
local lookup, discrepancies = {}, {}
for k, v in pairs(mod_data) do
local check = lookup[v]
if check then
discrepancies[check] = discrepancies[check] or {"<code>" .. check .. "</code>"}
insert(discrepancies[check], "<code>" .. k .. "</code>")
else
lookup[v] = k
end
end
for _, v in pairs(discrepancies) do
discrepancy(modname, "The codes " .. list_to_text(v, ", ", " and ") .. " are currently alias codes. Only one code should be used in the data.")
end
end
local function check_wikidata_item(modname, code, data, key)
local data_item = data[key]
if data_item == nil then
return
elseif type(data_item) == "number" then
if not require "Module:table".isPositiveInteger(data_item) then
discrepancy(modname, "%g, the Wikidata item id for %s (<code>%s</code>), is not a positive integer or a string in the correct format.",
data_item, data[1], code)
end
elseif type(data_item) == "string" then
if not data_item:find "^Q%d+$" then
discrepancy(modname, "%s, the Wikidata item id for %s (<code>%s</code>), is not a string in the correct format or a positive integer.",
data_item, data[1], code)
end
end
end
local function check_other_names_or_aliases(modname, code, canonical_name, data, data_key, allow_nested)
local array = data[data_key]
if not array then
return
end
check_array(modname, code, canonical_name, data, data_key, nil, true)
local names = {}
local function check_other_name(other_name)
if other_name == canonical_name then
discrepancy(modname,
"%s, the canonical name for <code>%s</code>, is repeated in the table of <code>%s</code>.",
canonical_name, code, data_key)
end
if names[other_name] then
discrepancy(modname,
"The name %s is found twice or more in the list of <code>%s</code> for %s (<code>%s</code>).",
other_name, data_key, canonical_name, code)
end
names[other_name] = true
end
for _, other_name in ipairs(array) do
if type(other_name) == "table" then
if not allow_nested then
discrepancy(modname,
"A nested table is found in the list of <code>%s</code> for %s (<code>%s</code>), but isn't allowed.",
data_key, canonical_name, code)
else
for _, on in ipairs(other_name) do
check_other_name(on)
end
end
else
check_other_name(other_name)
end
end
end
local function check_other_names_aliases_varieties(modname, code, canonical_name, data)
if data.otherNames then
check_other_names_or_aliases(modname, code, canonical_name, data, "otherNames")
end
if data.aliases then
check_other_names_or_aliases(modname, code, canonical_name, data, "aliases")
end
if data.varieties then
check_other_names_or_aliases(modname, code, canonical_name, data, "varieties", true)
end
end
local function validate_pattern(pattern, modname, code, data, standardChars)
if type(pattern) ~= "string" then
discrepancy(modname, "\"%s\", the %spattern for %s (<code>%s</code>), is not a string.",
pattern, standardChars and "standard character " or "", code, data[1])
end
local ranges
for lower, higher in ugmatch(pattern, "(.)%-%%?(.)") do
if codepoint(lower) >= codepoint(higher) then
ranges = ranges or Array()
insert(ranges, { lower, higher })
end
end
if ranges and ranges[1] then
local plural = #ranges ~= 1 and "s" or ""
discrepancy(modname, "%s (<code>%s</code>) specifies an invalid pattern " ..
"for %scharacter detection: <code>\"%s\"</code>. The first codepoint%s " ..
"in the range%s %s %s must be less than the second.",
link(data), code, standardChars and "standard " or "", pattern, plural, plural,
ranges
:map(
function(range)
return range[1] .. "-" .. range[2] .. (" (U+%X, U+%X)")
:format(codepoint(range[1]), codepoint(range[2]))
end)
:concat(", "),
#ranges ~= 1 and "are" or "is")
end
if not pcall(umatch, "", "[" .. pattern .. "]") then
discrepancy(modname, "%s (<code>%s</code>) specifies an invalid pattern for " ..
(standardChars and "standard" or "") .. " character detection: <code>\"%s\"</code>",
link(data), code, pattern)
end
end
local remove_exceptions_addition = 0xF0000
local maximum_code_point = 0x10FFFF
local remove_exceptions_maximum_code_point = maximum_code_point - remove_exceptions_addition
local function check_entry_name_or_sortkey(modname, code, data, replacements_name)
local canonical_name = data[1]
local replacements = data[replacements_name]
if type(replacements) == "string" then
if not (replacements_name == "sort_key" or replacements_name == "entry_name") then
discrepancy(modname, "The %s field in the data table for %s (<code>%s</code>) must be a table.",
replacements_name, canonical_name, code)
end
return
end
if (replacements.from ~= nil) ~= (replacements.to ~= nil) then
discrepancy(modname,
"The <code>from</code> and <code>to</code> arrays in the <code>%s</code> table for %s (<code>%s</code>) are not both defined or both undefined.",
replacements_name, canonical_name, code)
elseif replacements.from then
for _, key in ipairs { "from", "to" } do
check_array(modname, code, canonical_name, data, key, replacements_name)
end
end
if replacements.remove_diacritics and type(replacements.remove_diacritics) ~= "string" then
discrepancy(modname,
"The <code>remove_diacritics</code> field in the <code>%s</code> table for %s (<code>%s</code>) table must be a string.",
replacements_name, canonical_name, code)
end
if replacements.remove_exceptions then
if check_array(modname, code, canonical_name, data, "remove_exceptions", replacements_name) then
for sequence_i, sequence in ipairs(replacements.remove_exceptions) do
local code_point_i = 0
for code_point in gcodepoint(sequence) do
code_point_i = code_point_i + 1
if code_point > remove_exceptions_maximum_code_point then
discrepancy(modname,
"Code point #%d (0x%04X) in field #%d of the <code>remove_exceptions</code> array for %s (<code>%s</code>) is over U+%04X.",
code_point_i, code_point, sequence_i, canonical_name, code, remove_exceptions_maximum_code_point)
end
end
end
end
end
if replacements.from and replacements.to
and m_table.length(replacements.to) > m_table.length(replacements.from) then
discrepancy(modname,
"The <code>from</code> array in the <code>%s</code> table for %s (<code>%s</code>) must be shorter or the same length as the <code>to</code> array.",
replacements_name, canonical_name, code)
end
end
do
local function has_ancestor(lang, code)
for _, anc in ipairs(lang:getAncestors()) do
if code == anc:getCode() or has_ancestor(anc, code) then
return true
end
end
end
local function get_default_ancestors(lang)
if lang:hasType("etymology-only") then
local parent = lang:getParent()
if not has_ancestor(parent, lang:getCode()) then
return parent:getAncestorCodes()
end
end
local fam_code, def_anc = lang:getFamilyCode()
while fam_code and fam_code ~= "qfa-not" do
local fam = m_family_data[fam_code]
def_anc = fam.protoLanguage or
m_language_data[fam_code .. "-pro"] and fam_code .. "-pro" or
m_etym_language_data[fam_code .. "-pro"] and fam_code .. "-pro"
if def_anc and def_anc ~= lang:getCode() then
return {def_anc}
end
fam_code = fam[3]
end
end
local function iterate_ancestor(code, data, modname, anc_code, lang)
local anc = get_lang(anc_code, nil, true)
if not anc then
discrepancy(modname,
"%s (<code>%s</code>) lists the invalid language code <code>%s</code> as its ancestor.",
link(data), code, anc_code)
return
end
local anc_fam = anc:getFamily()
local anc_fam_code = anc_fam:getCode()
local def_ancs = get_default_ancestors(lang)
if def_ancs then
for _, def_anc in ipairs(def_ancs) do
def_anc = get_lang(def_anc, nil, true)
if def_anc and (
anc_code == def_anc:getCode() or
has_ancestor(def_anc, anc_code) or
def_anc:hasParent(anc_code) and not has_ancestor(anc, def_anc:getCode())
) then
discrepancy(modname,
"%s (<code>%s</code>) has the %s (<code>%s</code>) listed in its ancestor field, which is redundant, since it is determined to be ancestral automatically.",
link(data), code,
link(anc:getRawData()), anc_code)
end
end
end
if not lang:inFamily(anc_fam_code) then
discrepancy(modname,
"%s (<code>%s</code>) has %s (<code>%s</code>) set as an ancestor, but is not in the %s (<code>%s</code>).",
link(data), code,
link(anc:getRawData()), anc_code,
link(anc_fam:getRawData()), anc_fam_code)
end
local fam, proto = lang
repeat
fam = fam:getFamily()
proto = fam and fam:getProtoLanguage()
until proto or not fam or fam:getCode() == "qfa-not"
if proto and not (
proto:getCode() == anc:getCode() or
proto:hasAncestor(anc:getCode()) or
anc:hasAncestor(proto:getCode())
) then
local fam = lang:getFamily()
discrepancy(modname,
"%s (<code>%s</code>) is in the %s (<code>%s</code>) and has %s (<code>%s</code>) set as an ancestor, but it is not possible to form an ancestral chain between them.",
link(data), code,
link(fam:getRawData()), fam:getCode(),
link(anc:getRawData()), anc_code)
end
end
function export.check_ancestors(code, data, modname)
local ancestors = data.ancestors
if not ancestors then
return
elseif type(ancestors) == "string" then
ancestors = split(ancestors, "%s*,%s*", true)
end
local lang = get_lang(code, nil, true)
for _, anc in ipairs(ancestors) do
iterate_ancestor(code, data, modname, anc, lang)
end
end
end
local function check_code_to_name_and_name_to_code_maps(
source_module_type,
source_module_description,
code_to_module_map, name_to_code_map,
code_to_name_modname, code_to_name_module,
name_to_code_modname, name_to_code_module)
local aliases = require("Module:languages/data").aliases
local function check_code_and_name(modname, code, canonical_name)
-- Check the code is in code_to_module_map and that it didn't originate from the wrong data module.
local check_mod = code_to_module_map[code] or code_to_module_map[aliases[code]]
if not (check_mod and check_mod:match("^" .. source_module_type .. "/data")) then
if not name_to_code_map[canonical_name] then
discrepancy(modname,
"The code <code>%s</code> and the canonical name %s should be removed; they are not found in %s.",
code, canonical_name, source_module_description)
else
discrepancy(modname,
"<code>%s</code>, the code for the canonical name %s, is wrong; it should be <code>%s</code>.",
code, canonical_name, name_to_code_map[canonical_name])
end
elseif not name_to_code_map[canonical_name] then
local data_table = require("Module:" .. code_to_module_map[code])[code]
discrepancy(modname,
"%s, the canonical name for the code <code>%s</code>, is wrong; it should be %s.",
canonical_name, code, data_table[1])
end
end
for code, canonical_name in pairs(code_to_name_module) do
check_code_and_name(code_to_name_modname, code, canonical_name)
end
for canonical_name, code in pairs(name_to_code_module) do
check_code_and_name(name_to_code_modname, code, canonical_name)
end
end
local function check_extraneous_extra_data(
data_modname, data_module, extra_data_modname, extra_data_module)
for code, _ in pairs(extra_data_module) do
if not data_module[code] then
discrepancy(extra_data_modname,
"Language code <code>%s</code> is not found in [[Module:%s]], and should be removed from [[Module:%s]].",
code, data_modname, extra_data_modname
)
end
end
end
-- Just trying to not have a module error when someone puts a script code
-- in the position of a language code.
local function show_family_code(code)
if type(code) == "string" then
return "<code>" .. code .. "</code>"
else
return require("Module:debug").highlight_dump(code)
end
end
local function check_languages()
local check_language_data_keys = check_data_keys{
1, 2, 3, 4, -- canonical name, wikidata item, family, scripts
"display_text", "generate_forms", "entry_name", "sort_key",
"otherNames", "aliases", "varieties", "ietf_subtag",
"type", "ancestors",
"wikimedia_codes", "wikipedia_article", "standardChars",
"translit", "override_translit", "link_tr",
"dotted_dotless_i"
}
local function check_language(modname, code, data, mainData, extraData)
local canonical_name, lang_type = data[1], data.type
check_language_data_keys(modname, code, data)
if all_codes[code] then
discrepancy(modname, "代碼<code>%s</code>不唯一,同時定義於[[Module:%s]]。", code, all_codes[code])
else
if not m_language_codes[code] then
discrepancy("languages/code to canonical name", "代碼<code>%s</code>(%s)缺失。", code, canonical_name)
end
all_codes[code] = modname
end
if code:sub(-4) == "-pro" then
local fam_code = code:sub(1, -5)
local fam = get_lang(fam_code, nil, true, true)
if not fam then
discrepancy(modname,
"%s (<code>%s</code>) has a proto-language code associated with the invalid code <code>%s</code>.",
link(data), code, fam_code)
elseif not fam:hasType("family") then
discrepancy(modname,
"%s (<code>%s</code>) has a proto-language code associated with %s (<code>%s</code>), which is not a family.",
link(data), code, fam:getCanonicalName(), fam_code)
else
local expected_name = "原始" .. fam:getCanonicalName()
expected_name = mw.ustring.gsub(expected_name, "語[門系族支群]", "語") -- L10N
expected_name = mw.ustring.gsub(expected_name, "諸語言", "語")
if canonical_name ~= expected_name then
discrepancy(modname,
"儘管%s(<code>%s</code>)是%s(<code>%s</code>)的祖語,但它並不使用預期的名稱「%s」。",
link(data), code, fam:getCategoryName(), fam_code, expected_name)
end
end
end
if not canonical_name then
discrepancy(modname, "Code <code>%s</code> has no canonical name specified.", code)
elseif language_names[canonical_name] then
discrepancy(modname,
"「%s」(<code>%s</code>)的規範名稱不唯一,同時被代碼<code>%s</code>使用。",
link(data), code, language_names[canonical_name])
else
if not m_language_canonical_names[canonical_name] then
discrepancy("languages/canonical names", "規範名稱「%s」(<code>%s</code>)缺失。", canonical_name, code)
end
language_names[canonical_name] = code
end
check_wikidata_item(modname, code, data, 2)
if extraData then
check_other_names_aliases_varieties(modname, code, canonical_name, extraData)
end
if lang_type and not (lang_type == "regular" or lang_type == "reconstructed" or lang_type == "appendix-constructed") then
discrepancy(modname, "%s (<code>%s</code>) is of an invalid type <code>%s</code>.", link(data), code, data.type)
end
if mainData.aliases then
discrepancy(modname, "%s (<code>%s</code>) has the <code>aliases</code> key. This must be moved to [[Module:" .. modname .. "/extra]].", link(data), code)
end
if mainData.varieties then
discrepancy(modname, "%s (<code>%s</code>) has the <code>varieties</code> key. This must be moved to [[Module:" .. modname .. "/extra]].", link(data), code)
end
if mainData.otherNames then
discrepancy(modname, "%s (<code>%s</code>) has the <code>otherNames</code> key. This must be moved to [[Module:" .. modname .. "/extra]].", link(data), code)
end
if not extraData then
discrepancy(modname .. "/extra", "%s (<code>%s</code>) has data in [[Module:" .. modname .. "]], but does not have corresponding data in [[Module:" .. modname .. "/extra]].", link(data), code)
--elseif extraData.otherNames then
-- discrepancy(modname .. "/extra", "%s (<code>%s</code>) has <code>otherNames</code> key, but these should be changed to either <code>aliases</code> or <code>varieties</code>.", link(data), code)
end
local sc = data[4]
if sc then
if type(sc) == "string" then
sc = split(sc, "%s*,%s*", true)
end
if type(sc) == "table" then
if not sc[1] then
discrepancy(modname, "%s (<code>%s</code>) has no scripts listed.", link(data), code)
else
for _, sccode in ipairs(sc) do
local cur_sc = m_script_data[sccode]
if not (cur_sc or sccode == "All" or sccode == "Hants") then
discrepancy(modname,
"%s (<code>%s</code>) lists the invalid script code <code>%s</code>.",
link(data), code, sccode)
-- elseif not cur_sc.characters then
-- discrepancy(modname,
-- "%s (<code>%s</code>) lists a script without characters <code>%s</code> (%s).",
-- link(data), code, sccode, cur_sc[1])
end
nonempty_scripts[sccode] = true
end
end
else
discrepancy(modname,
"The %s field for %s (<code>%s</code>) must be a table or string.",
4, link(data), code)
end
end
if data.ancestors then
export.check_ancestors(code, data, modname)
end
if data[3] then
local family = data[3]
if not m_family_data[family] then
discrepancy(modname,
"%s (<code>%s</code>) has the invalid family code %s.",
link(data), code, show_family_code(family))
end
nonempty_families[family] = true
end
if data.sort_key then
check_entry_name_or_sortkey(modname, code, data, "sort_key")
end
if data.entry_name then
check_entry_name_or_sortkey(modname, code, data, "entry_name")
end
if data.display then
check_entry_name_or_sortkey(modname, code, data, "display")
end
if data.standardChars then
if type(data.standardChars) == "table" then
local sccodes = {}
for _, sccode in ipairs(sc) do
sccodes[sccode] = true
end
for sccode in pairs(data.standardChars) do
if not (sccodes[sccode] or sccode == 1) then
discrepancy(modname, "The field %s in the standardChars table for %s (<code>%s</code>) does not match any script for that language.",
sccode, link(data), code)
end
end
elseif data.standardChars and type(data.standardChars) ~= "string" then
discrepancy(modname, "The standardChars field in the data table for %s (<code>%s</code>) must be a string or table.",
link(data), code)
end
end
check_true_or_string_or_nil(modname, code, data, "override_translit")
check_true_or_string_or_nil(modname, code, data, "link_tr")
if data.override_translit and not data.translit then
discrepancy(modname,
"%s (<code>%s</code>) has <code>override_translit</code> set, but no transliteration module",
link(data), code)
end
end
local function check_module(modname, test)
local mod_data = mw.loadData("Module:" .. modname)
local extra_modname = modname .. "/extra"
local extra_mod_data = mw.loadData("Module:" .. extra_modname)
for code, data in pairs(mod_data) do
test(modname, code, data)
check_language(modname, code, data, mod_data[code], extra_mod_data[code])
end
check_no_alias_codes(modname, mod_data)
check_no_alias_codes(extra_modname, extra_mod_data)
check_extraneous_extra_data(modname, mod_data, extra_modname, extra_mod_data)
end
-- Check two-letter codes
check_module(
"languages/data/2",
function(modname, code, data)
if not code:find("^[a-z][a-z]$") then
discrepancy(modname, "%s (<code>%s</code>) does not have a two-letter code.", link(data), code)
end
end
)
-- Check three-letter codes
for i = 0x61, 0x7A do -- a to z
local letter = string.char(i)
check_module(
"languages/data/3/" .. letter,
function(modname, code, data)
if not code:find("^" .. letter .. "[a-z][a-z]$") then
discrepancy(modname,
"%s (<code>%s</code>) does not have a three-letter code starting with \"<code>%s</code>\".",
link(data), code, letter)
end
end
)
end
-- Check exceptional codes
check_module(
"languages/data/exceptional",
function(modname, code, data)
if code:find("^[a-z][a-z][a-z]?$") then
discrepancy(modname, "%s (<code>%s</code>) has a two- or three-letter code.", link(data), code)
end
end
)
-- These checks must be done while all_codes only contains language codes:
-- that is, after language data modules have been processed, but before
-- etymology languages, families, and scripts have.
check_code_to_name_and_name_to_code_maps(
"languages",
"a submodule of [[Module:languages]]",
all_codes, language_names,
"languages/code to canonical name", m_language_codes,
"languages/canonical names", m_language_canonical_names
)
-- Check [[Template:langname-lite]]
local frame = mw.getCurrentFrame()
local content = new_title("Template:langname-lite"):getContent()
content = content:gsub("%<%!%-%-.-%-%-%>", "") -- remove comments
local match = ugmatch(content, "\n\t*|#*([^\n]+)=([^\n]*)")
while true do
local code, name = match()
if not code then return "OK" end
if code:len() > 1 and code ~= "default" then
for _, code in pairs(split(code, "|", true)) do
local lang = get_lang(code, nil, true, true)
if name:match("etymcode") then
local nonEtym_name = frame:preprocess(name)
local nonEtym_real_name = lang:getFullName()
if nonEtym_name ~= nonEtym_real_name then
discrepancy("Template:langname-lite", "代碼:<code>" .. code .. "</code>。現有名稱:" .. nonEtym_name .. "。預期名稱:" .. nonEtym_real_name .. "。")
end
name = frame:preprocess(name:gsub("{{{allow etym|}}}", "1"))
elseif name:match("familycode") then
name = name:match("familycode|(.-)|")
else
name = name
end
if not lang then
discrepancy("Template:langname-lite", "代碼:<code>" .. code .. "</code>。現有名稱:" .. name .. "。語言不在資料中。")
else
local real_name = lang:getCanonicalName()
if name ~= real_name then
discrepancy("Template:langname-lite", "代碼:<code>" .. code .. "</code>。現有名稱:" .. name .. "。預期名稱:" .. real_name .. "。")
end
end
end
end
end
end
local function check_etym_languages()
local modname = "etymology languages/data"
local check_etymology_language_data_keys = check_data_keys{
1, 2, 3, 4, 5, -- canonical name, wikidata item, family, scripts, parent
"display_text", "generate_forms", "entry_name", "sort_key",
"otherNames", "aliases", "varieties", "ietf_subtag",
"type", "main_code", "ancestors",
"wikimedia_codes", "wikipedia_article", "standardChars",
"translit", "override_translit", "link_tr",
"dotted_dotless_i"
}
for code, data in pairs(m_etym_language_data) do
local canonical_name, parent =
data[1], data[5]
check_etymology_language_data_keys(modname, code, data)
if all_codes[code] then
discrepancy(modname, "代碼<code>%s</code>不唯一,同時定義於[[Module:%s]]。", code, all_codes[code])
else
if not m_etym_language_codes[code] then
discrepancy("etymology languages/code to canonical name", "代碼<code>%s</code>(%s)缺失。", code, canonical_name)
end
all_codes[code] = modname
end
if not canonical_name then
discrepancy(modname, "Code <code>%s</code> has no canonical name specified.", code)
elseif language_names[canonical_name] then
local m_canonical_lang = m_languages.getByCanonicalName(canonical_name, nil, true)
if not m_canonical_lang then
discrepancy(modname, "%s (<code>%s</code>) has a canonical name that cannot be looked up.",
link(data), code)
elseif data.main_code ~= m_canonical_lang:getCode() then
discrepancy(modname,
"%s (<code>%s</code>) has a canonical name that is not unique; it is also used by the code <code>%s</code>.",
link(data), code, language_names[canonical_name])
end
else
if not m_etym_language_canonical_names[canonical_name] then
discrepancy("etymology languages/canonical names", "The canonical name %s (<code>%s</code>) is missing.", canonical_name, code)
end
etym_language_names[canonical_name] = code
end
check_other_names_aliases_varieties(modname, code, canonical_name, data)
if parent then
if type(parent) ~= "string" then
discrepancy(modname,
"Etymology-only %s (<code>%s</code>) has a parent language or family code that is %s rather than a string.",
link(data), code, parent == nil and "nil" or "a " .. type(parent))
elseif not (m_language_data[parent] or m_family_data[parent] or m_etym_language_data[parent]) then
discrepancy(modname,
"Etymology-only %s (<code>%s</code>) has invalid parent language or family code <code>%s</code>.",
link(data), code, parent)
end
nonempty_families[parent] = true
else
discrepancy(modname,
"Etymology-only %s (<code>%s</code>) has no parent language or family code.",
link(data), code)
end
if data.ancestors then
export.check_ancestors(code, data, modname)
end
if data[3] then
local family = data[3]
if not m_family_data[family] then
discrepancy(modname,
"%s (<code>%s</code>) has the invalid family code %s.",
link(data), code, show_family_code(family))
end
nonempty_families[family] = true
end
check_wikidata_item(modname, code, data, 2)
end
local checked = {}
for code, data in pairs(m_etym_language_data) do
local stack = {}
while data do
if checked[data] then
break
end
if stack[data] then
discrepancy(modname, "%s (<code>%s</code>) has a cyclic parental relationship to %s (<code>%s</code>)",
link(data), code,
link(m_etym_language_data[data[5]]), data.parent or data[5]
)
break
end
stack[data] = true
code, data = data[5], data[5] and m_etym_language_data[data[5]]
end
for data in pairs(stack) do
checked[data] = true
end
end
check_no_alias_codes(modname, m_etym_language_data)
check_code_to_name_and_name_to_code_maps(
"etymology languages",
"[[Module:etymology languages/data]]",
all_codes, etym_language_names,
"etymology languages/code to canonical name", m_etym_language_codes,
"etymology languages/canonical names", m_etym_language_canonical_names)
end
local function check_families()
local modname = "families/data"
local check_family_data_keys = check_data_keys{
1, 2, 3, -- canonical name, wikidata item, (parent) family
"type", "ietf_subtag",
"protoLanguage", "otherNames", "aliases", "varieties",
}
for code, data in pairs(m_family_data) do
check_family_data_keys(modname, code, data)
local canonical_name, family, protolang = data[1], data[3], data.protoLanguage
if all_codes[code] then
discrepancy(modname, "代碼<code>%s</code>不唯一,同時定義於[[Module:%s]]。", code, all_codes[code])
else
if not m_family_codes[code] then
discrepancy("families/code to canonical name", "代碼<code>%s</code>(%s)缺失。", code, canonical_name)
end
all_codes[code] = modname
end
if not canonical_name then
discrepancy(modname, "Code <code>%s</code> has no canonical name specified.", code)
elseif family_names[canonical_name] then
discrepancy(modname,
"%s (<code>%s</code>) has a canonical name that is not unique; it is also used by the code <code>%s</code>.",
link(data), code, family_names[canonical_name])
else
if not m_family_canonical_names[canonical_name] then
discrepancy("families/canonical names", "The canonical name %s (<code>%s</code>) is missing.", canonical_name, code)
end
family_names[canonical_name] = code
end
if data[2] and type(data[2]) ~= "number" then
discrepancy(modname, "%s (<code>%s</code>) has a wikidata item value that is not a number or <code>nil</code>: %s", link(data), code, dump(data[2]))
end
check_other_names_aliases_varieties(modname, code, canonical_name, data)
if family then
if family == code and code ~= "qfa-not" then
discrepancy(modname,
"%s (<code>%s</code>) has itself as its family.",
link(data), code)
elseif not m_family_data[family] then
discrepancy(modname,
"%s (<code>%s</code>) has the invalid parent family code %s.",
link(data), code, show_family_code(family))
end
nonempty_families[family] = true
end
if protolang then
local protolang_obj = get_lang(protolang, nil, true)
if not protolang_obj then
discrepancy(modname,
"%s (<code>%s</code>) has the invalid proto-language code <code>%s</code>.",
canonical_name, code, protolang)
elseif protolang == code .. "-pro" then
discrepancy(modname,
"%s (<code>%s</code>) has %s (<code>%s</code>) listed as its proto-language, which is redundant, since it is determined to be the proto-language automatically.",
canonical_name, code,
protolang_obj:getCanonicalName(), protolang)
elseif protolang:sub(-4) == "-pro" then
discrepancy(modname,
"%s (<code>%s</code>) has %s (<code>%s</code>) listed as its proto-language, which is supposed to be the proto-language for the family <code>%s</code>.",
canonical_name, code,
protolang_obj:getCanonicalName(), protolang, protolang:sub(1, -5))
end
end
check_wikidata_item(modname, code, data, 2)
end
for code, data in pairs(m_family_data) do
if not (nonempty_families[code] or allowed_empty_families[code]) then
discrepancy(modname, "%s (<code>%s</code>) has no child families or languages.", link(data), code)
end
end
local checked = { ["qfa-not"] = true }
for code, data in pairs(m_family_data) do
local stack = {}
while data do
if checked[code] then
break
end
if stack[code] then
discrepancy(modname, "%s (<code>%s</code>) has a cyclic parental relationship to %s (<code>%s</code>)",
link(data), code,
link(m_family_data[data[3]]), data[3]
)
break
end
stack[code] = true
code, data = data[3], m_family_data[data[3]]
end
for code in pairs(stack) do
checked[code] = true
end
end
check_no_alias_codes(modname, m_family_data)
check_code_to_name_and_name_to_code_maps(
"families",
"[[Module:families/data]]",
all_codes, family_names,
"families/code to canonical name", m_family_codes,
"families/canonical names", m_family_canonical_names)
end
local function check_scripts()
local modname = "scripts/data"
local check_script_data_keys = check_data_keys({
1, 2, -- canonical name, writing systems
"canonicalName", "otherNames", "aliases", "varieties", "parent", "ietf_subtag",
"wikipedia_article", "ranges", "characters", "spaces", "capitalized", "translit", "direction",
"character_category", "normalizationFixes"
}, true)
local m_script_codes = require("Module:scripts/code to canonical name")
local m_script_canonical_names = require("Module:scripts/by name")
-- Just to satisfy requirements of check_code_to_name_and_name_to_code_maps.
local script_code_to_module_map = {}
for code, data in pairs(m_script_data) do
local canonical_name = data[1]
if not m_script_codes[code] and #code == 4 then
discrepancy("scripts/code to canonical name", "<code>%s</code> (%s) 缺失", code, canonical_name)
end
check_script_data_keys(modname, code, data)
if not canonical_name then
discrepancy(modname, "Code <code>%s</code> has no canonical name specified.", code)
elseif script_names[canonical_name] then
--[=[
discrepancy(modname,
"%s (<code>%s</code>) has a canonical name that is not unique; it is also used by the code <code>%s</code>.",
link_script(data.names[1]), code, script_names[data.names[1]])
--]=]
else
if not m_script_canonical_names[canonical_name] and #code == 4 then
discrepancy("scripts/by name", "「%s」(<code>%s</code>)缺失", canonical_name, code)
end
script_names[canonical_name] = code
end
check_other_names_aliases_varieties(modname, code, canonical_name, data)
if not nonempty_scripts[code] then
discrepancy(modname,
"%s(<code>%s</code>)未被任何語言使用%s。",
link_script(canonical_name), code, data.characters and ""
or "並且沒有給出供自動檢測所用的字元")
--[[
elseif not data.characters then
discrepancy(modname, "%s (<code>%s</code>) has no characters listed for auto-detection.", link_script(canonical_name), code)
--]]
end
if data.characters then
validate_pattern(data.characters, modname, code, data, false)
end
script_code_to_module_map[code] = modname
end
check_no_alias_codes(modname, m_script_data)
check_code_to_name_and_name_to_code_maps(
"scripts",
"a submodule of [[Module:scripts]]",
script_code_to_module_map, script_names,
"scripts/code to canonical name", m_script_codes,
"scripts/by name", m_script_canonical_names)
end
-- FIXME: this is quite messy.
local function check_wikidata_languages()
local data = mw.text.jsonDecode(new_title("Module:languages/data/wikidata.json"):getContent())
local seen = {{}, {}, {}, [5] = {}}
for _, item in ipairs(data) do
local id = item.id
for k, v in pairs(item) do
if k ~= "id" then
local _seen = seen[k]
for i, code in ipairs(v) do
local _code = code[1]
local _type = type(_seen[_code])
if _type == "table" then
insert(_seen[_code], id)
elseif _type == "string" then
_seen[_code] = {_seen[_code], id}
else
_seen[_code] = id
end
end
end
end
end
for k, v in pairs(seen) do
for code, ids in pairs(v) do
if type(ids) == "table" then
local t = {}
for i, id in ipairs(ids) do
t[i] = ("<code>[[d:%s|%s]]</code>"):format(id, id)
end
discrepancy("languages/data/wikidata.json", "<code>%s</code> is set as an ISO 639-%d code on multiple items: %s.",
code, k, list_to_text(t))
end
end
end
end
local function check_labels()
local check_label_data_keys = check_data_keys{
"display", "Wikipedia", "glossary",
"plain_categories", "topical_categories", "pos_categories", "regional_categories", "sense_categories",
"omit_preComma", "omit_postComma", "omit_preSpace",
"deprecated", "track"
}
local function check_label(modname, code, data)
local _type = type(data)
if _type == "table" then
check_label_data_keys(modname, code, data)
elseif _type ~= "string" then
discrepancy(modname,
"The data for label <code>%s</code> is a %s; only tables and strings are allowed.",
code, _type)
end
end
for _, module in ipairs{"", "/regional", "/topical"} do
local modname = "Module:labels/data" .. module
module = require(modname)
for label, data in pairs(module) do
check_label(modname, label, data)
end
end
for code in pairs(m_language_codes) do
local modname = "Module:labels/data/lang/" .. code
local ok, module = pcall(require, modname)
if ok then
for label, data in pairs(module) do
check_label(modname, label, data)
end
end
end
end
local function check_zh_trad_simp()
local m_ts = require("Module:zh/data/ts")
local m_st = require("Module:zh/data/st")
local ruby = require("Module:ja-ruby").ruby_auto
local lang = get_lang("zh")
local Hant = m_scripts.getByCode("Hant")
local Hans = m_scripts.getByCode("Hans")
local data = {[0] = m_st, m_ts}
local mod = {[0] = "st", "ts"}
local var = {[0] = "Simp.", "Trad."}
local sc = {[0] = Hans, Hant}
local function find_stable_loop(chars, other, j)
local display = ruby({["markup"] = "[" .. other .. "](" .. var[(j+1)%2] .. ")"})
display = m_links.language_link{term = other, alt = display, lang = lang, sc = sc[(j+1)%2], tr = "-"}
insert(chars, display)
if data[(j+1)%2][other] == other then
insert(chars, other)
return chars, 1
elseif not data[(j+1)%2][other] then
insert(chars, "not found")
return chars, 2
elseif data[j%2][data[(j+1)%2][other]] ~= other then
return find_stable_loop(chars, data[(j+1)%2][other], j + 1)
else
local display = ruby({["markup"] = "[" .. data[(j+1)%2][other] .. "](" .. var[j%2] .. ")"})
display = m_links.language_link{term = data[(j+1)%2][other], alt = display, lang = lang, sc = sc[j%2], tr = "-"}
insert(chars, display .. " (")
display = ruby({["markup"] = "[" .. data[j%2][data[(j+1)%2][other]] .. "](" .. var[(j+1)%2] .. ")"})
display = m_links.language_link{term = data[j%2][data[(j+1)%2][other]], alt = display, lang = lang, sc = sc[(j+1)%2], tr = "-"}
insert(chars, display .. " etc.)")
return chars, 3
end
return chars
end
for i = 0, 1, 1 do
for char, other in pairs(data[i]) do
if data[(i+1)%2][other] ~= char then
local chars, issue = {}
local display = ruby({["markup"] = "[" .. char .. "](" .. var[i] .. ")"})
display = m_links.language_link{term = char, alt = display, lang = lang, sc = sc[i], tr = "-"}
insert(chars, display)
chars, issue = find_stable_loop(chars, other, i)
if issue == 1 or issue == 2 then
local sc_this, mod_this, j = {}
if chars[#chars-1]:match(var[(i+1)%2]) then
j = 1
else
j = 0
end
mod_this = mod[(i+j)%2]
sc_this = {[0] = sc[(i+j)%2], sc[(i+j+1)%2]}
for k, char in ipairs(chars) do
chars[k] = m_script_utils.tag_text(char, lang, sc_this[k%2], "term")
end
if issue == 1 then
discrepancy("zh/data/" .. mod_this, "字元引用自身:" .. concat(chars, " → "))
elseif issue == 2 then
discrepancy("zh/data/" .. mod_this, "缺失字元:" .. concat(chars, " → "))
end
elseif issue == 3 then
for j, char in ipairs(chars) do
chars[j] = m_script_utils.tag_text(char, lang, sc[(i+j)%2], "term")
end
discrepancy("zh/data/" .. mod[i], "可能不匹配的字元:" .. concat(chars, " → "))
end
end
end
end
end
local function check_serialization(modname)
local serializers = {
["Hani-sortkey/data/serialized"] = "Hani-sortkey/serializer",
}
if not serializers[modname] then
return nil
end
local serializer = serializers[modname]
local current_data = require("Module:" .. serializer).main(true)
local stored_data = require("Module:" .. modname)
if current_data ~= stored_data then
discrepancy(modname, "<strong><u>Important!</u> Serialized data is out of sync. Use [[Module: ".. serializer .. "]] to update it. If you have made any changes to the underlying data, the serialized data <u>must</u> be updated before these changes will take effect.</strong>")
end
end
-- Warning: cannot be called twice in the same module invocation because
-- some module-global variables are not reset between calls.
function export.do_checks(modules)
messages = setmetatable({}, {
__index = function (self, k)
local val = Array()
self[k] = val
return val
end
})
if modules["zh/data/ts"] or modules["zh/data/st"] then
check_zh_trad_simp()
end
check_languages()
check_etym_languages()
-- families and scripts must be checked AFTER languages; languages checks fill out
-- the nonempty_families and nonempty_scripts tables, used for testing if a family/script
-- is ever used in the data
check_families()
check_scripts()
check_wikidata_languages()
if modules["labels/data"] then
check_labels()
end
for module in pairs(modules) do
check_serialization(module)
end
setmetatable(messages, nil)
local function find_code(message)
return string.match(message, "<code>([^<]+)</code>")
end
find_code = require("Module:fun").memoize(find_code)
local function comp(message1, message2)
local code1, code2 = find_code(message1), find_code(message2)
if code1 and code2 then
return code1 < code2
else
return message1 < message2
end
end
for _, msglist in pairs(messages) do
msglist:sort(comp)
end
local ret = messages
messages = nil
return ret
end
function export.format_message(modname, msglist)
local header; if modname:match("^Module:") or modname:match("^Template:") then
header = "===[[" .. modname .. "]]==="
else
header = "===[[Module:" .. modname .. "]]==="
end
return header
.. msglist
:map(
function(msg)
return "\n* " .. msg
end)
:concat()
end
function export.check_modules(args)
local modules = {}
for _, arg in ipairs(args) do
modules[arg] = true
end
local ret = Array()
local messages = export.do_checks(modules)
for _, module in ipairs(args) do
local msglist = messages[module]
if msglist then
ret:insert(export.format_message(module, msglist))
end
end
return ret:concat("\n")
end
function export.check_modules_t(frame)
local args = m_table.shallowcopy(frame.args)
return export.check_modules(args)
end
function export.perform(frame)
local messages = export.do_checks({})
-- Format the messages
local ret = Array()
for modname, msglist in m_table.sortedPairs(messages) do
ret:insert(export.format_message(modname, msglist))
end
-- Are there any messages?
if i == 1 then
return "<b class=\"success\">Glory to Arstotzka.</b>"
else
ret:insert(1, "<b class=\"warning\">檢測到差異:</b>")
return ret:concat("\n")
end
end
return export