Wiktionary:Beer parlour/2018/March
March LexiSession: mathematics
[edit]This month, we suggest you to focus somehow on the words to talk about the mathematics. Yes, it's because of Pi Day, the 14th of March. As a starting point, you can have a look at Thesaurus:mathematics an there is still plenty domains to explore and to structure. Let's figure it out.
By the way, for those who do not know LexiSession yet, it is a collaborative transwiktionary experiment. You're invited to participate however you like and to suggest next month's topic. The idea is to look at other community improvements on the selected topic to improve our own pages. It already bring new collaborators to contribute for the first time on a suggested topic. If you participate, please let us know here or on Meta, to keep track on the evolution of LexiSession. I hope there will be some people interested this month, and if you can spread it to another Wiktionary, you are welcome to do so. Ideally, LexiSession should be a booster for every Wiktionary on the same agenda, to give us more insight into the ways our colleagues works in the other projects.
By the way, it is the twentieth edition of LexiSession! Noé 10:56, 1 March 2018 (UTC)
- I did some Spanish entries and translations for some math terms. --Otra cuenta105 (talk) 11:04, 1 March 2018 (UTC)
- Time to crack open that Mongolian stats intro book. Crom daba (talk) 11:17, 1 March 2018 (UTC)
- There are LOTS of mathematical terms listed in Requests for definitions in English entries. Maybe this is a good excuse for someone to have a crack at them! Kiwima (talk) 02:34, 8 March 2018 (UTC)
- Time to crack open that Mongolian stats intro book. Crom daba (talk) 11:17, 1 March 2018 (UTC)
- Great job here! In French Wiktionary we only made a new thesaurus but we haven't created so much entries! Noé 10:08, 5 April 2018 (UTC)
Unifying the display of romanisations in links and headwords: italicise romanisations by default
[edit]This follows from last month's topic Wiktionary:Beer parlour/2018/February#Inconsistent and confusing romanisation formats given by various templates and modules. I would like to propose that we make the display of romanisations in {{l}}
, {{m}}
and {{head}}
consistent by italicising romanisations in an entry by default.
- Rationale: consistency, clarity, and professionality.
- Examples: Russian русский (russkij), Hindi युद्ध (yuddh), where italicised and unitalicised romanisations appear alternately in the entries.
Wyang (talk) 12:18, 1 March 2018 (UTC)
- Support --Per utramque cavernam (talk) 12:29, 1 March 2018 (UTC)
- Support. I really have to clean up that Hindi entry. —AryamanA (मुझसे बात करें • योगदान) 13:42, 1 March 2018 (UTC)
- Support. I've never understood why these display differently. ‑‑ Eiríkr Útlendi │Tala við mig 18:02, 1 March 2018 (UTC)
- Abstain – I understand the reasoning behind the current format, but don't mind it being changed. — Eru·tuon 23:05, 1 March 2018 (UTC)
- Abstain – Per Erutuon – Crom daba (talk) 00:06, 2 March 2018 (UTC)
Caveated support: Inline with{{lang}}
over on Wikipedia. The implementation of a transcription param would be prerequisite, because I wouldn't want to see 𐫁𐫏𐫇𐫡 (bywr /bēwar/) --Victar (talk) 03:29, 2 March 2018 (UTC)- Support, but keep romanizations in inflection tables that are on separate lines (as at e.g. алъдьи (alŭdĭi)) unitalicized. — Vorziblix (talk · contribs) 08:56, 4 March 2018 (UTC)
Speaking of transliterations, has anyone ever entertained the idea of switching them on and off at will (with a button or sumthin')? --Per utramque cavernam (talk) 22:23, 2 March 2018 (UTC)
- @Per utramque cavernam:, you can, just use
|tr=-
. --Victar (talk) 22:26, 2 March 2018 (UTC)- @Victar: I'm thinking of a gadget that would let the user choose which scripts he wants to see transliterated, and which ones he doesn't.
- While I personally need translits for Devanagari (everywhere), I don't need them for Greek (anywhere); but others will be in the opposite situation.
- Thus, it would be convenient to be able to hide them on the entire website, without actually hardcoding
|tr=-
. --Per utramque cavernam (talk) 22:44, 2 March 2018 (UTC)- @Per utramque cavernam: That can be easily done with some custom JS in your preferences. Or for a quick fix, just add something like
span[lang="el-Latn"] { display: none }
to your custom CSS file. --Victar (talk) 23:12, 2 March 2018 (UTC)- @Victar That leaves empty parenthesis and commas everywhere, nice idea though. Crom daba (talk) 23:59, 2 March 2018 (UTC)
- @Crom daba, that's why is said it should be done in JS. I thought you knew programming. --Victar (talk) 00:08, 3 March 2018 (UTC)
- You can use the below. It's going to remove
|pos=
too, which could be fixed, that's all from me.
- @Victar That leaves empty parenthesis and commas everywhere, nice idea though. Crom daba (talk) 23:59, 2 March 2018 (UTC)
- @Per utramque cavernam: That can be easily done with some custom JS in your preferences. Or for a quick fix, just add something like
var element = document.querySelectorAll('[lang="el-Latn"]'); [].forEach.call(element, function(element) { var parent = element.parentElement.innerHTML.replace(/ *\([^)]*\) */g, ""); element.parentElement.innerHTML = parent; });
- --Victar (talk) 02:38, 3 March 2018 (UTC)
- That's cool, although maybe we should rework our modules so that css can do it. Crom daba (talk) 15:19, 3 March 2018 (UTC)
- @Crom daba, no better time to learn some actual coding yourself. --Victar (talk) 17:56, 3 March 2018 (UTC)
- @Victar I have prior Javascript experience (although I was never fluent, and I wouldn't be able to write the above code without some heavy SO consultation), I just figured it would be more elegant to solve it with .css Crom daba (talk) 18:31, 3 March 2018 (UTC)
- @Crom daba, no better time to learn some actual coding yourself. --Victar (talk) 17:56, 3 March 2018 (UTC)
- That's cool, although maybe we should rework our modules so that css can do it. Crom daba (talk) 15:19, 3 March 2018 (UTC)
- --Victar (talk) 02:38, 3 March 2018 (UTC)
- Oppose Let's unify on non-italics. I don't see why the romanization should be in italics; the romanization is no more a mention than the term romanized. Better go for a simple typography. --Dan Polansky (talk) 21:53, 16 March 2018 (UTC)
- @Dan, I'm confused -- what do mentions have to do with it? ‑‑ Eiríkr Útlendi │Tala við mig 23:36, 16 March 2018 (UTC)
- In the referenced Beer parlour discussion, someone said "I believe the notion is that for scripts which we don't italicize in mentions (Russian, Greek, etc.), we italicize the romanization to show the distinction between the mention and non-mention formats."
- That's the only argument in support of italics that I could find at the time of my post.
- Now, below, someone said "Italicising romanisations of non-Latin-script foreign terms is a standard and internationally used practice in reference works and in academia": That is for romanizations in the middle of the sentence, right? Like below, we have an example: "The Arabic tāʾ marbūṭa is rendered a not ah." There, "tāʾ marbūṭa" is a mention. There, you might italicize a Czech term as well, where Czech uses roman letters by default.
- In "Hindi युद्ध (yuddh)", I don't see a need to italicize "yuddh". I admit that the Jaschke Dictionary for Tibetan does italicize romanizations, but there, they are followed by English text, whereas in the uses of
{{m}}
and{{l}}
, they are not followed by English text. - I looked in русский for visual inspection. The headword line in русский currently says "ру́сский • (rússkij)" without romanization, and it looks just fine; italics would not help it in any way.
- What would make sense to me is italicizing romanizations in
{{m}}
, but not in{{head}}
,{{l}}
and{{t}}
; this is because{{m}}
can be used in the middle of the text and it italicize roman script in general, whereas{{l}}
does not italicize roman script in general. - As for the rationale "consistency, clarity, and professionality" and as for legibility, romanizing in
{{m}}
but not in{{head}}
,{{l}}
and{{t}}
would be consistent with what we do for roman script; as for clarity, I do not see how italics is more clear; as for professionality, I might admit that italics could be more in keeping with what other publications do, but doing something different when well justified is not necessary unprofessional in any bad sense; as for legibility, there is no doubt in my mind that italics is less legible, especially on computer screens. --Dan Polansky (talk) 07:50, 17 March 2018 (UTC) - On a procedural note, I created Wiktionary:Votes/2018-03/Showing romanizations in italics by default to ensure maximum audience. --Dan Polansky (talk) 08:05, 17 March 2018 (UTC)
- @Dan, I'm confused -- what do mentions have to do with it? ‑‑ Eiríkr Útlendi │Tala við mig 23:36, 16 March 2018 (UTC)
- Oppose: I'm changing my vote to Dan's side. I don't think italics brings anything but less legibility. --Victar (talk) 22:40, 16 March 2018 (UTC)
- @Victar: Quite the contrary, actually.
- Italicising romanisations of non-Latin-script foreign terms is a standard and internationally used practice in reference works and in academia. Take a look at Wehr and Steingass for Arabic, Oxford Dictionary for Hindi, Steingass Dictionary for Persian, Monier-Williams Sanskrit Dictionary, Jaschke Dictionary for Tibetan, The Chicago Manual of Style, The Oxford Style Manual, ... and even Wikipedia (Iran).
- The International Journal of Middle East Studies guidelines become like this without italics:
- The Arabic tāʾ marbūṭa is rendered a not ah. In Persian it is ih. In Arabic iḍāfa constructions, it is rendered at: for example, thawrat 14 tammūz. The Persian izafat is rendered -i: for example, vilāyat-i faqīh. […]
- and this with italics:
- The Arabic tāʾ marbūṭa is rendered a not ah. In Persian it is ih. In Arabic iḍāfa constructions, it is rendered at: for example, thawrat 14 tammūz. The Persian izafat is rendered -i: for example, vilāyat-i faqīh. […]
- The basic meaning of italics is that “this Latin-script word is not English”, and legibility is precisely its advantage. Wyang (talk) 00:17, 17 March 2018 (UTC)
- @Wyang:, I don't think that's a comparable usage. In those examples, the foreign terms are only distinguished by being in italic. We however are enclosing them already in parentheses. And yes, I do think italics are less legible than normal text, in that it's harder to read, especially with diacritics and special characters. --Victar (talk) 01:00, 17 March 2018 (UTC)
- @Victar: Parentheses, quotation marks or not, the academic practice is that romanisations are italicised by default in running text, whenever they assume an auxiliary function to the script forms. We can find plenty of people writing 여름 yelum, 여름 (yelum), or yelum 여름, but hardly anyone writing 여름 (yelum) in proper works. Also see Citation Guidelines for Chinese-language Materials, § 2.3 In Parentheses. Such practice makes it easier for readers to parse the text and identify romanisations - and simply ignore them if the readers already know the script or language. 서울 (Seoul, “Seoul”) is much easier to parse than 서울 (Seoul, “Seoul”). To me, the current links on Reconstruction:Proto-Iranian/θanǰáyati are impossible to read. It's hard to make out which is which, and eyes become strained after reading a few lines, while the the italic version makes the romanisations stand out aesthetically.
- There isn't really a reduced legibility of italic Latin text. We routinely italicise natively Latin-script terms in
{{m}}
, and some Latin-script languages are no less diacritics-heavy. For example, Vietnamese:- ủ trái cây bằng đất đèn cho mau chín. Thường mấy trái cây non, bọn buôn nó hay giú khí đá cho mau chín, nên ăn mấy trái cây ấy có ra gì đâu.
- There is not really a legibility difference between the non-italic and italic sentences above. Our readers seem to read these italic letters with diacritics just fine too. Wyang (talk) 02:41, 17 March 2018 (UTC)
- @Wyang, all that you just expressed are preferences and opinions. I simply disagree with them. --Victar (talk) 02:45, 17 March 2018 (UTC)
- @Victar It's not preferences and opinions. It's what typically happens in academia and lexicography and the rationales behind it. You can surely disagree, but it's just unfortunate that votes can happen with complete disregard for what other way more established dictionaries adopt as standard practice, dismissed as professional preferences. Wyang (talk) 02:55, 17 March 2018 (UTC)
- @Wyang, again, that's simply not true; you're citing a standard for foreign terms within running English text. We don't have a need for it as we already use parentheses, so you're argument is of stylistic preference, not out of functionally. --Victar (talk) 03:07, 17 March 2018 (UTC)
- @Victar Please have a closer look at the dictionary links I have given above. You are missing the point: this isn't a discussion on whether there is a functional need to italicise the romanisations. The point is: is there a stylistic need to do so? The answer is yes, on the grounds that:
- This is the standard practice in academia and lexicography. Note that the “standard practice” is not that dictionaries italicise romanisations following headwords; the practice is:
When we flip through Wehr, Steingass, Oxford, Monier-Williams, Jaschke, etc., they are full of examples of romanisations after headwords, in parentheses, in quotation marks, on their own, whatever, but all romanisations are italic, regardless of the environment the romanisations are found in after the headwords. On a quick glance, there were 10 romanisations in parentheses on the Wehr page I gave before alone. This simultaneous consistency of formatting romanisations as italic in reference works is what we have failed to appreciate so far in our infrastructure. And, this standard practice in reference works is supported by good rationales which are very relevant to us too. Wyang (talk) 03:50, 17 March 2018 (UTC)Parentheses, quotation marks or not, romanisations are italicised by default in text, whenever they assume an auxiliary function to the script forms.
- @Wyang, don't be patronizing. I doesn't help your argument. I saw your links and understand the issue. I disagree and my points are above. I don't want to get into it with you any further. --Victar (talk) 04:10, 17 March 2018 (UTC)
- @Victar: Sorry if you found it patronizing. The discussion is directed towards the arguments, and none towards the people. Your points were: (1) that reference works adopt italicity out of functional necessity and that such necessity is nonexistent when romanisations are enclosed in parentheses, which is incorrect as the italicity was a universal stylistic preference in lexicography as shown above, independent of text environment; (2) that italicity brings about a reduced legibility, also unsupported by our routine italicisation of Latin-script terms, many of which are no less diacritics-heavy. This is not a personal preference vote; it is a site-wide style format change which must be carefully deliberated on, and I unfortunately did not find the arguments above sufficiently reasoning-robust to balance the overwhelming evidence of a standard practice of italicisation in reference works. Wyang (talk) 04:56, 17 March 2018 (UTC)
- @Wyang, don't be patronizing. I doesn't help your argument. I saw your links and understand the issue. I disagree and my points are above. I don't want to get into it with you any further. --Victar (talk) 04:10, 17 March 2018 (UTC)
- @Wyang, again, that's simply not true; you're citing a standard for foreign terms within running English text. We don't have a need for it as we already use parentheses, so you're argument is of stylistic preference, not out of functionally. --Victar (talk) 03:07, 17 March 2018 (UTC)
- @Victar It's not preferences and opinions. It's what typically happens in academia and lexicography and the rationales behind it. You can surely disagree, but it's just unfortunate that votes can happen with complete disregard for what other way more established dictionaries adopt as standard practice, dismissed as professional preferences. Wyang (talk) 02:55, 17 March 2018 (UTC)
- @Wyang, all that you just expressed are preferences and opinions. I simply disagree with them. --Victar (talk) 02:45, 17 March 2018 (UTC)
- @Wyang:, I don't think that's a comparable usage. In those examples, the foreign terms are only distinguished by being in italic. We however are enclosing them already in parentheses. And yes, I do think italics are less legible than normal text, in that it's harder to read, especially with diacritics and special characters. --Victar (talk) 01:00, 17 March 2018 (UTC)
- Support --Anatoli T. (обсудить/вклад) 02:57, 17 March 2018 (UTC)
Passed and discussion closed. This and the February discussion have been around for sufficiently long- and as Korn said in the Japanese discussion below, those with interest have already expressed their opinions. No substantial opposing argumentation was put forth, compared to what we have as the literature and lexicographic evidence for unifying it as italicised. This affects and interests some people much more than others who may barely view and manage non-Latin-script entries. This isn't a popular polling station or a place for musings, but rather a “think tank” where arguments for and against should be proposed and evaluated in the presence of each other. Wyang (talk) 09:08, 17 March 2018 (UTC)
- FWIW I support this. I think efforts (by the same one stick in the mud as usual) to insist on further bureaucracy can be dismissed. - -sche (discuss) 15:32, 17 March 2018 (UTC)
- This has not been brought to a wide audience, the reasoning presented intially was nearly absent ("consistency, clarity, and professionality") and free of substantiation. This is why Wiktionary:Votes/2018-03/Showing romanizations in italics by default is the proper venue. The only way the vote can fail to pass is if there actually is not a consensus. --Dan Polansky (talk) 15:36, 17 March 2018 (UTC)
- And on votes somtimes being evil and such, let the reader read the top of this Beer parlour "discussion". There is no discussion; there are blank and nearly blank votes on the supporting sites. A real discussion started only when I posted my oppose, and Victar changed his vote. --Dan Polansky (talk) 15:41, 17 March 2018 (UTC)
- At no surprise, I agree with Dan. I can't speak for each personally, but the support votes look more like "whatever" votes. I was also "whatever", but I thought more into it and ended up disagreeing. I think with a vote, people will put more thought into both sides of the argument, especially since only one side was initially given. Also, @-sche, no need for personal attacks --Victar (talk) 16:03, 17 March 2018 (UTC)
- As for: "arguments for and against should be proposed and evaluated in the presence of each other": Yes, let's. If I evaluate the arguments, I find in favor of my arguments. If I did not, I would change my vote above, right? Now what? That does not work. There is no mechanism of evaluation of arguments. The best we have is our venerable votes-cum-discussions. --Dan Polansky (talk) 16:22, 17 March 2018 (UTC)
I think the changes made towards italicized romanizations should be reverted immediately pending the completion of the vote. I find it completely inappropriate that @Wyang moved forward with this change. @Dan Polansky --Victar (talk) 03:59, 18 March 2018 (UTC)
- I agree. This is a longstanding format that most of us have become very used to and should not be changed without a vote. --WikiTiki89 15:01, 19 March 2018 (UTC)
- Not just gotten used to, but it nullifies important distinctions in Hittite and merges
{{l}}
and{{m}}
for many languages. You were already desynoped for making changes without consensus and wheel waring. Please undo this change immediately before it escalates any further. --Victar (talk) 15:36, 19 March 2018 (UTC)- Now that I've found where the change was made, I've reverted it. --WikiTiki89 15:45, 19 March 2018 (UTC)
- Thanks, @Wikitiki89. --Victar (talk) 15:52, 19 March 2018 (UTC)
- Now that I've found where the change was made, I've reverted it. --WikiTiki89 15:45, 19 March 2018 (UTC)
- Not just gotten used to, but it nullifies important distinctions in Hittite and merges
Where tf were you before, when the discussions were ongoing? Wyang (talk) 19:15, 19 March 2018 (UTC)
Quoting what I've said elsewhere:
- Never are votes fair, not in any population, under any circumstance. All voting systems are inherently flawed, as shown by Arrow's impossibility theorem (proposed by Nobel prize winner Kenneth Arrow);
- The Wikimedia page Polls are evil summarises this perfectly;
- The preference on Wiktionary to determine and resolve everything by voting is a fucked-up mentality that only has negative consequences when applied to non-admin/bureaucrat decisions;
- The proposal is 'by default'. Languages can certainly have their own format when appropriate: an example is Japanese, where headword templates are specifically formatted to look aesthetic, with the various fonts of kana, romaji, etc. which is fair enough. The Chinese template
{{zh-pron}}
also conventionally does not italicise the romanisations. - These things should be addressed in discussions, not in votes. They are helpful comments and opinions, and considerations. When forced to compress a comment into a camp, that's how the unfairness of polling manifests.
- Polls work because they are a cheap means of coercive consensus- this doesn't imply they are wise, and they are exactly the opposite of wisdom. If people are unable to formulate their thoughts in words in discussions and can only assign their 'thoughts' to support/oppose camps, that is just so wrong. Thoughts are very rarely bipolar, or oligopolar, like the options of a poll. In psychiatry, this is referred to as 'splitting' - black-and-white thinking, a pathological mode of psychology that we are forcing every participant of the poll to adopt.
- The increased audience of polls is a direct consequence of the above, as it additionally attracts people who simplemindedly classify their thoughts as 'support' or 'oppose', based on their certain preformed, and often biased, impressions on the issue without engaging in critical thinking. Past votes abound with these, and the typical vote is cringeworthy enough to only attract those enthusiastic in black-and-whitely assigning their opinions. The Chinese merger vote - look at the comments by those opposing and abstaining. The fact these voices are counted equally to others is just embarrassing.
- Regarding the supposed advantages of votes:
- Point #1 (“votes attract a larger audience”) is a drawback of polls, not a strength. The way polls attract more people is undesirable- we do not want a maximal audience for black-and-white thinking. Habitually creating polls to decide on everything in the community exactly helps subconsciously enforce the opinion that black-and-white thinking is perfectly nonpathological, and this is detrimental to the health of the community decision-making. It is the discussions that warrant a maximal audience, and there are alternative ways of achieving that.
- Point #2 (“many people aren't comfortable in discussions because they aren't good at expressing themselves”) is also a drawback of polls. The ability to formulate one's thoughts on an issue is an essential skill before one can thoroughly understand the issue. It is not an excessive requirement. There is no need for rhetoric and complex, ornate writing, but one has to show that one has attempted to consider the various aspects of the issue. It is not a comment such as "it conflates
{{l}}
and{{m}}
, thus I oppose". Specifically, where is the reasoning that the harms of conflating{{l}}
and{{m}}
for non-Latin-script languages outweigh the benefits? - And point #3 (“it reduces closer's bias”) is the evilest of the drawbacks of polls. The characteristic that all votes are weighted equally in a poll, regardless of the scrupulousness in the argument (if any), appears superficially just, but it is the biggest encouragement to simplistic, uninformed, and black-and-white thinking and causes this undesirable participation to iterate, vote after vote, and continuously contribute negatively to good decision-making. The fact the vote-closer feels powerless in discounting votes supported by outright uninformed reasoning, results in unwise decisions being made and a more daring voter next time, propagating and exacerbating the vicious cycle, to the point that everyone becomes indoctrinated about the acceptability of boldly casting uninformed votes, for the sake of "we can".
Wyang (talk) 02:41, 25 March 2018 (UTC)
- Wow. The abuse of Arrow's Theorem above is just fascinating. Sure, multiple-choice votes are vulnerable to speculative voting (=voting in which the voter casts a preference different from their real one), but the proposal under discussion is a two-choice vote, and therefore, Arrow's Theorem does not apply to it. (I learned the theorem and its proof in game theory classes at a university long time ago.) The argument that, since votes as a mechanism suffer from some anomalies, let's resort to autocracy or oligarchy as a mechanism, is a fallacy; the defects of autocracy and oligarchy are much graver. The strength of the argument is not a mechanism, since someone has to decide which arguments are strong and which are not. The strength of the argument principle all too often leads to interminable fights such as the one between Wyang and CodeCat some time ago; both combatants were convinced they had the strength of the argument on their side. The English Wiktionary is fortunate enough to have a long tradition of votes-cum-discussions. --Dan Polansky (talk) 09:53, 25 March 2018 (UTC)
- What the hell are you even talking about in your reply? The issue at hand is not a two-choice vote at all- read the title of the discussion "Unifying the display of romanisations in links and headwords: italicise romanisations by default". It is a proposal on the Beer Parlour with the aim of garnering opinions and comments, not a vote where you either support or oppose, or don't know, never in between. Some people choose to support, oppose, but there is the freedom of only voicing your opinion, comment, suggestion, etc. without having to attach a big support/oppose label in front of it. You converted it into a vote, with only three options of support, oppose and abstain. Now this is how the unfairness of voting manifests and how Arrow's Theorem applies. Your reasoning for oppose was "Let's unify on non-italics", which is actually half support, half oppose: you support the unification of display of romanisations, but oppose the italicisation. Yet you chose to stick a big red oppose even before you commented, not even caring about how you were actually half-support and half-oppose, but only that you opposed it (more important than the comment perhaps). Same as Victar's reasoning, that "italics brings anything but less legibility", an opposition to the italicisation, but not the unification.
- Accusing this of autocracy or oligarchy is even more ridiculous. This is exactly the opposite of that, where arguments are weighed more favourably than the individuals, or a "meritocracy of arguments". The merits of the arguments themselves are exactly what's evaluated, compared to polls which value every participant therein, regardless of whether the voter had thoroughly thought about the issue and the pros and cons of the proposal. The cold truth is that people don't think as clearly about an issue in votes, compared to discussions. Just observe the few recent votes and the level of expertise or deliberation required before someone casts a vote. It is precisely the source of the unfairness of votes, in that the same group of active participants, who habitually vote in one vote after another, deliberate inadequately while casting their votes over and over again. The decisions from such votes are of course going to be unwise. There are too many examples of votes like this already. Sure there will be few instances where relative argument strengths are comparable, but that will be when an arbitrator or bureaucrat should come in. Asking the commoner what they understand about an issue where the arguments are barely understandable surely gives the perfect platform for a personality contest. Wyang (talk) 10:13, 25 March 2018 (UTC)
- One thing: Arrow's Theorem does not apply to Wiktionary:Votes/2018-03/Showing romanizations in italics by default since it is a support-oppose vote, not a multi-option one, and therefore, speculative voting does not make sense in it. --Dan Polansky (talk) 10:41, 25 March 2018 (UTC)
- Huh? There are four options on your vote: "Support", "Oppose", "Abstain" and "Oppose having this vote". Wyang (talk) 11:23, 25 March 2018 (UTC)
- For the purpose of Arrow's Theorem, there are only two options/proposals: accept or reject a particular change. Abstain is not a separate proposal, and "Oppose having this vote" is a meta-option. The reader acquainted with Arrow's Theorem will realize the theorem does not apply to this vote, and it does not apply to a large majority of Wiktionary votes. Especially the notion that the presence of "Abstain" somehow lets Arrow's Theorem apply is as bizzare as it can get. Throwing in Arrow's Theorem for good measure because it seems anti-vote really is not a way how to build a robust and correct argument. --Dan Polansky (talk) 11:35, 25 March 2018 (UTC)
- Wow. You really have no idea about the Theorem- the aim of the Theorem is not to accept or reject a particular change; it is to convert ranking of preferences by individuals into a community-wide ranking. As bizarre as it seems to you, the Theorem is not exclusive of the preference of abstention, which is just as reasonable as a preference for the individual. When the vote presents the voter with a set of four alternatives like the "Support", "Oppose", "Abstain" and "Oppose having this vote" in your vote, the alternative of "abstention" is a perfectly valid preference for voters, since the Theorem is not concerned with rejecting a proposal, but rather with arriving from the personal preferences of alternatives presented, at a "social" or collective ordering of those alternatives, which the Theorem proves is never impossible. And remember that you are replying to me, not the reader. Wyang (talk) 11:55, 25 March 2018 (UTC)
- The preference ranking is over outcomes. There are two outcomes in the vote: change made, change not made. "Abstain" refrains from stating one's preferential ranking of outcomes. Arrow's Theorem does not apply; "In its strongest and simplest form, Arrow's impossibility theorem states that whenever the set A of possible alternatives has more than 2 elements, then the following three conditions become incompatible [...]" --WP. The abstain option does not increase the set of alternatives for the purpose of Arrow's Theorem. That is obvious; I do not know where to go from there. Someone better equipped to argue the obvious might wish to continue this conversation. --Dan Polansky (talk) 12:15, 25 March 2018 (UTC)
- Wow. You really have no idea about the Theorem- the aim of the Theorem is not to accept or reject a particular change; it is to convert ranking of preferences by individuals into a community-wide ranking. As bizarre as it seems to you, the Theorem is not exclusive of the preference of abstention, which is just as reasonable as a preference for the individual. When the vote presents the voter with a set of four alternatives like the "Support", "Oppose", "Abstain" and "Oppose having this vote" in your vote, the alternative of "abstention" is a perfectly valid preference for voters, since the Theorem is not concerned with rejecting a proposal, but rather with arriving from the personal preferences of alternatives presented, at a "social" or collective ordering of those alternatives, which the Theorem proves is never impossible. And remember that you are replying to me, not the reader. Wyang (talk) 11:55, 25 March 2018 (UTC)
- For the purpose of Arrow's Theorem, there are only two options/proposals: accept or reject a particular change. Abstain is not a separate proposal, and "Oppose having this vote" is a meta-option. The reader acquainted with Arrow's Theorem will realize the theorem does not apply to this vote, and it does not apply to a large majority of Wiktionary votes. Especially the notion that the presence of "Abstain" somehow lets Arrow's Theorem apply is as bizzare as it can get. Throwing in Arrow's Theorem for good measure because it seems anti-vote really is not a way how to build a robust and correct argument. --Dan Polansky (talk) 11:35, 25 March 2018 (UTC)
- Huh? There are four options on your vote: "Support", "Oppose", "Abstain" and "Oppose having this vote". Wyang (talk) 11:23, 25 March 2018 (UTC)
- One thing: Arrow's Theorem does not apply to Wiktionary:Votes/2018-03/Showing romanizations in italics by default since it is a support-oppose vote, not a multi-option one, and therefore, speculative voting does not make sense in it. --Dan Polansky (talk) 10:41, 25 March 2018 (UTC)
- "Never are [things] fair, not in any population, under any circumstance. All [things] are inherently flawed." Arrow's Theorem is irrelevant; even if it applies, even if it comes into play (Arrow's Theorem is a worse-case theorem, not saying anything about any one vote, no matter how many outcomes), it's saying that there's no coherent overall preference, which is a fact not avoidable by any other method.--Prosfilaes (talk) 07:47, 26 March 2018 (UTC)
- Excellent point. --Dan Polansky (talk) 10:25, 30 March 2018 (UTC)
Could we please avoid further references to the "Wyang vs CodeCat 2016 match", which don't serve the discussion at all? --Per utramque cavernam (talk) 10:18, 25 March 2018 (UTC)
- Well, it is empirical evidence supporting my thesis. The reader should realize I am not making things up when I say that the strength-of-the-argument people really are often going for "my way, I have assessed the strength of the argument, and I am right." No fallacy of irrelevance here, as far as I can tell. --Dan Polansky (talk) 10:41, 25 March 2018 (UTC)
- @Per utramque cavernam: I find that an absurd request to make. All past behaviors are relevant to present and future dealings and recourse. Going forward with his proposed changes after the vote was created strikes me as a clear abuse of admin rights, and is reminiscent of previous transgressions. --Victar (talk) 13:28, 25 March 2018 (UTC)
- As devil's advocate, as it were, I'd like to point out that we have no guarantee that Wyang was aware that the vote had been created before he implemented his changes. Dan's comment stating that he'd created the vote is sandwiched in the middle of a long thread, and was only created roughly one hour before Wyang's next comment, which is conceivably close enough for Wyang to have missed that change. Note that I am merely seeking to point out that we have no definite evidence of abuse, or abusive intent. ‑‑ Eiríkr Útlendi │Tala við mig 20:00, 27 March 2018 (UTC)
- Some people just like using strong words to promote their cause. A significant edit after a seeming consensus (majority, not all) and a substantial discussion is definitely not an abuse of admin rights. --Anatoli T. (обсудить/вклад) 01:00, 28 March 2018 (UTC)
- <nod> However, since Wyang did bring up a few of my favourite Meta articles, I would also like to mention meta:Neutral votes count. - Amgine/ t·e 02:41, 28 March 2018 (UTC)
- Meta:Neutral votes count makes no sense; it in effect argues that abstaining votes should be counted as opposing votes. Note that the sole author of the meta page is the above Amgine. --Dan Polansky (talk) 10:25, 30 March 2018 (UTC)
- It says they are votes, and part of any total. (It also notes a decision is not made by the outcome of a poll, but for the community. A nuance I have noted as distinctly missing from your history.) - Amgine/ t·e 14:52, 30 March 2018 (UTC)
- Meta:Neutral votes count makes no sense; it in effect argues that abstaining votes should be counted as opposing votes. Note that the sole author of the meta page is the above Amgine. --Dan Polansky (talk) 10:25, 30 March 2018 (UTC)
- <nod> However, since Wyang did bring up a few of my favourite Meta articles, I would also like to mention meta:Neutral votes count. - Amgine/ t·e 02:41, 28 March 2018 (UTC)
- Some people just like using strong words to promote their cause. A significant edit after a seeming consensus (majority, not all) and a substantial discussion is definitely not an abuse of admin rights. --Anatoli T. (обсудить/вклад) 01:00, 28 March 2018 (UTC)
- @Victar: "All past behaviors are relevant to present and future dealings and recourse." I actually agree with you. It's just that on this particular occasion, it seemed a bit brusque of you to bring that back on the table, when Wyang was apparently willing to pursue the discussion further. --Per utramque cavernam (talk) 11:32, 4 April 2018 (UTC)
- As devil's advocate, as it were, I'd like to point out that we have no guarantee that Wyang was aware that the vote had been created before he implemented his changes. Dan's comment stating that he'd created the vote is sandwiched in the middle of a long thread, and was only created roughly one hour before Wyang's next comment, which is conceivably close enough for Wyang to have missed that change. Note that I am merely seeking to point out that we have no definite evidence of abuse, or abusive intent. ‑‑ Eiríkr Útlendi │Tala við mig 20:00, 27 March 2018 (UTC)
- @Per utramque cavernam: I find that an absurd request to make. All past behaviors are relevant to present and future dealings and recourse. Going forward with his proposed changes after the vote was created strikes me as a clear abuse of admin rights, and is reminiscent of previous transgressions. --Victar (talk) 13:28, 25 March 2018 (UTC)
Middle Assamese
[edit]I have added a code inc-mas
for Middle Assamese per Talk:ভাল. If anyone has any objections, please put them here. There are lots of cites on Google Books. —AryamanA (मुझसे बात करें • योगदान) 18:40, 1 March 2018 (UTC)
- @AryamanA, I think the usual format for that would be
inc-asm
(Assamese Middle), cf.frm
(French Middle),goh
(German Old High). --Victar (talk) 03:21, 2 March 2018 (UTC) - Pinging @-sche. --Victar (talk) 03:31, 2 March 2018 (UTC)
- @Victar: But what about
inc-ohi
(Old Hindi),inc-ogu
(Old Gujarati)? —AryamanA (मुझसे बात करें • योगदान) 04:10, 2 March 2018 (UTC) roa-opt
(Old Portuguese) —AryamanA (मुझसे बात करें • योगदान) 04:11, 2 March 2018 (UTC)- @AryamanA: Right, all not ISO codes, but yes, it seems wiki sub-codes are reversed, so
inc-mas
is appropriate. +1 --Victar (talk) 04:26, 2 March 2018 (UTC)- Yes, in my experience when we make our own codes, we tend to have the code approximate the name with the words in the same order (so, "inc-mas"); the ISO's "backwards" order may be a product of internally preferring names like "German, Old High" for sorting reasons and/or preferring codes that sort "nearby" ("fr", "frm"), or just a result of their not always approximating language names as well as they could (they couldn't use "mfr" for "Middle French" because they already use it for "Marrithiyel", despite that word not having an "f" in it). - -sche (discuss) 04:56, 2 March 2018 (UTC)
- ISO codes are often quite inconsistent, e.g.
owl
for Old Welsh butwlm
for Welsh, Middle; neither of which uses the native name Cymraeg the waycy
for Modern Welsh does. —Mahāgaja (formerly Angr) · talk 15:41, 3 March 2018 (UTC)
- ISO codes are often quite inconsistent, e.g.
- Yes, in my experience when we make our own codes, we tend to have the code approximate the name with the words in the same order (so, "inc-mas"); the ISO's "backwards" order may be a product of internally preferring names like "German, Old High" for sorting reasons and/or preferring codes that sort "nearby" ("fr", "frm"), or just a result of their not always approximating language names as well as they could (they couldn't use "mfr" for "Middle French" because they already use it for "Marrithiyel", despite that word not having an "f" in it). - -sche (discuss) 04:56, 2 March 2018 (UTC)
- @AryamanA: Right, all not ISO codes, but yes, it seems wiki sub-codes are reversed, so
- @Victar: But what about
Proposed change to Japanese entry format - using kana as the main entry form
[edit]Continuing the discussion from last month at Wiktionary:Beer parlour/2018/February#Related: Status of hiragana entries.
In the process of cleaning up after some anon edits, I reworked the hiragana entry at うまい (umai) to show an example of what it might look like if we were to use the kana entries to store the main content, rather than the current practice of using kana entries only as soft redirects to the kanji spellings. The うまい (umai) entry is a bit of a simpler example, as this term only has one etymology. I think it can still help to illustrate how we might lay things out, and how we might show how different kanji spellings are applied to different senses of the same lemma.
For those interested in Japanese entries here, please read the linked thread above, have a look at the うまい (umai) entry, compare to 上手い (umai) and 旨い (umai) as (currently less comprehensive) examples of the conventional kanji-focused format, and discuss here as appropriate.
TIA, ‑‑ Eiríkr Útlendi │Tala við mig 21:21, 1 March 2018 (UTC)
- Support. I think this is a step long overdue, but it also means there will be a lot of work... Suggestions: the kanji forms on def lines need to be made more conspicuous, cf. the 【】 notation in JA dictionaries. Maybe an additional template is indicated. Also I think the kanji forms can take even less information, provided we incorporate content into the kana entries, including the conj table, which can be extended to display multiple kanji forms in one cell. Wyang (talk) 22:01, 1 March 2018 (UTC)
- I should have included this earlier -- @Wyang, please have a look at 巧い (umai) as an example of a kanji spelling entry as a soft-redirect to the fuller kana entry.
- I agree that some formatting, and probably different (new?) templates, may well be called for. ‑‑ Eiríkr Útlendi │Tala við mig 00:12, 2 March 2018 (UTC)
- I'd like to see more complex examples, and what sort of templates could improve them. As Wyang says, this is going to be a very big job — Chinese unification was also a big job, so I know it can be done, but more planning is necessary first. —Μετάknowledgediscuss/deeds 02:59, 2 March 2018 (UTC)
My thoughts (mostly related to practical usability):
- ja.wt has definitions on the kanji entry for Sinoxenic words; for example: ja:意味 (imi). Is this something we should consider doing?
- Paper dictionaries use kana as the main entry because they are sorted alphabetically and have multiple words on one page.
- Other online dictionaries don't have these problems because they are more database-like.
- How do we indicate the rarity of a kanji spelling?
Personally I think that the most common spelling should be used, solely because of usability (which admittedly can be unsightly), but kana entries make a lot of sense for native words. —suzukaze (t・c) 03:48, 2 March 2018 (UTC)
- I thought this proposal only affects native Japanese words, and that Sinoxenic words would be kept at their kanji spellings (?) Wyang (talk) 00:31, 4 March 2018 (UTC)
- As initially conceived, I hadn't fully considered Sinoxenic terms. In light of the discussion above, I agree with the coalescing consensus that Japanese Sinoxenic terms (those deriving originally from Chinese borrowings) should use the kanji spellings for the lemmata, with the kana spellings serving as soft redirects -- much as the current status quo. Meanwhile, native Japanese terms and fully-nativized borrowings (such as たばこ (tabako), which is old enough that it has multiple broadly accepted kanji spellings) would have lemmata content moved to the kana spellings, with the kanji spellings serving as soft redirects -- the opposite of the current status quo. ‑‑ Eiríkr Útlendi │Tala við mig 09:08, 4 March 2018 (UTC)
- What about words like 故郷・故郷 (kokyō furusato)? —suzukaze (t・c) 01:43, 5 March 2018 (UTC)
- You bring up a good point, that Japanese is sometimes variable enough in the kanji spellings, but consistent in the kana, that it might make sense to use kana for lemma entries even for Sinoxenic terms.
- As a counterargument, Daijirin lists at least 26 different kanji spellings for the kana sequence かんせい (kansei). If we were to use かんせい as the lemma, the entry would be quite horrifically huge. This is not true for every Sinoxenic reading, but it's common enough that we need to consider the ramifications. ‑‑ Eiríkr Útlendi │Tala við mig 20:48, 13 March 2018 (UTC)
- What about words like 故郷・故郷 (kokyō furusato)? —suzukaze (t・c) 01:43, 5 March 2018 (UTC)
- As initially conceived, I hadn't fully considered Sinoxenic terms. In light of the discussion above, I agree with the coalescing consensus that Japanese Sinoxenic terms (those deriving originally from Chinese borrowings) should use the kanji spellings for the lemmata, with the kana spellings serving as soft redirects -- much as the current status quo. Meanwhile, native Japanese terms and fully-nativized borrowings (such as たばこ (tabako), which is old enough that it has multiple broadly accepted kanji spellings) would have lemmata content moved to the kana spellings, with the kanji spellings serving as soft redirects -- the opposite of the current status quo. ‑‑ Eiríkr Útlendi │Tala við mig 09:08, 4 March 2018 (UTC)
- Well, the entry looks useful, but the information beneath the definitions needs more collapsing. After all, definitions are the primary function of this site, everything else is secondary data, so ease-of-access to the defs should be our primary concern. Korn [kʰũːɘ̃n] (talk) 09:16, 4 March 2018 (UTC)
- @Korn -- do you mean うまい (umai)? There was a thread somewhere about making usexes auto-collapsing, similar to the current behavior of quotes. I think it was Wiktionary:Beer_parlour/2018/March#Hiding_usexes. ‑‑ Eiríkr Útlendi │Tala við mig 18:55, 14 March 2018 (UTC)
- Support for now. I think we'll be able to iron out problems. —suzukaze (t・c) 01:43, 5 March 2018 (UTC)
- Neither way is perfect, in my opinion.
- A kana or mixed spelling may be more common for Sino-Japanese spellings as well, especially for complex or rare characters.
- If we choose kana forms for lemmas, then it would make sense to do this for Sino-Japanese terms as well. By Sino-Japanese, I mean all terms using on'yomi readings, not necessarily just terms borrowed from any form of Chinese.
- I prefer the status quo but to care about duplication of contents. Perhaps native verbs and adjectives should only have inflections in the kana entries. --Anatoli T. (обсудить/вклад) 02:36, 5 March 2018 (UTC)
- Neither way is perfect, in my opinion.
- As another example, I recently reworked the あばく (abaku) entry. This spelling has three different etymologies by current research, all of which seem to be at least loosely related. The terms have three different kanji spellings, and one etymology for which no kanji spelling is (yet?) attested. ‑‑ Eiríkr Útlendi │Tala við mig 20:48, 13 March 2018 (UTC)
- I hope this discussion doesn't die down. I think this layout is logical, and will prove to be much superior in the long run. The kanas were designed specifically for this reason (i.e. to record wago more accurately), and this makes the hiragana forms the most suitable lemma forms, out of all the possible variant forms of a word. Wyang (talk) 10:39, 14 March 2018 (UTC)
- If the discussion dies down, it likely means everything was said by everyone who cares. Then it's time to be bold and just do the stuff that was agreed on. Korn [kʰũːɘ̃n] (talk) 10:59, 14 March 2018 (UTC)
- I hope this discussion doesn't die down. I think this layout is logical, and will prove to be much superior in the long run. The kanas were designed specifically for this reason (i.e. to record wago more accurately), and this makes the hiragana forms the most suitable lemma forms, out of all the possible variant forms of a word. Wyang (talk) 10:39, 14 March 2018 (UTC)
- Follow-on:
- In response to an RFE from Korn [kʰũːɘ̃n], I updated the 貴方 and あなた entries.
- However, I wasn't sure how to handle this -- I expanded the entry first at あなた, and then got stuck with how best to proceed at 貴方 (anata). Since the kanji spelling also has the reading kihō, I felt uncertain about using the first etymology solely as a soft redirect, and opted (for now, anyway) to copy-paste the content from あなた so at least it's available.
- Does anyone have advice on how best to format one etym section on a kanji-spelling entry as a soft-redirect to the wago native-Japanese entry at the kana spelling, when that kanji spelling has other kango Chinese-derived readings?
- If we have a good idea for that approach, we should also apply it to the anata reading under the 彼方 spelling. ‑‑ Eiríkr Útlendi │Tala við mig 03:45, 2 April 2018 (UTC)
- Support In agreement with Suzukaze. As for etymologies, if we're going to put the actual entry at the kana form, isn't it customary on Wiktionary not to include etymologies on alternative spellings at all? We could just have two or three (if we split ánata from anáta~ánata) entries separated by a pronunciation section or we could even have one big entry under a new Header called 'Kanji spelling' or something along those lines of thought. And speaking of pronunciation sections, I'd actually expect a dictionary to split up e.g. はし into three entries. (And we do that for all languages I've seen so far.) They're not pronounced the same way, not always spelled the same way, and do not mean the same thing, they're clearly three words and thus three entries. Korn [kʰũːɘ̃n] (talk) 11:30, 2 April 2018 (UTC)
- @Korn [kʰũːɘ̃n]: The はし (hashi) entry is definitely in need of rework; looking at my sources to hand, I see a quick total of 10 spellings and 9 senses, with around 6 or 7 etymologies (depending on how separatist one wants to be), and 4 derivational groups (of related etyms). If we adopt a similar approach as at あばく (abaku), we would break each etym out into its own section, and indicate on each sense line which kanji are used for that sense (if any; あばく has ).
- I'd like to avoid coming up with new headers if possible. Would the format at あばく (abaku) work for you? ‑‑ Eiríkr Útlendi │Tala við mig 18:00, 2 April 2018 (UTC)
- The formatting of あばく is what I had in mind, yes. Although I'm a general proponent of nesting etymologies under pronunciations, which is only of secondary importance to the topic at hand. Korn [kʰũːɘ̃n] (talk) 20:52, 2 April 2018 (UTC)
- @Korn [kʰũːɘ̃n] -- I drastically pruned the first etymology at 貴方 to just redirect entirely to あなた. I think I may have gone too far, but anything else would amount to duplicating information at both entries. Do you have any thoughts on this? ‑‑ Eiríkr Útlendi │Tala við mig 17:35, 4 April 2018 (UTC)
- That's probably an approach which needs discussing with a greater part of Wiktionary. The traditional approach on Wiktionary is to still give alternative spellings an actual header and line for the entry, see coalmine. While I don't think that content-wise this adds anything, it's so unusual and short without those extra lines, it might be somewhat easy to miss. Our standard format isn't really suited to alternative spellings since we always require etymology or pronunciation header even though it's agreed that the alternative spelling should have as little information duplicated from the actual entry as possible. We could consider putting the alternative spelling-template into the etymology section or something... That would at least provide some visual emphasis on the single line. Then again, if we want to add a declension template, do we just put it right beneath the etymology? Korn [kʰũːɘ̃n] (talk) 19:09, 4 April 2018 (UTC)
- @Korn [kʰũːɘ̃n] -- I drastically pruned the first etymology at 貴方 to just redirect entirely to あなた. I think I may have gone too far, but anything else would amount to duplicating information at both entries. Do you have any thoughts on this? ‑‑ Eiríkr Útlendi │Tala við mig 17:35, 4 April 2018 (UTC)
- The formatting of あばく is what I had in mind, yes. Although I'm a general proponent of nesting etymologies under pronunciations, which is only of secondary importance to the topic at hand. Korn [kʰũːɘ̃n] (talk) 20:52, 2 April 2018 (UTC)
- Support In agreement with Suzukaze. As for etymologies, if we're going to put the actual entry at the kana form, isn't it customary on Wiktionary not to include etymologies on alternative spellings at all? We could just have two or three (if we split ánata from anáta~ánata) entries separated by a pronunciation section or we could even have one big entry under a new Header called 'Kanji spelling' or something along those lines of thought. And speaking of pronunciation sections, I'd actually expect a dictionary to split up e.g. はし into three entries. (And we do that for all languages I've seen so far.) They're not pronounced the same way, not always spelled the same way, and do not mean the same thing, they're clearly three words and thus three entries. Korn [kʰũːɘ̃n] (talk) 11:30, 2 April 2018 (UTC)
- Oppose At least any sort of simplistic "Japanese headwords are kana". I think that obviously Chinese compounds belong with their corresponding Chinese (etc) counterparts; meanwhile, Daijirin has three entries for うまい. WK can't do that (unless I misunderstand; but how could it?) so in order to restrict the number of unrelated homonyms within a single entry, kanji headwords are better. I think that in any event such a change needs to provide a very carefully thought out plan for how the various cases would be dealt with. Imaginatorium (talk) 12:07, 2 April 2018 (UTC)
- @Imaginatorium: The general consensus emerging above, as I understand it, is that Sinoxenic words -- kango and other on'yomi -- would have lemma information under the kanji spelling, while native Japanese words -- wago, i.e. kun'yomi -- would have lemma information under the kana spelling. The former avoids ugliness like the (at least) 26 different potential kanji spellings for the かんせい (kansei) reading, each with separate etyms and other details, and the latter avoids ugliness like what to do with the multiple overlaps for the verb つく (tsuku), which has around 15 different potential kanji spellings, broadly grouped into three conjugation patterns and two pitch accent groups, and with etymological overlap between most of them. The kanji spellings for wago like つく (tsuku) are incidental to etymology, but integral to etymology for kango like 完成 (kansei).
- Our うまい (umai) entry is presently incomplete, as you note. I expanded that earlier as an example of what a specific wago entry might look like, focusing on the most-common term under that kana representation. Re: うまい (umai) and Daijirin, one of Daijirin's entries is wago as the same term now at our うまい (umai) entry. Another one is also wago with varied kanji spellings of 味寝, 熟寝, and 熟睡, ultimately a compound of stem uma- from adjective うまい (umai) (our current entry) + い (i), an OJP noun meaning sleep. The last is a later compound of Sinoxenic 右 (u, “right, righthand side”) + native 舞 (mai, “dance, dancing”). KDJ includes another wago compound of 馬 (uma, “horse”) + 居 (i, “being, being in a place”), apparently an obsolete term in reference to being on horseback.
- You mentioned, “WK can't do that (unless I misunderstand; but how could it?)”. All of these can be treated on the うまい (umai) page under separate etymology sections, as terms with the same kana representation and different derivations -- no different than what we have done so far at entries such as 柄, where we have multiple terms with the same kanji representation and different derivations. I will further expand the Japanese section at うまい (umai) as time allows to include these other terms, providing a fuller example of what this approach might look like.
- You also mentioned, “...so in order to restrict the number of unrelated homonyms within a single entry, kanji headwords are better.” For Sinoxenic terms, I quite agree: the kanji spellings are integral to the etymologies, and different spellings indicate unrelated terms. For native-derived wago terms, however, focusing on kanji can very much muddle the picture: the term came first (basically, the kana spelling), and the kanji came later, sometimes resulting in one native term with multiple possible kanji spellings. This situation is not uncommon for wago: native terms with wide-ranging utility have collected multiple kanji spellings over the years, with the spellings used to clarify intended sub-senses. It's a bit like if the English term set were spelled differently for each shade of meaning -- 置 for to put something down, 付 for to attach or affix to something else, 定 for to determine or settle, 整 for to adjust, 配 for to arrange with dishes and cutlery, etc., but where all the spellings are ultimately pronounced as set, and all ultimately equating to the same verb set.
- So, “in order to restrict the number of unrelated homonyms within a single entry,”, we can also argue that kana headwords are better -- for native-derived wago.
- I think あばく (abaku) might be a better example for this approach than うまい (umai) in its current state -- あばく (abaku) has three etym sections, but all are likely related; there are also two kanji spellings commonly used with the first etym's senses, one more kanji spelling of the second etym's senses, and no kanji spelling on record for the last etym. If we focused on kanji spellings as the headwords even for native-derived wago, the content at あばく (abaku) would be scattered in a way that would make it harder to see the interrelations.
- → Would you support an approach where Sinoxenic lemma entries are under the kanji spellings, and native-Japanese-derived lemma entries are under the kana spellings? ‑‑ Eiríkr Útlendi │Tala við mig 18:00, 2 April 2018 (UTC)
- Rather Oppose: I always find it better to use the commonest spelling, which may or may not be written with kanji, depending on each word. That is the way to reflect reality. — TAKASUGI Shinji (talk) 05:39, 22 April 2018 (UTC)
- But reality could have some other indicator in the newly developed standard entry format rather than forcing editors to find out the frequency of spellings of every entry (which might or might not be trivial) and then refactor an entry in question, and then force users to be exposed to varying formats of Japanese entries (which likely is not a grave factor of confusion, but an eliminable one). Korn [kʰũːɘ̃n] (talk) 08:59, 22 April 2018 (UTC)
- You can just create an entry. It is someone else’s task to move it to the most frequent spelling. It is very important for a dictionary to have frequency information, lack of which can result in uselessness. — TAKASUGI Shinji (talk) 12:10, 22 April 2018 (UTC)
- I don't really agree with this view. Printed dictionaries always have consistent headwords and formats, and Wiktionary should be no exception. It doesn't look professional at all if our entry headwords for Japanese have a hotchpotch and rather unpredictable set of Kanji and kana forms. This is why we have opted for the policy of centralising lemma information on traditional Chinese forms, rather than a frequency- or creation-time-based mix. Additionally, there is no way to reliably determine usage frequency (#ghits is not a reliable measurement). Wyang (talk) 12:20, 22 April 2018 (UTC)
- I agree with Shinji but I think it is a bit difficult to compare with paper dictionaries. Japanese dictionaries and references I use, e.g. Kodansha, are sorted by kana but listed as separate entries, showing the common kanji spelling or nothing if only kana is used. E.g. 引く (hiku) and 弾く (hiku) or 易しい (yasashii) and 優しい (yasashii) are separate entries with examples, rare kanji are marked with [ ]. Alt forms can be separated with commas but this dictionary doesn't display etymologies or shinjitai/kyūjitai info. IMHO, it will get messy if we choose kana for lemmas. I prefer the status quo but we do need centralisation of Japanese entries under a main entry but I also think we should choose the most standard, common and current variant. Other dictionaries can help with that. Of of the equal variants can become an alt form, just like we do with other languages. --Anatoli T. (обсудить/вклад) 13:28, 22 April 2018 (UTC)
- As a counterargument, have a look at Daijirin's entry for ひく at Weblio, clearly showing four etymological groupings, with kanji spellings grouped by etym. Senses usually spelled with a specific kanji show that on each sense line, as 《引・牽・曳》 or 《引・退》. ‑‑ Eiríkr Útlendi │Tala við mig 18:24, 22 April 2018 (UTC)
- I agree with Shinji but I think it is a bit difficult to compare with paper dictionaries. Japanese dictionaries and references I use, e.g. Kodansha, are sorted by kana but listed as separate entries, showing the common kanji spelling or nothing if only kana is used. E.g. 引く (hiku) and 弾く (hiku) or 易しい (yasashii) and 優しい (yasashii) are separate entries with examples, rare kanji are marked with [ ]. Alt forms can be separated with commas but this dictionary doesn't display etymologies or shinjitai/kyūjitai info. IMHO, it will get messy if we choose kana for lemmas. I prefer the status quo but we do need centralisation of Japanese entries under a main entry but I also think we should choose the most standard, common and current variant. Other dictionaries can help with that. Of of the equal variants can become an alt form, just like we do with other languages. --Anatoli T. (обсудить/вклад) 13:28, 22 April 2018 (UTC)
- We are not making a printed dictionary. Centralization is surely necessary for maintenance reasons, but we don’t have a page limit for alternative spellings. Here is an example: the Japanese Wiktionary has an established rule, according to which they write native Japanese words with hiragana and Sino-Japanese words with kanji. First, check their entry for うつ, and you will find a verb that basically means “to hit”. Then check Google images for うつ, and you will find a totally different thing: 鬱, which should be moved to うつ. — TAKASUGI Shinji (talk) 14:18, 22 April 2018 (UTC)
- @Shinji, I'm afraid I'm a little confused as to your specific preference. The emerging majority view above (as I understand it) is a slight modification of the JA WT approach. Our うつ entry would mostly be for the wago term, similar to the current entry at ja:うつ, with an additional etym section at the bottom, similar to the current
===Etymology 1===
section now at よう listed as "Kana spelling of various words", which provides soft redirects to the kanji headwords. So the main content for 鬱 (utsu, on'yomi) would stay at 鬱, and the うつ entry would include a soft redirect to 鬱. Would that work for you? ‑‑ Eiríkr Útlendi │Tala við mig 19:33, 22 April 2018 (UTC)- I am rather against the majority here. I mean we should have a main entry for the “depression” sense in うつ (hiragana) and a soft redirect in 鬱 (kanji) even though it is a Sino-Japanese word, because that is how most Japanese speakers write it. — TAKASUGI Shinji (talk) 03:39, 23 April 2018 (UTC)
- (I believe this is the same reasoning I used when I made the entry 抗うつ薬 using うつ.) —Suzukaze-c◆◆ 04:49, 24 April 2018 (UTC)
- I'm not so familiar with Japanese, but I think I agree with Shinji that entries should be centralized at the most common form rather than the kana form by default. The proposed format would require usage notes describing the orthographic situation that would not be needed if the most common form is chosen as the main entry. This is different from the situation with Chinese: the traditional and simplified forms are generally both common in their contexts and would not need a usage note in almost all cases. When more than one traditional form exists, we pick the most common or most representative traditional form. — justin(r)leung { (t...) | c=› } 05:08, 24 April 2018 (UTC)
- (I believe this is the same reasoning I used when I made the entry 抗うつ薬 using うつ.) —Suzukaze-c◆◆ 04:49, 24 April 2018 (UTC)
- I am rather against the majority here. I mean we should have a main entry for the “depression” sense in うつ (hiragana) and a soft redirect in 鬱 (kanji) even though it is a Sino-Japanese word, because that is how most Japanese speakers write it. — TAKASUGI Shinji (talk) 03:39, 23 April 2018 (UTC)
- @Shinji, I'm afraid I'm a little confused as to your specific preference. The emerging majority view above (as I understand it) is a slight modification of the JA WT approach. Our うつ entry would mostly be for the wago term, similar to the current entry at ja:うつ, with an additional etym section at the bottom, similar to the current
- I don't really agree with this view. Printed dictionaries always have consistent headwords and formats, and Wiktionary should be no exception. It doesn't look professional at all if our entry headwords for Japanese have a hotchpotch and rather unpredictable set of Kanji and kana forms. This is why we have opted for the policy of centralising lemma information on traditional Chinese forms, rather than a frequency- or creation-time-based mix. Additionally, there is no way to reliably determine usage frequency (#ghits is not a reliable measurement). Wyang (talk) 12:20, 22 April 2018 (UTC)
- You can just create an entry. It is someone else’s task to move it to the most frequent spelling. It is very important for a dictionary to have frequency information, lack of which can result in uselessness. — TAKASUGI Shinji (talk) 12:10, 22 April 2018 (UTC)
- But reality could have some other indicator in the newly developed standard entry format rather than forcing editors to find out the frequency of spellings of every entry (which might or might not be trivial) and then refactor an entry in question, and then force users to be exposed to varying formats of Japanese entries (which likely is not a grave factor of confusion, but an eliminable one). Korn [kʰũːɘ̃n] (talk) 08:59, 22 April 2018 (UTC)
- @Eirikr The problem of not being able to categorize 避く(さく・よく) twice with different hiragana keys also exists with kango like 人間(にんげん・じんかん) and 強力(きょうりょく・ごうりき). Maybe we can have a different approach, such as to have Category:Japanese shimo nidan verbs include the kana forms さく and よく under the headers さ and よ, and the shared kanji form 避く under the radical 辵(辶)? (My main motivation for supporting a hiragana-as-lemma approach is better synchronization between kanji and kana entries, in a similar way to
{{zh-forms}}
fetching the definitions and categories automatically from the main entry so one doesn't have to take care of both. Another motivation is to avoid repeating|sort=ひらがな
in every categorizing template.)--Dine2016 (talk) 07:26, 16 May 2018 (UTC)- The (frankly, horribly) inadequate sort mechanics are a huge problem for Japanese. I've posted a couple times over the years over at Meta, in 2012 and again in 2016, only to be deafened by the silence. I also tried posting to Phabricator in 2016, but met the same response. I'm pretty sure I brought this issue up even earlier than 2012, but I cannot find it.
- I'm open to what you describe, sorting kana under kana and then changing the categorization of kanji entries to sort under radical for the first kanji character. I believe the first part already happens automatically as part of the creation of the kana entries, even when just creating these as soft redirects. The second part would require further coding. ‑‑ Eiríkr Útlendi │Tala við mig 16:38, 16 May 2018 (UTC)
- @Justinrleung Maybe the problem of the most common form can be solved by making
{{ja-def}}
support extra labels such as{{ja-def|儘-a|侭-a,exts}}
for まま to show the archaicness of the kanji spellings, eliminating the need of usage notes? We can also have tags like "more common than the hiragana form" and "as common as the hiragana form" but make them less eye-catching. --Dine2016 (talk) 07:26, 16 May 2018 (UTC)- @Dine2016 -- I like the general idea. I have some concerns about the syntax:
- All other JA templates use positional or named parameters for additional metadata, rather than cramming the term and param values all in one string, which then requires additional parsing.
- Using extremely short values like
a
increases the difficulty for editors to remember, and increases the room for collisions. Just in the existing JA template infrastructure, excessive shortening of parameter values has led to the case whereyomi=ko
in one template means "kan'on", and in another it means "yutōyomi" (apparently from "kun-on"). Keeping things spelled out makes things immediately clearer to the humans using these tools. Even just including more letters would help. As a case-in-point of excessive value shortening, I have no idea what theexts
in your example is supposed to mean...
- Considering the above, would you be open to a template format more like the following?
{{ja-def|儘|lb1=archaic|侭|lb2=archaic,???}}
- ‑‑ Eiríkr Útlendi │Tala við mig 16:38, 16 May 2018 (UTC)
- @Eirikr Of course. The format given above is only an example and I expect such issues to be further discussed if and once the hiragana-as-lemma approach is agreed upon. IMO using the hiragana spelling as lemma accelerates the creation of kanji forms as well as making them more database-like (such as frequency information more compact).
exts
stand for extended shinjitai. --Dine2016 (talk) 02:04, 17 May 2018 (UTC)
- @Eirikr Of course. The format given above is only an example and I expect such issues to be further discussed if and once the hiragana-as-lemma approach is agreed upon. IMO using the hiragana spelling as lemma accelerates the creation of kanji forms as well as making them more database-like (such as frequency information more compact).
- @Dine2016 -- I like the general idea. I have some concerns about the syntax:
@Dine2016 How would entries like 帰りの会, 脈打つ, 虎穴に入らずんば虎子を得ず (mixing of native + Sinitic terms), フッ素 (kanji not used), etc. be handled? —Suzukaze-c◇◇ 08:48, 30 December 2018 (UTC)
- @Suzukaze-c: You've brought up a good point. I support lemmatizing at the most common spelling as the default rule, and making the core wago vocabulary an exception. The first reason I can offer is stability in writing. The entries you listed all have stabilized spellings, so they should obviously be lemmatized at the usual kanji spelling. On the other hand, the core wago vocabulary have a greater degree of independence from and variety in combination with kanji. For example, EDICT includes four kanji spellings of kaeri and about ten of ut-, though only one of ir-. So it's better to present the first two in a kanji-neutral way. The second reason is etymology. The etymology of compounds and expressions is simply the sum of their constituent parts with a focus on meaning, so kanji spellings are no hindrance. The etymology of single, indecomposable wago words, however, must be studied through the sound shape of the word, and kana (みずうみ = みず + うみ) exhibits facts better than kanji (湖 = 水 + 海). --Dine2016 (talk) 13:51, 30 December 2018 (UTC)
- That is what I was thinking too. 👍 —Suzukaze-c◇◇ 05:10, 31 December 2018 (UTC)
Middle Japanese again
[edit]We still have 6 Middle Japanese entries without any code for that language- can we decide what should be done? DTLHS (talk) 02:43, 2 March 2018 (UTC)
- Forgive me, I can't remember how to find those. Are they in a category? ‑‑ Eiríkr Útlendi │Tala við mig 01:02, 3 March 2018 (UTC)
- I'm still not sure what to do about these. @Chuck, @-sche, for some reason you two stick out in my mind as infrastructure folks. Can you provide concrete guidance? Or suggest who could? ‑‑ Eiríkr Útlendi │Tala við mig 06:52, 29 March 2018 (UTC)
- It's easy enough to create a code for Middle Japanese and add it to Module:languages/datax. (I'd recommend
jpx-mja
.) Do we want it to be a full-fledged language or an etymology-only language? —Mahāgaja (formerly Angr) · talk 12:21, 29 March 2018 (UTC)- Yes, as Mahagaja says, it's simple enough to create a code, but what kind of code? The previous discussion at Wiktionary:Beer parlour/2018/February#Middle_Japanese was not terribly conclusive. How different is written Middle Japanese from written Old Japanese? If it's not that different, the conservative approach would be to create an etymology-only code and handle the entries as Old Japanese with a temporal label or defdate and with the pronunciation information given in the pronunciation section as is done for Chinese. - -sche (discuss) 19:30, 29 March 2018 (UTC)
- How different is Middle Japanese from Classical Japanese? Korn [kʰũːɘ̃n] (talk) 19:44, 29 March 2018 (UTC)
- Yes, as Mahagaja says, it's simple enough to create a code, but what kind of code? The previous discussion at Wiktionary:Beer parlour/2018/February#Middle_Japanese was not terribly conclusive. How different is written Middle Japanese from written Old Japanese? If it's not that different, the conservative approach would be to create an etymology-only code and handle the entries as Old Japanese with a temporal label or defdate and with the pronunciation information given in the pronunciation section as is done for Chinese. - -sche (discuss) 19:30, 29 March 2018 (UTC)
- It's easy enough to create a code for Middle Japanese and add it to Module:languages/datax. (I'd recommend
- How different is written Middle Japanese from written Old Japanese? -- Quite. There are verb endings and grammatical constructions in Old Japanese that are largely missing from Middle / Classical. I've been told that most folks educated in Japan can read Classical Japanese with varying degrees of fluency, whereas Old Japanese requires transcription first into more-modern spellings, and even after that it is harder to understand beyond a few snippets here and there.
- How different is Middle Japanese from Classical Japanese? -- These two are broadly the same thing.
- Re: etym-only or full, I'm leaning towards full. Some verbs are only attested as having 四段活用 (yodan katsuyō, “quadrigrade conjugation”), where the verb's conjugable stems only end in -a, -i, -u, or -e, meaning these verbs fell out of use before the sound shifts (or at least orthographical shifts) to the modern 五段活用 (godan katsuyō, “quintigrade conjugation”), where the stems end in all five possible Japanese vowels (including -o). These would require different conjugation templates, for instance.
- @Shinji, @suzukaze, @Wyang, @Nibiko, @Anatoli, @POKéTalker, @Eryk Kij, @馬太阿房, anyone I'm missing, what are your thoughts on this? ‑‑ Eiríkr Útlendi │Tala við mig 21:23, 29 March 2018 (UTC)
- I do not know a lot about non-modern Japanese. —suzukaze (t・c) 22:21, 29 March 2018 (UTC)
- Q: name - Classical or Middle Japanese? Are there a lot of specifically MJ resources (dictionaries etc.) for us to work with? Wyang (talk) 00:20, 30 March 2018 (UTC)
- Regardless of how much the lang code covers the Classical Japanese literary norm, Middle Japanese is the less ambiguous and also more self-explanatory term. Korn [kʰũːɘ̃n] (talk) 11:44, 11 April 2018 (UTC)
- @Korn [kʰũːɘ̃n]: Agreed on this point. ‑‑ Eiríkr Útlendi │Tala við mig 18:01, 11 April 2018 (UTC)
- @Wyang: Modern monolingual JA dictionaries include entries for verbs listed as 四段活用 (yodan katsuyō, “quadrigrade conjugation”), which are regarded as Classical / Middle. Any other term listed as 文語 (bungo, literally “literary word”) would also be Classical / Middle.
- Given the presence of such terms even in modern dictionaries, I'm starting to wonder if labeling would be more appropriate than a full-on language header... ‑‑ Eiríkr Útlendi │Tala við mig 18:01, 11 April 2018 (UTC)
- Well, one death we have to die, to use a German saying. Either we have Middle Japanese terms and conjugations entered as 'Japanese' or we have 'Middle Japanese' entries doubled with the same entry in 'Japanese' (dogmatically the most proper way if you ask me and this is how we handle it in other languages) or we have Modern Japanese terms labeled as 'still in use' entered as 'Middle Japanese'. Korn [kʰũːɘ̃n] (talk) 19:46, 11 April 2018 (UTC)
- I know nothing about Middle Japanese (and as good as nothing about modern Japanese either), but if there are words written the same in Middle Japanese as they are in modern Japanese, then we can just create two entries for them. We already do that for Middle vs. modern English, e.g. cause, dragon, face (though to be fair most of our Middle English entries do seem either to be spelled differently or to have different meanings from their modern English equivalents). —Mahāgaja (formerly Angr) · talk 20:02, 11 April 2018 (UTC)
- Well, one death we have to die, to use a German saying. Either we have Middle Japanese terms and conjugations entered as 'Japanese' or we have 'Middle Japanese' entries doubled with the same entry in 'Japanese' (dogmatically the most proper way if you ask me and this is how we handle it in other languages) or we have Modern Japanese terms labeled as 'still in use' entered as 'Middle Japanese'. Korn [kʰũːɘ̃n] (talk) 19:46, 11 April 2018 (UTC)
- Regardless of how much the lang code covers the Classical Japanese literary norm, Middle Japanese is the less ambiguous and also more self-explanatory term. Korn [kʰũːɘ̃n] (talk) 11:44, 11 April 2018 (UTC)
Book Pahlavi in Unicode
[edit]Man, am I the only one that's pissed that Book Pahlavi hasn't been added to Unicode yet? Why the heck hasn't this proposal gone forward?! --Victar (talk) 03:15, 2 March 2018 (UTC)
- It seems like Unicode blogged about working on it two days ago: [1] —suzukaze (t・c) 03:17, 2 March 2018 (UTC)
- Woot! Thanks for sharing, @Suzukaze-c. I hope the move quickly on it. --Victar (talk) 03:24, 2 March 2018 (UTC)
- lol, that's great! I am annoyed at having Latin script Middle Persian Entries too. —AryamanA (मुझसे बात करें • योगदान) 04:14, 2 March 2018 (UTC)
- More frustrating for me is that we already have Manichean Unicode but no good fonts to support it. Crom daba (talk) 10:39, 2 March 2018 (UTC)
- I'm still waiting for Tocharian. —Mahāgaja (formerly Angr) · talk 11:24, 2 March 2018 (UTC)
- @Crom daba:, I've been using this one, which I ripped from a Unicode proposal PDF. --Victar (talk) 16:08, 2 March 2018 (UTC)
- Wow, thanks! I've been periodically checking this page, but they don't list whatever font this is. Crom daba (talk) 16:15, 2 March 2018 (UTC)
- @Crom daba: Yeah, it's not publicly available yet. I ripped it from https://unicode.org/charts/PDF/U10AC0.pdf. FYI: @AryamanA, Vahagn Petrosyan --Victar (talk) 16:39, 2 March 2018 (UTC)
- Wow, thanks! I've been periodically checking this page, but they don't list whatever font this is. Crom daba (talk) 16:15, 2 March 2018 (UTC)
- More frustrating for me is that we already have Manichean Unicode but no good fonts to support it. Crom daba (talk) 10:39, 2 March 2018 (UTC)
Why is this under English lemmas? ---> Tooironic (talk) 15:55, 2 March 2018 (UTC)
- Fixed. Equinox ◑ 16:00, 2 March 2018 (UTC)
Middle Persian language codes
[edit]Now that we have Unicode Manichaean, and are soon getting Unicode Book Pahlavi, I think it imperative that we rehash the conversation on whether the two should still be split into separate languages codes, one for Pahlavi pal
, and the other for Manichean xmn
. My arguments for unifying them under one code are as follows:
- Book Pahlavi and Manichean are scripts (and a religion), not languages.
- The general pronunciation of Pahlavi and Manichaean is mostly identical, far less distinct that Old Avestan and Younger Avestan or Vedic Sanskrit and Classical Sanskrit.
- Unnessary category division, i.e. Category:Ancient_Greek_terms_borrowed_from_Manichaean_Middle_Persian.
Pinging @AryamanA, माधवपंडित, Vahagn_Petrosyan, -sche, ZxxZxxZ. --Victar (talk) 17:52, 3 March 2018 (UTC)
- Support Crom daba (talk) 18:00, 3 March 2018 (UTC)
- Support, but we should tag the variety inside the page using
{{lb}}
or something in the headword line. --Vahag (talk) 20:02, 3 March 2018 (UTC)- Agreed. --Victar (talk) 20:11, 3 March 2018 (UTC)
- Support -- माधवपंडित (talk) 03:07, 4 March 2018 (UTC)
- Support, but as Vahag said. —AryamanA (मुझसे बात करें • योगदान) 04:10, 4 March 2018 (UTC)
@-sche, do you have any thoughts before moving forward on this? --Victar (talk) 14:52, 14 March 2018 (UTC)
- Support; based on previous discussion it does seem like we're dealing with dialects and not separate languages, especially because it sounds like the two varieties have differences within themselves (temporally, regionally or otherwise), not just between each other. ISO/SIL also split some other languages by script, e.g. Luwian. In that case, we just picked one of the codes to use for both script varieties; we could do that here; it would have the advantage that we'd be using a shorter code and a recognized (ISO) one, but the disadvantage that it might be confusing for people to see content from the second lect under the first lect's code. - -sche (discuss) 18:00, 14 March 2018 (UTC)
- Support —*i̯óh₁n̥C[5] 19:28, 14 March 2018 (UTC)
- @-sche, why not run a bot to replace
{{(.+)|xmn|(.*)}}
andlang=xmn
withpal
? --Victar (talk) 21:33, 15 March 2018 (UTC)- That (or, the general concept of replacing the language code "xmn" with "pal", and changing the L2 headers at the same time) would work. In fact, it looks like we're dealing with so few entries that it would be feasible for me to do it with AutoWikiBrowser. Unless anyone has objections, I should have time to do that later. - -sche (discuss) 21:52, 15 March 2018 (UTC)
- Does anything more complex need to be done to maintain the functionality of Module:Mani-translit beyond adding it to the data for "pal"? - -sche (discuss) 22:06, 15 March 2018 (UTC)
- @-sche, nope, same functionality. If you're going to run some more bot conversions, adding
|sc=Mani
to thosexmn
entries would be awesome. --Victar (talk) 01:40, 16 March 2018 (UTC)
- @-sche, nope, same functionality. If you're going to run some more bot conversions, adding
- Does anything more complex need to be done to maintain the functionality of Module:Mani-translit beyond adding it to the data for "pal"? - -sche (discuss) 22:06, 15 March 2018 (UTC)
- That (or, the general concept of replacing the language code "xmn" with "pal", and changing the L2 headers at the same time) would work. In fact, it looks like we're dealing with so few entries that it would be feasible for me to do it with AutoWikiBrowser. Unless anyone has objections, I should have time to do that later. - -sche (discuss) 21:52, 15 March 2018 (UTC)
- @-sche, why not run a bot to replace
A fair number of entries have module errors because they still use xmn
. — Eru·tuon 19:23, 17 March 2018 (UTC)
- I've fixed all the ones I could, but take a look at Reconstruction:Proto-Indo-European/n̥- and Reconstruction:Proto-Indo-European/speḱ-. - -sche (discuss) 22:58, 17 March 2018 (UTC)
Spelling pronunciation
[edit]I've created a simple etymology template for marking (historical) spelling pronunciations. Is this okay with everyone? Crom daba (talk) 20:09, 3 March 2018 (UTC)
P.S. Once again I can't remember where I'm supposed to put category information other than Module:category tree/poscatboiler/data/terms by etymology to make {{autocat}}
work.
- Wiktionary:Beer_parlour/2014/January#spelling_pronunciations --Per utramque cavernam (talk) 20:11, 3 March 2018 (UTC)
- I suspected there was a hidden Chesterton fence here, but I figured this was the best way to find it.
- This is basically my use case: хязгаар#Mongolian, is this valid or not? Crom daba (talk) 20:36, 3 March 2018 (UTC)
- Is it not a pronunciation spelling? From what I (think I) understand, the letter г was added to reflect the pronunciation more accurately. A spelling pronunciation is the reverse process: altering the pronunciation and matching it to the spelling (pronouncing salmon /ˈsælmən/, for example). --Per utramque cavernam (talk) 21:30, 3 March 2018 (UTC)
- Yes, a spelling pronunciation is a pronunciation that's been altered because of how the word is spelled, like /ˈsælmən/ for salmon and a whole lot other examples. A pronunciation spelling, on the other hand, is a spelling that's been altered because of how the word is pronounced, like Enya for Eithne or (presumably) show for shew. —Mahāgaja (formerly Angr) · talk 21:59, 3 March 2018 (UTC)
- I guess I should try to write more clearly. Classical Mongolian ᠭ (g) stands for two different Proto-Mongolic phonemes (it's generally full of homography), and г (g) was added (to the [Khalkha] pronunciation, which is more faithfully reflected in Cyrillic orthography) as a misreading of ᠭ (g) as *g instead of *x (actually as a mixture of both). Crom daba (talk) 12:20, 4 March 2018 (UTC)
- I'm still not sure what's going on. Is the word pronounced the way it is etymologically expected to be pronounced, but "misspelled" (from an etymological point of view)? Or is it spelled the way it's etymologially expected to be spelled, but "mispronounced" (from an etymological point of view)? Or both, or neither? —Mahāgaja (formerly Angr) · talk 15:24, 4 March 2018 (UTC)
- It is mispronounced. ᠬᠢᠵᠠᠭᠠᠷ (kiǰaɣar) renders Proto-Mongolic *kïjaxar, which regularly goes to *kïjaar -> *xïjaar -> xyajaar -> hyadzaar and then (or in some intermediate steps) g was inserted because the spelling is ambiguous between *kïjaxar and **kïjagar.
- When I asked whether this use was valid, I meant whether everyone is fine with there being a template that would do the thing I did here (link to appendix + categorize), not whether this is an instance of a spelling pronunciation (I already know it is). Crom daba (talk) 00:00, 5 March 2018 (UTC)
- I'm still not sure what's going on. Is the word pronounced the way it is etymologically expected to be pronounced, but "misspelled" (from an etymological point of view)? Or is it spelled the way it's etymologially expected to be spelled, but "mispronounced" (from an etymological point of view)? Or both, or neither? —Mahāgaja (formerly Angr) · talk 15:24, 4 March 2018 (UTC)
- I guess I should try to write more clearly. Classical Mongolian ᠭ (g) stands for two different Proto-Mongolic phonemes (it's generally full of homography), and г (g) was added (to the [Khalkha] pronunciation, which is more faithfully reflected in Cyrillic orthography) as a misreading of ᠭ (g) as *g instead of *x (actually as a mixture of both). Crom daba (talk) 12:20, 4 March 2018 (UTC)
- Yes, a spelling pronunciation is a pronunciation that's been altered because of how the word is spelled, like /ˈsælmən/ for salmon and a whole lot other examples. A pronunciation spelling, on the other hand, is a spelling that's been altered because of how the word is pronounced, like Enya for Eithne or (presumably) show for shew. —Mahāgaja (formerly Angr) · talk 21:59, 3 March 2018 (UTC)
- Is it not a pronunciation spelling? From what I (think I) understand, the letter г was added to reflect the pronunciation more accurately. A spelling pronunciation is the reverse process: altering the pronunciation and matching it to the spelling (pronouncing salmon /ˈsælmən/, for example). --Per utramque cavernam (talk) 21:30, 3 March 2018 (UTC)
5,500,000
[edit]Hi! From my own counting, it seems that unquestioningness is the 5,500,000th pages to be created here. Congratulations Pamputt (talk) 09:26, 4 March 2018 (UTC)
- For what it's worth as confirmation, I arrive at the same conclusion. (I counted back through the recent changes list of new entries as of when there were 5,500,059 entries per Special:Statistics, and again when there were 5,500,062 (double checking), to find the 5,500,000th new entry.) Congratulations to Equinox! - -sche (discuss) 10:00, 4 March 2018 (UTC)
- Wiktionary:Milestones has been updated accordingly. SemperBlotto (talk) 10:09, 5 March 2018 (UTC)
Use of † in taxonomic entries
[edit]In biology, † is sometimes placed before a taxonomic name to indicate that it is extinct. Some of our Translingual taxonomic entries, although only a small portion, use this notation, e.g. at Smilodon in the headword line as well as elsewhere in the entry. Whether or not a taxon is extinct is not lexical information; indeed, any species could go extinct without its definition, etymology, gender, hyponyms, or other lexical metadata changing. Extinction status is purely encyclopaedic, and belongs at Wikipedia. It is also unexplained in entries, which may confuse readers. As a result, it would probably be best to remove it from our entries. —Μετάknowledgediscuss/deeds 20:33, 4 March 2018 (UTC)
- I'm not opposed to noting that a taxon is extinct, maybe in the definition ("an extinct species of..."), or it arguably is the sort of semi-lexical information we record in other cases by using
{{lb|en|historical}}
. (It does seem like a lot of effort to maintain.) But given that the cross/dagger also sometimes means the word or sense is obsolete — I think Chinese entries use it this way — and given that we have the space to spell things out, it's probably best to avoid using just the symbol. - -sche (discuss) 20:54, 4 March 2018 (UTC) - I don't work with these entries, but I agree with Metaknowledge. --Per utramque cavernam (talk) 14:03, 6 March 2018 (UTC)
- I don't understand how the word encyclopedic applies to the dagger. Even less can I understand how the fact of something being extinct or extant or endangered isn't information of considerable interest to someone looking at a taxonomic entry, as much as whether something is "endangered" or "red" or "large", or a bird. Have we really lost touch with normal dictionary users to that extent? As to the changeability of the status of something sub specie aeternitatis, the same applies to the very words we use to gloss other words.
- As to it not being explained, the same applies to m, f, n, and plenty of terms which use in labels, category names, etc, and even in definiens. We could link to [[†]] or to WT:GLOSS, approaches we have taken with a few of these others. DCDuring (talk) 15:25, 6 March 2018 (UTC)
- @DCDuring: It looks like you've "lost touch with normal dictionary users" if you think that they will know what the daggers mean and not be confused by their only occasional usage here. Gender is explained, if you merely hover your mouse over the letter. I agree with -sche that it is often appropriate to use "extinct" in the definition line, although not always (it probably isn't particularly useful for dinosaurs, for example). We can make those kinds of decisions without reliance on daggers. —Μετάknowledgediscuss/deeds 17:56, 6 March 2018 (UTC)
- I am glad you support the use of hover notices. That might be a desirable alternative to a link, though a link to a sense id gloss can include more information, eg context and is more accessible for us technically challenged contributors.
- I was actually surprised that you seemed to object to semantic content in your original complaint.
- The dagger is just the kind of orthographic element that we have lavishly honored in showing ligatures and obsolete English characters in entries, especially in alternative forms and in citations. DCDuring (talk) 20:05, 6 March 2018 (UTC)
- @DCDuring: Parts of your response are unintelligible to me, particularly the final sentence. I see no relationship between documenting all words that meet our criteria regardless of what characters they use and what we choose to put in our entries. Can you make a clear statement about what you want to do with this issue? To be abundantly clear, I want to remove all daggers and ensure that the word "extinct" is on definition lines where it is deemed useful. —Μετάknowledgediscuss/deeds 20:27, 6 March 2018 (UTC)
- Oppose DCDuring (talk) 19:44, 7 March 2018 (UTC)
- @DCDuring: It looks like you've "lost touch with normal dictionary users" if you think that they will know what the daggers mean and not be confused by their only occasional usage here. Gender is explained, if you merely hover your mouse over the letter. I agree with -sche that it is often appropriate to use "extinct" in the definition line, although not always (it probably isn't particularly useful for dinosaurs, for example). We can make those kinds of decisions without reliance on daggers. —Μετάknowledgediscuss/deeds 17:56, 6 March 2018 (UTC)
- I hereby propose that someone, not me, create a means of linking a † to the entry appropriate English definition at [[†]] and that the daggers be permitted on the inflection line of taxonomic names and wherever else the taxonomic name of an extinct species may appear. DCDuring (talk) 19:44, 7 March 2018 (UTC)
- The dagger is the ordinary means of indicating in text that a species is extinct (as an aside, Wikipedia makes heavy use of it as well), and I see no reason to prohibit it. However, I'm also not opposed to replacing it with the word "extinct." Andrew Sheedy (talk) 11:18, 12 March 2018 (UTC)
- I intend to use the dagger as a means of locating entries that would benefit from the addition of
{{R:Fossilworks}}
. DCDuring (talk) 11:53, 12 March 2018 (UTC)
Hiding usexes
[edit]I assume it is OK to hide longer usexes in the same manner as quotations. The only thing that worries me is the click-on "quotations" heading is slightly misleading when it's a usex. I did an example at rekke. DonnanZ (talk) 11:42, 5 March 2018 (UTC)
- I think we should change it to examples ▼, and hide all 'usage examples' and 'quotations' by default. Wyang (talk) 11:44, 5 March 2018 (UTC)
- That would be better, or perhaps "usage examples and quotations" (perhaps a little long). As long as
{{ux}}
/{{usex}}
or{{quote}}
are used where appropriate. DonnanZ (talk) 13:50, 5 March 2018 (UTC)- Well, they're not, so this discussion seems a little pointless. DTLHS (talk) 15:53, 5 March 2018 (UTC)
- It's not necessarily pointless if users are made aware that entries can be updated. DonnanZ (talk) 16:23, 5 March 2018 (UTC)
- Well, they're not, so this discussion seems a little pointless. DTLHS (talk) 15:53, 5 March 2018 (UTC)
- That would be better, or perhaps "usage examples and quotations" (perhaps a little long). As long as
- I don’t like the idea very much (after all, if a usex is so long that it’s a good idea to hide it, it’s almost always a better idea to use a shorter one instead). But if you do, please keep using
{{ux}}
so that parsers can have a chance at knowing it is not a quotation. — Ungoliant (falai) 16:33, 5 March 2018 (UTC)
- In the example I gave above it was a sentence from Wikipedia which is I believe not allowable as a quote, so I treated it as a usex. DonnanZ (talk) 17:20, 5 March 2018 (UTC)
- I think sentences from Wikipedia should be treated as quotes. They don't count toward attestation for CFI purposes because they're not durably archived, but they're still quotes and ought to be properly attributed. —Mahāgaja (formerly Angr) · talk 20:16, 5 March 2018 (UTC)
- Oh right. So if a word is attestable for CFI, e.g. with dictionary references included, there's no problem with quotes from Wiktionary? DonnanZ (talk) 21:05, 5 March 2018 (UTC)
- God no. Please don't start adding quotes from ourselves to pages- what an awful idea. DTLHS (talk) 21:18, 5 March 2018 (UTC)
- Oh right. So if a word is attestable for CFI, e.g. with dictionary references included, there's no problem with quotes from Wiktionary? DonnanZ (talk) 21:05, 5 March 2018 (UTC)
- I think sentences from Wikipedia should be treated as quotes. They don't count toward attestation for CFI purposes because they're not durably archived, but they're still quotes and ought to be properly attributed. —Mahāgaja (formerly Angr) · talk 20:16, 5 March 2018 (UTC)
- In the example I gave above it was a sentence from Wikipedia which is I believe not allowable as a quote, so I treated it as a usex. DonnanZ (talk) 17:20, 5 March 2018 (UTC)
- We invent usexes so we can just shorten them as needed. Equinox ◑ 19:38, 5 March 2018 (UTC)
- Y-yes, in this case it made sense to include the whole sentence. DonnanZ (talk) 20:18, 5 March 2018 (UTC)
- Personally I like having more space between definitions- it makes the page easier to read for me. DTLHS (talk) 20:25, 5 March 2018 (UTC)
There's over 8,000 of them, but what exactly is needed: {{quote}}
templates, or is it something else that needs attention? DonnanZ (talk) 13:42, 5 March 2018 (UTC)
- They might not need to be cleaned at all, the only problem with them is that they are using a generic
{{quote-text}}
instead of a specific variant ({{quote-book}}
usually). If we are OK with some quotes being generic, you only need to clean the category out of the template. - TheDaveRoss 20:07, 5 March 2018 (UTC)- @TheDaveRoss: Thanks. That was indeed the case in the entry I was looking at. That's one off the list, only 8,621 to go. DonnanZ (talk) 20:44, 5 March 2018 (UTC)
- @TheDaveRoss There would appear to be a problem with these, getting them to register in the category for quotes. I tried adding "en" to the quote at glowing, but that isn't the solution, there must be something else that should be done, a rewrite? DonnanZ (talk) 14:25, 9 March 2018 (UTC)
- @Donnanz What category for quotes? - TheDaveRoss 15:22, 9 March 2018 (UTC)
- @TheDaveRoss: Category:English terms with quotations, where all of these should go. DonnanZ (talk) 15:31, 9 March 2018 (UTC)
- @Donnanz: That is from
{{quote}}
, not the{{quote-book}}
family. If you want all of those quotes to go into that category you will have to add the category to{{quote-meta}}
or each of the family of templates. - TheDaveRoss 15:46, 9 March 2018 (UTC)- @TheDaveRoss: I'm not allowed to edit that template, even if I could make head or tail of it. DonnanZ (talk) 16:02, 9 March 2018 (UTC)
- @Donnanz: That is from
- @TheDaveRoss: Category:English terms with quotations, where all of these should go. DonnanZ (talk) 15:31, 9 March 2018 (UTC)
- @Donnanz What category for quotes? - TheDaveRoss 15:22, 9 March 2018 (UTC)
- @TheDaveRoss There would appear to be a problem with these, getting them to register in the category for quotes. I tried adding "en" to the quote at glowing, but that isn't the solution, there must be something else that should be done, a rewrite? DonnanZ (talk) 14:25, 9 March 2018 (UTC)
- Anyway, I won't spend any time fixing these unless a solution is found. DonnanZ (talk) 10:19, 27 March 2018 (UTC)
Moving snowclones back to the mainspace
[edit]I think that snowclones (i.e., the ones listed in Appendix:English snowclones) should be included as dictionary entries in the main namespace, as long as they follow CFI's rules on attestation and idiomaticity. The rationale is that it is far less convenient for a dictionary reader to have to look in the appendix for this than it is to look in the mainspace. They are just like any other idioms and are just as lexical; it's just harder to fit general semantic variables like someone or something into them. I think things like X is the new Y or to X or not to X are perfectly decent entry-material. So, let's make it easier for the readers and move these exceptional idioms to the mainspace. (P.S.: I remember now that a while back I created ride the ... train. This was before I knew about the appendix page or about what a "snowclone" is. I just modeled the "..." after the already existing phrasebook entry I am ... year(s) old, which may need to be changed a bit too. That name looks a little funky.) PseudoSkull (talk) 04:28, 6 March 2018 (UTC)
- Support. --Daniel Carrero (talk) 08:40, 6 March 2018 (UTC)
- Oppose having them in the main space. We can create redirects to an appendix. --Per utramque cavernam (talk) 12:05, 6 March 2018 (UTC)
- Oppose per Per utramque cavernam. DCDuring (talk) 12:56, 6 March 2018 (UTC)
- Oppose Equinox ◑ 12:57, 6 March 2018 (UTC)
- Support Crom daba (talk) 13:52, 6 March 2018 (UTC)
- Oppose --WikiTiki89 16:18, 6 March 2018 (UTC)
- Comment: @Per utramque cavernam, DCDuring, Equinox, Wikitiki89 If I may kindly ask, can any of you do me the favor to explain (further) why you oppose this idea? PseudoSkull (talk) 16:41, 6 March 2018 (UTC)
- Oppose --Victar (talk) 23:03, 6 March 2018 (UTC)
- Oppose. Some, but by no means all, are already in mainspace. These constructions are not necessarily phrases suitable for inclusion in the body of a dictionary. bd2412 T 03:00, 7 March 2018 (UTC)
- Important note: I think I forgot to mention that I only propose to include idiomatic snowclones. Things like I know X better than you'll ever know X, which can be deduced from its parts, should not be included (if it's even attested in the first place). ride the X train, which cannot be deduced from its parts, should. PseudoSkull (talk) 04:49, 7 March 2018 (UTC)
- See also Category:English snowclones. That category is important for our discussion.
- I was thinking the same thing when I voted "support", though I failed to say it. Yes, only idiomatic ones. In my opinion, phrases like Appendix:Snowclones/X with a capital Y should be included in the mainspace, because you can't deduce the meaning from the sum of its parts.
- The entry awesome with a capital A was deleted in 2011 (RFD: Talk:awesome with a capital A) because it's a snowclone.
- But technically, the CFI doesn't currently offer a snowclone caveat, so it seems that in theory all attestable variations of "X with a capital Y" could be created as entries (jerk with a capital J, snowclone with a capital S, dictionary with a capital D...), they can't be deduced from their parts.
- In my opinion, having a snowclone entry (X with a capital Y) as opposed to entries for all variations of a snowclone is better, because it covers all possibilities.
- The entry name could be X with a capital Y or ... with a capital ..., but it's tempting to create a title like just with a capital for snowclones that have no variables in the middle, just at the extremities. Or to be even more minimalistic, just add a new sense at capital to explain the "X with a capital Y".
- I gave a few ideas to be discussed, but my preference is for having a mainspace snowclone title like this: X with a capital Y.
- Yes, some other snowclones are just common SOP phrases, like Appendix:Snowclones/X and Y and Z, oh my!. They don't merit mainspace entries whatsovever. --Daniel Carrero (talk) 08:50, 7 March 2018 (UTC)
I am trying to a get a vote going to amend CFI to expressly allow retronyms. I think retronyms are interesting to people who study language. Quite often they will have a current meaning which is transparently equal to 'sum of parts' and that is why I feel they are deserving of special protection. As an example, analogue clock merits inclusion in my view even though it is really just analogue + clock. See also: Category:English retronyms. John Cross (talk) 06:33, 6 March 2018 (UTC)
- How interesting are retronyms to normal people who use dictionaries? — This unsigned comment was added by DCDuring (talk • contribs).
- A large number of words we have will be uninteresting to most normal people. I am interested in creating entries that a small proportion of our users find interesting. John Cross (talk) 21:09, 6 March 2018 (UTC)
- I feel as though this should at least be qualified, i.e. allow retronyms unless... What are the worst types of SoP etc. that this rule would permit? Equinox ◑ 13:00, 6 March 2018 (UTC)
- This would permit the likes of paper book and mechanical mouse. I don't think these are useful or belong in a dictionary. —Μετάknowledgediscuss/deeds 21:28, 6 March 2018 (UTC)
- And even biological mouse, perhaps even mammalian mouse.
←₰-→Lingo Bingo Dingo (talk) 13:05, 16 March 2018 (UTC)
- And even biological mouse, perhaps even mammalian mouse.
- This would permit the likes of paper book and mechanical mouse. I don't think these are useful or belong in a dictionary. —Μετάknowledgediscuss/deeds 21:28, 6 March 2018 (UTC)
- I also don't think this is a useful concept for us. Retronyms are just one example of general disambiguating techniques. Just like in England, what Americans call football is called American football, and who in England is called "the Queen", in America is called "the Queen of England". When these terms are idiomatic, we include them. When they are SOP, we don't. Retronyms are no different. --WikiTiki89 22:37, 6 March 2018 (UTC)
- The only thing that gives me pause is that something that wasn't SoP originally might become SoP over time. I wish I could think of an example; I'm sure this came up on here before. But suppose that a new phrase Adj+N is coined, and gets in all the dictionaries, and then Adj comes to be used more generally, with other nouns, significantly later: it seems wrong to delete the original Adj+N when it was the predecessor. Equinox ◑ 22:41, 6 March 2018 (UTC)
- We do have a rule that if a word was once idiomatic but is now SOP, it is to be included. --WikiTiki89 22:46, 6 March 2018 (UTC)
- If as an example, it can be shown that hammerhead shark was used first and hammerhead came later as shortened form then our policies appear to allow both to be included even if hammerhead shark ≤ hammerhead + shark. I want the same in reverse - sort of - if it can be shown that compass is the original term and magnetic compass comes second to distinguish from steady-state compasses then ... I would still want to be able to include magnetic compass even if magnetic compass ≤ magnetic + compass. -- John Cross (talk) 06:18, 7 March 2018 (UTC) (edited John Cross (talk) 06:29, 7 March 2018 (UTC))
- I disagree. All that would need to be done is to have enough definitions at compass that cover all historical usage. There's no need to have magnetic compass, unless it is shown to have its own specialized meaning. --WikiTiki89 15:31, 7 March 2018 (UTC)
- I agree with Wikitiki. (And the test referred to above is WT:JIFFY, for anyone who didn't already know.) - -sche (discuss) 16:07, 7 March 2018 (UTC)
- I disagree. All that would need to be done is to have enough definitions at compass that cover all historical usage. There's no need to have magnetic compass, unless it is shown to have its own specialized meaning. --WikiTiki89 15:31, 7 March 2018 (UTC)
- If as an example, it can be shown that hammerhead shark was used first and hammerhead came later as shortened form then our policies appear to allow both to be included even if hammerhead shark ≤ hammerhead + shark. I want the same in reverse - sort of - if it can be shown that compass is the original term and magnetic compass comes second to distinguish from steady-state compasses then ... I would still want to be able to include magnetic compass even if magnetic compass ≤ magnetic + compass. -- John Cross (talk) 06:18, 7 March 2018 (UTC) (edited John Cross (talk) 06:29, 7 March 2018 (UTC))
- We do have a rule that if a word was once idiomatic but is now SOP, it is to be included. --WikiTiki89 22:46, 6 March 2018 (UTC)
- The only thing that gives me pause is that something that wasn't SoP originally might become SoP over time. I wish I could think of an example; I'm sure this came up on here before. But suppose that a new phrase Adj+N is coined, and gets in all the dictionaries, and then Adj comes to be used more generally, with other nouns, significantly later: it seems wrong to delete the original Adj+N when it was the predecessor. Equinox ◑ 22:41, 6 March 2018 (UTC)
- I would have thought a manual gearbox preceded an automatic gearbox. Anyway, go for the vote. DonnanZ (talk) 11:45, 11 March 2018 (UTC)
News from French Wiktionary
[edit]Hello!
February issue of Wiktionary Actualités just came out in English!
A snowy issue of Actualités just fall on Wiktionary with not-so-chilled news and stats, surrounded by three articles: Wiktionarians allies, a dictionary that went through some trouble way know as well and a speaking orca! As usual, some changes in Wiktionary projects and advices of videos about languages and linguistics (including some in English!).
This issue was written by seven people and was translated for you by Pamputt. This translation may be improved by readers (wiki-spirit) like it was last month by Xbony2 (thanks a lot!). We still receive zero money for this publication and your comments are welcome. You can also receive a notice on your talk page if you want Noé 13:57, 6 March 2018 (UTC)
- I like that Bahubali got a mention, haha. —AryamanA (मुझसे बात करें • योगदान) 15:33, 6 March 2018 (UTC)
Romance words and Medieval Latin
[edit]I just thought about something. Can we truly say that a Romance word can be inherited from Medieval Latin (or Ecclesiastical)? So far I've been doing that at times. Like if a certain word is found mostly in Medieval or Ecclesiastical Latin, but underwent all the normal changes into the descendant language, and it is a common, popular word. Or if the meaning matches the Medieval Latin sense of a word more than the Classical (like coxa ("thigh" in Medieval and Romance langauges but "hip" in Classical). By all indicators these should be inherited terms.
But the problem is, the way I see Medieval Latin defined on Wikipedia for example, was as this kind of artificially preserved language that was no longer popularly spoken, but used in things like administration, writing, church, etc. By the time the Middle Ages came along, the Romance languages/vernaculars had already begun diverging from the spoken Vulgar Latin. So does it really make sense to say they're inherited from this register or form of Latin (the way the Wiktionary templates work now allows inherited to be used on Romance terms with any form of Latin, including New Latin even!)? How are we going to define "Medieval Latin" for Wiktionary's purposes? Is it possible that what we're really looking at for those terms is rather (inherited) descent from a parallel Late or Vulgar Latin term that was more or less the same in form (or at least meaning) as the attested Medieval Latin one? There are certainly many cases of obvious borrowings from Medieval Latin (and some Medieval Latin words even crafted or coined based on existing Romance, like Old French, words), but I'm talking about apparent inherited ones. Like how do we handle coxa for example? Maybe it's best to put that sense as Late Latin instead? Word dewd544 (talk) 16:00, 7 March 2018 (UTC)
- If the meaning "thigh" is what the Romance languages have, it's probably best to call that sense Vulgar Latin. Same with focus (“fire”) rather than "fireplace". —Mahāgaja (formerly Angr) · talk 16:48, 7 March 2018 (UTC)
- Indeed, there are many cases where a sense is shared between Vulgar Latin and Mediaeval Latin because the latter has borrowed it from a Romance language that inherited it from the former. —Μετάknowledgediscuss/deeds 18:25, 7 March 2018 (UTC)
- Ok, that works for me. But it will admittedly require a bit of backtracking and redoing of some etymologies in which I've put "inherited" from Medieval Latin (because at the time I didn't know how else to handle it; I used to incorrectly treat Medieval Latin in these contexts as essentially being Vulgar Latin in the Early Middle Ages, which is more accurately what we'd be looking at for inherited terms). I assume the same goes for Ecclesiastical Latin? Like all the religious related words like presbyter, episcopus, pascha, abbas, monachus, basilica, baptizo, blasphemo, etc.? And here's another issue: say if we have a Latin word that is listed in its entry as Medieval Latin, but in the Romance descendant's etymology we use Vulgar Latin instead. I imagine it would still be linked to the main entry that is described as "Medieval Latin" (without an asterisk), since making a separate VL. reconstruction page for each instance would be ridiculous. That's just reserved for terms that were unattested in any written form of Latin. Word dewd544 (talk) 17:16, 8 March 2018 (UTC)
- Indeed, there are many cases where a sense is shared between Vulgar Latin and Mediaeval Latin because the latter has borrowed it from a Romance language that inherited it from the former. —Μετάknowledgediscuss/deeds 18:25, 7 March 2018 (UTC)
sports ticker and score card abbreviations
[edit]Over in [Requests for verification:English] someone has tagged a number of sports ticker abbreviations, such as UTA, LAL, WIM, etc. They are clearly a thing, but are they a thing we want in Wiktionary? The RFV process doesn't work well for them, because sports tickers are not durably archived, but they seem pretty standard in their own subculture. Given the number of these, it seems like something we could use a policy decision on. Kiwima (talk) 01:50, 8 March 2018 (UTC)
First version of Lexicographical Data will be released in April
[edit]I come bearing a message from WikiData.
After several years discussing about it, and one year of development and discussion with the communities, the development team will deploy the first version of lexicographical data on Wikidata in April 2018.
A new namespace and several new datatypes will be created in order to model words and phrases in many languages. Editors will be able to describe words in Wikidata, and in the future, to query this information, and to reuse it inside and outside the Wikimedia movement.
If you’re curious to discover how this new data structures will look like, you can have a look at the data model. It is suggesting a technical structure, but the editors will remain free to model and organize data as they prefer, with the usual open discussions and community processes that we apply on Wikidata. The documentation will be improved step by step, with the different releases and help of the community.
Please note that the version that will be deployed in April is a first version, that will be improved in the future, thanks to your tests, comments and suggestions. Some features may be missing, some bugs may occur. We can already tell you that the following features will be included in the first version:
- Add, edit and delete Lexemes, Forms, statements, qualifiers, references
- Link from an Item or a Lexeme to an Item or a Lexeme
- Basic search feature
And the following features will not be included in the first version, but are planned for the future:
- RDF support (which means: the ability to query it with query.wikidata.org)
- Senses will not be included in the first version, to give you all some time to get properties, processes, etc in place for Lexemes and Forms
- Entity suggestion and better search features
- Merge Lexemes
You can have a look at a more detailed features list. After the first deployment, we will start a discussion with all of you about what are the most important features for you, so we know which ones you would like us to work on next.
Thanks to the people who already showed support and curiosity about lexicographical data on Wikidata. We hope that when it will be deployed, you will test it, experiment with the languages you know, and give us some feedback to improve the tools in the future.
While waiting for the release, here’s what you can do:
- Improve the list of tools with ideas of tools that could be built on the top of lexicographical data
- Add your ideas of cool queries you’d like to do with words and phrases in the future
- Have a look at the project page and especially the talk page, where people are already asking questions, and discussing about how to model data and other topics
- If you’re involved in a Wiktionary community, discuss with them and answer any questions they might have about Wikidata. You can also register as ambassador for your community.
Last but not least, we are kindly asking you to not plan any mass import from any source for the moment. There are several reasons behind that: first of all, like mentioned above, the release will be a first version and we need to observe how our system reacts to the manual edits before starting considering automatic ones. The system may not be ready for big massive imports at the beginning. Second reason is legal. Lexicographical data in Wikidata will be released under CC0, and the responsibility of each editor is to make sure that the data they will add is compatible with CC0. For more information, you can have a look at the advice of WMF Legal team. Finally, we strongly encourage you to discuss with the communities before considering any import from the Wiktionaries. Wiktionary editors have been putting a lot of efforts during years to build definitions, and we should be respectful of this work, and discuss with them to find common solutions to work on lexicographical data and enjoy the use of it together.
If you have any question or idea, feel free to write on Wikidata:Wikidata talk:Lexicographical data. Further discussion is also ongoing at Wikidata:Wikidata:Project chat#First version of Lexicographical Data will be released in April. Cheers! bd2412 T 02:59, 8 March 2018 (UTC)
- As an offshoot, but really unrelated to the Wikidata effort, I wonder how much content on en.wiktionary would become CC0 based on a small number of contributors ex post facto releasing their work in such a manner. There are many entries which have only been touched by a single primary author and then a number of bots for formatting, I don't know whether the bots even count as authors. If someone has the full edit history downloaded I imagine it would be possible to do some modeling and determine how many entries here would be CC0 if the top 10, 20, 50, etc. editors were willing to transfer the license. If we wanted to get fancy and remove from the edit history all reversions (that is any intermediate edits between two equivalent versions of the same page), or perhaps consider section by section. While I am not a fan of the process that Wikidata seems to favor when interacting with other projects, I would love to be able to back our project with a more structured data. I think this would open myriad doors for improvements in presentation and usefulness. - TheDaveRoss 13:02, 8 March 2018 (UTC)
- I would actually be hard-pressed to imagine a Wiktionary editor asserting any kind of copyright in their contributions. bd2412 T 21:22, 8 March 2018 (UTC)
- It seems like they're treating us in the same way that we treat other dictionaries- as a source of information that can't be copied directly but can be paraphrased and used as a source. DTLHS (talk) 21:27, 8 March 2018 (UTC)
- Except that unlike a print dictionary, we're an active community that they could collaborate with if they chose to do so. —Μετάknowledgediscuss/deeds 21:30, 8 March 2018 (UTC)
- Lots of frustrated voices over at the project chat discussion (now several pages long). Anyway, I encourage you to participate in the technical discussion happening right now, the project enters a crucial early phase were important decisions are made. I'm curious when and why the project "rebranding" of "Wikidata: Structured data for Wiktionary" to "Wikidata: Structured Lexicographical Data" happened, it was mentioned a few times in the discussion (and, as pointed out, changed the tone of the collaboration). – Jberkel 12:13, 12 March 2018 (UTC)
- Except that unlike a print dictionary, we're an active community that they could collaborate with if they chose to do so. —Μετάknowledgediscuss/deeds 21:30, 8 March 2018 (UTC)
- Hi, I went to the wikimedia conference in Berlin last week. My main goal was to address this situation of conflict with Wikidata team. I already released the of my report that is related to this point, but it's only available in French so far. Léa proposed me to help with the translation, so the translation availability will depends on her schedule. If you have time and skills, be bold and translate :) --Psychoslave (talk) 13:41, 27 April 2018 (UTC)
Category for Insurance terminology
[edit]Is there a particular process for agreeing and implementing a new category for English terms? I'd like to create and populate Category:en:Insurance for terminology used within the insurance industry (a subcategory of Category:en:Finance seems most appropriate), and add automated categorisation via labels using Module:labels/data/topical. I'm holding back from "being bold" to make sure I don't step on any toes. -Stelio (talk) 10:37, 8 March 2018 (UTC)
- Be bold. - TheDaveRoss 12:56, 8 March 2018 (UTC)
- Done -Stelio (talk) 14:58, 8 March 2018 (UTC)
- Some of the items in the category (eg, economic) seem not to have any distinct insurance sense. Unless they do, the category seems misleading. This is not unique to this topical category, but it might be well to address the problem now before populating the category recklessly. I think the problem is associated with the presence of hard categorization and the absence of
{{label|insurance}}
categorization. An "incategory" and "insource" Cirrus search should quickly identify the possible problems. DCDuring (talk) 16:09, 8 March 2018 (UTC)- The category may be too new for the Cirrus search term "incategory:en:Insurance" to work. The "insource" term won't be allowed to run unless it is restricted to run over a readily identified, "not-too-big" subset of Wiktionary entries. DCDuring (talk) 16:18, 8 March 2018 (UTC)
- Thanks you very much for the review, @DCDuring. Yes, I had two difficulties here:
- Words with definitions that are more generic that a specific insurance sense. For example term, definition 8, is "Duration of a set length...". That's the insurance definition: the term of an insurance policy is the amount of time from its inception to latest expected termination. But I shouldn't label that definition with "insurance" because it applies in wider circumstances too. Would an indented insurance definition would be appropriate (8.1)?
- Avoiding SOP terms. For example "economic assumption" is a modelling assumption (assumption is on my list to update with that sense) that relates to economic factors. That feels SOP to me, but the economic/demographic split is sufficiently important in the insurance world to merit categorising those terms. Perhaps then an additional definition of economic with a sense that is labelled as "insurance" and "of an assumption", then?
- I'm definitely keen to get this right and conform to established site norms, so I value this feedback. -Stelio (talk) 16:41, 8 March 2018 (UTC)
- If the insurance sense of "term" is in fact covered by an existing, broader sense of "term", then I wouldn't add a subsense. (To give an extreme example: insurance documents also use "the", but it's the same "the" as everyone else also uses, so there's no need for an insurance-specific sense.) If the insurance sense is significantly different, then a subsense is merited. Whether or not terms that seem important to insurance but aren't specific/limited to it (like "economic" and maybe "term") should be categorized is less clear. As DCDuring suggests, it's an unclearness that plagues our category structure in general, and despite giving it thought, I don't know what to advise you. Many other categories do include terms that seem related/important without being limited to the category's named context. - -sche (discuss) 18:06, 8 March 2018 (UTC)
- The problem is that a topic-specific glossary is useful because it contains ONLY the relevant sense and usage of polysemic words like term, policy, and economic. As a comprehensive (and historical dictionary) we, by definition, try to include all definitions. Someone only interested in the insurance use of a term can get lost in our entries for such terms. We could have appendices (eg, Appendix:Glossary of terms used in insurance [term again!!!]) that contained links to the specific definitions using
{{senseid}}
. This would serve as a specialized portal for passive users as well as contributors who had specialized topical interests. DCDuring (talk) 21:30, 8 March 2018 (UTC)
- The problem is that a topic-specific glossary is useful because it contains ONLY the relevant sense and usage of polysemic words like term, policy, and economic. As a comprehensive (and historical dictionary) we, by definition, try to include all definitions. Someone only interested in the insurance use of a term can get lost in our entries for such terms. We could have appendices (eg, Appendix:Glossary of terms used in insurance [term again!!!]) that contained links to the specific definitions using
- If the insurance sense of "term" is in fact covered by an existing, broader sense of "term", then I wouldn't add a subsense. (To give an extreme example: insurance documents also use "the", but it's the same "the" as everyone else also uses, so there's no need for an insurance-specific sense.) If the insurance sense is significantly different, then a subsense is merited. Whether or not terms that seem important to insurance but aren't specific/limited to it (like "economic" and maybe "term") should be categorized is less clear. As DCDuring suggests, it's an unclearness that plagues our category structure in general, and despite giving it thought, I don't know what to advise you. Many other categories do include terms that seem related/important without being limited to the category's named context. - -sche (discuss) 18:06, 8 March 2018 (UTC)
- Some of the items in the category (eg, economic) seem not to have any distinct insurance sense. Unless they do, the category seems misleading. This is not unique to this topical category, but it might be well to address the problem now before populating the category recklessly. I think the problem is associated with the presence of hard categorization and the absence of
- Done -Stelio (talk) 14:58, 8 March 2018 (UTC)
Automatic transliteration of Biblical Hebrew
[edit]Would it be possible and desirable to implement automatic transliteration of the etymology-only language Biblical Hebrew (hbo
) when it's fully pointed? That way, {{der|en|hbo|אָמֵן}}
would automatically provide the transliteration ʾāmēn, but {{der|en|he|אָמֵן}}
(and of course both {{der|en|he|אמן}}
and {{der|en|hbo|אמן}}
) would still require manual transliteration. Would that be technically possible, and if so, would other people find it a good idea? —Mahāgaja (formerly Angr) · talk 13:15, 8 March 2018 (UTC)
- There are still a lot of complications that need to be solved. We already have an experimental module Module:he-translit, which is able to transliterate about 90% of words (which is not at all good enough for automatic transliterations), but without stress marks. The next step would be to implement support for stress marking, but this would also require adding stress marks or cantillation marks to Hebrew text that needs to be transliterated. We decided a while ago not to allow stress marks or cantillation marks in Hebrew text due to poor font support, which has improved a little but not enough over the years. Additionally, we would need to start strictly using the Unicode HEBREW POINT QAMATS QATAN (U+05C7) instead of the regular qamats mark whenever it represents a short-o, and we would need a way to mark the distinction between sheva na and sheva nach, which currently do not have separate Unicode codepoints. Once all that is done, however, it should work equally well for Biblical Hebrew and Modern Hebrew. And then there is also a minor issue that it would be impossible to distinguish between abbreviations (which should be transliterated letter-for-letter) and Hebrew numerals (which should be transliterated as "Arabic" numerals). So in short, it's not possible yet until we can solve some of those problems. --WikiTiki89 17:19, 8 March 2018 (UTC)
- Regarding the shevas, it was pointed out that Michael Everson is a wiki user, w:User talk:Evertype; if we can point (no pun intended) to texts where the shva na and shva nach are used contrastively, we could ask him about proposing a new Unicode codepoint. - -sche (discuss) 18:22, 8 March 2018 (UTC)
- @-sche: Here are couple examples:
- A page from the Koren Sacks Rosh Hashana Maḥzor (see in particular the word עַבְדְּךָ on the bottom line, which has both shevas side-by-side)
- A page from the Book of Esther (2:14-20) in a Tikkun Kor'im from an unidentified publisher (see in particular the word וּמָרְדְּכַי on the second-to-bottom line, which has both shevas side-by-side)
- Both of these examples also differentiate qamats qatan from qamats gadol. --WikiTiki89 19:39, 8 March 2018 (UTC)
- @Wikitiki89 Thanks. I'm writing to him now. I see that some references say the shvas are no longer normally pronounced differently in modern Hebrew; if so, which one are they pronounced as? If Unicode, instead of adding new codepoints for both, were to desire to assume that the existing shva codepoint could be taken to be one of them (with only one new codepoint added, for the other one, for those texts which distinguish it), which shva should be the "default" shva and which one should get a new codepoint? (I will see if Michael thinks it would be better to propose two new codepoints or just one.) - -sche (discuss) 20:18, 8 March 2018 (UTC)
- @-sche: Regarding your question about Modern Hebrew pronunciation: Generally, the old distinction of the shvas has disappeared, but there is a new distinction between null and /e/, depending on the phonological environment or morphology, with some environments having free variation between them. Regarding your question about codepoints, I definitely don't think we need two new codepoints and I would say that simply for graphical reasons the shva nach should share the current codepoint, and the shva na should get the new codepoint, because generally when they are distinguished, the shva nach has a normal or maybe slightly reduced size, while the shva na is clearly enlarged and/or bolded. --WikiTiki89 20:40, 8 March 2018 (UTC)
- OK, I left a message, with your informative explanation of how differently they are displayed. Hopefully he can either make the proposal or advise us on making it. - -sche (discuss) 21:28, 8 March 2018 (UTC)
- If the two are sometimes distinguished in writing, then by all means they should have separate code points. But isn't the distinction always clear from environment anyway? Are there any words where you can't tell whether a schwa is na or nach just from its environment? —Mahāgaja (formerly Angr) · talk 11:53, 9 March 2018 (UTC)
- No, the distinction is not always clear from the environment, otherwise we wouldn't have this problem. --WikiTiki89 20:11, 9 March 2018 (UTC)
- If the two are sometimes distinguished in writing, then by all means they should have separate code points. But isn't the distinction always clear from environment anyway? Are there any words where you can't tell whether a schwa is na or nach just from its environment? —Mahāgaja (formerly Angr) · talk 11:53, 9 March 2018 (UTC)
- OK, I left a message, with your informative explanation of how differently they are displayed. Hopefully he can either make the proposal or advise us on making it. - -sche (discuss) 21:28, 8 March 2018 (UTC)
- @-sche: Regarding your question about Modern Hebrew pronunciation: Generally, the old distinction of the shvas has disappeared, but there is a new distinction between null and /e/, depending on the phonological environment or morphology, with some environments having free variation between them. Regarding your question about codepoints, I definitely don't think we need two new codepoints and I would say that simply for graphical reasons the shva nach should share the current codepoint, and the shva na should get the new codepoint, because generally when they are distinguished, the shva nach has a normal or maybe slightly reduced size, while the shva na is clearly enlarged and/or bolded. --WikiTiki89 20:40, 8 March 2018 (UTC)
- @Wikitiki89 Thanks. I'm writing to him now. I see that some references say the shvas are no longer normally pronounced differently in modern Hebrew; if so, which one are they pronounced as? If Unicode, instead of adding new codepoints for both, were to desire to assume that the existing shva codepoint could be taken to be one of them (with only one new codepoint added, for the other one, for those texts which distinguish it), which shva should be the "default" shva and which one should get a new codepoint? (I will see if Michael thinks it would be better to propose two new codepoints or just one.) - -sche (discuss) 20:18, 8 March 2018 (UTC)
- @-sche: Here are couple examples:
- Regarding the shevas, it was pointed out that Michael Everson is a wiki user, w:User talk:Evertype; if we can point (no pun intended) to texts where the shva na and shva nach are used contrastively, we could ask him about proposing a new Unicode codepoint. - -sche (discuss) 18:22, 8 March 2018 (UTC)
- Alright, Michael Everson sent me an e-mail explaining that the next step is that we need a point person who is willing to use their real name, and it would be helpful but not obligatory (because you can always just check back in here if questions come up) if it was someone with some knowledge of these characters and/or of Hebrew script generally, to e-mail him and another gentleman. If someone here is willing to be that person, I will send you the contact information. - -sche (discuss) 02:29, 11 March 2018 (UTC)
- I'm willing to send the message and use my real name. I do not however have knowledge of Hebrew script (beyond the parallels it shares with Arabic script). So feel free to use me if a more suitable candidate does not arise. -Stelio (talk) 11:44, 12 March 2018 (UTC) I should probably ping you too, @-sche, in this response. -Stelio (talk) 12:16, 12 March 2018 (UTC)
- Since I know Michael Everson IRL I'm willing to use my real name too. I do have a fair knowledge of the Hebrew script, although until this thread I never knew that the two schwas were sometimes distinguished in writing, so maybe my knowledge of the Hebrew script is insufficient. —Mahāgaja (formerly Angr) · talk 13:10, 12 March 2018 (UTC)
- @Mahagaja: Don't beat yourself up over that. It's a recent phenomenon that is still limited to very specific religious publications, the same ones that also make a distinction between the two qamatses. If you look carefully at the second example I linked to above, you'll notice they even distinguish between the two types of dageshes. --WikiTiki89 15:01, 12 March 2018 (UTC)
- @Wikitiki89: Yes, I see that now; it also distinguishes the two types of qamats. Do we want to request a new code for dagesh forte while we're at it? —Mahāgaja (formerly Angr) · talk 15:10, 12 March 2018 (UTC)
- @Mahagaja: Maybe. It's less common, because it's much more straightforward to distinguish them (in 99.9% of cases). But I guess since it does exist, it might deserve a code. --WikiTiki89 15:16, 12 March 2018 (UTC)
- @Wikitiki89: Yes, I see that now; it also distinguishes the two types of qamats. Do we want to request a new code for dagesh forte while we're at it? —Mahāgaja (formerly Angr) · talk 15:10, 12 March 2018 (UTC)
- OK, I've passed the contact info on to Mahagaja (by e-mail). Hopefully we get us some shiny new codepoints! - -sche (discuss) 00:47, 13 March 2018 (UTC)
- @Mahagaja: Don't beat yourself up over that. It's a recent phenomenon that is still limited to very specific religious publications, the same ones that also make a distinction between the two qamatses. If you look carefully at the second example I linked to above, you'll notice they even distinguish between the two types of dageshes. --WikiTiki89 15:01, 12 March 2018 (UTC)
- Since I know Michael Everson IRL I'm willing to use my real name too. I do have a fair knowledge of the Hebrew script, although until this thread I never knew that the two schwas were sometimes distinguished in writing, so maybe my knowledge of the Hebrew script is insufficient. —Mahāgaja (formerly Angr) · talk 13:10, 12 March 2018 (UTC)
- I'm willing to send the message and use my real name. I do not however have knowledge of Hebrew script (beyond the parallels it shares with Arabic script). So feel free to use me if a more suitable candidate does not arise. -Stelio (talk) 11:44, 12 March 2018 (UTC) I should probably ping you too, @-sche, in this response. -Stelio (talk) 12:16, 12 March 2018 (UTC)
"Proverb" PoS isn't a PoS
[edit]PseudoSkull pointed out that "proverb" is not in fact a part of speech. Please see "Proverb"_POS_at_Wiktionary. We should presumably convert these PoS into "phrase", and possibly tag proverb as a gloss. Thoughts? Equinox ◑ 09:32, 10 March 2018 (UTC)
- It isn't, but phrase isn't either. Both is accepted per WT:EL#Part of speech. -84.161.29.236 21:24, 12 March 2018 (UTC)
I'm getting "Lua error in Module:headword/templates at line 103: attempt to index field 'falt' (a nil value)". What's this? ---> Tooironic (talk) 02:32, 12 March 2018 (UTC)
Wiki Indaba 2018
[edit]Hi, Benoît Prieur and I will be at Tunis from 16 to 18 March to attend at WikiIndaba conference 2018. We will go there to present the Wiktionary (especially the French version) and hope to incite some people (everybody?) to contribute at the Wiktionary in languages from Africa that are currently underrepresented on Wiktionaries (Arabic, Berber, Fula, ...). A conference and a workshop have been accepted. The goal of the presentation is to try to show the interest of the Wiktionary (mainly French one) for the development and the visilibility of languages spoken in Africa.
I have posted a first version of the presentation. I would really appreciate if one of you could correct the English on the slides before Thursday so that I can take it into account. I can provide the odp file if needed. I would be also happy if you have comment about what we wrote. Thanks in advance. Pamputt (talk) 06:51, 12 March 2018 (UTC)
- I really appreciate that you're doing this! Obviously, Francophones will have you as a point of contact, but for people who are more comfortable with English, I would be happy to mentor people who want to contribute in African languages to en.wiktionary. —Μετάknowledgediscuss/deeds 07:17, 12 March 2018 (UTC)
- @Pamputt: In general, the slides are well written. :) On page 7, perhaps "allow understanding by all audiences" would be better than "allow understanding for all audiences". On page 8, it's not clear what "words with a confidential use" would be, or how Wiktionary would know a secret use; perhaps you mean "restricted (to certain contexts/jergons)" or "literary"...? On page 9, instead of "built languages", it's more natural to say "constructed languages" or "artificial languages". On page 11, the language is fine, it's just hard to read the yellow font "Austronesian: Malagasy" is in. On page 12, "Adding a new entry... (compare to..." sounds more natural than "Add a new entry... (to compare to..." IMO. And "limited knowledge" wouldn't normally take an article (
a) AFAIK. In "Contributing help to learn its own language or rediscover it", it's unclear what "its" and "it" refer to... maybe "Language communities contributing helps them maintain and deepen knowledge of their own languages"? And one would normally speak of interest from linguists or of something being of interest to linguists. On page 15, "native speaker" is more natural and simpler (for any non-native speakers to understand) than "locutor". But again, on the whole, well-written; it sounds like an interesting and informative presentation! :) - -sche (discuss) 07:41, 12 March 2018 (UTC)- @Metaknowledge thanks for your messge. Sure I will give your name if some English speaker needs help to contribute here. Pamputt (talk) 22:36, 12 March 2018 (UTC)
- @-sche thank you very much for your corrections. I took them into account within the new version of the presentation. If you have more comments, do not hesitate to write them. :D Pamputt (talk) 22:36, 12 March 2018 (UTC)
Pazend
[edit]I'm wondering what the best way to add Pazend, the Middle Persian variant of Avestan, which contains an extra character (𐬮) and several unique ligature. Would this be inappropriate?
m["pal-Avst"] = { canonicalName = "Pazend", characters = m["Avst"].characters, direction = "rtl", parent = "Avestan", }
--Victar (talk) 19:25, 12 March 2018 (UTC)
- What's the reason it needs to be a separate script? --WikiTiki89 21:02, 12 March 2018 (UTC)
- It wouldn't be inconsistent with there being so many language-specific versions of Arabic script. But if what's desired is different fonts for Pazend as opposed to Avestan proper, that can be achieved without a separate script code, using the CSS selector
.Avst:lang(pal)
(or additional selectors if other languages were also written in Pazend). — Eru·tuon 21:16, 12 March 2018 (UTC)- It's not a difference of font, but rather the addition of a character and various ligatures, so the unicode characters employed for spelling a word would be different. Incendeltly, I'll need to create a modified typing-aids module dataset. It would also be nice to call up the name in templates, ex.
{{desc|pal|𐬯𐬞𐬁𐬵|sc=pal-Avst|sclb=1|tr=spāh}}
and populate categories specially for withpal-Avst
script requests. --Victar (talk) 21:30, 12 March 2018 (UTC)- If it doesn't need a different font, that doesn't sound like the kind of thing that needs a new script code; I mean, different languages use different subsets of Latin letters while still using just either "Latn" or "Latinx". Just make sure "Avst" covers all the characters that are used. (I wonder if we need as many Arabic script codes as we have, to Erutuon's point...) - -sche (discuss) 22:02, 12 March 2018 (UTC)
- @Victar: Huh, how are the ligatures rendered if not by a separate font? — Eru·tuon 22:11, 12 March 2018 (UTC)
- @Erutuon: I'm not sure how it exactly works, but e.g. Noto Sans Devanagari has different styles for Hindi and Marathi. It's one font. —AryamanA (मुझसे बात करें • योगदान) 22:23, 12 March 2018 (UTC)
- @-sche: There are stylistic differences that could be better represented in a different font, but I'm not aware of any font though specifically made for Pazend. I think the best comparison is
Latn
toLatf
. Note that Fraktur uses the same range as Latin. @Erutuon: What's recommended is actually a silly system of usingU+200C
between letters to prevent ligatures, but your idea of a separate font would be much better. I wonder how the Unicode Avestan font compares to the Google Noto font. I'll have to check that out. --Victar (talk) 22:31, 12 March 2018 (UTC)- I looked into it and there are actually only two free unicode Avestan scripts, one, Noto Sans Avestan, used ligatures, and the other, Ahuramzda, does not.
- It's not a difference of font, but rather the addition of a character and various ligatures, so the unicode characters employed for spelling a word would be different. Incendeltly, I'll need to create a modified typing-aids module dataset. It would also be nice to call up the name in templates, ex.
Ligature Ahuramzda Noto Sans Avestan š + a 𐬱 + 𐬀 𐬱𐬀 𐬱 + 𐬀 𐬱𐬀 š + c 𐬱 + 𐬗 𐬱𐬗 𐬱 + 𐬗 𐬱𐬗 š + t 𐬱 + 𐬙 𐬱𐬙 𐬱 + 𐬙 𐬱𐬙 a + h 𐬀 + 𐬵 𐬀𐬵 𐬀 + 𐬵 𐬀𐬵
- So, we could use Ahuramzda for Avestan (Old and Younger) and Noto Sans Avestan for Pazend. The distatanges of this is that their stylistic differences aren't along historical lines, and the Avestan language does employ some ligatures which aren't in Ahuramzda. I could make a variant of Noto Sans Avestan with the Pazend ligatures removed, but I don't know the legality of that, nor do I know if we could host that variant as a webfont. --Victar (talk) 04:40, 13 March 2018 (UTC)
- Could one of these CSS properties disable Noto ligatures? —suzukaze (t・c) 04:45, 13 March 2018 (UTC)
- @Suzukaze-c: HAH! I didn't even know that existed! It does indeed work: 𐬀𐬵. No IE support, but no surprise there. I'll have to look inside Noto Sans to see if they're using distinct suffixes for those Pazend ligatures. --Victar (talk) 05:02, 13 March 2018 (UTC)
- :D —suzukaze (t・c) 05:20, 13 March 2018 (UTC)
- It looks like those are the only ligatures supported by Avestan unicode anyway. So, perhaps as a first step, I recommendation that
.Avst
be set tofont-family:"Noto Sans Avestan", "Ahuramzda";font-variant-ligatures: none
and.Avst:lang(pal)
set tofont-variant-ligatures: normal
. @Erutuon, -sche, would that work for you? --Victar (talk) 05:32, 13 March 2018 (UTC)- I'm not against adding a separate script code for it if it'll have a benefit. For most scripts, the benefit is in the form of a subscript needing a different font to display correctly (e.g. display the special non-Latn letters of Latinx at all) or accurately (e.g. when rendering Latf differently from Latn). For this, if it's not the font that's different so much as the presence or absence of ligatures, I don't know what's better. In the table above, all the words display in the same font and are identically ligatured, probably because I don't have both fonts installed yet. Does anyone know if a separate script code would result in correct display for a greater number of users than the lang(pal) approach? - -sche (discuss) 15:59, 13 March 2018 (UTC)
- @Erutuon, you seem most familiar with this. Are there any advantages/disadvantages to either approach? --Victar (talk) 14:55, 14 March 2018 (UTC)
- @Victar: Which approaches do you mean? Using a dedicated script code versus a combination of a script code and language code? — Eru·tuon 20:03, 14 March 2018 (UTC)
- @Erutuon, yes, I believe that's what @-sche is asking,
.pal-Avst
over.Avst:lang(pal)
, estentally. --Victar (talk) 20:11, 14 March 2018 (UTC)- I believe having sub-scripts is a holdover from before CSS supported language tags. As long as we're confident that enough of a proportion of users are using browsers that support language tags, then we can start switching away from sub-scripts. --WikiTiki89 20:15, 14 March 2018 (UTC)
- The only benefit I can see for a
language-script
script code is if multiple languages will be using the same combination of CSS properties and you want to shorten the list of CSS selectors. Otherwise, both do the same thing equally well. — Eru·tuon 20:18, 14 March 2018 (UTC)
- @Erutuon, yes, I believe that's what @-sche is asking,
- @Victar: Which approaches do you mean? Using a dedicated script code versus a combination of a script code and language code? — Eru·tuon 20:03, 14 March 2018 (UTC)
- @Erutuon, you seem most familiar with this. Are there any advantages/disadvantages to either approach? --Victar (talk) 14:55, 14 March 2018 (UTC)
- I'm not against adding a separate script code for it if it'll have a benefit. For most scripts, the benefit is in the form of a subscript needing a different font to display correctly (e.g. display the special non-Latn letters of Latinx at all) or accurately (e.g. when rendering Latf differently from Latn). For this, if it's not the font that's different so much as the presence or absence of ligatures, I don't know what's better. In the table above, all the words display in the same font and are identically ligatured, probably because I don't have both fonts installed yet. Does anyone know if a separate script code would result in correct display for a greater number of users than the lang(pal) approach? - -sche (discuss) 15:59, 13 March 2018 (UTC)
- @Suzukaze-c: HAH! I didn't even know that existed! It does indeed work: 𐬀𐬵. No IE support, but no surprise there. I'll have to look inside Noto Sans to see if they're using distinct suffixes for those Pazend ligatures. --Victar (talk) 05:02, 13 March 2018 (UTC)
- Could one of these CSS properties disable Noto ligatures? —suzukaze (t・c) 04:45, 13 March 2018 (UTC)
- So, we could use Ahuramzda for Avestan (Old and Younger) and Noto Sans Avestan for Pazend. The distatanges of this is that their stylistic differences aren't along historical lines, and the Avestan language does employ some ligatures which aren't in Ahuramzda. I could make a variant of Noto Sans Avestan with the Pazend ligatures removed, but I don't know the legality of that, nor do I know if we could host that variant as a webfont. --Victar (talk) 04:40, 13 March 2018 (UTC)
To approach this from a different angle, is having a different font and native name enough to warrant a subscript? If not, what makes Latf
and the various Arab
variants different? Why does it even matter? --Victar (talk) 06:31, 13 March 2018 (UTC)
- Latf is displayed in very different fonts from Latn. Some of the Arab subscripts may need different fonts to display closer to how they're written as far as letter-shapes (letters having dots, or entirely different shapes from in standard Arabic) or line-slanting, but I do wonder if we need so many. - -sche (discuss) 16:01, 13 March 2018 (UTC)
Generalizing of Japanese infrastructure
[edit]Currently we have templates like {{ja-def}}
, {{ja-pos}}
, and {{ja-r}}
, but nothing for other Japonic languages (except for forks of {{ja-def}}
). We should consider making these templates usable for other Japonic languages as well.
See also Module_talk:ja-kanji-readings#Okinawan, Template_talk:ja-readings#Separate_template_for_Okinawan, and diff.
(Notifying Eirikr, Wyang, TAKASUGI Shinji, Nibiko, Atitarev, Dine2016, Poketalker, Cnilep, Britannic124, Fumiko Take, Dine2016): and @Erutuon. —suzukaze (t・c) 20:12, 13 March 2018 (UTC)
- I'd definitely be supportive of that.
- There are some Japonics with weird phonetics and weird spellings, like the Miyako dialect of Ryukyuan. In addition, there's at least one non-Japonic that uses katakana (Ainu) which could also benefit from this effort. ‑‑ Eiríkr Útlendi │Tala við mig 20:41, 13 March 2018 (UTC)
I support the move in principle, but probably don't have much to contribute technically. I was going to mention Ainu, but Erikr has that well in hand. Cnilep (talk) 00:41, 15 March 2018 (UTC)
Old West and Old East Norse
[edit]I would like to add a language code for Old West and Old East Norse respectively. Just like with New and Medieval Latin. I can’t seem to find the right module though. Can someone add the two? I suggest non-oen (Old East Norse) and non-own (Old West Norse). I will use this in the etymology templates.Jonteemil (talk) 16:28, 15 March 2018 (UTC)
- Sounds like what you want is etymology-only language codes. You can add those at Module:etymology languages/data. --WikiTiki89 16:38, 15 March 2018 (UTC)
- For Old East Norse we do already have Old Danish and Old Swedish. —Mahāgaja (formerly Angr) · talk 20:56, 15 March 2018 (UTC)
- We use those names for the intermediate forms between Old Norse and the modern languages. "Old Danish" is ambiguous, as I said here. Check out Old Danish and Middle Danish.__Gamren (talk) 11:54, 12 April 2018 (UTC)
- For Old East Norse we do already have Old Danish and Old Swedish. —Mahāgaja (formerly Angr) · talk 20:56, 15 March 2018 (UTC)
- Completely irrelevant tangent: I saw this heading pop up again on my watchlist, and initially mis-parsed it as "The Old West and Old East Norse" -- and now I'm stuck thinking of vikings in cowboy hats. :) ‑‑ Eiríkr Útlendi │Tala við mig 16:23, 12 April 2018 (UTC)
- Belatedly added, following Wiktionary:Beer parlour/2019/March#Old_Gutnish, see there for more. - -sche (discuss) 03:08, 23 March 2019 (UTC)
Is the war against the unified Serbo-Croatian raging on?
[edit]Is the war against the unified Serbo-Croatian raging on? Template talk:User hr. --Anatoli T. (обсудить/вклад) 01:11, 16 March 2018 (UTC)
- I will repeat what I said on the talk page: there's no reason to try to make our user language templates correspond exactly to what's in Module:languages. I find it baffling that you have an objection if someone wants to say they speak Croatian and not Serbo-Croatian. DTLHS (talk) 01:13, 16 March 2018 (UTC)
- The question is: if user declares with the template "this user pig meat", than he/she likes pig meat, they did not say "this user likes lamb meat". Redirecting the template to some other form is not a fair and not correct. Just for communication (talk) 01:28, 16 March 2018 (UTC)
- Fixing political issues is NOT our business. Use whatever ISO says because we have nothing better. Equinox ◑ 02:45, 16 March 2018 (UTC)
- I'm in favor of letting users specify whatever language variety they like in their userboxes. If Croation is going to be banned in userboxes, American and British English should be as well: they don't have full language codes or get headers in entries. — Eru·tuon 03:21, 16 March 2018 (UTC)
- I agree. Just because we treat Serbo-Croatian as a single language for lexicographical purposes doesn't mean we can't allow userboxes to make finer distinctions. —Mahāgaja (formerly Angr) · talk 10:38, 16 March 2018 (UTC)
- I also agree that people should be able to say whatever they want on their userpages. --WikiTiki89 12:18, 16 March 2018 (UTC)
- This is an interesting statement because I see it and think "yes, anyone should be able to say anything on their page" (unless it's totally egregious, like posting our home addresses, or spam/propaganda without any contribs), and then I wonder why I oppose userboxes. Probably because they have a "viral" quality and people tend to copy them without thinking, and then we end up with a big infrastructure of needless rubbish. I suppose in theory I don't oppose an individual userbox. Huh! Oh well just ranting. Equinox ◑ 15:59, 16 March 2018 (UTC)
- IMO, as long as "sr", "hr" etc still ultimately categorize into the "User sh" categories so the users can be found when their expertise is needed, it's fine to have Serbian-specific and Croatian-specific boxes. I think the other Babel system (that pulls from a central, off-Wiktionary repository of codes) allows them regardless. - -sche (discuss) 16:52, 16 March 2018 (UTC)
- Good point. I can use {{#babel:hr}} overriding the local template or I can use a global user page on Meta including whatever Babel extention permits. In fact, there is no link to update on pages currently on Category:User hr. --Vriullop (talk) 21:05, 16 March 2018 (UTC)
This is what I dreaded when Just for communication (a.k.a. Kubura) first contacted me – I share Atitarev's fear that this might be the first inkling of yet another war against unified Serbo-Croatian. Heaven knows I've reverted my fair share of anons who have tried to change Serbo-Croatian lemmas in one direction or another, blatantly disregarding our policy. With that said, I can also agree that letting users specify which language variety they speak in their userboxes might not be such a big deal after all. As long as we can agree – unanimously – that it's something we're willing to stomach. --Robbie SWE (talk) 19:25, 16 March 2018 (UTC)
- The template has been changed back to point to Croatian by User:DTLHS in diff with a summary "it is agreed". In my opinion too early in the discussion. The original poster was even offended by the term "Serbo-Croatian", calling it the "so-called", and it has been our long-fought policy for years! If we say we are apolitical, then using Serbo-Croatian/Croato-Serbian is not a political statement but a linguistic common sense. Do we stand for anything? Why have language policies, language treatment documents and modules, mergers, splits and votes? --Anatoli T. (обсудить/вклад) 01:42, 17 March 2018 (UTC)
- What do "language policies" have to do with what people put on their user pages? We unify Serbo-Croatian for lexicographic convenience and nothing more. And no, something isn't apolitical just because you happen to agree with it. DTLHS (talk) 01:47, 17 March 2018 (UTC)
- I tend to agree with DTLHS. To echo Mahagaja: "Just because we treat Serbo-Croatian as a single language for lexicographical purposes doesn't mean we can't allow userboxes to make finer distinctions." Any edits switching the mainspace should be dealt with separately, if and when they arise. --Dan Polansky (talk) 13:00, 18 March 2018 (UTC)
- As Erutuon said, there is no reason to allow British and American varieties of English and disallow varieties in other languages. If the user feels more comfortable with Croatian or Serbian, let them feel comfortable. It does not influence the main space in any means. --Jan Kameníček (talk) 13:14, 18 March 2018 (UTC)
Vote: PseudoSkull for admin
[edit]Hi! I'm a newbie who has rarely done any work on this project, and (seriously) I can hardly tell how to create a vote and I am not sure if I've done it right. It's very hard. (Usability question?) Anyway, I think it's about time that PseudoSkull becomes an admin and here is a vote about it. Please visit Wiktionary:Votes/sy-2018-03/User:PseudoSkull_for_admin Equinox ◑ 04:53, 16 March 2018 (UTC)
- If you're a newbie I must be a new-newbie. I know what you meant though. DonnanZ (talk) 22:02, 16 March 2018 (UTC)
Vote: Including translation hubs
[edit]FYI, I created Wiktionary:Votes/pl-2018-03/Including translation hubs.
Let us postpone the vote as much as discussion requires, if at all. --Dan Polansky (talk) 08:54, 17 March 2018 (UTC)
Czech noun phrases
[edit]lošák zprohýbaný should be marked as "Noun", just like černá díra and black hole should be marked as nouns. This is consistent with WT:EL, which forbids the part of speech "Noun phrase". Jan.Kamenicek disagrees. A discussion is at User_talk:Jan.Kamenicek#Noun phrases, but I was not convincing enough. I do not want to engage in a revert war. --Dan Polansky (talk) 12:06, 18 March 2018 (UTC)
- Terms that consist of a noun and an adjective (in either order) are phrases according to our definition of phrase but we always treat then as nouns here. Please change it to a noun (I don't know if it has a plural). If the other person continues to change it to a phrase, I'll give him a short block. SemperBlotto (talk) 12:14, 18 March 2018 (UTC)
- p.s. See also lišák zprohýbaný. SemperBlotto (talk) 12:16, 18 March 2018 (UTC)
- Let me note that Jan.Kamenicek does a lot of excellent work here. But the matter seems very clear to me, as for WT:EL and en wikt common practice. --Dan Polansky (talk) 12:22, 18 March 2018 (UTC)
- I do oppose such solution for Czech entries. I understand that English nouns may include noun phrases as well, but Czech nouns do not. General understanding is that only single-word expressions can be nouns in Czech language, which I explained in detail on my talk page. --Jan Kameníček (talk) 12:24, 18 March 2018 (UTC) E. g. houpací křeslo (“rocking chair”) is always analyzed either as adj + noun, or as a noun phrase, but never as a single noun. --Jan Kameníček (talk) 13:21, 18 March 2018 (UTC)
- Czech and English are exactly the same as for noun vs. noun phrase, as I pointed out. No reason to treat Czech different from English or German. A dictionary treatment of a part of speech is not necessarily the same as a general linguistic treatment. The English linguistics does distinguish NP from N, no question about it. --Dan Polansky (talk) 12:27, 18 March 2018 (UTC)
- Yes there is a huge reason to treat it differently, and that is that linguistic sources on Czech nouns use it differently and so do all people no matter whether they are linguists or laypeople. It is very confusing for readers if they meet here an attitude that is so different from what they are used to in real Czech language usage as well as Czech language textbooks. --Jan Kameníček (talk) 12:33, 18 March 2018 (UTC)
- The first-time users of Merriam-Webster may experience the same confusion. But the confusion quickly withers away; they get used to it, some of them realizing that it is a part-of-speech classification and that it makes sense from a dictionary point of view. Czech = English as for linguistic sources distinguishing NP and N; no difference here. --Dan Polansky (talk) 12:47, 18 March 2018 (UTC)
- To argue about classification of Czech words we should seek something that deals with classification of Czech words, which Merriam-Webster is not.
- Here is also a link to an English book on Czech language that also differentiates between nouns and noun phrases. These two have always been understood as different things when analyzing Czech language and so it should be mirrored in Wiktionary as well.
- It is not true that it has to be a part of speech classification, as various different headings are allowed, such as Phrase (which is the one I used), Prepositional phrase, Proverb, Suffix and many more...
- The confusion does not wither away with non-regular users. --Jan Kameníček (talk) 13:00, 18 March 2018 (UTC)
- I will break it down.
- 1) General English linguistic sources about English distinguish NP from N.
- 2) English dictionaries do not distinguish NP from N.
- 3) The English Wiktionary has decided to abolish the distinction between NP and N, for all languages. It did so in keeping with 2).
- 4) We do not have any example of a Czech dictionary that has černá díra, and ranks it either as NP or P.
- 5) General Czech linguistic sources about Czech distinguish NP from N, similar to 1). No surprise here.
- 6) There is no grammatical difference between černá díra, black hole and schwarzes Loch.
- 7) ----- Therefore -----
- 8) Let us enter Czech in a way consistent with 3). Let us do so until the decision made in 3) is reverted via general consensus of the English Wiktionary.
- 9) Consistent with 8), please change lošák zprohýbaný to Noun, and leave it like that until you convince other people to change 3).
- --Dan Polansky (talk) 13:34, 18 March 2018 (UTC)
- Ad 1) and 2) Not applicable for Czech expressions.
- Ad 3) I avoided using the heading "Noun phrase" which was rejected and used just the allowed "Phrase", although it is probably meant for different cases. I believe it is a good compromise until it is agreed whether "noun phrases" are allowed to have their own heading at least in Czech entries. If not, I would be happy just with "Phrase", too: it is not ideal, but at least it is not wrong and confusing.
- Ad 4) Easy to explain, most dictionaries of Czech expressions do not have various phrases as separate entries, but among phrases and collocations connected with individual single words. Despite this there is evidence how the dictionary authors understand Czech nouns:
- c) I have never seen a dictionary of Czech expressions marking noun phrases as nouns.
- b) My Czech-English dictionary (by Josef Fronek, 2000) has loads of examples that suggest the authors do not consider noun phrases to be nouns. Every entry there has marked its POS. If they want to change it to a different POS within the same entry, they always mark it again. However, when I look up elektrický (“electric”) marked as adj., they have also got there elektrické křeslo (“electric chair”) within the same entry without marking any change of POS to a "noun". Instead they have got it among other phrases and collocations of the adjective electric with other words. If they considered it a noun, they would mark it so.
- c) My dictionary of Czech phraseology, part on non-verbal expressions, contains entries many of which are noun phrases. Although they never mark any POS (which can also be understood as evidence that they do not consider them to be PsOS but phrases), in various comments in the preface and other chapters they directly call them noun phrases and never nouns.
- Ad 5) And so do English sources about Czech.
- To sum up my arguments again: various sources on both Czech lexicography and Czech language generally do not consider Czech noun phrases to be nouns (but phrases consisting of words of various PsOS) and so do also laypeople. Because the heading "Phrase" as such is not disallowed in Wiktionary, I hope it can stay (although allowing "Noun phrase" would be even better). --Jan Kameníček (talk) 17:30, 18 March 2018 (UTC)
- The first-time users of Merriam-Webster may experience the same confusion. But the confusion quickly withers away; they get used to it, some of them realizing that it is a part-of-speech classification and that it makes sense from a dictionary point of view. Czech = English as for linguistic sources distinguishing NP and N; no difference here. --Dan Polansky (talk) 12:47, 18 March 2018 (UTC)
- Yes there is a huge reason to treat it differently, and that is that linguistic sources on Czech nouns use it differently and so do all people no matter whether they are linguists or laypeople. It is very confusing for readers if they meet here an attitude that is so different from what they are used to in real Czech language usage as well as Czech language textbooks. --Jan Kameníček (talk) 12:33, 18 March 2018 (UTC)
- Czech and English are exactly the same as for noun vs. noun phrase, as I pointed out. No reason to treat Czech different from English or German. A dictionary treatment of a part of speech is not necessarily the same as a general linguistic treatment. The English linguistics does distinguish NP from N, no question about it. --Dan Polansky (talk) 12:27, 18 March 2018 (UTC)
- Perhaps an argument can be made for using "Noun phrase" instead of "Noun", but this would have to apply to all of Wiktionary, Czech is not exceptional in this matter. Crom daba (talk) 17:56, 18 March 2018 (UTC)
- Thank you. One difference I can see is that while Czech noun phrases are not understood as nouns, English ones sometimes are. General solution of allowing "Noun phrase" headings would be great (and I believe it will happen one day), but if such consensus does not occur, tollerating just the heading "Phrase" to Czech entries without specifying what kind of phrase it is would suffice. --Jan Kameníček (talk) 19:06, 18 March 2018 (UTC)
- Perhaps an argument can be made for using "Noun phrase" instead of "Noun", but this would have to apply to all of Wiktionary, Czech is not exceptional in this matter. Crom daba (talk) 17:56, 18 March 2018 (UTC)
- I agree with Semper, we treat these as nouns because they function as nouns, can be replaced with single-word nouns without changing the grammar of the sentence, etc. There does not seem to be anything different about these terms in Czech versus Polish, French, German, English, etc that would justify treating them differently; to Jan's point about "confusing" Czech-speakers I would counter that it is likely confusing for non-Czech people (who seem more likely than Czechs to be looking up English definitions of Czech words, i.e. using en.Wikt instead of cs.Wikt) who are looking up words to see things that are clearly nouns labelled as "phrases"; certainly, it seems wrong to me, since "phrases" normally refers here to things like "a little bird told me" (and if I had encountered it without realizing there were an ongoing discussion like this, I would have simply considered it an obvious mistake and misunderstanding of en.Wikt conventions and changed it to "Noun"). - -sche (discuss) 20:01, 18 March 2018 (UTC)
- I second everything -sche said. —Μετάknowledgediscuss/deeds 20:05, 18 March 2018 (UTC)
- @-sche, Metaknowledge: Non-Czech speakers trying to learn Czech are not likely to use Wiktionary as the only source for learning. They are likely to use other sources and Wiktionary only as a secondary source. So Wiktionary should be in accordance with the others, not the only one which is different (and in the context of linguistics dealing with Czech language also wrong). --Jan Kameníček (talk) 20:29, 18 March 2018 (UTC)
- I second everything -sche said. —Μετάknowledgediscuss/deeds 20:05, 18 March 2018 (UTC)
- Displaying noun phrases as nouns makes sense specifically in English because English noun phrases and equivalent compound nouns are often found in free variation. Consider synonymous forms like healthcare vs. health care, distinguished only by a bit of whitespace; in a dictionary, it makes sense to format such pairs of entries in the same way. And perhaps the same consideration applies to other analytic languages such as Chinese.
By contrast, in strongly inflected languages, for instance in any Slavic language, noun phrases and compound nouns behave extremely differently. First of all, if you remove space between Slavic words, you don't end up with a compound noun; you end up with garbage that violates the language's morphological rules. (You cannot meld охрана здоровья (oxrana zdorovʹja) into *охраназдоровья; the correct compound would be здравоохранение (zdravooxranenije).) Second, you have a lot of freedom to reorder the words in a Slavic noun phrase or even interleave it with other words and phrases in the middle.
The English wiktionary's insistence on classifying Russian noun phrases as nouns has always grated on me; it feels and looks utterly wrong for my language, and I am sure that speakers of Czech like Jan.Kamenicek are of the same opinion. If someone starts a vote to abolish the noun/NP merger that makes no sense for non-English entries, I would be 100% in favor. Tetromino (talk) 16:04, 21 March 2018 (UTC)- But these are morphological properties. From a syntactic point of view, noun phrases behave no differently than simple nouns. --Per utramque cavernam (talk) 23:01, 26 March 2018 (UTC)
- Sure enough. On the other hand, the classification of words by part of speech, which in some languages is called "word kind" or "word class", is classification both by position in sentence and morphology (inflection, characteristic suffix). Thus, in Czech, you can tell a word looks like an adjective even if you do not see the word used in a sentence; when picking an inflection pattern for a word, you first check what part of speech (cs:slovní druh) it is, and then, based on the part of speech, you select the table of inflection patterns. Word looking like an adjective is something that to a limited extent is also seen in English, probably via influence of Latin; in English, acidic looks like an adjective. An argument can be made that, since lošák zprohýbaný is not inflected like a noun but rather like a noun phrase, it should also be so marked. I believe both approaches (noun, noun phrase) are workable and sensible, but only one of them is the current en wikt tradition. This is an example of what I would call a broader moderate approach relativism: not anything goes, but there is often no single right way to do things. Not any bird design can fly, but there is no single "correct bird", or even "correct flying animal". --Dan Polansky (talk) 11:22, 30 March 2018 (UTC)
- But these are morphological properties. From a syntactic point of view, noun phrases behave no differently than simple nouns. --Per utramque cavernam (talk) 23:01, 26 March 2018 (UTC)
Headings linked?
[edit]Would you consider linking headings, or some headings? e.g. Participle. Often edited as Adjective or Verb. But is both, and in most pages there is no clarification. sarri.greek (talk) 13:21, 18 March 2018 (UTC)
- Some languages do have "participle" as a part of speech and POS header. But (standard) POS headers should never have links in them (I seem to recall that some "Abbreviation" or "Initialism" headers may have contained links at some point, but that is deprecated). - -sche (discuss) 20:07, 18 March 2018 (UTC)
- I see, thank you @-sche:. A pity: the PoS in many pages remains unexplained (αγιοποιημένη). I was comparing to @el.wiktionary with linked Pos. I presume, that at some stage in the future, all words in wiktionary will be clickable/linked. sarri.greek (talk) 21:26, 18 March 2018 (UTC)
German case ordering
[edit]Newer grammars tend to use the order "nominative-accusative-dative-genitive", which has the advantage that the often identical nominative/accusative forms are grouped together (easier to see patterns for learners). It also reflects usage: genitive is rare and listed last. I'd like to change our templates accordingly, any objections? – Jberkel 08:59, 19 March 2018 (UTC)
- A recent discussion about this. --Per utramque cavernam (talk) 09:13, 19 March 2018 (UTC)
- Ah, thanks. In general editors seem to be in favor of the change (this wasn't a general discussion though). As a compromise, I could change it to the proposed order with the option of reordering it back to the "traditional" layout via the script mentioned. – Jberkel 09:34, 19 March 2018 (UTC)
- Oppose, this is confusing for those of us who grew up with the classical ordering, which I suspect includes most Germans. Crom daba (talk) 13:17, 19 March 2018 (UTC)
- I also learned the traditional ordering at school (ages ago), but we should think about non-native readers who want to use Wiktionary as a resource. Imagine how confused they are with our current presentation. The proposed order is already used in DaF (German ESL) and is becoming a standard in modern grammars. What is currently taught in German schools, to native speakers, I do not know (any teachers reading?) – Jberkel 14:05, 19 March 2018 (UTC)
- Additionally, it doesn't fit with European case names like German dritter Fall (third case = dative), vierter Fall (fourth case = accusative), Dutch vierde naamval (fourth case = accusative), Czech čtvrtý pád (fourth case = accusative). Also the German ordering fits with the ordering of Greek, Latin, Czech and other languages. -84.161.12.28 16:27, 19 March 2018 (UTC)
- That's irrelevant; we don't have to make our system match these case names. What we do have to do is add some explanations in these entries. --Per utramque cavernam (talk) 16:37, 19 March 2018 (UTC)
- Oppose, this is confusing for those of us who grew up with the classical ordering, which I suspect includes most Germans. Crom daba (talk) 13:17, 19 March 2018 (UTC)
- Ah, thanks. In general editors seem to be in favor of the change (this wasn't a general discussion though). As a compromise, I could change it to the proposed order with the option of reordering it back to the "traditional" layout via the script mentioned. – Jberkel 09:34, 19 March 2018 (UTC)
- Support, as it is easier, quicker to see and is more appropriate for the German case system. --Mahmudmasri (talk) 13:56, 19 March 2018 (UTC)
Support; that's the ordering we used when I was learning German. --Per utramque cavernam (talk) 15:53, 19 March 2018 (UTC)- My bad, it was NAGD, not NADG. And I don't care too much about the issue, so Abstain. --Per utramque cavernam (talk) 10:48, 22 March 2018 (UTC)
- Support; it's a more logical order, IMO, grouping similar forms. - -sche (discuss) 16:06, 19 March 2018 (UTC)
- Oppose. We should follow the lexicographical conventions of the language. --WikiTiki89 16:43, 19 March 2018 (UTC)
- Oppose: per Wikitiki89. This mostly seems moot given the ability to reorder with JS, but as long as de.Wikt uses the traditional ordering, I don't think we should break with them. —*i̯óh₁n̥C[5] 21:57, 19 March 2018 (UTC)
- Confused: I'm quite surprised to learn that the Nominative-Accusative-Dative-Genitive ordering is "new". I learned German first from the First-year German textbook by Jedan, Helbling, Gewehr, and von Schmidt, first published in 1975 and republished in 1979 (Amazon link), and they used that ordering. Is 43 years old still "new"? ‑‑ Eiríkr Útlendi │Tala við mig 22:24, 19 March 2018 (UTC)
- It's new in comparison to two thousand years of nom-gen-dat-acc. Crom daba (talk) 22:37, 19 March 2018 (UTC)
- Forgive me for not believing that German grammars have been around for 2000 years. If you are referring to Latin, I fail to see how that has any direct bearing. ‑‑ Eiríkr Útlendi │Tala við mig 22:40, 19 March 2018 (UTC)
- FWIW, re-reading my post above, it comes across much snottier than I intended. Apologies, Crom daba. ‑‑ Eiríkr Útlendi │Tala við mig 05:15, 21 March 2018 (UTC)
- Forgive me for not believing that German grammars have been around for 2000 years. If you are referring to Latin, I fail to see how that has any direct bearing. ‑‑ Eiríkr Útlendi │Tala við mig 22:40, 19 March 2018 (UTC)
- I'm not a big fan of monkey patching MediaWiki with user scripts to fix usability problems (how does this help anon/casual users?), but as there's no consensus for making global changes I adapted the noun declination tables to work with the changeCaseOrder.js script. @Erutuon could you change the order to NADG? There is also a table which lists ablative and vocative forms
{{de-decl-noun-table-sg-av}}
, they should be last (currently listed first with the script). – Jberkel 10:03, 21 March 2018 (UTC)- @Jberkel: Done. — Eru·tuon 21:10, 21 March 2018 (UTC)
- Oppose: The current order (nominative-genitive-dative-accusative) is just as easy, just as quick to see, just as appropriate, and just as logical. This order is found in many books, is the order many schools use in teaching, and is the order shared by many Indo-European languages on this Wiktionary, including Russian, Czech, Polish, German, Latin, Lithuanian, Belarusian, Ancient Greek, Serbo-Croatian, Upper Sorbian, Lower Sorbian, and Ukrainian. (including the German declension tables on de.wiktionary). When a student who has learnt declensions using a certain order is later asked to adjust to a different order, he thinks of little things such as two forms being adjacent, alphabetical order, chronological order (the order in which the forms were presented in class), or one of any number of other superficial features that he can think of to "prove" his argument that his way is best. It is simple rationalization (i.e., a psychological defense mechanism in which perceived controversial behaviors are logically justified, also known as "making excuses"). There is one good argument, and that is the order in which the student learned them. For anyone who learned declensions in the current order, the current order is best (easiest, quickest to see, most appropriate, most logical). For anyone who learned declensions in a different order, the different order is the easiest, etc. —Stephen (Talk) 12:46, 21 March 2018 (UTC)
- Oppose This doesn't seem more common than the current scheme, certainly not in dictionaries, and it introduces a strange difference with the declension tables of High German's ancestor languages, other Germanic extinct languages and many other European languages. And both are equally arbitrary and equally illogical.
←₰-→Lingo Bingo Dingo (talk) 10:44, 22 March 2018 (UTC)
- Oppose I am old school and prefer the traditional methods I learned German from, which also match the order of cases in Latin and Slavic languages. Classical Arabic/MSA also uses nominative, genitive accusative cases in this order.--Anatoli T. (обсудить/вклад) 10:52, 22 March 2018 (UTC)
- I don't particularly care, but isn't it possible to make this customisable for each user via WT:PREFS somehow? Ƿidsiþ 06:38, 11 April 2018 (UTC)
- Support like a steel beam. I always took offence as the nonsensical distance between nominative and often identical accusative. We should change the order to NADG for every Germanic language with cases. As for reasons for opposition raised: 1. Be realistic, nobody is going to be 'confused' by anything as case names are always explicitly listed in our tables. 2. Ours is not to ape the errors of my countrymen. We are an English or at best international resource, and how others (including English resources) do anything is not worth of mention in any discussion until the two options have be found to be completely equal in merit under every other structural, technical and user-experience reasoning . Korn [kʰũːɘ̃n] (talk) 11:37, 11 April 2018 (UTC)
What is the purpose of Category:Numeral symbols by language exactly? Probably Category:Gujarati numeral symbols is what it is intended for, but Category:English numeral symbols and Category:Korean numeral symbols look very different. — TAKASUGI Shinji (talk) 05:53, 21 March 2018 (UTC)
- I don't understand your question. Why would it include Gujarati numerals, but exclude English and Korean numerals? And what does "look very different" have to do with it? There are many sets of numerals, and the numerals in one set usually look different from the numerals in any other set. If every language used one and the same set of numerals, there would be no need for this. It's needed because the numeral sets often vary according to language. —Stephen (Talk) 13:03, 21 March 2018 (UTC)
- Category:Gujarati numeral symbols appears to consist of mathematical digits (equivalent to "1", "2", "3", etc.); Category:English numeral symbols consists the English alphabet, apparently because letters are sometimes used to index ordered lists of things; and Category:Korean numeral symbols seems to be a mixed bag of digit symbols and spelled-out names (not symbols!) for numbers, such as 하나, 열하나, and 열다섯. Arguably, the Gujarati category is the only category of the three which is used correctly. Tetromino (talk) 21:36, 21 March 2018 (UTC)
- Yes, it seems they are for digits of different writing systems. If so, most entries must be removed, but where should we categorize the English digits 0 through 9? — TAKASUGI Shinji (talk) 15:01, 24 March 2018 (UTC)
- 0 and 1 have been included in Category:Translingual numerals, which seems more appropriate here. There's nothing especially English about the Arabic numerals; 3, 5, 7, 9 don't even have English entries. 2, 4, 6, 8 only have entries for colloquial non-numeral uses, and they also duly appear in Category:English terms spelled with 2, etc. --Tropylium (talk) 14:43, 2 April 2018 (UTC)
- Yes, it seems they are for digits of different writing systems. If so, most entries must be removed, but where should we categorize the English digits 0 through 9? — TAKASUGI Shinji (talk) 15:01, 24 March 2018 (UTC)
- Category:Gujarati numeral symbols appears to consist of mathematical digits (equivalent to "1", "2", "3", etc.); Category:English numeral symbols consists the English alphabet, apparently because letters are sometimes used to index ordered lists of things; and Category:Korean numeral symbols seems to be a mixed bag of digit symbols and spelled-out names (not symbols!) for numbers, such as 하나, 열하나, and 열다섯. Arguably, the Gujarati category is the only category of the three which is used correctly. Tetromino (talk) 21:36, 21 March 2018 (UTC)
Stock market indices
[edit]We have some entries for individual stock market indices, including CAC 40, FTSE 100, and IBEX 35. These are idiomatic and attestable. However also idiomatic and attestable are every other stock market index. For consistency, I believe we should take one of three stances:
- Inclusive: Include entries for all stock market indices.
- Partially inclusive: Establish rules for notability of stock market indices that allow us to objectively decide which should be allowed and which shouldn't.
- Exclusive: Do not include any individual stock market indices. Instead just include the parent term for the stock market series. (Also include certain abbreviations like DJIA and DJTA where the series name ("Dow Jones") does not form part of the lexical unit.)
My preference is for the last of these as #1 strikes me as encyclopaedic rather than lexicographical, and #2 is harder to enforce due to having to verify notability on a case-by-case basis. A direct analogy, in my opinion, is that we include makes of cars (Fiat, Ford, etc.) but not models (Fiat 500, Ford Escort), even though the model names are also idiomatic and attestable.
If this is better served by a formal policy vote to add to WT:CFI (rather than an informal discussion), here is a first draft of wording for comment:
Voting on: Adding the following paragraph to WT:CFI, at the end of the Names of specific entities section as a subheading of that section:
Specific stock market indices should not be included. Instead include the common term that forms the parent word of a series of index names:
- Include FTSE. Do not include FTSE 100, FTSE MID 250, FTSE 350, FTSE All-Share, FTSE SmallCap, FTSE Fledgling, FTSE techMark, FTSE Eurotop 100, FTSE Euromid, FTSE Euro 100.
- Include CAC. Do not include CAC 40, CAC Next 20, CAC Mid 60, CAC Small.
Attested initialisms of specific stock market indices may be included, where they form a single term that does not include the parent word:
- Include Dow Jones. Also include DJIA and DJTA but neither Dow Jones Industrial Average nor Dow Jones Transportation Average.
- Include WIG. Do not include WIG30, mWIG40, sWIG80.
Here is a list (not exhaustive) of some stock market index series and names to help inform discussion with real examples of what I propose should be included and excluded:
Entries for the names of stock market series can then be labelled/categorised (possibly by template), and cross-link to examples of the specific indices on Wikipedia.
What are your thoughts? -Stelio (talk) 13:00, 26 March 2018 (UTC)
- In the absence of discussion, I've created a vote: Wiktionary:Votes/pl-2018-04/Stock market indices. -Stelio (talk) 11:08, 24 April 2018 (UTC)
In Wiktionary:Requests for deletion/English and /Non-English, "brown leaf" is still used as an example of an SOP entry. This is confusing now, because not too long ago, an editor created the entry brown leaf with a sense that actually was idiomatic, something I never thought to be possible. I think we need to have a different universal SOP example. PseudoSkull (talk) 19:16, 26 March 2018 (UTC)
- How about green leaf? -Stelio (talk) 21:20, 26 March 2018 (UTC)
- Tout est accompli. — (((Romanophile))) ♞ (contributions) 03:52, 27 March 2018 (UTC)
- green leaf should likely exist as it is adjectival of non-fermented tea leaf. Just sayin'. - Amgine/ t·e 04:07, 27 March 2018 (UTC)
- green grass then? -Stelio (talk) 10:03, 27 March 2018 (UTC)
- Or large dog? —Mahāgaja (formerly Angr) · talk 09:20, 11 April 2018 (UTC)
- green grass then? -Stelio (talk) 10:03, 27 March 2018 (UTC)
- green leaf should likely exist as it is adjectival of non-fermented tea leaf. Just sayin'. - Amgine/ t·e 04:07, 27 March 2018 (UTC)
- Tout est accompli. — (((Romanophile))) ♞ (contributions) 03:52, 27 March 2018 (UTC)
- The definition is poor, BTW: "a condition that refers to dead leaves on plants". How can a condition refer to something? And if this means more than merely "leaves that are brown because dead", then what is the distinction? Equinox ◑ 04:15, 27 March 2018 (UTC)
- I've improved the definition. SemperBlotto (talk) 06:29, 27 March 2018 (UTC)
Aside from Wiktionary:Requests for deletion/Header, other places the example SOP term appears is at:
I changed these both to "green leaf" for now. -Stelio (talk) 10:03, 27 March 2018 (UTC)
Mariupol Greek or Ruméika
[edit]I need to add a word in this language. Should we treat it under Pontic Greek (code pnt), Greek (code el) or create a new code (grk-rom)? See Mariupol Greek for context. Pinging @Saltmarsh, because he expressed interest in this dialect. --Vahag (talk) 22:43, 26 March 2018 (UTC)
- What a fascinating lect! The WP article suggests it shouldn't be considered Pontic per se. I see Robert Browning's Medieval and Modern Greek says "some of the Asia Minor dialects, together with the Greek of Mariupol, [...] show a rearrangement of the system of genders, resulting in a differentiation between animate and inanimate substantives. [...] Mariupol Greek has in addition lost the genitive case entirely, and expresses possession by a construction modelled on that of Tatar, e.g. spiti-t porta '(the door of the house)', tata-t tu spit '(his father's house)'." Browning calls it "the very strange dialect of Mariupol", "the curious dialect of Mariupol" and "the strange Greek of Mariupol". That seems suggestive of it meriting its own code. (It should probably have an etymology code if it doesn't get a 'full' language code.) Do you have enough words at hand to see how different its vocabulary is from the Greek of Greece? - -sche (discuss) 23:33, 26 March 2018 (UTC)
- If I may... I found this pdf (unfortunately in Greek only) which may be of help. It is by Sofronis Hatzisavvidis, n.d. in the pdf, post 2001. Uploaded 2014.10.15. at Thessaloniki University. He describes the differences of Pontic and Marioupoli affiliated dialects. Note, that after 2014, many ukraine-greeks left the area, so the number of speakers is further reduced. sarri.greek (talk) 00:43, 27 March 2018 (UTC)
- I am sadly not qualified to decide when a separate code should be used. The dialects are geographically separated so perhaps a separate code could be used - it might be easier to merge later if a firmer decision is taken. — Saltmarsh. 06:19, 27 March 2018 (UTC)
- Here is a comparison of seven random Swadesh list words in Mariupol Greek, Modern Greek and Pontic Greek. For the first, I used the 2006 Ruméika–Russian dictionary of Diamantupolo-Rionis et al. Mariupol Greek resources mainly use the Cyrillic script. I used the Russian transliteration as a fallback.
English | Mariupol Greek | Modern Greek | Pontic Greek |
---|---|---|---|
I | го (go), его́ (jevó) | εγώ (egó) | εγώ (egó) |
four | те́сера (tésera) | τέσσερα (téssera) | τέσερα (tésera) |
husband | а́дрась (ádrasʹ), а́ндрас (ándras), сте́фанс (stéfans), афендис (afendis) | σύζυγος (sýzygos), άντρας (ántras) | άντρας (ántras) |
to die | путъе́н(у) (putʺjén(u)), сухуре́фкум (suxuréfkum), схурэ́фкум(э) (sxurɛ́fkum(ɛ)) | πεθαίνω (pethaíno) | ποφάνω (pofáno) |
to drink | пине́ск (pinésk), пинэ́шку (pinɛ́šku), пинэ́шк (pinɛ́šk), пнэ́шку (pnɛ́šku), пи́ну (pínu) | πίνω (píno) | πίνω (píno) |
sky | урано́ (uranó) | ουρανός (ouranós) | ουρανόν (ouranón) |
dirty | лапме́нс (lapméns), гриндзо́с (grindzós), чапелкус (čapelkus) | βρόμικος (vrómikos) | λερός (lerós), τσαμουρωμένος (tsamouroménos) |
- I am inclined to creating a separate code. --Vahag (talk) 14:11, 27 March 2018 (UTC)
- Thanks! Given the difference in script and the likelihood of even more dissimilarities in vocabulary due to loanwords, on top of the differences in grammar, I am also inclined to a separate code. It should "match" the language name, so either
grk-rum
if we use "Rumeíka" or something likegrk-mar
if we call it "Mariupol Greek". "Mariupol Greek" seems to be slightly more common in books that merely mention the language, but both names are rare, so if references/dictionaries of the language tend to use one name or the other, that would be an argument in favour of that name. - -sche (discuss) 15:32, 27 March 2018 (UTC)- PS can you add the Pontic and Mariupol Greek words for [[water]]? - -sche (discuss) 15:41, 27 March 2018 (UTC)
- My references/dictionaries of the language are all in Russian. They prefer the designation руме́йский (ruméjskij), but the Russian practice should not concern us. I think it is better adopt "Mariupol Greek" as the language name in English, because it immediately tells something about the language to an uninformed reader.
- I added the Pontic word to water and will add the Mariupol Greek translation as soon as you create the code. --Vahag (talk) 16:15, 27 March 2018 (UTC)
- I think you'll find that this is far from the only Modern Greek lect called some variant of "Rum..." by native speakers. Chuck Entz (talk) 03:49, 28 March 2018 (UTC)
- True. OK, I've added it as "Mariupol Greek",
grk-mar
. - -sche (discuss) 05:38, 28 March 2018 (UTC)- Thanks. I added the "water" translations. --Vahag (talk) 12:16, 28 March 2018 (UTC)
- True. OK, I've added it as "Mariupol Greek",
- PS can you add the Pontic and Mariupol Greek words for [[water]]? - -sche (discuss) 15:41, 27 March 2018 (UTC)
- Thanks! Given the difference in script and the likelihood of even more dissimilarities in vocabulary due to loanwords, on top of the differences in grammar, I am also inclined to a separate code. It should "match" the language name, so either
French verb conjugation
[edit]Imperative forms displayed for “s'en moquer” under “moquer” have been false; imperative forms of “s'en moquer” are "moque-t'en”, “moquons-nous-en”, and “moquez-vous-en”. --Paris91 (talk) 15:01, 28 March 2018 (UTC)
- You're right. Also, all the links were wrong, for example, (je) "m'en moque" should link as "m'en [[moque]]" rather than as "m'[[en moque]]". If we want to be able to support the addition of the pronoun en to conjugation tables, it should be added as a new feature to the conjugation module. Just sticking "en" to the beginning of the verb is not enough. --WikiTiki89 15:15, 28 March 2018 (UTC)