Minecraft Name:
314.
Suggestion and reason:
To redesign the rules regarding unicode characters in chat in order to make them more logical.
Right now illegal unicode characters that aren't included in the printable part of the 7 bit ASCII set (plus a few exceptions), 'symbols', are only mentioned twice in the rules.
This forbids the use of symbols when spamming, not in normal conversation.
This explicitly targets shouts, not normal conversation.
To summarize:
ECC does not allow symbols to be used at all and treats it as a rule violation, but the server rules don't actually forbid their use in a normal chat message. Symbol rules should be applied to general chat rules instead of mentioning them in specific subsections of S4C1.
Let's go back to the second rule.
I'll be honest: "Standard keyboard" is a pretty useless phrase. What is a 'standard' keyboard? Let's take a look.
My German DIN 2137 T1 keyboard layout is pretty much the standard in Germany and Austria (although the T2 standard exists as well, but I digress). It offers umlauts and the German ß, plus a few dead key modifiers for accents. Millions of people would agree that it's a standard keyboard layout, but quite a lot of characters violate the current policies. I'm going to skip the additional third- and fourth-level characters, or this list is going to become far too long and annoying.
Well, this isn't going to work. Next try!Code:ÄÖÜäöüß°`´áéíóúÁÉÍÓÚàèìòùÀÈÌÒÙâêîôûÂÊÎÔÛ§µ€²³There are quite a few 'standard' keyboards, I'm just going to pick one at random.
https://upload.wikimedia.org/wikipedia/commons/b/bc/KB_Japanese.svg
...I think you get my point...The United Kingdom is the source of the English language; it should therefore count as a de facto standard for typing English texts on ECC. Surely its keyboard is not going to violate... any... rules...
Oh well.Code:¬¦£áéíóúÁÉÍÓÚECC is located in Canada, so I assume Canadian keyboards are going to apply.
...oh, wait. Are we talking about the Canadian French keyboard layout or the Canadian Multilingual Standard?
https://upload.wikimedia.org/wikipe...ext.svg/800px-KB_Canadian_French_text.svg.png
https://upload.wikimedia.org/wikipe...dian_Multilingual_Standard_comment-en.svg.png
Err... possibly neither.Alright, back to being serious.Alright. We have seen that the term 'standard keyboard' can apply to pretty much anything. I know that this could be seen as intentionally misinterpreting the rules, so I'll just jump to this one: The keyboard that was most likely intended to be used - the United States keyboard layout. Why didn't we refer to it as "a US keyboard" in the first place?
https://upload.wikimedia.org/wikipedia/commons/2/22/KB_US-International.svg
Oh, this may be the reason. I see a lot of fancy, policy-violating symbols here... wait, nevermind. There are two common US layouts, this is the international edition.
https://upload.wikimedia.org/wikipe...Gr.svg/900px-KB_United_States-NoAltGr.svg.png
Ah, this is the keyboard we have been searching for. Who could have known that 'standard keyboard' was referring to the ANSI-INCITS 154-1988 (R1999) standard?
Oooh, now I see. The rules don't call it a "Standard US keyboard" because ‘ and “ are being tolerated as well (at least according to multiple staff members). I'd say they might be the Unicode characters U+2018 and U+201C, but I'm not certain.Code:‘ “
What type of keyboard could possibly produce those characters?
...oh, right. Certain apple keyboards do that automatically.
Let's use the Wikimedia Foundation's analytics to estimate apple's share on the OS market.
So... the keyboard we are searching for must be an Apple Inc. keyboard that only exists on about 25% of all devices, including computers, phones, tablets and more.
But wait... according to a quick image search, an iOS keyboard also offers £, € and ¥, which aren't allowed.
TL;DR: We need to create a custom keyboard layout and call it the "ECC standard keyboard".
\end{sarcasm}
\end{rant}
\end{digression}
Mentioning an unspecified "standard" keyboard layout may be fine if everyone is from the same country. However, ECC's international audience occasionally tends to react with "I'm not using any symbols, it's on my keyboard".
After reading my minor rant we now know that the rules differ from actual moderation policies and that the term "standard keyboard" is bad. What could be worse than that?
...well, to be honest, a lot of the symbols we have seen in part B should actually be perfectly fine.
Minecraft used to have a few problems with non-ASCII characters, they caused decorative async chat info messages in the logs. However, this seems to not be the case anymore. In addition to this, testing the logs generated by ECC's chat plugin has shown me how well the logs can deal with some special characters.
As an example, I'll use the German keyboard I mentioned above.
(Almost) all of those display perfectly fine in the logs, unreadable characters are replaced with a question mark. The message would still readable even if I added an exotic character like π.Code:ÄÖÜäöüß°`´áéíóúÁÉÍÓÚàèìòùÀÈÌÒÙâêîôûÂÊÎÔÛ§µ€²³
I'll be honest: I have absolutely no idea why only ASCII is allowed in chat. Maybe Jamie knows; I don't.
Speaking of ASCII, according to what I've been told a long time ago exceptions include some currency symbols like pound signs or cent signs. Those, too, display properly in the logs.
Oh, and there is one more thing...
Foreign languages usually include special characters. We are basically already allowing the use of symbols, albeit with limitations.
TL;DR: Read it. Also, thank you in case you actually made it through my keyboard-related rant.
Actual TL;DR:
1. The rules concerning the use of symbols do not match the actual policies that are applied in-game.
2. I don't like the term "standard keyboard" in the server rules.
3. A lot of 'illegal' symbols should be perfectly fine from a technical point of view (even though Jamie may know more about possible problems than me).
And now, the moment you've all been waiting for: Actual suggestions!
1. Include symbol regulations in the actual S4C1 instead of placing it in subclauses that only apply to specific parts of chat.
Choose one:
2.A.
Make it clear that only ASCII is allowed; no exceptions.
(Optional: Include a link to a reference table, preferrably without unprintable control characters.)
Easy to moderate, but pretty limiting.
2.B.
Explicitly mention a "standard US keyboard layout" (even though it's not accurate, see the last few paragraphs of my rant above).
2.C.
Create an exhaustive list of allowed characters.
2.D.
Allow the use of all characters if 'special' characters are used in moderation.
2.E.
Allow all printable unicode characters and remove the limits mentioned in the rules.
Potential solution: The JVM's -Dfile.encoding=UTF-8 option.
Any Other Information:
The previous section has pretty much become a concoction of the "Suggestion", "Reason" and "Any Other Information" categories.
Link To This Plugin/Is this a custom addition?:
*Points at the chat plugin that is running on ECC right now and would be completely unaffected by rule changes*
Fun fact: The BBCode for this humongous thread has exactly 8113 characters and has therefore exceeded 8.1% of xenForo's maximum post length.
- Thread Status:
- Not open for further replies.
Thread Tools
Thread Tools
Page 1 of 3
-
Expipiplusone BuilderBuilder ⛰️ Ex-Tycoon ⚜️⚜️⚜️ Premium Upgrade
Wait, how come I didn't +1 this?
-
I could probably do some trickery and enable the logs to be UTF-8 and possibly remove the entire rule, but that's not for now.
-
Like x 2 - List
-
-
Tri-weekly bump because 2.[ABCD] are still available.
I wish I could provide chat statistics by Pivillean here, but I should have implemented proper UTF-8 handling for that...-
Optimistic x 1 - List
-
-
-
Optimistic x 1 - List
-
-
Expipiplusone BuilderBuilder ⛰️ Ex-Tycoon ⚜️⚜️⚜️ Premium Upgrade
-
The Minecraft/Spigot server software is executed by Java, like normal Minecraft. Java can be supplied with additional parameters, including -Dfile.encoding. This flag tells Java to use a specific data format when writing to a file. UTF-8 is a data format that can properly display (currently disallowed) symbols.
Code:[Server thread/INFO]: <314> Debüg. [Server thread/INFO]: <314> ẞß
However, I don't think the console is as relevant as the actual logs - when andrew and Jamie access the console they probably have more important things to do than warning players for the use of special characters.-
Optimistic x 1 - List
-
-
Expipiplusone BuilderBuilder ⛰️ Ex-Tycoon ⚜️⚜️⚜️ Premium Upgrade
A single byte can hold up to 256 different characters (usually less), due to the fact that a byte is made of 8 bits and therefore it can assume 2⁸=256 different values.
Usually that's more than enough for "standard characters" (i.e. 26 lowercase + 26 uppercase + 10 digits + a bunch of punctuation) and ASCII is a common example of how a single byte has more than enough room to encode all those "standard characters".
However in the whole world language database there's much more than 256 different symbols (latest Unicode standard encompasses more than a hundred thousands different symbols including all modern and historic scripts, as well as other symbols), so a single byte is not enough to represent any possible single character.
UTF-8 is a common way to encode all possible Unicode characters in such a way that, most of the times (but not always)*, a single byte is sufficient to represent a character: basically, "standard characters" are encoded in one single byte, while "non-standard characters" might take 2, 3 or up to 4 bytes to be encoded.
All this, to explain why in @314's screenshot the character ü was misrepresented with two strange symbols: because it was originally encoded in UTF-8, therefore (being non-standard) it required two bytes to be encoded, but the console wrongly assumed the string to be encoded in some fixed-one-byte-per-character encoding (probably ASCII) and therefore printed the two strange symbols associated to those two bytes in that (wrong) fixed-size encoding.
Simply put: things break when one part encodes some output in one way, and another decodes this output assuming a different (wrong) encoding; therefore, before proceeding, we want to identify all parts that are talking to each other and figure out how to configure them all with the same encoding.
We don't want to avoid symbols: we want all symbols to be correctly represented as they were meant to be represented in the first place.
Hope this clears the confusion.
</teacher>
*this is true only for text made up of mostly western characters and this is the reason why UTF-8 is to be preferred; however, for text made of mostly other symbols, such as Chinese ideograms, other encodings might be a better choice (I'm not an expert, maybe UTF-16 might be?); but ECC is mostly an English speaking server, so this is not our case and UTF-8 would be perfect.-
Informative x 1 - List
-
Page 1 of 3
- Thread Status:
- Not open for further replies.