Accepted - [Suggestion] Rule modification: Symbols

314 · May 14, 2017

Minecraft Name:
314.

Suggestion and reason:
To redesign the rules regarding unicode characters in chat in order to make them more logical.

Right now illegal unicode characters that aren't included in the printable part of the 7 bit ASCII set (plus a few exceptions), 'symbols', are only mentioned twice in the rules.

Server rules - §4.1.1 - 'CAPS and Spam' said:

Spam is not allowed. Repetitive words, letters, symbols that aren’t used in their correct context, or many messages sent quickly are considered spam.
Click to expand...

This forbids the use of symbols when spamming, not in normal conversation.

Server rules - §4.1.3 - 'Shouts and Lottery' said:

Only numbers, letters, basic symbols found on a standard keyboard may be used.
Click to expand...

This explicitly targets shouts, not normal conversation.

To summarize:
ECC does not allow symbols to be used at all and treats it as a rule violation, but the server rules don't actually forbid their use in a normal chat message. Symbol rules should be applied to general chat rules instead of mentioning them in specific subsections of S4C1.
Let's go back to the second rule.

Server rules - §4.1.3 - 'Shouts and Lottery' said:

Only numbers, letters, basic symbols found on a standard keyboard may be used.
Click to expand...

I'll be honest: "Standard keyboard" is a pretty useless phrase. What is a 'standard' keyboard? Let's take a look.
My German DIN 2137 T1 keyboard layout is pretty much the standard in Germany and Austria (although the T2 standard exists as well, but I digress). It offers umlauts and the German ß, plus a few dead key modifiers for accents. Millions of people would agree that it's a standard keyboard layout, but quite a lot of characters violate the current policies. I'm going to skip the additional third- and fourth-level characters, or this list is going to become far too long and annoying.
Code:
ÄÖÜäöüß°`´áéíóúÁÉÍÓÚàèìòùÀÈÌÒÙâêîôûÂÊÎÔÛ§µ€²³
Well, this isn't going to work. Next try!
There are quite a few 'standard' keyboards, I'm just going to pick one at random.

https://upload.wikimedia.org/wikipedia/commons/b/bc/KB_Japanese.svg

...I think you get my point...
The United Kingdom is the source of the English language; it should therefore count as a de facto standard for typing English texts on ECC. Surely its keyboard is not going to violate... any... rules...
Code:
¬¦£áéíóúÁÉÍÓÚ
Oh well.
ECC is located in Canada, so I assume Canadian keyboards are going to apply.
...oh, wait. Are we talking about the Canadian French keyboard layout or the Canadian Multilingual Standard?

https://upload.wikimedia.org/wikipe...ext.svg/800px-KB_Canadian_French_text.svg.png
https://upload.wikimedia.org/wikipe...dian_Multilingual_Standard_comment-en.svg.png

Err... possibly neither.
Alright. We have seen that the term 'standard keyboard' can apply to pretty much anything. I know that this could be seen as intentionally misinterpreting the rules, so I'll just jump to this one: The keyboard that was most likely intended to be used - the United States keyboard layout. Why didn't we refer to it as "a US keyboard" in the first place?

https://upload.wikimedia.org/wikipedia/commons/2/22/KB_US-International.svg
Oh, this may be the reason. I see a lot of fancy, policy-violating symbols here... wait, nevermind. There are two common US layouts, this is the international edition.

https://upload.wikimedia.org/wikipe...Gr.svg/900px-KB_United_States-NoAltGr.svg.png
Ah, this is the keyboard we have been searching for. Who could have known that 'standard keyboard' was referring to the ANSI-INCITS 154-1988 (R1999) standard?
Code:
‘ “
Oooh, now I see. The rules don't call it a "Standard US keyboard" because ‘ and “ are being tolerated as well (at least according to multiple staff members). I'd say they might be the Unicode characters U+2018 and U+201C, but I'm not certain.
What type of keyboard could possibly produce those characters?

...oh, right. Certain apple keyboards do that automatically.

Let's use the Wikimedia Foundation's analytics to estimate apple's share on the OS market.

The Wikimedia Analytics said:

iOS: 19%
Mac OS X: 5.9%
Click to expand...

So... the keyboard we are searching for must be an Apple Inc. keyboard that only exists on about 25% of all devices, including computers, phones, tablets and more.

But wait... according to a quick image search, an iOS keyboard also offers £, € and ¥, which aren't allowed.

TL;DR: We need to create a custom keyboard layout and call it the "ECC standard keyboard".

\end{sarcasm}
\end{rant}
\end{digression}
Alright, back to being serious.

Mentioning an unspecified "standard" keyboard layout may be fine if everyone is from the same country. However, ECC's international audience occasionally tends to react with "I'm not using any symbols, it's on my keyboard".
After reading my minor rant we now know that the rules differ from actual moderation policies and that the term "standard keyboard" is bad. What could be worse than that?

...well, to be honest, a lot of the symbols we have seen in part B should actually be perfectly fine.

Minecraft used to have a few problems with non-ASCII characters, they caused decorative async chat info messages in the logs. However, this seems to not be the case anymore. In addition to this, testing the logs generated by ECC's chat plugin has shown me how well the logs can deal with some special characters.

As an example, I'll use the German keyboard I mentioned above.
Code:
ÄÖÜäöüß°`´áéíóúÁÉÍÓÚàèìòùÀÈÌÒÙâêîôûÂÊÎÔÛ§µ€²³
(Almost) all of those display perfectly fine in the logs, unreadable characters are replaced with a question mark. The message would still readable even if I added an exotic character like π.

I'll be honest: I have absolutely no idea why only ASCII is allowed in chat. Maybe Jamie knows; I don't.

Speaking of ASCII, according to what I've been told a long time ago exceptions include some currency symbols like pound signs or cent signs. Those, too, display properly in the logs.

Oh, and there is one more thing...

The server rules - §4.6 - 'Languages' said:

If you wish to play with a friend and speak other languages, that is fine in private chat channels, or in local chat that is not spawn. However, it must stay out of global chats.
Click to expand...

Foreign languages usually include special characters. We are basically already allowing the use of symbols, albeit with limitations.
TL;DR: Read it. Also, thank you in case you actually made it through my keyboard-related rant.

Actual TL;DR:
1. The rules concerning the use of symbols do not match the actual policies that are applied in-game.
2. I don't like the term "standard keyboard" in the server rules.
3. A lot of 'illegal' symbols should be perfectly fine from a technical point of view (even though Jamie may know more about possible problems than me).

And now, the moment you've all been waiting for: Actual suggestions!

1. Include symbol regulations in the actual S4C1 instead of placing it in subclauses that only apply to specific parts of chat.

Choose one:
2.A.
Make it clear that only ASCII is allowed; no exceptions.
(Optional: Include a link to a reference table, preferrably without unprintable control characters.)
Easy to moderate, but pretty limiting.
2.B.
Explicitly mention a "standard US keyboard layout" (even though it's not accurate, see the last few paragraphs of my rant above).
2.C.
Create an exhaustive list of allowed characters.
2.D.
Allow the use of all characters if 'special' characters are used in moderation.
2.E.
Allow all printable unicode characters and remove the limits mentioned in the rules.
Potential solution: The JVM's -Dfile.encoding=UTF-8 option.

Any Other Information:
The previous section has pretty much become a concoction of the "Suggestion", "Reason" and "Any Other Information" categories.

Link To This Plugin/Is this a custom addition?:
*Points at the chat plugin that is running on ECC right now and would be completely unaffected by rule changes*

Fun fact: The BBCode for this humongous thread has exactly 8113 characters and has therefore exceeded 8.1% of xenForo's maximum post length.

padsen · May 30, 2017

+1 and I guess bump

6_28 · Jun 1, 2017

I just wasted 7 minutes of my life reading that,

+1

Expipiplusone · Jun 1, 2017

Wait, how come I didn't +1 this?

314 · Jun 23, 2017

Tri-weekly bump since the last post.

314 · Jul 11, 2017

18-daily bump.

JamieSinn · Jul 12, 2017

I could probably do some trickery and enable the logs to be UTF-8 and possibly remove the entire rule, but that's not for now.

314 · Aug 1, 2017

Tri-weekly bump because 2.[ABCD] are still available.

I wish I could provide chat statistics by Pivillean here, but I should have implemented proper UTF-8 handling for that...

314 · Aug 26, 2017

24-daily bump.

lambiser · Aug 26, 2017

+1 I didnt read it all through, but by only reading bits and pieces i can agree with this 100%

(Here is a bump for ya, @314 )

G5_Gaming · Aug 27, 2017

I agree with the rule change of chat so here's a +1 and a bump for ya @314 also i read the whole document. Good job.

Netsui · Aug 28, 2017

You'd think you would have poked Jamie to look at this by now.

314 · Sep 8, 2017

JamieSinn said: ↑

I could probably do some trickery and enable the logs to be UTF-8 and possibly remove the entire rule, but that's not for now.
Click to expand...

So... I made a few tests using the JVM option -Dfile.encoding=UTF-8. The results of my experiments look fine so far, am I missing something?

lambiser · Sep 8, 2017

314 said: ↑

So... I made a few tests using the JVM option -Dfile.encoding=UTF-8. The results of my experiments look fine so far, am I missing something?
Click to expand...

Please say that in English. I'm a lamb, not a genius. XD

Expipiplusone · Sep 9, 2017

314 said: ↑

So... I made a few tests using the JVM option -Dfile.encoding=UTF-8. The results of my experiments look fine so far, am I missing something?
Click to expand...

Is the (possible) "non-standard character" issue only with logs, or is there something else that might break as well?

314 · Sep 10, 2017

lambieplays said: ↑

Please say that in English. I'm a lamb, not a genius. XD
Click to expand...

To clarify:
The Minecraft/Spigot server software is executed by Java, like normal Minecraft. Java can be supplied with additional parameters, including -Dfile.encoding. This flag tells Java to use a specific data format when writing to a file. UTF-8 is a data format that can properly display (currently disallowed) symbols.

Expipiplusone said: ↑

Is the (possible) "non-standard character" issue only with logs, or is there something else that might break as well?
Click to expand...

Well... the console.
Code:
[Server thread/INFO]: <314> Debüg.
[Server thread/INFO]: <314> ẞß
However, I don't think the console is as relevant as the actual logs - when andrew and Jamie access the console they probably have more important things to do than warning players for the use of special characters.

lambiser · Sep 10, 2017

314 said: ↑
To clarify:
The Minecraft/Spigot server software is executed by Java, like normal Minecraft. Java can be supplied with additional parameters, including -Dfile.encoding. This flag tells Java to use a specific data format when writing to a file. UTF-8 is a data format that can properly display (currently disallowed) symbols.

Well... the console.
Code:
[Server thread/INFO]: <314> Debüg.
[Server thread/INFO]: <314> ẞß
However, I don't think the console is as relevant as the actual logs - when andrew and Jamie access the console they probably have more important things to do than warning players for the use of special characters.
Click to expand...

lambiser · Sep 10, 2017

So like you want it so that there are no symbols in chat???

Expipiplusone · Sep 10, 2017

lambieplays said: ↑

So like you want it so that there are no symbols in chat???
Click to expand...

I'll try to put it as short as possible.
A single byte can hold up to 256 different characters (usually less), due to the fact that a byte is made of 8 bits and therefore it can assume 2⁸=256 different values.
Usually that's more than enough for "standard characters" (i.e. 26 lowercase + 26 uppercase + 10 digits + a bunch of punctuation) and ASCII is a common example of how a single byte has more than enough room to encode all those "standard characters".
However in the whole world language database there's much more than 256 different symbols (latest Unicode standard encompasses more than a hundred thousands different symbols including all modern and historic scripts, as well as other symbols), so a single byte is not enough to represent any possible single character.
UTF-8 is a common way to encode all possible Unicode characters in such a way that, most of the times (but not always)*, a single byte is sufficient to represent a character: basically, "standard characters" are encoded in one single byte, while "non-standard characters" might take 2, 3 or up to 4 bytes to be encoded.

All this, to explain why in @314's screenshot the character ü was misrepresented with two strange symbols: because it was originally encoded in UTF-8, therefore (being non-standard) it required two bytes to be encoded, but the console wrongly assumed the string to be encoded in some fixed-one-byte-per-character encoding (probably ASCII) and therefore printed the two strange symbols associated to those two bytes in that (wrong) fixed-size encoding.

Simply put: things break when one part encodes some output in one way, and another decodes this output assuming a different (wrong) encoding; therefore, before proceeding, we want to identify all parts that are talking to each other and figure out how to configure them all with the same encoding.
We don't want to avoid symbols: we want all symbols to be correctly represented as they were meant to be represented in the first place.
Hope this clears the confusion.

</teacher>

*this is true only for text made up of mostly western characters and this is the reason why UTF-8 is to be preferred; however, for text made of mostly other symbols, such as Chinese ideograms, other encodings might be a better choice (I'm not an expert, maybe UTF-16 might be?); but ECC is mostly an English speaking server, so this is not our case and UTF-8 would be perfect.

lambiser · Sep 10, 2017

Expipiplusone said: ↑

I'll try to put it as short as possible.
A single byte can hold up to 256 different characters (usually less), due to the fact that a byte is made of 8 bits and therefore it can assume 2⁸=256 different values.
Usually that's more than enough for "standard characters" (i.e. 26 lowercase + 26 uppercase + 10 digits + a bunch of punctuation) and ASCII is a common example of how a single byte has more than enough room to encode all those "standard characters".
However in the whole world language database there's much more than 256 different symbols (latest Unicode standard encompasses more than a hundred thousands different symbols including all modern and historic scripts, as well as other symbols), so a single byte is not enough to represent any possible single character.
UTF-8 is a common way to encode all possible Unicode characters in such a way that, most of the times (but not always)*, a single byte is sufficient to represent a character: basically, "standard characters" are encoded in one single byte, while "non-standard characters" might take 2, 3 or up to 4 bytes to be encoded.

All this, to explain why in @314's screenshot the character ü was misrepresented with two strange symbols: because it was originally encoded in UTF-8, therefore (being non-standard) it required two bytes to be encoded, but the console wrongly assumed the string to be encoded in some fixed-one-byte-per-character encoding (probably ASCII) and therefore printed the two strange symbols associated to those two bytes in that (wrong) fixed-size encoding.

Simply put: things break when one part encodes some output in one way, and another decodes this output assuming a different (wrong) encoding; therefore, before proceeding, we want to identify all parts that are talking to each other and figure out how to configure them all with the same encoding.
We don't want to avoid symbols: we want all symbols to be correctly represented as they were meant to be represented in the first place.
Hope this clears the confusion.

</teacher>

*this is true only for text made up of mostly western characters and this is the reason why UTF-8 is to be preferred; however, for text made of mostly other symbols, such as Chinese ideograms, other encodings might be a better choice (I'm not an expert, maybe UTF-16 might be?); but ECC is mostly an English speaking server, so this is not our case and UTF-8 would be perfect.
Click to expand...

???
You know what? I'll just let the smart people handle this. XD

314 Irrational GameAdmin, former ServerAdmin
EcoLegend ⛰️⛰️⛰️⛰️ Ex-President ⚒️⚒️ Prestige ⭐ VI ⭐ Premium Upgrade

padsen Glory to Arstotzka!
President ⛰️⛰️ Ex-Tycoon ⚜️⚜️⚜️ Prestige ⭐ I ⭐ Premium Upgrade

6_28 2π
Builder ⛰️ Ex-President ⚒️⚒️ Premium Upgrade

Expipiplusone Builder
Builder ⛰️ Ex-Tycoon ⚜️⚜️⚜️ Premium Upgrade

314 Irrational GameAdmin, former ServerAdmin
EcoLegend ⛰️⛰️⛰️⛰️ Ex-President ⚒️⚒️ Prestige ⭐ VI ⭐ Premium Upgrade

314 Irrational GameAdmin, former ServerAdmin
EcoLegend ⛰️⛰️⛰️⛰️ Ex-President ⚒️⚒️ Prestige ⭐ VI ⭐ Premium Upgrade

JamieSinn Retired Lead Administrator/Developer
Builder ⛰️ Ex-Tycoon ⚜️⚜️⚜️ Premium Upgrade

314 Irrational GameAdmin, former ServerAdmin
EcoLegend ⛰️⛰️⛰️⛰️ Ex-President ⚒️⚒️ Prestige ⭐ VI ⭐ Premium Upgrade

314 Irrational GameAdmin, former ServerAdmin
EcoLegend ⛰️⛰️⛰️⛰️ Ex-President ⚒️⚒️ Prestige ⭐ VI ⭐ Premium Upgrade

lambiser Builder
Builder ⛰️ Ex-President ⚒️⚒️

G5_Gaming Resident
Resident ⛰️ Ex-President ⚒️⚒️

Netsui EcoLeader
EcoLeader ⛰️⛰️⛰️ Ex-Mayor ⚒️⚒️ Premium Upgrade

314 Irrational GameAdmin, former ServerAdmin
EcoLegend ⛰️⛰️⛰️⛰️ Ex-President ⚒️⚒️ Prestige ⭐ VI ⭐ Premium Upgrade

lambiser Builder
Builder ⛰️ Ex-President ⚒️⚒️

Expipiplusone Builder
Builder ⛰️ Ex-Tycoon ⚜️⚜️⚜️ Premium Upgrade

314 Irrational GameAdmin, former ServerAdmin
EcoLegend ⛰️⛰️⛰️⛰️ Ex-President ⚒️⚒️ Prestige ⭐ VI ⭐ Premium Upgrade

lambiser Builder
Builder ⛰️ Ex-President ⚒️⚒️

lambiser Builder
Builder ⛰️ Ex-President ⚒️⚒️

Expipiplusone Builder
Builder ⛰️ Ex-Tycoon ⚜️⚜️⚜️ Premium Upgrade

lambiser Builder
Builder ⛰️ Ex-President ⚒️⚒️

Thread Tools

314 Irrational GameAdmin, former ServerAdmin EcoLegend ⛰️⛰️⛰️⛰️ Ex-President ⚒️⚒️ Prestige ⭐ VI ⭐ Premium Upgrade

padsen Glory to Arstotzka! President ⛰️⛰️ Ex-Tycoon ⚜️⚜️⚜️ Prestige ⭐ I ⭐ Premium Upgrade

6_28 2π Builder ⛰️ Ex-President ⚒️⚒️ Premium Upgrade

Expipiplusone Builder Builder ⛰️ Ex-Tycoon ⚜️⚜️⚜️ Premium Upgrade

314 Irrational GameAdmin, former ServerAdmin EcoLegend ⛰️⛰️⛰️⛰️ Ex-President ⚒️⚒️ Prestige ⭐ VI ⭐ Premium Upgrade

314 Irrational GameAdmin, former ServerAdmin EcoLegend ⛰️⛰️⛰️⛰️ Ex-President ⚒️⚒️ Prestige ⭐ VI ⭐ Premium Upgrade

JamieSinn Retired Lead Administrator/Developer Builder ⛰️ Ex-Tycoon ⚜️⚜️⚜️ Premium Upgrade

314 Irrational GameAdmin, former ServerAdmin EcoLegend ⛰️⛰️⛰️⛰️ Ex-President ⚒️⚒️ Prestige ⭐ VI ⭐ Premium Upgrade

314 Irrational GameAdmin, former ServerAdmin EcoLegend ⛰️⛰️⛰️⛰️ Ex-President ⚒️⚒️ Prestige ⭐ VI ⭐ Premium Upgrade

lambiser Builder Builder ⛰️ Ex-President ⚒️⚒️

G5_Gaming Resident Resident ⛰️ Ex-President ⚒️⚒️

Netsui EcoLeader EcoLeader ⛰️⛰️⛰️ Ex-Mayor ⚒️⚒️ Premium Upgrade

314 Irrational GameAdmin, former ServerAdmin EcoLegend ⛰️⛰️⛰️⛰️ Ex-President ⚒️⚒️ Prestige ⭐ VI ⭐ Premium Upgrade

lambiser Builder Builder ⛰️ Ex-President ⚒️⚒️

Expipiplusone Builder Builder ⛰️ Ex-Tycoon ⚜️⚜️⚜️ Premium Upgrade

314 Irrational GameAdmin, former ServerAdmin EcoLegend ⛰️⛰️⛰️⛰️ Ex-President ⚒️⚒️ Prestige ⭐ VI ⭐ Premium Upgrade

lambiser Builder Builder ⛰️ Ex-President ⚒️⚒️

lambiser Builder Builder ⛰️ Ex-President ⚒️⚒️

Expipiplusone Builder Builder ⛰️ Ex-Tycoon ⚜️⚜️⚜️ Premium Upgrade

lambiser Builder Builder ⛰️ Ex-President ⚒️⚒️

314 Irrational GameAdmin, former ServerAdmin
EcoLegend ⛰️⛰️⛰️⛰️ Ex-President ⚒️⚒️ Prestige ⭐ VI ⭐ Premium Upgrade

padsen Glory to Arstotzka!
President ⛰️⛰️ Ex-Tycoon ⚜️⚜️⚜️ Prestige ⭐ I ⭐ Premium Upgrade

6_28 2π
Builder ⛰️ Ex-President ⚒️⚒️ Premium Upgrade

Expipiplusone Builder
Builder ⛰️ Ex-Tycoon ⚜️⚜️⚜️ Premium Upgrade

314 Irrational GameAdmin, former ServerAdmin
EcoLegend ⛰️⛰️⛰️⛰️ Ex-President ⚒️⚒️ Prestige ⭐ VI ⭐ Premium Upgrade

314 Irrational GameAdmin, former ServerAdmin
EcoLegend ⛰️⛰️⛰️⛰️ Ex-President ⚒️⚒️ Prestige ⭐ VI ⭐ Premium Upgrade

JamieSinn Retired Lead Administrator/Developer
Builder ⛰️ Ex-Tycoon ⚜️⚜️⚜️ Premium Upgrade

314 Irrational GameAdmin, former ServerAdmin
EcoLegend ⛰️⛰️⛰️⛰️ Ex-President ⚒️⚒️ Prestige ⭐ VI ⭐ Premium Upgrade

314 Irrational GameAdmin, former ServerAdmin
EcoLegend ⛰️⛰️⛰️⛰️ Ex-President ⚒️⚒️ Prestige ⭐ VI ⭐ Premium Upgrade

lambiser Builder
Builder ⛰️ Ex-President ⚒️⚒️

G5_Gaming Resident
Resident ⛰️ Ex-President ⚒️⚒️

Netsui EcoLeader
EcoLeader ⛰️⛰️⛰️ Ex-Mayor ⚒️⚒️ Premium Upgrade

314 Irrational GameAdmin, former ServerAdmin
EcoLegend ⛰️⛰️⛰️⛰️ Ex-President ⚒️⚒️ Prestige ⭐ VI ⭐ Premium Upgrade

lambiser Builder
Builder ⛰️ Ex-President ⚒️⚒️

Expipiplusone Builder
Builder ⛰️ Ex-Tycoon ⚜️⚜️⚜️ Premium Upgrade

314 Irrational GameAdmin, former ServerAdmin
EcoLegend ⛰️⛰️⛰️⛰️ Ex-President ⚒️⚒️ Prestige ⭐ VI ⭐ Premium Upgrade

lambiser Builder
Builder ⛰️ Ex-President ⚒️⚒️

lambiser Builder
Builder ⛰️ Ex-President ⚒️⚒️

Expipiplusone Builder
Builder ⛰️ Ex-Tycoon ⚜️⚜️⚜️ Premium Upgrade

lambiser Builder
Builder ⛰️ Ex-President ⚒️⚒️