Accepted [Suggestion] Rule modification: Symbols

Discussion in 'Suggestions' started by 314, May 14, 2017.

Thread Status:
Not open for further replies.
  1. 314

    314 Irrational Moderator, former ServerAdmin
    SuperMod EcoLegend ⛰️⛰️⛰️⛰️ Ex-President ⚒️⚒️ Prestige ⭐ VI ⭐ Premium Upgrade

    Joined:
    Apr 1, 2014
    Messages:
    7,050
    Trophy Points:
    97,160
    EcoDollars:
    $0
    Ratings:
    +4,917
    Minecraft Name:
    314.

    Suggestion and reason:
    To redesign the rules regarding unicode characters in chat in order to make them more logical.

    Right now illegal unicode characters that aren't included in the printable part of the 7 bit ASCII set (plus a few exceptions), 'symbols', are only mentioned twice in the rules.
    This forbids the use of symbols when spamming, not in normal conversation.
    This explicitly targets shouts, not normal conversation.

    To summarize:
    ECC does not allow symbols to be used at all and treats it as a rule violation, but the server rules don't actually forbid their use in a normal chat message. Symbol rules should be applied to general chat rules instead of mentioning them in specific subsections of S4C1.

    Let's go back to the second rule.
    I'll be honest: "Standard keyboard" is a pretty useless phrase. What is a 'standard' keyboard? Let's take a look.


    My German DIN 2137 T1 keyboard layout is pretty much the standard in Germany and Austria (although the T2 standard exists as well, but I digress). It offers umlauts and the German ß, plus a few dead key modifiers for accents. Millions of people would agree that it's a standard keyboard layout, but quite a lot of characters violate the current policies. I'm going to skip the additional third- and fourth-level characters, or this list is going to become far too long and annoying.
    Code:
    ÄÖÜäöüß°`´áéíóúÁÉÍÓÚàèìòùÀÈÌÒÙâêîôûÂÊÎÔÛ§µ€²³
    
    Well, this isn't going to work. Next try!
    There are quite a few 'standard' keyboards, I'm just going to pick one at random.

    https://upload.wikimedia.org/wikipedia/commons/b/bc/KB_Japanese.svg

    ...I think you get my point...
    The United Kingdom is the source of the English language; it should therefore count as a de facto standard for typing English texts on ECC. Surely its keyboard is not going to violate... any... rules...
    Code:
    ¬¦£áéíóúÁÉÍÓÚ
    
    Oh well.
    ECC is located in Canada, so I assume Canadian keyboards are going to apply.
    ...oh, wait. Are we talking about the Canadian French keyboard layout or the Canadian Multilingual Standard?

    https://upload.wikimedia.org/wikipe...ext.svg/800px-KB_Canadian_French_text.svg.png
    https://upload.wikimedia.org/wikipe...dian_Multilingual_Standard_comment-en.svg.png

    Err... possibly neither.
    Alright. We have seen that the term 'standard keyboard' can apply to pretty much anything. I know that this could be seen as intentionally misinterpreting the rules, so I'll just jump to this one: The keyboard that was most likely intended to be used - the United States keyboard layout. Why didn't we refer to it as "a US keyboard" in the first place?

    https://upload.wikimedia.org/wikipedia/commons/2/22/KB_US-International.svg
    Oh, this may be the reason. I see a lot of fancy, policy-violating symbols here... wait, nevermind. There are two common US layouts, this is the international edition.

    https://upload.wikimedia.org/wikipe...Gr.svg/900px-KB_United_States-NoAltGr.svg.png
    Ah, this is the keyboard we have been searching for. Who could have known that 'standard keyboard' was referring to the ANSI-INCITS 154-1988 (R1999) standard?

    Code:
    ‘ “
    
    Oooh, now I see. The rules don't call it a "Standard US keyboard" because ‘ and “ are being tolerated as well (at least according to multiple staff members). I'd say they might be the Unicode characters U+2018 and U+201C, but I'm not certain.
    What type of keyboard could possibly produce those characters?

    ...oh, right. Certain apple keyboards do that automatically.

    Let's use the Wikimedia Foundation's analytics to estimate apple's share on the OS market.
    So... the keyboard we are searching for must be an Apple Inc. keyboard that only exists on about 25% of all devices, including computers, phones, tablets and more.

    But wait... according to a quick image search, an iOS keyboard also offers £, € and ¥, which aren't allowed.

    TL;DR: We need to create a custom keyboard layout and call it the "ECC standard keyboard".


    \end{sarcasm}
    \end{rant}
    \end{digression}
    Alright, back to being serious.

    Mentioning an unspecified "standard" keyboard layout may be fine if everyone is from the same country. However, ECC's international audience occasionally tends to react with "I'm not using any symbols, it's on my keyboard".

    After reading my minor rant we now know that the rules differ from actual moderation policies and that the term "standard keyboard" is bad. What could be worse than that?

    ...well, to be honest, a lot of the symbols we have seen in part B should actually be perfectly fine.

    Minecraft used to have a few problems with non-ASCII characters, they caused decorative async chat info messages in the logs. However, this seems to not be the case anymore. In addition to this, testing the logs generated by ECC's chat plugin has shown me how well the logs can deal with some special characters.

    As an example, I'll use the German keyboard I mentioned above.
    Code:
    ÄÖÜäöüß°`´áéíóúÁÉÍÓÚàèìòùÀÈÌÒÙâêîôûÂÊÎÔÛ§µ€²³
    
    (Almost) all of those display perfectly fine in the logs, unreadable characters are replaced with a question mark. The message would still readable even if I added an exotic character like π.

    I'll be honest: I have absolutely no idea why only ASCII is allowed in chat. Maybe Jamie knows; I don't.

    Speaking of ASCII, according to what I've been told a long time ago exceptions include some currency symbols like pound signs or cent signs. Those, too, display properly in the logs.

    Oh, and there is one more thing...
    Foreign languages usually include special characters. We are basically already allowing the use of symbols, albeit with limitations.

    TL;DR: Read it. Also, thank you in case you actually made it through my keyboard-related rant.

    Actual TL;DR:
    1. The rules concerning the use of symbols do not match the actual policies that are applied in-game.
    2. I don't like the term "standard keyboard" in the server rules.
    3. A lot of 'illegal' symbols should be perfectly fine from a technical point of view (even though Jamie may know more about possible problems than me).

    And now, the moment you've all been waiting for: Actual suggestions!

    1. Include symbol regulations in the actual S4C1 instead of placing it in subclauses that only apply to specific parts of chat.

    Choose one:
    2.A.
    Make it clear that only ASCII is allowed; no exceptions.
    (Optional: Include a link to a reference table, preferrably without unprintable control characters.)
    Easy to moderate, but pretty limiting.
    2.B.
    Explicitly mention a "standard
    US keyboard layout" (even though it's not accurate, see the last few paragraphs of my rant above).
    2.C.
    Create an exhaustive list of allowed characters.
    2.D.
    Allow the use of all characters if 'special' characters are used in moderation.
    2.E.
    Allow all printable unicode characters and remove the limits mentioned in the rules.
    Potential solution: The JVM's
    -Dfile.encoding=UTF-8 option.


    Any Other Information:
    The previous section has pretty much become a concoction of the "Suggestion", "Reason" and "Any Other Information" categories.

    Link To This Plugin/Is this a custom addition?:
    *Points at the chat plugin that is running on ECC right now and would be completely unaffected by rule changes*

    Fun fact: The BBCode for this humongous thread has exactly 8113 characters and has therefore exceeded 8.1% of xenForo's maximum post length.
     
    • Winner x 12
    • Like x 3
    • Agree x 1
    • Informative x 1
    • Useful x 1
    • Optimistic x 1
    • List
    #1 314, May 14, 2017
    Last edited: Oct 31, 2017
  2. padsen

    padsen Glory to Arstotzka!
    Resident ⛰️ Ex-Tycoon ⚜️⚜️⚜️ Prestige ⭐ I ⭐ Premium Upgrade

    Joined:
    Apr 30, 2015
    Messages:
    295
    Trophy Points:
    39,910
    Gender:
    Male
    Ratings:
    +1,650
    +1 and I guess bump
     
  3. 6_28

    6_28
    Builder ⛰️ Ex-President ⚒️⚒️ Premium Upgrade

    Joined:
    Jul 5, 2016
    Messages:
    413
    Trophy Points:
    33,160
    Gender:
    Male
    EcoDollars:
    $0
    Ratings:
    +1,658
    I just wasted 7 minutes of my life reading that,

    +1
     
  4. Expipiplusone

    Expipiplusone Builder
    Builder ⛰️ Ex-Tycoon ⚜️⚜️⚜️ Premium Upgrade

    Joined:
    Sep 13, 2014
    Messages:
    1,592
    Trophy Points:
    37,590
    Gender:
    Male
    Ratings:
    +778
    Wait, how come I didn't +1 this?
     
  5. 314

    314 Irrational Moderator, former ServerAdmin
    SuperMod EcoLegend ⛰️⛰️⛰️⛰️ Ex-President ⚒️⚒️ Prestige ⭐ VI ⭐ Premium Upgrade

    Joined:
    Apr 1, 2014
    Messages:
    7,050
    Trophy Points:
    97,160
    EcoDollars:
    $0
    Ratings:
    +4,917
    Tri-weekly bump since the last post.
     
    • Optimistic Optimistic x 2
    • Like Like x 1
    • List
  6. 314

    314 Irrational Moderator, former ServerAdmin
    SuperMod EcoLegend ⛰️⛰️⛰️⛰️ Ex-President ⚒️⚒️ Prestige ⭐ VI ⭐ Premium Upgrade

    Joined:
    Apr 1, 2014
    Messages:
    7,050
    Trophy Points:
    97,160
    EcoDollars:
    $0
    Ratings:
    +4,917
    18-daily bump.
     
    • Like Like x 2
    • Optimistic Optimistic x 1
    • List
  7. JamieSinn

    JamieSinn Retired Lead Administrator/Developer
    Builder ⛰️ Ex-Tycoon ⚜️⚜️⚜️ Premium Upgrade

    Joined:
    Jun 4, 2011
    Messages:
    5,517
    Trophy Points:
    78,090
    Gender:
    Male
    Ratings:
    +4,588
    I could probably do some trickery and enable the logs to be UTF-8 and possibly remove the entire rule, but that's not for now.
     
  8. 314

    314 Irrational Moderator, former ServerAdmin
    SuperMod EcoLegend ⛰️⛰️⛰️⛰️ Ex-President ⚒️⚒️ Prestige ⭐ VI ⭐ Premium Upgrade

    Joined:
    Apr 1, 2014
    Messages:
    7,050
    Trophy Points:
    97,160
    EcoDollars:
    $0
    Ratings:
    +4,917
    Tri-weekly bump because 2.[ABCD] are still available.

    I wish I could provide chat statistics by Pivillean here, but I should have implemented proper UTF-8 handling for that...
     
    • Optimistic Optimistic x 1
    • List
    #8 314, Aug 1, 2017
    Last edited: Aug 1, 2017
  9. 314

    314 Irrational Moderator, former ServerAdmin
    SuperMod EcoLegend ⛰️⛰️⛰️⛰️ Ex-President ⚒️⚒️ Prestige ⭐ VI ⭐ Premium Upgrade

    Joined:
    Apr 1, 2014
    Messages:
    7,050
    Trophy Points:
    97,160
    EcoDollars:
    $0
    Ratings:
    +4,917
    24-daily bump.
     
    • Like Like x 2
    • Optimistic Optimistic x 1
    • List
  10. lambiser

    lambiser Builder
    Builder ⛰️ Ex-President ⚒️⚒️

    Joined:
    Jun 4, 2017
    Messages:
    858
    Trophy Points:
    13,590
    Gender:
    Female
    Ratings:
    +906
    +1 I didnt read it all through, but by only reading bits and pieces i can agree with this 100%

    (Here is a bump for ya, @314 )
     
  11. G5_Gaming

    G5_Gaming Resident
    Resident ⛰️ Ex-President ⚒️⚒️

    Joined:
    Mar 3, 2017
    Messages:
    139
    Trophy Points:
    8,510
    Gender:
    Male
    Ratings:
    +203
    I agree with the rule change of chat so here's a +1 and a bump for ya @314 also i read the whole document. Good job.
     
  12. Netsui

    Netsui EcoLeader
    EcoLeader ⛰️⛰️⛰️ Ex-Mayor ⚒️⚒️ Premium Upgrade

    Joined:
    May 11, 2012
    Messages:
    2,348
    Trophy Points:
    45,360
    EcoDollars:
    $0
    Ratings:
    +274
    You'd think you would have poked Jamie to look at this by now.
     
  13. 314

    314 Irrational Moderator, former ServerAdmin
    SuperMod EcoLegend ⛰️⛰️⛰️⛰️ Ex-President ⚒️⚒️ Prestige ⭐ VI ⭐ Premium Upgrade

    Joined:
    Apr 1, 2014
    Messages:
    7,050
    Trophy Points:
    97,160
    EcoDollars:
    $0
    Ratings:
    +4,917
    So... I made a few tests using the JVM option -Dfile.encoding=UTF-8. The results of my experiments look fine so far, am I missing something?
     
    • Optimistic Optimistic x 1
    • List
    #13 314, Sep 8, 2017
    Last edited: Sep 8, 2017
  14. lambiser

    lambiser Builder
    Builder ⛰️ Ex-President ⚒️⚒️

    Joined:
    Jun 4, 2017
    Messages:
    858
    Trophy Points:
    13,590
    Gender:
    Female
    Ratings:
    +906
    Please say that in English. I'm a lamb, not a genius. XD
     
  15. Expipiplusone

    Expipiplusone Builder
    Builder ⛰️ Ex-Tycoon ⚜️⚜️⚜️ Premium Upgrade

    Joined:
    Sep 13, 2014
    Messages:
    1,592
    Trophy Points:
    37,590
    Gender:
    Male
    Ratings:
    +778
    Is the (possible) "non-standard character" issue only with logs, or is there something else that might break as well?
     
  16. 314

    314 Irrational Moderator, former ServerAdmin
    SuperMod EcoLegend ⛰️⛰️⛰️⛰️ Ex-President ⚒️⚒️ Prestige ⭐ VI ⭐ Premium Upgrade

    Joined:
    Apr 1, 2014
    Messages:
    7,050
    Trophy Points:
    97,160
    EcoDollars:
    $0
    Ratings:
    +4,917
    To clarify:
    The Minecraft/Spigot server software is executed by Java, like normal Minecraft. Java can be supplied with additional parameters, including -Dfile.encoding. This flag tells Java to use a specific data format when writing to a file. UTF-8 is a data format that can properly display (currently disallowed) symbols.

    Well... the console.
    Code:
    [Server thread/INFO]: <314> Debüg.
    [Server thread/INFO]: <314> ẞß
    
    [​IMG]
    However, I don't think the console is as relevant as the actual logs - when andrew and Jamie access the console they probably have more important things to do than warning players for the use of special characters.
     
    • Optimistic Optimistic x 1
    • List
    #16 314, Sep 10, 2017
    Last edited: Sep 10, 2017
  17. lambiser

    lambiser Builder
    Builder ⛰️ Ex-President ⚒️⚒️

    Joined:
    Jun 4, 2017
    Messages:
    858
    Trophy Points:
    13,590
    Gender:
    Female
    Ratings:
    +906
     
  18. lambiser

    lambiser Builder
    Builder ⛰️ Ex-President ⚒️⚒️

    Joined:
    Jun 4, 2017
    Messages:
    858
    Trophy Points:
    13,590
    Gender:
    Female
    Ratings:
    +906
    So like you want it so that there are no symbols in chat???
     
  19. Expipiplusone

    Expipiplusone Builder
    Builder ⛰️ Ex-Tycoon ⚜️⚜️⚜️ Premium Upgrade

    Joined:
    Sep 13, 2014
    Messages:
    1,592
    Trophy Points:
    37,590
    Gender:
    Male
    Ratings:
    +778
    I'll try to put it as short as possible.
    A single byte can hold up to 256 different characters (usually less), due to the fact that a byte is made of 8 bits and therefore it can assume 2⁸=256 different values.
    Usually that's more than enough for "standard characters" (i.e. 26 lowercase + 26 uppercase + 10 digits + a bunch of punctuation) and ASCII is a common example of how a single byte has more than enough room to encode all those "standard characters".
    However in the whole world language database there's much more than 256 different symbols (latest Unicode standard encompasses more than a hundred thousands different symbols including all modern and historic scripts, as well as other symbols), so a single byte is not enough to represent any possible single character.
    UTF-8 is a common way to encode all possible Unicode characters in such a way that, most of the times (but not always)*, a single byte is sufficient to represent a character: basically, "standard characters" are encoded in one single byte, while "non-standard characters" might take 2, 3 or up to 4 bytes to be encoded.

    All this, to explain why in @314's screenshot the character ü was misrepresented with two strange symbols: because it was originally encoded in UTF-8, therefore (being non-standard) it required two bytes to be encoded, but the console wrongly assumed the string to be encoded in some fixed-one-byte-per-character encoding (probably ASCII) and therefore printed the two strange symbols associated to those two bytes in that (wrong) fixed-size encoding.

    Simply put: things break when one part encodes some output in one way, and another decodes this output assuming a different (wrong) encoding; therefore, before proceeding, we want to identify all parts that are talking to each other and figure out how to configure them all with the same encoding.
    We don't want to avoid symbols: we want all symbols to be correctly represented as they were meant to be represented in the first place.
    Hope this clears the confusion.

    </teacher>


    *this is true only for text made up of mostly western characters and this is the reason why UTF-8 is to be preferred; however, for text made of mostly other symbols, such as Chinese ideograms, other encodings might be a better choice (I'm not an expert, maybe UTF-16 might be?); but ECC is mostly an English speaking server, so this is not our case and UTF-8 would be perfect.
     
    • Informative Informative x 1
    • List
    #19 Expipiplusone, Sep 10, 2017
    Last edited: Sep 10, 2017
  20. lambiser

    lambiser Builder
    Builder ⛰️ Ex-President ⚒️⚒️

    Joined:
    Jun 4, 2017
    Messages:
    858
    Trophy Points:
    13,590
    Gender:
    Female
    Ratings:
    +906
    ???
    You know what? I'll just let the smart people handle this. XD
     
Thread Status:
Not open for further replies.