GIF89a; EcchiShell v1.0
//usr/share/perl5/pod/

Mass Deface . The Unicode standard uses the notation C, to give the hexadecimal code point and the normative name of the character. Unicode also defines various I for the characters, like "uppercase" or "lowercase", "decimal digit", or "punctuation"; these properties are independent of the names of the characters. Furthermore, various operations on the characters like uppercasing, lowercasing, and collating (sorting) are defined. A Unicode I "character" can actually consist of more than one internal I "character" or code point. For Western languages, this is adequately modelled by a I (like C) followed by one or more I (like C). This sequence of base character and modifiers is called a I. Some non-western languages require more complicated models, so Unicode created the I concept, which was later further refined into the I. For example, a Korean Hangul syllable is considered a single logical character, but most often consists of three actual Unicode characters: a leading consonant followed by an interior vowel followed by a trailing consonant. Whether to call these extended grapheme clusters "characters" depends on your point of view. If you are a programmer, you probably would tend towards seeing each element in the sequences as one unit, or "character". However from the user's point of view, the whole sequence could be seen as one "character" since that's probably what it looks like in the context of the user's language. In this document, we take the programmer's point of view: one "character" is one Unicode code point. For some combinations of base character and modifiers, there are I characters. There is a single character equivalent, for example, to the sequence C followed by C. It is called C. These precomposed characters are, however, only available for some combinations, and are mainly meant to support round-trip conversions between Unicode and legacy standards (like ISO 8859). Using sequences, as Unicode does, allows for needing fewer basic building blocks (code points) to express many more potential grapheme clusters. To support conversion between equivalent forms, various I are also defined. Thus, C is in I, (abbreviated NFC), and the sequence C followed by C represents the same character in I (NFD). Because of backward compatibility with legacy encodings, the "a unique number for every character" idea breaks down a bit: instead, there is "at least one number for every character". The same character could be represented differently in several legacy encodings. The converse is not also true: some code points do not have an assigned character. Firstly, there are unallocated code points within otherwise used blocks. Secondly, there are special Unicode control characters that do not represent true characters. When Unicode was first conceived, it was thought that all the world's characters could be represented using a 16-bit word; that is a maximum of C<0x10000> (or 65536) characters from C<0x0000> to C<0xFFFF> would be needed. This soon proved to be false, and since Unicode 2.0 (July 1996), Unicode has been defined all the way up to 21 bits (C<0x10FFFF>), and Unicode 3.1 (March 2001) defined the first characters above C<0xFFFF>. The first C<0x10000> characters are called the I, or the I (BMP). With Unicode 3.1, 17 (yes, seventeen) planes in all were defined--but they are nowhere near full of defined characters, yet. When a new language is being encoded, Unicode generally will choose a C of consecutive unallocated code points for its characters. So far, the number of code points in these blocks has always been evenly divisible by 16. Extras in a block, not currently needed, are left unallocated, for future growth. But there have been occasions when a later relase needed more code points than the available extras, and a new block had to allocated somewhere else, not contiguous to the initial one, to handle the overflow. Thus, it became apparent early on that "block" wasn't an adequate organizing principal, and so the C