Character Sets

GSM-7

“GSM-7” is a 7-bit character set widely used in, and intended for, the mobile industry. It consists of a 128-character alphabet with an available eight-character extension table for a maximum total of 136 different characters. Although there are some “special” characters shown in the GSM alphabet (such as the Greek alphabet), not all are supported reliably by carriers. A single, 140-byte GSM message can fit a maximum of 160 7-bit characters.

This character set is perfect for use-cases which do not require special characters, or for those which prioritize message length over character variety.

Keep in mind:

  • Using 7-bit character encoding allows a greater number of characters per message.
  • The following GSM characters are 8-bit and may change affect the available character length of your message: |, ^, {, }, €, [, ~, ] and \
  • Most GSM networks fully support concatenation.
  • Supported by all mobile carriers.
  • Includes an extension table to add a few characters.
  • Of the character sets available for use in messaging, GSM is the least robust in terms of character availability.
  • Not all characters technically “available” in GSM are supported by carriers.

The documentation for this specification is located here.

Latin-1

“Latin-1” is an 8-bit character set which consists of 191 readable characters from the Latin script - up to 63 more characters than in GSM-7. Encoding with 8-bit characters means that the maximum character length per 140-byte message is 140 characters.

Furthermore, some special characters in Latin-1 are sent in 16-bit packages. While the message itself is still encoded as 8-bit, individual special characters may take up the space of two characters.

Keep in mind:

  • Even if no non-GSM characters are used, messages encoded as 8-bit still have a max character-per-message length of 140.
  • Some special characters take up 16 bits (two characters’ worth of space).

UTF-16 Unicode

“UTF-16” is capable of encoding 1,112,064 code points in the Unicode code space from 0 to 0x10FFFF. Unlike ISO 8859 which is an umbrella to many specified character sets such as Latin-1 (for Latin languages) and ISO 8859-6 (the Greek language character set), Unicode is a single character set including over a million characters. This means that if a carrier and mobile device support Unicode, they do not need to support individual character sets or shift tables in order to offer Arabic, Chinese, Korean, Japanese, Cyrillic languages.

Unfortunately, UTF-16 is not supported on all carriers, let alone on all mobile handsets. Furthermore, although the range of characters offered is robust, the message length available in a 16-bit message is 70 characters.

Keep in mind:

  • Nearly limitless variety of characters
  • Inherently multi-lingual
  • Can replace dozens of individual sets and shift tables with a single character set.
  • Not widely supported by carriers
  • Unreadable by many mobile devices
  • Highly reduced message length

Unicode is the best bet for users who need to support a wide variety of languages at all times, particularly those of Western, Middle-Eastern, and Eastern Asia. Those who value character variety above all else and do not mind a shortened message length and reduced carrier support and handset audience will benefit from UTF-16.

Emoji

Support for emoji is not yet standardized across mobile carriers, let alone mobile devices. Because of this, the support of emoji characters cannot be guaranteed via SMS in either direction or on any specific carrier.

However, Aerialink does specifically support outbound emoji delivery when sent using MMS.

Supported Characters

DecLatin-1 DescriptionGSM 03.38 Description
9Horizontal Tab
10Line FeedLine Feed
13Carriage ReturnCarriage Return
27Escape
32SpaceSpace
33!Exclamation!Exclamation
34QuotationQuotation
35#Number (Pound)#Number (Pound)
36$Dollar
37%Percentage%Percentage
38&Ampersand&Ampersand
39ApostropheApostrophe
40(Open Parenthesis(Open Parenthesis
41)Close Parenthesis)Close Parenthesis
42*Asterisk*Asterisk
43+Plus+Plus
44,Comma,Comma
45-Hyphen-Hyphen
46.Period.Period
47/Slash/Slash
480Zero0Zero
491One1One
502Two2Two
513Three3Three
524Four4Four
535Five5Five
546Six6Six
557Seven7Seven
568Eight8Eight
579Nine9Nine
58:Colon:Colon
59;Semicolon;Semicolon
60<Less-than<Less-than
61=Equal=Equal
62>Greater-than>Greater-than
63?Question?Question
64@At
65ACap AACap A
66BCap BBCap B
67CCap CCCap C
68DCap DDCap D
69ECap EECap E
70FCap FFCap F
71GCap GGCap G
72HCap HHCap H
73ICap IICap I
74JCap JJCap J
75KCap KKCap K
76LCap LLCap L
77MCap MMCap M
78NCap NNCap N
79OCap OOCap O
80PCap PPCap P
81QCap QQCap Q
82RCap RRCap R
83SCap SSCap S
84TCap TTCap T
85UCap UUCap U
86VCap VVCap V
87WCap WWCap W
88XCap XXCap X
89YCap YYCap Y
90ZCap ZZCap Z
91[Open Bracket
92\Backslash
93]Close Bracket
94^Caret
95_Underscore
96`Grave Accent
97aLow aaLow a
98bLow bbLow b
99cLow ccLow c
100dLow ddLow d
101eLow eeLow e
102fLow ffLow f
103gLow ggLow g
104hLow hhLow h
105iLow iiLow i
106jLow jjLow j
107kLow kkLow k
108lLow llLow l
109mLow mmLow m
110nLow nnLow n
111oLow ooLow o
112pLow ppLow p
113qLow qqLoq q
114rLow rrLow r
115sLow ssLow s
116tLow ttLow t
117uLow uuLow u
118vLow vvLow v
119wLow wwLow w
120xLow xxLow x
121yLow yyLow y
122zLow zzLow z
123{Open Curly Bracket
124|Vertical Bar
125}Close Curly Bracket
126~Tilde
127Delete

16-Bit Special Latin-1 Characters

DecLatin-1 Description
160Non-breaking space*
161¡Inverted Exclamation
162¢Cent
163£Pound Sterling
164¤General Currency
165¥Yen
166¦Broken Vertical Bar
167§Section
168¨Umlaut
169©Copyright
170ªFeminine Ordinal
171«Left Guillemet
172¬Negation
173Soft Hyphen
174®Registered Trademark
175¯Macron
176°Degree
177±Plus-Minus
178²Square
179³Cube
180´Acute Accent
181µMicro
182Paragraph
183·Interpunct
184¸Cedilla
185¹Superscript 1
186ºMasculine Ordinal
187»Right Guillemet
188¼One Fourth
189½One Half
190¾Three Quarters
191¿Inverted Question
192ÀCap A Grave
193ÁCap A Acute
194ÂCap A Circumflex
195ÃCap A Tilde
196ÄCap A Umlaut (Dieresis)
197ÅCap A Ring
198ÆCap AE Dipthong (Ligature)
199ÇCap C Cedilla
200ÈCap E Grave
201ÉCap E Acute
202ÊCap E Circumflex
203ËCap E Umlaut (Dieresis)
204ÌCap I Grave
205ÍCap I Acute
206ÎCap I Circumflex
207ÏCap I Umlaut (Dieresis)
208ÐCap Eth (Icelandic)
209ÑCap N Tilde
210ÒCap O Grave
211ÓCap O Acute
212ÔCap O Circumflex
213ÕCap O Tilde
214ÖCap O Umlaut (Dieresis)
215×Multiplication
216ØCap O Slashed
217ÙCap U Grave
218ÚCap U Acute
219ÛCap U Circumflex
220ÜCap U Umlaut (Dieresis)
221ÝCap Y Acute
222ÞCap Thorn (Icelandic)
223ßSmall Sz (German Ligature)
224àSmall a Grave
225áSmall a Acute
226âSmall a Circumflex
227ãSmall a Tilde
228äSmall a Umlaut (Dieresis)
229åSmall a Ring
230æSmall ae Dipthong
231çSmall c Cedilla
232èSmall e Grave
233éSmall e Acute
234êSmall e Circumflex
235ëSmall e Umlaut (Dieresis)
236ìSmall i Grave
237íSmall i Acute
238îSmall i Circumflex
239ïSmall i Umlaut (Dieresis)
240ðSmall eth (Icelandic)
241ñSmall n Tilde
242òSmall o Grave
243óSmall o Acute
244ôSmall o Circumflex
245õSmall o Tilde
246öSmall o Umlaut (Dieresis)
247÷Division
248øSmall o Slashed
249ùSmall u Grave
250úSmall u Acute
251ûSmall u Circumflex
252üSmall u Umlaut
253ýSmall y Acute
254þSmall thorn (Icelandic)
255ÿSmall y Umlaut (Dieresis)

This page was last updated 1522161900855