Character Sets

GSM 03.38

“GSM 03.38” is a 7-bit character set specification that consists of a 128-character alphabet with an available eight-character extension table for a maximum total of 136 different characters. This alphabet is widely-used in and intended for the mobile industry, and is the most character set for SMS in Western countries.

The documentation for this specification can be located here.

Special Characters

Although there are some special characters shown in the GSM alphabet, some of which even include characters not available in the standard Latin-1 character set (such as the Greek alphabet), not all are supported reliably by carriers.


IA5 or “International Alphabet 5” is an international character set that refers to the ISO 646 specification. Within the mobile industry it is considered aligned to the GSM 03.38 specification and also sometimes to the standard ASCII character set. Within the mobile industry it should be considered the same character set.


Set NameDCS ValueBits/CharChar In SetChar/Msg (Single)Char/Msg (Concat)
GSM 03.3817128 (+8)160153


  • Encoding in 7 bits allows for a greater number of characters per message.
  • Most GSM networks fully support concatenation.
  • Supported by all mobile carriers.
  • Includes an extension table to add a few characters.


  • Of the three character sets outlined in this document, GSM is the least robust in terms of character availability.
  • Not all characters technically available in GSM are supported by carriers.


This character set is perfect for use-cases which do not require special characters, or for those which prioritize message length over character variety.  

ISO 8859-1 Latin-1

“ISO 8859-1” or “Latin-1” (not to be confused with “ISO-8859-1”) is an 8-bit character set also originally created for use on the internet. It consists of 191 readable characters from the Latin script.

Special Characters

In Latin-1 there are up to 63 characters in excess of those available in standard GSM. The encoding for these “special characters” requires a message’s character limit to be 140 rather than GSM’s 160.

Furthermore, special characters in Latin-1 are sent in 16-bit packages. While the message itself is still encoded in 8-bit mode, each individual special character takes up the space of two (16 bits). The max message length remains at 140, as standard characters are still sent as 8 bits each.

ISO-8859-1 (ISO-Latin-1) Superset

ISO-Latin-1 is a superset of Latin-1, created to expand available characters for use online. It is also an 8-bit character set but consists of Latin-1’s 191 characters, plus 65 additional characters for a total character count of 256, representing every possible 8-bit value. The difference in characters between these two sets represents limitations within the mobile industry and carriers.

Standard Specs

Set NameDCS ValueBits/CharChar In SetChar/Msg (Single)Char/Msg (Concat)
  • 8-bit
  • 191 distinct characters available
  • 140 characters per single message
  • 134 characters per concatenated segment


  • A wide variety of special characters are available, even without the ISO-Latin-1 Superset.
  • Latin-1 (not ISO-Latin-1) is widely covered by carriers.
  • Reliable transcription of characters


  • Even if no special characters are used, max message length is still 140.
  • The ISO-Latin-1 Superset exists but is not yet reliably or publicly supported via mobile.
  • Special characters take up two characters’ worth of space.
  • Only reliable for standard English characters.

7-bit Latin-1 Specs

Although Latin-1 is an 8-bit language, it is possible to send it via 7-bit character encoding. This allows for the full potential message length of 160 characters. However, this message length is directly relative to the presence (or lack thereof) of special characters. The use of a special character will automatically reduce the available message length to 140 characters, as though encoded in 8-bit.

  • 7-bit
  • 191 distinct characters available
  • 140-160 characters per single message
  • 134-153 characters per concatenated segment


  • A wide variety of special characters are available.
  • Available message length is equal to that of GSM without use of special characters.
  • Widely-covered by carriers.


  • Transcription of characters is slightly less reliable, though the difference is minor.
  • Presence of special characters reduces message length to that of standard 8-bit Latin-1.
  • Special characters take up two characters’ worth of space.
  • For standard Latin or Romanized languages only.


8-bit Latin-1 can be used for those who are not concerned with message length, and prefer to have that available length consistent from message-to-message regardless of the presence of special characters. However, Aerialink’s top recommendation for most use-cases is 7-bit Latin-1 encoding, as it offers the best of both GSM and Latin-1 with 160 available characters per message, yet the availability of a larger variety of characters when needed.  

UTF-16 Unicode

“UTF-16” is capable of encoding 1,112,064 numbers (code points) in the Unicode code space from 0 to 0x10FFFF. Unlike ISO 8859 which is an umbrella to many specified character sets such as Latin-1 (for Latin languages) and ISO 8859-6 (the Greek language character set), Unicode is a single character set including over a million characters. This means that if a carrier and mobile device support Unicode, they do not need to support individual character sets or shift tables in order to offer Arabic, Chinese, Korean, Japanese, Cyrillic languages. Unfortunately, UTF-16 is not supported on all carriers, let alone on all mobile handsets. Furthermore, although the range of characters offered is robust, the message length available in a 16-bit message is 70 characters. (4)


“UTF-8” is still unicode. Passing a DCS value of 8 will default your messages to send in an 8-bit mode of Unicode. However, this is only the case for the first 128 characters in the set. All other characters will send for 16 bits.


Set NameDCS ValueBits/CharChar In SetChar/Msg (Single)Char/Msg (Concat)
UTF-168, 16161,112,0647067
  • 16-bit
  • 1 million+ distinct characters available
  • 70 characters per single message
  • 67 characters per concatenated segment


  • Nearly limitless variety of characters
  • Inherently multi-lingual
  • Can replace dozens of individual sets and shift tables with a single character set.


  • Not widely supported by carriers
  • Unreadable by many mobile devices
  • Highly reduced message length


Unicode is the best bet for users who need to support a wide variety of languages at all times, particularly those of Western, Middle-Eastern, and Eastern Asia. Those who value character variety above all else and do not mind a shortened message length and reduced carrier support and handset audience will benefit greatly from the use of UTF-16.


Support for emoji is not yet standardized across mobile carriers, let alone mobile devices. Because of this, the support of emoji characters cannot be guaranteed via SMS in either direction or on any specific carrier.

However, Aerialink does specifically support outbound emoji delivery when sent using MMS.

Supported Characters

DecLatin-1 DescriptionGSM 03.38 Description
9Horizontal Tab
10Line FeedLine Feed
13Carriage ReturnCarriage Return
35#Number (Pound)#Number (Pound)
40(Open Parenthesis(Open Parenthesis
41)Close Parenthesis)Close Parenthesis
65ACap AACap A
66BCap BBCap B
67CCap CCCap C
68DCap DDCap D
69ECap EECap E
70FCap FFCap F
71GCap GGCap G
72HCap HHCap H
73ICap IICap I
74JCap JJCap J
75KCap KKCap K
76LCap LLCap L
77MCap MMCap M
78NCap NNCap N
79OCap OOCap O
80PCap PPCap P
81QCap QQCap Q
82RCap RRCap R
83SCap SSCap S
84TCap TTCap T
85UCap UUCap U
86VCap VVCap V
87WCap WWCap W
88XCap XXCap X
89YCap YYCap Y
90ZCap ZZCap Z
91[Open Bracket
93]Close Bracket
96`Grave Accent
97aLow aaLow a
98bLow bbLow b
99cLow ccLow c
100dLow ddLow d
101eLow eeLow e
102fLow ffLow f
103gLow ggLow g
104hLow hhLow h
105iLow iiLow i
106jLow jjLow j
107kLow kkLow k
108lLow llLow l
109mLow mmLow m
110nLow nnLow n
111oLow ooLow o
112pLow ppLow p
113qLow qqLoq q
114rLow rrLow r
115sLow ssLow s
116tLow ttLow t
117uLow uuLow u
118vLow vvLow v
119wLow wwLow w
120xLow xxLow x
121yLow yyLow y
122zLow zzLow z
123{Open Curly Bracket
124|Vertical Bar
125}Close Curly Bracket

16-Bit Special Latin-1 Characters

DecLatin-1 Description
160Non-breaking space*
161¡Inverted Exclamation
163£Pound Sterling
164¤General Currency
166¦Broken Vertical Bar
170ªFeminine Ordinal
171«Left Guillemet
173Soft Hyphen
174®Registered Trademark
180´Acute Accent
185¹Superscript 1
186ºMasculine Ordinal
187»Right Guillemet
188¼One Fourth
189½One Half
190¾Three Quarters
191¿Inverted Question
192ÀCap A Grave
193ÁCap A Acute
194ÂCap A Circumflex
195ÃCap A Tilde
196ÄCap A Umlaut (Dieresis)
197ÅCap A Ring
198ÆCap AE Dipthong (Ligature)
199ÇCap C Cedilla
200ÈCap E Grave
201ÉCap E Acute
202ÊCap E Circumflex
203ËCap E Umlaut (Dieresis)
204ÌCap I Grave
205ÍCap I Acute
206ÎCap I Circumflex
207ÏCap I Umlaut (Dieresis)
208ÐCap Eth (Icelandic)
209ÑCap N Tilde
210ÒCap O Grave
211ÓCap O Acute
212ÔCap O Circumflex
213ÕCap O Tilde
214ÖCap O Umlaut (Dieresis)
216ØCap O Slashed
217ÙCap U Grave
218ÚCap U Acute
219ÛCap U Circumflex
220ÜCap U Umlaut (Dieresis)
221ÝCap Y Acute
222ÞCap Thorn (Icelandic)
223ßSmall Sz (German Ligature)
224àSmall a Grave
225áSmall a Acute
226âSmall a Circumflex
227ãSmall a Tilde
228äSmall a Umlaut (Dieresis)
229åSmall a Ring
230æSmall ae Dipthong
231çSmall c Cedilla
232èSmall e Grave
233éSmall e Acute
234êSmall e Circumflex
235ëSmall e Umlaut (Dieresis)
236ìSmall i Grave
237íSmall i Acute
238îSmall i Circumflex
239ïSmall i Umlaut (Dieresis)
240ðSmall eth (Icelandic)
241ñSmall n Tilde
242òSmall o Grave
243óSmall o Acute
244ôSmall o Circumflex
245õSmall o Tilde
246öSmall o Umlaut (Dieresis)
248øSmall o Slashed
249ùSmall u Grave
250úSmall u Acute
251ûSmall u Circumflex
252üSmall u Umlaut
253ýSmall y Acute
254þSmall thorn (Icelandic)
255ÿSmall y Umlaut (Dieresis)

This page was last updated 1521042189810