Skip to content

Commit f2bfbac

Browse files
Unicode Consortiumkhwilliamson
Unicode Consortium
authored andcommitted
Use Unicode 9.0
This includes regenerating the files that depend on the Unicode 9 data files
1 parent b0e2440 commit f2bfbac

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

55 files changed

+17099
-5421
lines changed

charclass_invlists.h

Lines changed: 4217 additions & 611 deletions
Large diffs are not rendered by default.

lib/Unicode/UCD.t

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ use Test::More;
1919

2020
use Unicode::UCD qw(charinfo charprop charprops_all);
2121

22-
my $expected_version = '8.0.0';
22+
my $expected_version = '9.0.0';
2323
my $current_version = Unicode::UCD::UnicodeVersion;
2424
my $v_unicode_version = pack "C*", split /\./, $current_version;
2525
my $unknown_script = ($v_unicode_version lt v5.0.0)

lib/unicore/ArabicShaping.txt

Lines changed: 95 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,23 @@
1-
# ArabicShaping-8.0.0.txt
2-
# Date: 2015-02-17, 23:33:00 GMT [RP]
1+
# ArabicShaping-9.0.0.txt
2+
# Date: 2016-02-24, 22:25:00 GMT [RP]
3+
# © 2016 Unicode®, Inc.
4+
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
5+
# For terms of use, see http://www.unicode.org/terms_of_use.html
36
#
47
# This file is a normative contributory data file in the
58
# Unicode Character Database.
69
#
7-
# Copyright (c) 1991-2014 Unicode, Inc.
8-
# For terms of use, see http://www.unicode.org/terms_of_use.html
9-
#
1010
# This file defines the Joining_Type and Joining_Group property
1111
# values for Arabic, Syriac, N'Ko, Mandaic, and Manichaean positional
1212
# shaping, repeating in machine readable form the information
1313
# exemplified in Tables 9-3, 9-8, 9-9, 9-10, 9-14, 9-15, 9-16, 9-19,
1414
# 9-20, 10-4, 10-5, 10-6, 10-7, and 19-5 of The Unicode Standard core
1515
# specification. This file also defines Joining_Type values for
16-
# Mongolian, Phags-pa, and Psalter Pahlavi positional shaping, which
17-
# are not listed in tables in the standard.
16+
# Mongolian, Phags-pa, Psalter Pahlavi, and Adlam positional shaping,
17+
# which are not listed in tables in the standard.
1818
#
19-
# See Sections 9.2, 9.3, 9.5, 10.5, 10.6, 13.4, 14.3, 19.4 of
20-
# The Unicode Standard core specification for more information.
19+
# See Sections 9.2, 9.3, 9.5, 10.5, 10.6, 13.4, 14.3, 19.4, and 19.9
20+
# of The Unicode Standard core specification for more information.
2121
#
2222
# Each line contains four fields, separated by a semicolon.
2323
#
@@ -50,8 +50,8 @@
5050
# Field 3: defines the joining group (property name: Joining_Group)
5151
#
5252
# The values of the joining group are based schematically on character
53-
# names. Where a schematic character name consists of two or more parts separated
54-
# by spaces, the formal Joining_Group property value, as specified in
53+
# names. Where a schematic character name consists of two or more parts
54+
# separated by spaces, the formal Joining_Group property value, as specified in
5555
# PropertyValueAliases.txt, consists of the same name parts joined by
5656
# underscores. Hence, the entry:
5757
#
@@ -90,7 +90,7 @@
9090
# have joining type T.
9191
# - All others not explicitly listed have joining type U.
9292
#
93-
# For an explicit listing of characters of joining type T, see
93+
# For an explicit listing of all characters of joining type T, see
9494
# the derived property file DerivedJoiningType.txt.
9595
#
9696
# #############################################################
@@ -436,6 +436,15 @@
436436
08B2; REH WITH DOT AND INVERTED V ABOVE; R; REH
437437
08B3; AIN WITH 3 DOTS BELOW; D; AIN
438438
08B4; KAF WITH DOT BELOW; D; KAF
439+
08B6; BEH WITH MEEM ABOVE; D; BEH
440+
08B7; DOTLESS BEH WITH 3 DOTS BELOW AND MEEM ABOVE; D; BEH
441+
08B8; DOTLESS BEH WITH TEH ABOVE; D; BEH
442+
08B9; REH WITH NOON ABOVE; R; REH
443+
08BA; YEH WITH NOON ABOVE; D; YEH
444+
08BB; AFRICAN FEH; D; AFRICAN FEH
445+
08BC; AFRICAN QAF; D; AFRICAN QAF
446+
08BD; AFRICAN NOON; D; AFRICAN NOON
447+
08E2; ARABIC DISPUTED END OF AYAH; U; No_Joining_Group
439448

440449
# Mongolian Characters
441450

@@ -536,8 +545,8 @@
536545
1882; MONGOLIAN ALI GALI DAMARU; U; No_Joining_Group
537546
1883; MONGOLIAN ALI GALI UBADAMA; U; No_Joining_Group
538547
1884; MONGOLIAN ALI GALI INVERTED UBADAMA; U; No_Joining_Group
539-
1885; MONGOLIAN ALI GALI BALUDA; U; No_Joining_Group
540-
1886; MONGOLIAN ALI GALI THREE BALUDA; U; No_Joining_Group
548+
1885; MONGOLIAN ALI GALI BALUDA; T; No_Joining_Group
549+
1886; MONGOLIAN ALI GALI THREE BALUDA; T; No_Joining_Group
541550
1887; MONGOLIAN ALI GALI A; D; No_Joining_Group
542551
1888; MONGOLIAN ALI GALI I; D; No_Joining_Group
543552
1889; MONGOLIAN ALI GALI KA; D; No_Joining_Group
@@ -578,6 +587,7 @@
578587

579588
200C; ZERO WIDTH NON-JOINER; U; No_Joining_Group
580589
200D; ZERO WIDTH JOINER; C; No_Joining_Group
590+
202F; NARROW NO-BREAK SPACE; U; No_Joining_Group
581591
2066; LEFT-TO-RIGHT ISOLATE; U; No_Joining_Group
582592
2067; RIGHT-TO-LEFT ISOLATE; U; No_Joining_Group
583593
2068; FIRST STRONG ISOLATE; U; No_Joining_Group
@@ -711,4 +721,75 @@ A873; PHAGS-PA CANDRABINDU; U; No_Joining_Group
711721
10BAE; PSALTER PAHLAVI TWENTY; D; No_Joining_Group
712722
10BAF; PSALTER PAHLAVI HUNDRED; U; No_Joining_Group
713723

724+
# Adlam Characters
725+
726+
1E900;ADLAM CAPITAL ALIF; D; No_Joining_Group
727+
1E901;ADLAM CAPITAL DAALI; D; No_Joining_Group
728+
1E902;ADLAM CAPITAL LAAM; D; No_Joining_Group
729+
1E903;ADLAM CAPITAL MIIM; D; No_Joining_Group
730+
1E904;ADLAM CAPITAL BA; D; No_Joining_Group
731+
1E905;ADLAM CAPITAL SINNYIIYHE; D; No_Joining_Group
732+
1E906;ADLAM CAPITAL PE; D; No_Joining_Group
733+
1E907;ADLAM CAPITAL BHE; D; No_Joining_Group
734+
1E908;ADLAM CAPITAL RA; D; No_Joining_Group
735+
1E909;ADLAM CAPITAL E; D; No_Joining_Group
736+
1E90A;ADLAM CAPITAL FA; D; No_Joining_Group
737+
1E90B;ADLAM CAPITAL I; D; No_Joining_Group
738+
1E90C;ADLAM CAPITAL O; D; No_Joining_Group
739+
1E90D;ADLAM CAPITAL DHA; D; No_Joining_Group
740+
1E90E;ADLAM CAPITAL YHE; D; No_Joining_Group
741+
1E90F;ADLAM CAPITAL WAW; D; No_Joining_Group
742+
1E910;ADLAM CAPITAL NUN; D; No_Joining_Group
743+
1E911;ADLAM CAPITAL KAF; D; No_Joining_Group
744+
1E912;ADLAM CAPITAL YA; D; No_Joining_Group
745+
1E913;ADLAM CAPITAL U; D; No_Joining_Group
746+
1E914;ADLAM CAPITAL JIIM; D; No_Joining_Group
747+
1E915;ADLAM CAPITAL CHI; D; No_Joining_Group
748+
1E916;ADLAM CAPITAL HA; D; No_Joining_Group
749+
1E917;ADLAM CAPITAL QAAF; D; No_Joining_Group
750+
1E918;ADLAM CAPITAL GA; D; No_Joining_Group
751+
1E919;ADLAM CAPITAL NYA; D; No_Joining_Group
752+
1E91A;ADLAM CAPITAL TU; D; No_Joining_Group
753+
1E91B;ADLAM CAPITAL NHA; D; No_Joining_Group
754+
1E91C;ADLAM CAPITAL VA; D; No_Joining_Group
755+
1E91D;ADLAM CAPITAL KHA; D; No_Joining_Group
756+
1E91E;ADLAM CAPITAL GBE; D; No_Joining_Group
757+
1E91F;ADLAM CAPITAL ZAL; D; No_Joining_Group
758+
1E920;ADLAM CAPITAL KPO; D; No_Joining_Group
759+
1E921;ADLAM CAPITAL SHA; D; No_Joining_Group
760+
1E922;ADLAM SMALL ALIF; D; No_Joining_Group
761+
1E923;ADLAM SMALL DAALI; D; No_Joining_Group
762+
1E924;ADLAM SMALL LAAM; D; No_Joining_Group
763+
1E925;ADLAM SMALL MIIM; D; No_Joining_Group
764+
1E926;ADLAM SMALL BA; D; No_Joining_Group
765+
1E927;ADLAM SMALL SINNYIIYHE; D; No_Joining_Group
766+
1E928;ADLAM SMALL PE; D; No_Joining_Group
767+
1E929;ADLAM SMALL BHE; D; No_Joining_Group
768+
1E92A;ADLAM SMALL RA; D; No_Joining_Group
769+
1E92B;ADLAM SMALL E; D; No_Joining_Group
770+
1E92C;ADLAM SMALL FA; D; No_Joining_Group
771+
1E92D;ADLAM SMALL I; D; No_Joining_Group
772+
1E92E;ADLAM SMALL O; D; No_Joining_Group
773+
1E92F;ADLAM SMALL DHA; D; No_Joining_Group
774+
1E930;ADLAM SMALL YHE; D; No_Joining_Group
775+
1E931;ADLAM SMALL WAW; D; No_Joining_Group
776+
1E932;ADLAM SMALL NUN; D; No_Joining_Group
777+
1E933;ADLAM SMALL KAF; D; No_Joining_Group
778+
1E934;ADLAM SMALL YA; D; No_Joining_Group
779+
1E935;ADLAM SMALL U; D; No_Joining_Group
780+
1E936;ADLAM SMALL JIIM; D; No_Joining_Group
781+
1E937;ADLAM SMALL CHI; D; No_Joining_Group
782+
1E938;ADLAM SMALL HA; D; No_Joining_Group
783+
1E939;ADLAM SMALL QAAF; D; No_Joining_Group
784+
1E93A;ADLAM SMALL GA; D; No_Joining_Group
785+
1E93B;ADLAM SMALL NYA; D; No_Joining_Group
786+
1E93C;ADLAM SMALL TU; D; No_Joining_Group
787+
1E93D;ADLAM SMALL NHA; D; No_Joining_Group
788+
1E93E;ADLAM SMALL VA; D; No_Joining_Group
789+
1E93F;ADLAM SMALL KHA; D; No_Joining_Group
790+
1E940;ADLAM SMALL GBE; D; No_Joining_Group
791+
1E941;ADLAM SMALL ZAL; D; No_Joining_Group
792+
1E942;ADLAM SMALL KPO; D; No_Joining_Group
793+
1E943;ADLAM SMALL SHA; D; No_Joining_Group
794+
714795
# EOF

lib/unicore/BidiBrackets.txt

Lines changed: 14 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,17 @@
1-
# BidiBrackets-8.0.0.txt
2-
# Date: 2015-01-20, 19:00:00 GMT [AG, LI, KW]
1+
# BidiBrackets-9.0.0.txt
2+
# Date: 2016-06-07, 22:30:00 GMT [AG, LI, KW]
3+
# © 2016 Unicode®, Inc.
4+
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
5+
# For terms of use, see http://www.unicode.org/terms_of_use.html
6+
#
7+
# Unicode Character Database
8+
# For documentation, see http://www.unicode.org/reports/tr44/
39
#
410
# Bidi_Paired_Bracket and Bidi_Paired_Bracket_Type Properties
511
#
612
# This file is a normative contributory data file in the Unicode
713
# Character Database.
814
#
9-
# Copyright (c) 1991-2015 Unicode, Inc.
10-
# For terms of use, see http://www.unicode.org/terms_of_use.html
11-
#
1215
# Bidi_Paired_Bracket is a normative property of type Miscellaneous,
1316
# which establishes a mapping between characters that are treated as
1417
# bracket pairs by the Unicode Bidirectional Algorithm.
@@ -26,6 +29,12 @@
2629
# vice versa, and their Bidi_Paired_Bracket_Type (bpt) property values are
2730
# Open (o) and Close (c), respectively.
2831
#
32+
# The brackets with ticks U+298D LEFT SQUARE BRACKET WITH TICK IN TOP CORNER
33+
# through U+2990 RIGHT SQUARE BRACKET WITH TICK IN TOP CORNER are paired the
34+
# same way their glyphs form mirror pairs, according to their bmg property
35+
# values. They are not paired on the basis of a diagonal or antidiagonal
36+
# matching of the corner ticks inferred from code point order.
37+
#
2938
# For legacy reasons, the characters U+FD3E ORNATE LEFT PARENTHESIS and
3039
# U+FD3F ORNATE RIGHT PARENTHESIS do not mirror in bidirectional display
3140
# and therefore do not form a bracket pair.

lib/unicore/BidiMirroring.txt

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,21 @@
1-
# BidiMirroring-8.0.0.txt
2-
# Date: 2015-01-20, 18:30:00 GMT [KW, LI]
1+
# BidiMirroring-9.0.0.txt
2+
# Date: 2016-01-21, 22:00:00 GMT [KW, LI]
3+
# © 2016 Unicode®, Inc.
4+
# For terms of use, see http://www.unicode.org/terms_of_use.html
5+
#
6+
# Unicode Character Database
7+
# For documentation, see http://www.unicode.org/reports/tr44/
38
#
49
# Bidi_Mirroring_Glyph Property
510
#
611
# This file is an informative contributory data file in the
712
# Unicode Character Database.
813
#
9-
# Copyright (c) 1991-2015 Unicode, Inc.
10-
# For terms of use, see http://www.unicode.org/terms_of_use.html
11-
#
1214
# This data file lists characters that have the Bidi_Mirrored=Yes property
1315
# value, for which there is another Unicode character that typically has a glyph
1416
# that is the mirror image of the original character's glyph.
1517
#
16-
# The repertoire covered by the file is Unicode 8.0.0.
18+
# The repertoire covered by the file is Unicode 9.0.0.
1719
#
1820
# The file contains a list of lines with mappings from one code point
1921
# to another one for character-based mirroring.

lib/unicore/Blocks.txt

Lines changed: 15 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
1-
# Blocks-8.0.0.txt
2-
# Date: 2014-11-10, 23:04:00 GMT [KW]
1+
# Blocks-9.0.0.txt
2+
# Date: 2016-02-05, 23:48:00 GMT [KW]
3+
# © 2016 Unicode®, Inc.
4+
# For terms of use, see http://www.unicode.org/terms_of_use.html
35
#
46
# Unicode Character Database
5-
# Copyright (c) 1991-2014 Unicode, Inc.
6-
# For terms of use, see http://www.unicode.org/terms_of_use.html
77
# For documentation, see http://www.unicode.org/reports/tr44/
88
#
99
# Format:
@@ -93,6 +93,7 @@
9393
1BC0..1BFF; Batak
9494
1C00..1C4F; Lepcha
9595
1C50..1C7F; Ol Chiki
96+
1C80..1C8F; Cyrillic Extended-C
9697
1CC0..1CCF; Sundanese Supplement
9798
1CD0..1CFF; Vedic Extensions
9899
1D00..1D7F; Phonetic Extensions
@@ -209,6 +210,7 @@ FFF0..FFFF; Specials
209210
10400..1044F; Deseret
210211
10450..1047F; Shavian
211212
10480..104AF; Osmanya
213+
104B0..104FF; Osage
212214
10500..1052F; Elbasan
213215
10530..1056F; Caucasian Albanian
214216
10600..1077F; Linear A
@@ -243,13 +245,17 @@ FFF0..FFFF; Specials
243245
11280..112AF; Multani
244246
112B0..112FF; Khudawadi
245247
11300..1137F; Grantha
248+
11400..1147F; Newa
246249
11480..114DF; Tirhuta
247250
11580..115FF; Siddham
248251
11600..1165F; Modi
252+
11660..1167F; Mongolian Supplement
249253
11680..116CF; Takri
250254
11700..1173F; Ahom
251255
118A0..118FF; Warang Citi
252256
11AC0..11AFF; Pau Cin Hau
257+
11C00..11C6F; Bhaiksuki
258+
11C70..11CBF; Marchen
253259
12000..123FF; Cuneiform
254260
12400..1247F; Cuneiform Numbers and Punctuation
255261
12480..1254F; Early Dynastic Cuneiform
@@ -260,6 +266,9 @@ FFF0..FFFF; Specials
260266
16AD0..16AFF; Bassa Vah
261267
16B00..16B8F; Pahawh Hmong
262268
16F00..16F9F; Miao
269+
16FE0..16FFF; Ideographic Symbols and Punctuation
270+
17000..187FF; Tangut
271+
18800..18AFF; Tangut Components
263272
1B000..1B0FF; Kana Supplement
264273
1BC00..1BC9F; Duployan
265274
1BCA0..1BCAF; Shorthand Format Controls
@@ -270,7 +279,9 @@ FFF0..FFFF; Specials
270279
1D360..1D37F; Counting Rod Numerals
271280
1D400..1D7FF; Mathematical Alphanumeric Symbols
272281
1D800..1DAAF; Sutton SignWriting
282+
1E000..1E02F; Glagolitic Supplement
273283
1E800..1E8DF; Mende Kikakui
284+
1E900..1E95F; Adlam
274285
1EE00..1EEFF; Arabic Mathematical Alphabetic Symbols
275286
1F000..1F02F; Mahjong Tiles
276287
1F030..1F09F; Domino Tiles

lib/unicore/CJKRadicals.txt

Lines changed: 13 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,28 +1,30 @@
1-
# CJKRadicals-8.0.0.txt
2-
# Date: 2015-02-19, 00:30:00 GMT [RC, KW, LI]
1+
# CJKRadicals-9.0.0.txt
2+
# Date: 2016-01-22, 06:00:00 GMT [RC, KW, LI]
3+
# © 2016 Unicode®, Inc.
4+
# For terms of use, see http://www.unicode.org/terms_of_use.html
35
#
46
# Unicode Character Database
5-
# Copyright (c) 1991-2015 Unicode, Inc.
6-
# For terms of use, see http://www.unicode.org/terms_of_use.html
7-
# For documentation, see UAX #38: Unicode Han Database (Unihan),
8-
# at http://www.unicode.org/reports/tr38/
7+
# For documentation, see http://www.unicode.org/reports/tr44/
98
#
10-
# Mapping from radical numbers to characters.
9+
# Mapping from CJK radical numbers to characters
1110
#
12-
# This data file provides a mapping from the radical numbers used
11+
# This data file provides a mapping from the CJK radical numbers used
1312
# in the kRSUnicode property to the corresponding character in
1413
# the Kangxi Radicals block or the CJK Radicals Supplement block,
1514
# as well as to a CJK unified ideograph which is formed from that
1615
# radical only.
1716
#
18-
# There is one line per radical number. Each line contains three
17+
# There is one line per CJK radical number. Each line contains three
1918
# fields, separated by a semicolon (';'). The first field is the
20-
# radical number. The second field is the CJK radical character.
19+
# CJK radical number. The second field is the CJK radical character.
2120
# The third field is the CJK unified ideograph.
2221
#
23-
# Radical numbers match the regular expression [1-9][0-9]{0,2}\'?
22+
# CJK radical numbers match the regular expression [1-9][0-9]{0,2}\'?
2423
# and in particular they can end with a U+0027 ' APOSTROPHE.
2524
#
25+
# For more information, see UAX #38: Unicode Han Database (Unihan),
26+
# at http://www.unicode.org/reports/tr38/
27+
#
2628
# This file was created for Unicode 5.2 by Richard Cook.
2729
# Updated for Unicode 6.0 by Richard Cook.
2830
# Updated for Unicode 6.1 and 6.2 by Ken Whistler,

0 commit comments

Comments
 (0)