Name Description Size
bidi.rs This module exposes tooling for running the [unicode bidi algorithm](https://unicode.org/reports/tr9/) using ICU4X data. `BidiClassAdapter` enables ICU4X to provide data to [`unicode-bidi`], an external crate implementing UAX #9. ✨ *Enabled with the `bidi` Cargo feature.* # Examples ``` use icu_properties::bidi::BidiClassAdapter; use icu_properties::maps; use unicode_bidi::BidiInfo; // This example text is defined using `concat!` because some browsers // and text editors have trouble displaying bidi strings. let text = concat!["א", // RTL#1 "ב", // RTL#2 "ג", // RTL#3 "a", // LTR#1 "b", // LTR#2 "c", // LTR#3 ]; // let adapter = BidiClassAdapter::new(maps::bidi_class()); // Resolve embedding levels within the text. Pass `None` to detect the // paragraph level automatically. let bidi_info = BidiInfo::new_with_data_source(&adapter, text, None); // This paragraph has embedding level 1 because its first strong character is RTL. assert_eq!(bidi_info.paragraphs.len(), 1); let para = &bidi_info.paragraphs[0]; assert_eq!(para.level.number(), 1); assert!(para.level.is_rtl()); // Re-ordering is done after wrapping each paragraph into a sequence of // lines. For this example, I'll just use a single line that spans the // entire paragraph. let line = para.range.clone(); let display = bidi_info.reorder_line(para, line); assert_eq!(display, concat!["a", // LTR#1 "b", // LTR#2 "c", // LTR#3 "ג", // RTL#3 "ב", // RTL#2 "א", // RTL#1 ]); ``` 5745
bidi_data.rs Data and APIs for supporting specific Bidi properties data in an efficient structure. Supported properties are: - `Bidi_Paired_Bracket` - `Bidi_Paired_Bracket_Type` - `Bidi_Mirrored` - `Bidi_Mirroring_Glyph` 8806
error.rs 1391
exemplar_chars.rs This module provides APIs for getting exemplar characters for a locale. Exemplars are characters used by a language, separated into different sets. The sets are: main, auxiliary, punctuation, numbers, and index. The sets define, according to typical usage in the language, which characters occur in which contexts with which frequency. For more information, see the documentation in the [Exemplars section in Unicode Technical Standard #35](https://unicode.org/reports/tr35/tr35-general.html#Exemplars) of the LDML specification. # Examples ``` use icu::locid::locale; use icu::properties::exemplar_chars; let locale = locale!("en-001").into(); let data = exemplar_chars::exemplars_main(&locale) .expect("locale should be present"); let exemplars_main = data.as_borrowed(); assert!(exemplars_main.contains_char('a')); assert!(exemplars_main.contains_char('z')); assert!(exemplars_main.contains("a")); assert!(!exemplars_main.contains("ä")); assert!(!exemplars_main.contains("ng")); ``` 8548
lib.rs Definitions of [Unicode Properties] and APIs for retrieving property data in an appropriate data structure. This module is published as its own crate ([`icu_properties`](https://docs.rs/icu_properties/latest/icu_properties/)) and as part of the [`icu`](https://docs.rs/icu/latest/icu/) crate. See the latter for more details on the ICU4X project. APIs that return a [`CodePointSetData`] exist for binary properties and certain enumerated properties. See the [`sets`] module for more details. APIs that return a [`CodePointMapData`] exist for certain enumerated properties. See the [`maps`] module for more details. # Examples ## Property data as `CodePointSetData`s ``` use icu::properties::{maps, sets, GeneralCategory}; // A binary property as a `CodePointSetData` assert!(sets::emoji().contains('🎃')); // U+1F383 JACK-O-LANTERN assert!(!sets::emoji().contains('木')); // U+6728 // An individual enumerated property value as a `CodePointSetData` let line_sep_data = maps::general_category() .get_set_for_value(GeneralCategory::LineSeparator); let line_sep = line_sep_data.as_borrowed(); assert!(line_sep.contains32(0x2028)); assert!(!line_sep.contains32(0x2029)); ``` ## Property data as `CodePointMapData`s ``` use icu::properties::{maps, Script}; assert_eq!(maps::script().get('🎃'), Script::Common); // U+1F383 JACK-O-LANTERN assert_eq!(maps::script().get('木'), Script::Han); // U+6728 ``` [`ICU4X`]: ../icu/index.html [Unicode Properties]: https://unicode-org.github.io/icu/userguide/strings/properties.html [`CodePointSetData`]: crate::sets::CodePointSetData [`CodePointMapData`]: crate::maps::CodePointMapData [`sets`]: crate::sets 3843
maps.rs The functions in this module return a [`CodePointMapData`] representing, for each code point in the entire range of code points, the property values for a particular Unicode property. The descriptions of most properties are taken from [`TR44`], the documentation for the Unicode Character Database. [`TR44`]: https://www.unicode.org/reports/tr44 22814
props.rs A collection of property definitions shared across contexts (ex: representing trie values). This module defines enums / newtypes for enumerated properties. String properties are represented as newtypes if their values represent code points. 111166
provider
provider.rs 🚧 \[Unstable\] Data provider struct definitions for this ICU4X component. <div class="stab unstable"> 🚧 This code is considered unstable; it may change at any time, in breaking or non-breaking ways, including in SemVer minor releases. While the serde representation of data structs is guaranteed to be stable, their Rust representation might not be. Use with caution. </div> Read more about data providers: [`icu_provider`] 38269
runtime.rs 🚧 \[Experimental\] This module is experimental and currently crate-private. Let us know if you have a use case for this! This module contains utilities for working with properties where the specific property in use is not known at compile time. For regex engines, [`crate::sets::load_for_ecma262_unstable()`] is a convenient API for working with properties at runtime tailored for the use case of ECMA262-compatible regex engines. 18612
script.rs Data and APIs for supporting both Script and Script_Extensions property values in an efficient structure. 25396
sets.rs The functions in this module return a [`CodePointSetData`] containing the set of characters with a particular Unicode property. The descriptions of most properties are taken from [`TR44`], the documentation for the Unicode Character Database. Some properties are instead defined in [`TR18`], the documentation for Unicode regular expressions. In particular, Annex C of this document defines properties for POSIX compatibility. [`CodePointSetData`]: crate::sets::CodePointSetData [`TR44`]: https://www.unicode.org/reports/tr44 [`TR18`]: https://www.unicode.org/reports/tr18 84655
trievalue.rs 7315