replace any previously applied set of allowed characters.
*
* Adjustments, such as additions or deletions of certain classes of characters,
* can be made to the result of uspoof_setAllowedLocales() by
* fetching the resulting set with uspoof_getAllowedChars(),
* manipulating it with the Unicode Set API, then resetting the
* spoof detectors limits with uspoof_setAllowedChars()
*
* @param sc The USpoofChecker
* @param localesList A list list of locales, from which the language
* and associated script are extracted. The locales
* are comma-separated if there is more than one.
* White space may not appear within an individual locale,
* but is ignored otherwise.
* The locales are syntactically like those from the
* HTTP Accept-Language header.
* If the localesList is empty, no restrictions will be placed on
* the allowed characters.
*
* @param status The error code, set if this function encounters a problem.
* @stable ICU 4.2
*/
U_STABLE void U_EXPORT2
uspoof_setAllowedLocales(USpoofChecker *sc, const char *localesList, UErrorCode *status);
/**
* Get a list of locales for the scripts that are acceptable in strings
* to be checked. If no limitations on scripts have been specified,
* an empty string will be returned.
*
* uspoof_setAllowedChars() will reset the list of allowed to be empty.
*
* The format of the returned list is the same as that supplied to
* uspoof_setAllowedLocales(), but returned list may not be identical
* to the originally specified string; the string may be reformatted,
* and information other than languages from
* the originally specified locales may be omitted.
*
* @param sc The USpoofChecker
* @param status The error code, set if this function encounters a problem.
* @return A string containing a list of locales corresponding
* to the acceptable scripts, formatted like an
* HTTP Accept Language value.
*
* @stable ICU 4.2
*/
U_STABLE const char * U_EXPORT2
uspoof_getAllowedLocales(USpoofChecker *sc, UErrorCode *status);
/**
* Limit the acceptable characters to those specified by a Unicode Set.
* Any previously specified character limit is
* is replaced by the new settings. This includes limits on
* characters that were set with the uspoof_setAllowedLocales() function.
*
* The USPOOF_CHAR_LIMIT test is automatically enabled for this
* USpoofChecker by this function.
*
* @param sc The USpoofChecker
* @param chars A Unicode Set containing the list of
* characters that are permitted. Ownership of the set
* remains with the caller. The incoming set is cloned by
* this function, so there are no restrictions on modifying
* or deleting the USet after calling this function.
* @param status The error code, set if this function encounters a problem.
* @stable ICU 4.2
*/
U_STABLE void U_EXPORT2
uspoof_setAllowedChars(USpoofChecker *sc, const USet *chars, UErrorCode *status);
/**
* Get a USet for the characters permitted in an identifier.
* This corresponds to the limits imposed by the Set Allowed Characters
* functions. Limitations imposed by other checks will not be
* reflected in the set returned by this function.
*
* The returned set will be frozen, meaning that it cannot be modified
* by the caller.
*
* Ownership of the returned set remains with the Spoof Detector. The
* returned set will become invalid if the spoof detector is closed,
* or if a new set of allowed characters is specified.
*
*
* @param sc The USpoofChecker
* @param status The error code, set if this function encounters a problem.
* @return A USet containing the characters that are permitted by
* the USPOOF_CHAR_LIMIT test.
* @stable ICU 4.2
*/
U_STABLE const USet * U_EXPORT2
uspoof_getAllowedChars(const USpoofChecker *sc, UErrorCode *status);
#if U_SHOW_CPLUSPLUS_API
/**
* Limit the acceptable characters to those specified by a Unicode Set.
* Any previously specified character limit is
* is replaced by the new settings. This includes limits on
* characters that were set with the uspoof_setAllowedLocales() function.
*
* The USPOOF_CHAR_LIMIT test is automatically enabled for this
* USoofChecker by this function.
*
* @param sc The USpoofChecker
* @param chars A Unicode Set containing the list of
* characters that are permitted. Ownership of the set
* remains with the caller. The incoming set is cloned by
* this function, so there are no restrictions on modifying
* or deleting the USet after calling this function.
* @param status The error code, set if this function encounters a problem.
* @stable ICU 4.2
*/
U_STABLE void U_EXPORT2
uspoof_setAllowedUnicodeSet(USpoofChecker *sc, const icu::UnicodeSet *chars, UErrorCode *status);
/**
* Get a UnicodeSet for the characters permitted in an identifier.
* This corresponds to the limits imposed by the Set Allowed Characters /
* UnicodeSet functions. Limitations imposed by other checks will not be
* reflected in the set returned by this function.
*
* The returned set will be frozen, meaning that it cannot be modified
* by the caller.
*
* Ownership of the returned set remains with the Spoof Detector. The
* returned set will become invalid if the spoof detector is closed,
* or if a new set of allowed characters is specified.
*
*
* @param sc The USpoofChecker
* @param status The error code, set if this function encounters a problem.
* @return A UnicodeSet containing the characters that are permitted by
* the USPOOF_CHAR_LIMIT test.
* @stable ICU 4.2
*/
U_STABLE const icu::UnicodeSet * U_EXPORT2
uspoof_getAllowedUnicodeSet(const USpoofChecker *sc, UErrorCode *status);
#endif
/**
* Check the specified string for possible security issues.
* The text to be checked will typically be an identifier of some sort.
* The set of checks to be performed is specified with uspoof_setChecks().
*
* @param sc The USpoofChecker
* @param text The string to be checked for possible security issues,
* in UTF-16 format.
* @param length the length of the string to be checked, expressed in
* 16 bit UTF-16 code units, or -1 if the string is
* zero terminated.
* @param position An out parameter that receives the index of the
* first string position that fails the allowed character
* limitation checks.
* This parameter may be null if the position information
* is not needed.
* If the string passes the requested checks the
* parameter value will not be set.
* @param status The error code, set if an error occurred while attempting to
* perform the check.
* Spoofing or security issues detected with the input string are
* not reported here, but through the function's return value.
* @return An integer value with bits set for any potential security
* or spoofing issues detected. The bits are defined by
* enum USpoofChecks. Zero is returned if no issues
* are found with the input string.
* @stable ICU 4.2
*/
U_STABLE int32_t U_EXPORT2
uspoof_check(const USpoofChecker *sc,
const UChar *text, int32_t length,
int32_t *position,
UErrorCode *status);
/**
* Check the specified string for possible security issues.
* The text to be checked will typically be an identifier of some sort.
* The set of checks to be performed is specified with uspoof_setChecks().
*
* @param sc The USpoofChecker
* @param text A UTF-8 string to be checked for possible security issues.
* @param length the length of the string to be checked, or -1 if the string is
* zero terminated.
* @param position An out parameter that receives the index of the
* first string position that fails the allowed character
* limitation checks.
* This parameter may be null if the position information
* is not needed.
* If the string passes the requested checks the
* parameter value will not be set.
* @param status The error code, set if an error occurred while attempting to
* perform the check.
* Spoofing or security issues detected with the input string are
* not reported here, but through the function's return value.
* If the input contains invalid UTF-8 sequences,
* a status of U_INVALID_CHAR_FOUND will be returned.
* @return An integer value with bits set for any potential security
* or spoofing issues detected. The bits are defined by
* enum USpoofChecks. Zero is returned if no issues
* are found with the input string.
* @stable ICU 4.2
*/
U_STABLE int32_t U_EXPORT2
uspoof_checkUTF8(const USpoofChecker *sc,
const char *text, int32_t length,
int32_t *position,
UErrorCode *status);
#if U_SHOW_CPLUSPLUS_API
/**
* Check the specified string for possible security issues.
* The text to be checked will typically be an identifier of some sort.
* The set of checks to be performed is specified with uspoof_setChecks().
*
* @param sc The USpoofChecker
* @param text A UnicodeString to be checked for possible security issues.
* @param position An out parameter that receives the index of the
* first string position that fails the allowed character
* limitation checks.
* This parameter may be null if the position information
* is not needed.
* If the string passes the requested checks the
* parameter value will not be set.
* @param status The error code, set if an error occurred while attempting to
* perform the check.
* Spoofing or security issues detected with the input string are
* not reported here, but through the function's return value.
* @return An integer value with bits set for any potential security
* or spoofing issues detected. The bits are defined by
* enum USpoofChecks. Zero is returned if no issues
* are found with the input string.
* @stable ICU 4.2
*/
U_STABLE int32_t U_EXPORT2
uspoof_checkUnicodeString(const USpoofChecker *sc,
const icu::UnicodeString &text,
int32_t *position,
UErrorCode *status);
#endif
/**
* Check the whether two specified strings are visually confusable.
* The types of confusability to be tested - single script, mixed script,
* or whole script - are determined by the check options set for the
* USpoofChecker.
*
* The tests to be performed are controlled by the flags
* USPOOF_SINGLE_SCRIPT_CONFUSABLE
* USPOOF_MIXED_SCRIPT_CONFUSABLE
* USPOOF_WHOLE_SCRIPT_CONFUSABLE
* At least one of these tests must be selected.
*
* USPOOF_ANY_CASE is a modifier for the tests. Select it if the identifiers
* may be of mixed case.
* If identifiers are case folded for comparison and
* display to the user, do not select the USPOOF_ANY_CASE option.
*
*
* @param sc The USpoofChecker
* @param s1 The first of the two strings to be compared for
* confusability. The strings are in UTF-16 format.
* @param length1 the length of the first string, expressed in
* 16 bit UTF-16 code units, or -1 if the string is
* zero terminated.
* @param s2 The second of the two strings to be compared for
* confusability. The strings are in UTF-16 format.
* @param length2 The length of the second string, expressed in
* 16 bit UTF-16 code units, or -1 if the string is
* zero terminated.
* @param status The error code, set if an error occurred while attempting to
* perform the check.
* Confusability of the strings is not reported here,
* but through this function's return value.
* @return An integer value with bit(s) set corresponding to
* the type of confusability found, as defined by
* enum USpoofChecks. Zero is returned if the strings
* are not confusable.
* @stable ICU 4.2
*/
U_STABLE int32_t U_EXPORT2
uspoof_areConfusable(const USpoofChecker *sc,
const UChar *s1, int32_t length1,
const UChar *s2, int32_t length2,
UErrorCode *status);
/**
* Check the whether two specified strings are visually confusable.
* The types of confusability to be tested - single script, mixed script,
* or whole script - are determined by the check options set for the
* USpoofChecker.
*
* @param sc The USpoofChecker
* @param s1 The first of the two strings to be compared for
* confusability. The strings are in UTF-8 format.
* @param length1 the length of the first string, in bytes, or -1
* if the string is zero terminated.
* @param s2 The second of the two strings to be compared for
* confusability. The strings are in UTF-18 format.
* @param length2 The length of the second string in bytes, or -1
* if the string is zero terminated.
* @param status The error code, set if an error occurred while attempting to
* perform the check.
* Confusability of the strings is not reported here,
* but through this function's return value.
* @return An integer value with bit(s) set corresponding to
* the type of confusability found, as defined by
* enum USpoofChecks. Zero is returned if the strings
* are not confusable.
* @stable ICU 4.2
*/
U_STABLE int32_t U_EXPORT2
uspoof_areConfusableUTF8(const USpoofChecker *sc,
const char *s1, int32_t length1,
const char *s2, int32_t length2,
UErrorCode *status);
#if U_SHOW_CPLUSPLUS_API
/**
* Check the whether two specified strings are visually confusable.
* The types of confusability to be tested - single script, mixed script,
* or whole script - are determined by the check options set for the
* USpoofChecker.
*
* @param sc The USpoofChecker
* @param s1 The first of the two strings to be compared for
* confusability. The strings are in UTF-8 format.
* @param s2 The second of the two strings to be compared for
* confusability. The strings are in UTF-18 format.
* @param status The error code, set if an error occurred while attempting to
* perform the check.
* Confusability of the strings is not reported here,
* but through this function's return value.
* @return An integer value with bit(s) set corresponding to
* the type of confusability found, as defined by
* enum USpoofChecks. Zero is returned if the strings
* are not confusable.
* @stable ICU 4.2
*/
U_STABLE int32_t U_EXPORT2
uspoof_areConfusableUnicodeString(const USpoofChecker *sc,
const icu::UnicodeString &s1,
const icu::UnicodeString &s2,
UErrorCode *status);
#endif
/**
* Get the "skeleton" for an identifier string.
* Skeletons are a transformation of the input string;
* Two strings are confusable if their skeletons are identical.
* See Unicode UAX 39 for additional information.
*
* Using skeletons directly makes it possible to quickly check
* whether an identifier is confusable with any of some large
* set of existing identifiers, by creating an efficiently
* searchable collection of the skeletons.
*
* @param sc The USpoofChecker
* @param type The type of skeleton, corresponding to which
* of the Unicode confusable data tables to use.
* The default is Mixed-Script, Lowercase.
* Allowed options are USPOOF_SINGLE_SCRIPT_CONFUSABLE and
* USPOOF_ANY_CASE_CONFUSABLE. The two flags may be ORed.
* @param s The input string whose skeleton will be computed.
* @param length The length of the input string, expressed in 16 bit
* UTF-16 code units, or -1 if the string is zero terminated.
* @param dest The output buffer, to receive the skeleton string.
* @param destCapacity The length of the output buffer, in 16 bit units.
* The destCapacity may be zero, in which case the function will
* return the actual length of the skeleton.
* @param status The error code, set if an error occurred while attempting to
* perform the check.
* @return The length of the skeleton string. The returned length
* is always that of the complete skeleton, even when the
* supplied buffer is too small (or of zero length)
*
* @stable ICU 4.2
*/
U_STABLE int32_t U_EXPORT2
uspoof_getSkeleton(const USpoofChecker *sc,
uint32_t type,
const UChar *s, int32_t length,
UChar *dest, int32_t destCapacity,
UErrorCode *status);
/**
* Get the "skeleton" for an identifier string.
* Skeletons are a transformation of the input string;
* Two strings are confusable if their skeletons are identical.
* See Unicode UAX 39 for additional information.
*
* Using skeletons directly makes it possible to quickly check
* whether an identifier is confusable with any of some large
* set of existing identifiers, by creating an efficiently
* searchable collection of the skeletons.
*
* @param sc The USpoofChecker
* @param type The type of skeleton, corresponding to which
* of the Unicode confusable data tables to use.
* The default is Mixed-Script, Lowercase.
* Allowed options are USPOOF_SINGLE_SCRIPT_CONFUSABLE and
* USPOOF_ANY_CASE. The two flags may be ORed.
* @param s The UTF-8 format input string whose skeleton will be computed.
* @param length The length of the input string, in bytes,
* or -1 if the string is zero terminated.
* @param dest The output buffer, to receive the skeleton string.
* @param destCapacity The length of the output buffer, in bytes.
* The destCapacity may be zero, in which case the function will
* return the actual length of the skeleton.
* @param status The error code, set if an error occurred while attempting to
* perform the check. Possible Errors include U_INVALID_CHAR_FOUND
* for invalid UTF-8 sequences, and
* U_BUFFER_OVERFLOW_ERROR if the destination buffer is too small
* to hold the complete skeleton.
* @return The length of the skeleton string, in bytes. The returned length
* is always that of the complete skeleton, even when the
* supplied buffer is too small (or of zero length)
*
* @stable ICU 4.2
*/
U_STABLE int32_t U_EXPORT2
uspoof_getSkeletonUTF8(const USpoofChecker *sc,
uint32_t type,
const char *s, int32_t length,
char *dest, int32_t destCapacity,
UErrorCode *status);
#if U_SHOW_CPLUSPLUS_API
/**
* Get the "skeleton" for an identifier string.
* Skeletons are a transformation of the input string;
* Two strings are confusable if their skeletons are identical.
* See Unicode UAX 39 for additional information.
*
* Using skeletons directly makes it possible to quickly check
* whether an identifier is confusable with any of some large
* set of existing identifiers, by creating an efficiently
* searchable collection of the skeletons.
*
* @param sc The USpoofChecker.
* @param type The type of skeleton, corresponding to which
* of the Unicode confusable data tables to use.
* The default is Mixed-Script, Lowercase.
* Allowed options are USPOOF_SINGLE_SCRIPT_CONFUSABLE and
* USPOOF_ANY_CASE_CONFUSABLE. The two flags may be ORed.
* @param s The input string whose skeleton will be computed.
* @param dest The output string, to receive the skeleton string.
* @param status The error code, set if an error occurred while attempting to
* perform the check.
* @return A reference to the destination (skeleton) string.
*
* @stable ICU 4.2
*/
U_I18N_API icu::UnicodeString & U_EXPORT2
uspoof_getSkeletonUnicodeString(const USpoofChecker *sc,
uint32_t type,
const icu::UnicodeString &s,
icu::UnicodeString &dest,
UErrorCode *status);
#endif /* U_SHOW_CPLUSPLUS_API */
/**
* Serialize the data for a spoof detector into a chunk of memory.
* The flattened spoof detection tables can later be used to efficiently
* instantiate a new Spoof Detector.
*
* @param sc the Spoof Detector whose data is to be serialized.
* @param data a pointer to 32-bit-aligned memory to be filled with the data,
* can be NULL if capacity==0
* @param capacity the number of bytes available at data,
* or 0 for preflighting
* @param status an in/out ICU UErrorCode; possible errors include:
* - U_BUFFER_OVERFLOW_ERROR if the data storage block is too small for serialization
* - U_ILLEGAL_ARGUMENT_ERROR the data or capacity parameters are bad
* @return the number of bytes written or needed for the spoof data
*
* @see utrie2_openFromSerialized()
* @stable ICU 4.2
*/
U_STABLE int32_t U_EXPORT2
uspoof_serialize(USpoofChecker *sc,
void *data, int32_t capacity,
UErrorCode *status);
#endif
#endif /* USPOOF_H */