Bouncy Castle Cryptography Library 1.77.0

org.bouncycastle.util.encoders
Class UTF8

java.lang.Object
  |
  +--org.bouncycastle.util.encoders.UTF8

public class UTF8
extends java.lang.Object

Utilities for working with UTF-8 encodings.

Decoding of UTF-8 is based on a presentation by Bob Steagall at CppCon2018 (see https://github.com/BobSteagall/CppCon2018). It uses a Deterministic Finite Automaton (DFA) to recognize and decode multi-byte code points.


Constructor Summary
UTF8()
           
 
Method Summary
static int transcodeToUTF16(byte[] utf8, char[] utf16)
          Transcode a UTF-8 encoding into a UTF-16 representation.
static int transcodeToUTF16(byte[] utf8, int utf8Off, int utf8Length, char[] utf16)
          Transcode a UTF-8 encoding into a UTF-16 representation.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

UTF8

public UTF8()
Method Detail

transcodeToUTF16

public static int transcodeToUTF16(byte[] utf8,
                                   char[] utf16)
Transcode a UTF-8 encoding into a UTF-16 representation. In the general case the output {@code utf16} array should be at least as long as the input {@code utf8} one to handle arbitrary inputs. The number of output UTF-16 code units is returned, or -1 if any errors are encountered (in which case an arbitrary amount of data may have been written into the output array). Errors that will be detected are malformed UTF-8, including incomplete, truncated or "overlong" encodings, and unmappable code points. In particular, no unmatched surrogates will be produced. An error will also result if {@code utf16} is found to be too small to store the complete output.
Parameters:
utf8 - A non-null array containing a well-formed UTF-8 encoding.
utf16 - A non-null array, at least as long as the {@code utf8} array in order to ensure the output will fit.
Returns:
The number of UTF-16 code units written to {@code utf16} (beginning from index 0), or else -1 if the input was either malformed or encoded any unmappable characters, or if the {@code utf16} is too small.

transcodeToUTF16

public static int transcodeToUTF16(byte[] utf8,
                                   int utf8Off,
                                   int utf8Length,
                                   char[] utf16)
Transcode a UTF-8 encoding into a UTF-16 representation. In the general case the output {@code utf16} array should be at least as long as the input length from {@code utf8} to handle arbitrary inputs. The number of output UTF-16 code units is returned, or -1 if any errors are encountered (in which case an arbitrary amount of data may have been written into the output array). Errors that will be detected are malformed UTF-8, including incomplete, truncated or "overlong" encodings, and unmappable code points. In particular, no unmatched surrogates will be produced. An error will also result if {@code utf16} is found to be too small to store the complete output.
Parameters:
utf8 - A non-null array containing a well-formed UTF-8 encoding.
utf8Off - start position in the array for the well-formed encoding.
utf8Length - length in bytes of the well-formed encoding.
utf16 - A non-null array, at least as long as the {@code utf8} array in order to ensure the output will fit.
Returns:
The number of UTF-16 code units written to {@code utf16} (beginning from index 0), or else -1 if the input was either malformed or encoded any unmappable characters, or if the {@code utf16} is too small.

Bouncy Castle Cryptography Library 1.77.0