org.bouncycastle.util.encoders
Class UTF8
java.lang.Object
|
+--org.bouncycastle.util.encoders.UTF8
- public class UTF8
- extends java.lang.Object
Utilities for working with UTF-8 encodings.
Decoding of UTF-8 is based on a presentation by Bob Steagall at CppCon2018 (see
https://github.com/BobSteagall/CppCon2018). It uses a Deterministic Finite Automaton (DFA) to
recognize and decode multi-byte code points.
Constructor Summary |
UTF8()
|
Method Summary |
static int |
transcodeToUTF16(byte[] utf8,
char[] utf16)
Transcode a UTF-8 encoding into a UTF-16 representation. |
static int |
transcodeToUTF16(byte[] utf8,
int utf8Off,
int utf8Length,
char[] utf16)
Transcode a UTF-8 encoding into a UTF-16 representation. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
UTF8
public UTF8()
transcodeToUTF16
public static int transcodeToUTF16(byte[] utf8,
char[] utf16)
- Transcode a UTF-8 encoding into a UTF-16 representation. In the general case the output
{@code utf16} array should be at least as long as the input {@code utf8} one to handle
arbitrary inputs. The number of output UTF-16 code units is returned, or -1 if any errors are
encountered (in which case an arbitrary amount of data may have been written into the output
array). Errors that will be detected are malformed UTF-8, including incomplete, truncated or
"overlong" encodings, and unmappable code points. In particular, no unmatched surrogates will
be produced. An error will also result if {@code utf16} is found to be too small to store the
complete output.
- Parameters:
utf8
- A non-null array containing a well-formed UTF-8 encoding.utf16
- A non-null array, at least as long as the {@code utf8} array in order to ensure
the output will fit.- Returns:
- The number of UTF-16 code units written to {@code utf16} (beginning from index 0), or
else -1 if the input was either malformed or encoded any unmappable characters, or if
the {@code utf16} is too small.
transcodeToUTF16
public static int transcodeToUTF16(byte[] utf8,
int utf8Off,
int utf8Length,
char[] utf16)
- Transcode a UTF-8 encoding into a UTF-16 representation. In the general case the output
{@code utf16} array should be at least as long as the input length from {@code utf8} to handle
arbitrary inputs. The number of output UTF-16 code units is returned, or -1 if any errors are
encountered (in which case an arbitrary amount of data may have been written into the output
array). Errors that will be detected are malformed UTF-8, including incomplete, truncated or
"overlong" encodings, and unmappable code points. In particular, no unmatched surrogates will
be produced. An error will also result if {@code utf16} is found to be too small to store the
complete output.
- Parameters:
utf8
- A non-null array containing a well-formed UTF-8 encoding.utf8Off
- start position in the array for the well-formed encoding.utf8Length
- length in bytes of the well-formed encoding.utf16
- A non-null array, at least as long as the {@code utf8} array in order to ensure
the output will fit.- Returns:
- The number of UTF-16 code units written to {@code utf16} (beginning from index 0), or
else -1 if the input was either malformed or encoded any unmappable characters, or if
the {@code utf16} is too small.