Package jodd.io

Class UnicodeInputStream

  • All Implemented Interfaces:
    java.io.Closeable, java.lang.AutoCloseable

    public class UnicodeInputStream
    extends java.io.InputStream
    Unicode input stream for detecting UTF encodings and reading BOM characters. Detects following BOMs:
    • UTF-8
    • UTF-16BE
    • UTF-16LE
    • UTF-32BE
    • UTF-32LE
    • Constructor Summary

      Constructors 
      Constructor Description
      UnicodeInputStream​(java.io.InputStream in, java.nio.charset.Charset targetEncoding)
      Creates new unicode stream.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void close()
      Closes input stream.
      int getBOMSize()
      Returns BOM size in bytes.
      java.nio.charset.Charset getDetectedEncoding()
      Returns detected UTF encoding or null if no UTF encoding has been detected (i.e.
      protected void init()
      Detects and decodes encoding from BOM character.
      int read()
      Reads byte from the stream.
      • Methods inherited from class java.io.InputStream

        available, mark, markSupported, nullInputStream, read, read, readAllBytes, readNBytes, readNBytes, reset, skip, transferTo
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • BOM_UTF32_BE

        public static final byte[] BOM_UTF32_BE
      • BOM_UTF32_LE

        public static final byte[] BOM_UTF32_LE
      • BOM_UTF8

        public static final byte[] BOM_UTF8
      • BOM_UTF16_BE

        public static final byte[] BOM_UTF16_BE
      • BOM_UTF16_LE

        public static final byte[] BOM_UTF16_LE
    • Constructor Detail

      • UnicodeInputStream

        public UnicodeInputStream​(java.io.InputStream in,
                                  java.nio.charset.Charset targetEncoding)
        Creates new unicode stream. It works in two modes: detect mode and read mode.

        Detect mode is active when target encoding is not specified. In detect mode, it tries to detect encoding from BOM if exist. If BOM doesn't exist, encoding is not detected.

        Read mode is active when target encoding is set. Then this stream reads optional BOM for given encoding. If BOM doesn't exist, nothing is skipped.

    • Method Detail

      • getDetectedEncoding

        public java.nio.charset.Charset getDetectedEncoding()
        Returns detected UTF encoding or null if no UTF encoding has been detected (i.e. no BOM). If stream is not read yet, it will be initalized first.
      • init

        protected void init()
                     throws java.io.IOException
        Detects and decodes encoding from BOM character. Reads ahead four bytes and check for BOM marks. Extra bytes are unread back to the stream, so only BOM bytes are skipped.
        Throws:
        java.io.IOException
      • close

        public void close()
                   throws java.io.IOException
        Closes input stream. If stream was not used, encoding will be unavailable.
        Specified by:
        close in interface java.lang.AutoCloseable
        Specified by:
        close in interface java.io.Closeable
        Overrides:
        close in class java.io.InputStream
        Throws:
        java.io.IOException
      • read

        public int read()
                 throws java.io.IOException
        Reads byte from the stream.
        Specified by:
        read in class java.io.InputStream
        Throws:
        java.io.IOException
      • getBOMSize

        public int getBOMSize()
        Returns BOM size in bytes. Returns -1 if BOM not found.