This documentation is for an out-of-date version of Apache Flink. We recommend you use the latest stable version.

Orc Format

Format: Serialization Schema Format: Deserialization Schema

The Apache Orc format allows to read and write Orc data.

Dependencies

In order to setup the Orc format, the following table provides dependency information for both projects using a build automation tool (such as Maven or SBT) and SQL Client with SQL JAR bundles.

Maven dependency SQL Client JAR
flink-orc_2.11 Download

How to create a table with Orc format

Here is an example to create a table using Filesystem connector and Orc format.

CREATE TABLE user_behavior (
  user_id BIGINT,
  item_id BIGINT,
  category_id BIGINT,
  behavior STRING,
  ts TIMESTAMP(3),
  dt STRING
) PARTITIONED BY (dt) WITH (
 'connector' = 'filesystem',
 'path' = '/tmp/user_behavior',
 'format' = 'orc'
)

Format Options

Option Required Default Type Description
format
required (none) String Specify what format to use, here should be 'orc'.

Orc format also supports table properties from Table properties. For example, you can configure orc.compress=SNAPPY to enable snappy compression.

Data Type Mapping

Orc format type mapping is compatible with Apache Hive. The following table lists the type mapping from Flink type to Orc type.

Flink Data Type Orc physical type Orc logical type
CHAR bytes CHAR
VARCHAR bytes VARCHAR
STRING bytes STRING
BOOLEAN long BOOLEAN
BYTES bytes BINARY
DECIMAL decimal DECIMAL
TINYINT long BYTE
SMALLINT long SHORT
INT long INT
BIGINT long LONG
FLOAT double FLOAT
DOUBLE double DOUBLE
DATE long DATE
TIMESTAMP timestamp TIMESTAMP

Attention Composite data type: Array, Map and Row are not supported.