Catalogs provide metadata, such as databases, tables, partitions, views, and functions and information needed to access data stored in a database or other external systems.
One of the most crucial aspects of data processing is managing metadata.
It may be transient metadata like temporary tables, or UDFs registered against the table environment.
Or permanent metadata, like that in a Hive Metastore. Catalogs provide a unified API for managing metadata and making it accessible from the Table API and SQL Queries.
Catalog enables users to reference existing metadata in their data systems, and automatically maps them to Flink’s corresponding metadata.
For example, Flink can map JDBC tables to Flink table automatically, and users don’t have to manually re-writing DDLs in Flink.
Catalog greatly simplifies steps required to get started with Flink with users’ existing system, and greatly enhanced user experiences.
The GenericInMemoryCatalog is an in-memory implementation of a catalog. All objects will be available only for the lifetime of the session.
The JdbcCatalog enables users to connect Flink to relational databases over JDBC protocol. PostgresCatalog is the only implementation of JDBC Catalog at the moment.
See JdbcCatalog documentation for more details on setting up the catalog.
The HiveCatalog serves two purposes; as persistent storage for pure Flink metadata, and as an interface for reading and writing existing Hive metadata.
Flink’s Hive documentation provides full details on setting up the catalog and interfacing with an existing Hive installation.
Warning The Hive Metastore stores all meta-object names in lower case. This is unlike GenericInMemoryCatalog which is case-sensitive
Catalogs are pluggable and users can develop custom catalogs by implementing the Catalog interface.
To use custom catalogs in SQL CLI, users should develop both a catalog and its corresponding catalog factory by implementing the CatalogFactory interface.
The catalog factory defines a set of properties for configuring the catalog when the SQL CLI bootstraps.
The set of properties will be passed to a discovery service where the service tries to match the properties to a CatalogFactory and initiate a corresponding catalog instance.
How to Create and Register Flink Tables to Catalog
Using SQL DDL
Users can use SQL DDL to create tables in catalogs in both Table API and SQL.
Users can use Java or Scala to create catalog tables programmatically.
Note: only catalog program APIs are listed here. Users can achieve many of the same funtionalities with SQL DDL.
For detailed DDL information, please refer to SQL CREATE DDL.
Table API and SQL for Catalog
Registering a Catalog
Users have access to a default in-memory catalog named default_catalog, that is always created by default. This catalog by default has a single database called default_database.
Users can also register additional catalogs into an existing Flink session.
All catalogs defined using YAML must provide a type property that specifies the type of catalog.
The following types are supported out of the box.
Changing the Current Catalog And Database
Flink will always search for tables, views, and UDF’s in the current catalog and database.
Metadata from catalogs that are not the current catalog are accessible by providing fully qualified names in the form catalog.database.object.