Gelly is a Graph API for Flink. It contains a set of methods and utilities which aim to simplify the development of graph analysis applications in Flink. In Gelly, graphs can be transformed and modified using high-level functions similar to the ones provided by the batch processing API. Gelly provides methods to create, transform and modify graphs, as well as a library of graph algorithms.
Gelly is currently part of the libraries Maven project. All relevant classes are located in the org.apache.flink.graph package.
Add the following dependency to your
pom.xml to use Gelly.
<dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-gelly_2.10</artifactId> <version>1.2.0</version> </dependency>
<dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-gelly-scala_2.10</artifactId> <version>1.2.0</version> </dependency>
Note that Gelly is not part of the binary distribution. See linking for instructions on packaging Gelly libraries into Flink user programs.
The remaining sections provide a description of available methods and present several examples of how to use Gelly and how to mix it with the Flink DataSet API.
To run the Gelly examples the flink-gelly (for Java) or flink-gelly-scala (for Scala) jar must be copied to Flink’s lib directory.
cp opt/flink-gelly_*.jar lib/ cp opt/flink-gelly-scala_*.jar lib/
Gelly’s examples jar includes both drivers for the library methods as well as additional example algorithms. After configuring and starting the cluster, list the available algorithm classes:
./bin/start-cluster.sh ./bin/flink run opt/flink-gelly-examples_*.jar
The Gelly drivers can generate RMat graph data or read the edge list from a CSV file. Each node in a cluster must have access to the input file. Calculate graph metrics on a directed generated graph:
./bin/flink run -c org.apache.flink.graph.drivers.GraphMetrics opt/flink-gelly-examples_*.jar \ --directed true --input rmat
The size of the graph is adjusted by the --scale and --edge_factor parameters. The library generator provides access to additional configuration to adjust the power-law skew and random noise.
wget -O - http://snap.stanford.edu/data/bigdata/communities/com-lj.ungraph.txt.gz | gunzip -c > com-lj.ungraph.txt ./bin/flink run -q -c org.apache.flink.graph.drivers.GraphMetrics opt/flink-gelly-examples_*.jar \ --directed true --input csv --type integer --input_filename com-lj.ungraph.txt --input_field_delimiter '\t' ./bin/flink run -q -c org.apache.flink.graph.drivers.ClusteringCoefficient opt/flink-gelly-examples_*.jar \ --directed true --input csv --type integer --input_filename com-lj.ungraph.txt --input_field_delimiter '\t' \ --output hash ./bin/flink run -q -c org.apache.flink.graph.drivers.JaccardIndex opt/flink-gelly-examples_*.jar \ --input csv --type integer --simplify true --input_filename com-lj.ungraph.txt --input_field_delimiter '\t' \ --output hash