Sorry, you need to enable JavaScript to visit this website.

GraphBuilder 1.0 Beta Release

BY 01 Staff (not verified) ON Dec 04, 2012

We are pleased to announce the initial beta release of GraphBuilder, our first public release of the code base.

About GraphBuilder

Data relationships play a vital role in various data analytics, structured machine learning, and data mining applications. These relationships can be expressed as graphs. GraphBuilder is a Java library for preparing graphs from large volumes of unstructured and semi-structured data.  

Processing large graphs in a distributed computing environment presents multiple system-level challenges, from parallelizing the graph construction algorithms to achieving balanced system utilization. In addition, the resultant graph must be carefully partitioned to ensure that subsequent application processing is properly balanced. GraphBuilder addresses these challenges with a carefully-designed multi-stage pipeline that leverages the Apache Hadoop framework to achieve scalability.

We are using GraphBuilder to carry out systems research at Intel Labs. We hope that you too will find this tool to be useful for your research and commercial use!



The GraphBuilder beta release supports the following features:

  • Graph Construction - Feature extractor for XML and a clean interface for plug-in custom feature extractors.
  • Graph Cleaning - Options to eliminate dangling, self, and duplicate edges.
  • Graph Transformation - Options for edge directionality conversion.
  • Graph Normalization - Compression technique for sparse graph labels.
  • Graph Partitioning - Scheme to ensure even distribution of graph data for a balanced distributed computation (supports vertex partitioning scheme for GraphLab 2.1)
  • Graph Serialization - Supports JSON Serialization
  • Algorithms that are optimized for natural graphs, which possess scale-free structure and power-law degree distribution.


Apache 2 license 

Project Details

Mailing list:

The development GraphBuilder git repositories are hosted at GitHub.

Note: We will not use any of the integrate, merge, or tracking functions of GitHub, so please continue to use the provided lists.

Our thanks

For this release, we would like to give a special thanks to the following contributors, and all contributors to GraphBuilder. Not in any particular order:
Haijie (Jay) Gu [CMU], Diana Hu, Ted Willke, Frank Berry, Joseph Gonzalez [UCB], Yucheng Low [UW], Danny Bickson [CMU]. 
Current v1.0 goals under consideration are:
Continue improvements to beta release for robustness 
New features, and different framework support