Graph computation

2014-12-17 — 2020-09-19

Suspiciously similar content

Engines for calculating things on or about graphs. Which is everything, in a trivial case, but usually when we talk about graph computation we mean things that are more simply or more elegantly represented as graphs, which usually implies having some kind of sparsity in edges.

For specific applications of such computations, see e.g. graphical models or complex networks, inference on social graphs etc. Sometimes we use a linear algebra representation of a graph connectivity pattern, e.g. a graph Laplacian, and in that context some graph algorithms can be represented as matrix factorisation or similar problems.

A prime example of such a thing is Laplacians.jl (Julia):

Laplacians is a package containing graph algorithms, with an emphasis on tasks related to spectral and algebraic graph theory. It contains (and will contain more) code for solving systems of linear equations in graph Laplacians, low stretch spanning trees, sparsification, clustering, local clustering, and optimization on graphs.

All graphs are represented by sparse adjacency matrices. This is both for speed, and because our main concerns are algebraic tasks. It does not handle dynamic graphs. It would be very slow to implement dynamic graphs this way.

The other options that follow are more general creatures than this.

Peregrine: A Pattern-Aware Graph Mining System
SNAP.py is engineered to do Stuff To Large Networks in Python. It plugs into snapvx, a ‘nearly-convex’ solver for graph-like data.
Apache Giraph is a large-scale graph processor.

Apache Giraph is an iterative graph processing system built for high scalability. For example, it is currently used at Facebook to analyze the social graph formed by users and their connections. Giraph originated as the open-source counterpart to Pregel, the graph processing architecture developed at Google and described in a 2010 paper. Both systems are inspired by the Bulk Synchronous Parallel model of distributed computation introduced by Leslie Valiant. Giraph adds several features beyond the basic Pregel model, including master computation, sharded aggregators, edge-oriented input, out-of-core computation, and more. With a steady development cycle and a growing community of users worldwide, Giraph is a natural choice for unleashing the potential of structured datasets at a massive scale. To learn more, consult the User Docs section above.
neo4j is the previous season’s hot graph query system.
ASAP presents some background on graph pattern mining versus graph analysis.

Today, a deluge of graph processing frameworks exist, both in academia and open-source. […] These frameworks typically provide high-level abstractions that make it easy for developers to implement many graph algorithms. A vast majority of the existing graph processing frameworks however have focused on graph analysis algorithms. These frameworks are fast and can scale out to handle very large graph analysis settings: for instance, GraM can run one iteration of page rank on a trillion-edge graph in 140 seconds in a cluster. In contrast, systems that support graph pattern mining fail to scale to even moderately sized graphs, and are slow, taking several hours to mine simple patterns
Gephi is a classic graph visualizer, which I have played with a lot, although I never found a use for it outside of pretty pictures.

Gephi is the leading visualization and exploration software for all kinds of graphs and networks. Gephi is open-source and free.
scikit-network Bonald et al. (2020)

Scikit-network is a Python package inspired by scikit-learn for the analysis of large graphs. Graphs are represented by their adjacency matrix in the sparse CSR format of SciPy. The package provides state-of-the-art algorithms for ranking, clustering, classifying, embedding and visualising the nodes of a graph. High performance is achieved through a mix of fast matrix-vector products (using SciPy), compiled code (using Cython) and parallel processing. The package is distributed under the BSD license, with dependencies limited to NumPy and SciPy. It is compatible with Python 3.6 and newer. Source code, documentation and installation instructions are available online.
graphlab, now a Turi product has “graph” in its name. I think it tries to solve general non-graphish ML problems but also… solves graphish ones? Do they also run a cloud? Or something? Maybe someone should actually read their website. I clearly didn’t.

1 References

Bonald, de Lara, Lutz, et al. 2020. “Scikit-Network: Graph Analysis in Python.” Journal of Machine Learning Research.

Otasek, Morris, Bouças, et al. 2019. “Cytoscape Automation: Empowering Workflow-Based Network Analysis.” Genome Biology.

Shannon, Markiel, Ozier, et al. 2003. “Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction Networks.” Genome Research.