In this paper we consider the problem of graph-based transductive classification, and we are particularly interested in the directed graph scenario which is a natural form for many real world applications.Different from existing research efforts that either only deal with undirected graphs or circumvent directionality by means of symmetrization, we propose a novel random walk approach on directed graphs using absorbing Markov chains, which can be regarded as maximizing the accumulated expected number of visits from the unlabeled transient states. Our algorithm is simple, easy to implement, and works with large-scale graphs on binary, multiclass, and multi-label prediction problems. Moreover, it is capable of preserving the graph structure even when the input graph is sparse and changes over time, as well as retaining weak signals presented in the directed edges. We present its intimate connections to a number of existing methods, including graph kernels, graph Laplacian based methods, and interestingly, spanning forest of graphs. Its computational complexity and the generalization error are also studied. Empirically our algorithm is systematically evaluated on a wide range of applications, where it has shown to perform competitively comparing to a suite of state-of-the-art methods. In particular, our algorithm is shown to work exceptionally well with large sparse directed graphs with e.g. millions of nodes and tens of millions of edges, where it significantly outperforms other state-of-the-art methods. In the dynamic graph setting involving insertion or deletion of nodes and edge-weight changes over time, it also allows efficient online updates that produce the same results as of the batch update counterparts.
IEEE Transactions on Pattern Analysis and Machine Intelligence, Issue 99, 2017, doi:10.1109/TPAMI.2017.2730871