Stream processing and reactive programming

October 31, 2014 — July 1, 2015

computers are awful
concurrency hell
premature optimization
signal processing

Lazy bookmark for practical details to processing and transforming possibly-infinite streams of data, from signals to parse trees. Disambiguating “transducers”.

Used in parallel/offline processing of large data sets that do not fit in core, or processing things that happen in realtime such as UI.

I am imagining more general objects than singly-indexed real-valued signals; Tokens, maybe. Classic DSP can be elsewhere. Infrastructure to do stream processing in a distributed fashion is filed under message queues.

In statistics and machine learning, stream processing connects with online learning; incorporating data as it comes in, as in distributed statistics.

1 Functional reactive programming

Closely related: Communicating Sequential Processes. I think the main difference is supposed to be perhaps that CSP is not necessarily in a function style? But in practice, FRP I have seen in the wild is not necessarily in a functional style?

Question: why are there not more visual pipeline programs for this programming domain? It looks like an obvious fit.

A parallelism/streaming thing. Communicating Sequential Processes. Functional Reactive Programming.

  • CSP and transducers — I don’t think these are transducers as I understand them, i.e. stack machines, but I could be wrong. See also the expositions from the clojure authors:

  • ReactiveX is a particular stream processing paradigm with implementations for many languages

  • Reactive manifesto:

    We believe that a coherent approach to systems architecture is needed, and we believe that all necessary aspects are already recognized individually: we want systems that are Responsive, Resilient, Elastic and Message Driven. We call these Reactive Systems.

    Systems built as Reactive Systems are more flexible, loosely-coupled and scalable. This makes them easier to develop and amenable to change. They are significantly more tolerant of failure and when failure does occur they meet it with elegance rather than disaster. Reactive Systems are highly responsive, giving users effective interactive feedback.

  • A collection of links for streaming algorithms and data structures

1.1 Javascript

See FRP in Javascript.

1.2 Python

See Python See FRP in Python.

2 Streaming data analysis

Online, possibly realtime, certainly memory-constrained.

  • Apache Storm

    • Storm-compatible, Heron aims to be Storm-but-more-reliable.

3 To read

4 References

Hu, Pehlevan, and Chklovskii. 2014. A Hebbian/Anti-Hebbian Network for Online Sparse Dictionary Learning Derived from Symmetric Matrix Factorization.” In 2014 48th Asilomar Conference on Signals, Systems and Computers.
McSherry, Isaacs, Isard, et al. 2013. Differential dataflow. US20130304744 A1.
Murray, McSherry, Isaacs, et al. 2013. Naiad: A Timely Dataflow System.” In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. SOSP ’13.
Pan, Zhang, Wu, et al. 2014. Online Community Detection for Large Complex Networks.” PLoS ONE.
Ryabko, and Ryabko. 2010. Nonparametric Statistical Inference for Ergodic Processes.” IEEE Transactions on Information Theory.
Sorensen, and Gardner. 2010. Programming with Time: Cyber-Physical Programming with Impromptu.” In ACM Sigplan Notices.