Stream processing and reactive programming

2014-10-30 — 2015-07-01

Suspiciously similar content

Lazy bookmark for practical details on processing and transforming possibly infinite streams of data, from signals to parse trees. Disambiguating “transducers”.

Used in parallel/offline processing of large data sets that do not fit in core, or processing things that happen in real-time such as UI.

I am imagining more general objects than singly-indexed real-valued signals; Tokens, maybe. Classic DSP can be elsewhere. Infrastructure to do stream processing in a distributed fashion is filed under message queues.

In statistics and machine learning, stream processing connects with online learning; incorporating data as it comes in, as in distributed statistics.

1 Functional reactive programming

See FRP.

2 Streaming data analysis

Online, possibly real-time, certainly memory-constrained.

Apache Storm…
- Storm-compatible, Heron aims to be Storm-but-more-reliable.
A collection of links for streaming algorithms and data structures

2.1 Qminer

qminer

UNSTRUCTURED DATA

QMiner provides support for unstructured data, such as text and social networks across the entire processing pipeline, from feature engineering and indexing to aggregation and machine learning.

SEARCH

QMiner provides out-of-the-box support for indexing, querying and aggregating structured, unstructured and geospatial data using a simple query language.

JAVASCRIPT API

QMiner applications are implemented in JavaScript, making it easy to get started. Using the Javascript API it is easy to compose complete data processing pipelines and integrate with other systems via RESTful web services.

C++ LIBRARY

QMiner is implemented in C++ and can be included as a library into custom C++ projects, thus providing them with stream processing and data analytics capabilities.

3 To read

Adrian Colyer explains the McSherry et al Naiad system
A different, information-theoretic, angle — The (interstellar) streaming problem:
- Streaming Algorithms
- Jeremy Kun: The complexity of Communication

4 References

Hu, Pehlevan, and Chklovskii. 2014. “A Hebbian/Anti-Hebbian Network for Online Sparse Dictionary Learning Derived from Symmetric Matrix Factorization.” In 2014 48th Asilomar Conference on Signals, Systems and Computers.

McSherry, Isaacs, Isard, et al. 2013. Differential dataflow. US20130304744 A1.

Murray, McSherry, Isaacs, et al. 2013. “Naiad: A Timely Dataflow System.” In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. SOSP ’13.

Pan, Zhang, Wu, et al. 2014. “Online Community Detection for Large Complex Networks.” PLoS ONE.

Ryabko, and Ryabko. 2010. “Nonparametric Statistical Inference for Ergodic Processes.” IEEE Transactions on Information Theory.

Sorensen, and Gardner. 2010. “Programming with Time: Cyber-Physical Programming with Impromptu.” In ACM Sigplan Notices.