Gesture recognition

I want to recognise gestures made with generic interface devices for artistic purposes, in realtime. Is that so much to ask?

Related: synestizer, time warping, functional data analysis, controller mapping.

To Use

  • Reverse engineer Face the music.

  • Gesture variation following has particular algorithms optimised for realtime music and video control using AFAICT particle filter. This is a different approach to the other ones, which use off-the-shelf algorithms for the purpose, which leads to some difficulties. (source is c++, puredata and maxmsp interfaces available)

  • GRT: The Gesture Recognition Toolkit other software for gesture recognition; lower level than wekinator (default API is raw C++), more powerful algorithsm, although a less beguiling demo video. Now also includes a GUI and puredata opensoundcontrol interfaces in addition to the original C++ API.

  • Eyesweb: An inscrutably under-explained GUI(?) for integrating UI stuff somehow or other.

  • Wekinator: Software for using machine learning to build real-time interactive systems. (Which is to say, a workflow optimised for ad-hoc, slippery, artsy applications of cold, hard, calculating machine learning techniques.)

  • Beautifully simple “graffiti” letter recogniser (NN-search on normalised characters, neat hack. Why you should always start from the simplest thing.) (via Chr15m)

  • how the kinext recognises (spoiler: random forests)

BTW, you can also roll your own with any machine learning library; It’s not clear how much you need all the fancy time-warping tricks.

Likely bottlenecks are constructing a training data set and getting the cursed thing to work in real time. I should make some notes on that theme.

Apropos that, Museplayer can record opensoundcontrol data.


Arfib, D., J. M. Couturier, L. Kessous, and V. Verfaille. 2002. Strategies of Mapping Between Gesture Data and Synthesis Model Parameters Using Perceptual Spaces.” Organised Sound 7 (2): 127–44.
Caramiaux, Baptiste, Nicola Montecchio, Atau Tanaka, and Frédéric Bevilacqua. 2014. Adaptive Gesture Recognition with Variation Estimation for Interactive Systems.” ACM Trans. Interact. Intell. Syst. 4 (4): 18:1–34.
Chen, Feng-Sheng, Chih-Ming Fu, and Chung-Lin Huang. 2003. Hand Gesture Recognition Using a Real-Time Tracking Method and Hidden Markov Models.” Image and Vision Computing 21 (8): 745–58.
Cresci, Stefano, Roberto Di Pietro, Marinella Petrocchi, Angelo Spognardi, and Maurizio Tesconi. 2017. The Paradigm-Shift of Social Spambots: Evidence, Theories, and Tools for the Arms Race.” In Proc. 26th WWW.
Criminisi, Antonio, Jamie Shotton, and Ender Konukoglu. 2012. Decision Forests: A Unified Framework for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning. Vol. 7.
Fiebrink, Rebecca, and Perry R. Cook. 2010. The Wekinator: A System for Real-Time, Interactive Machine Learning in Music.” In Proceedings of The Eleventh International Society for Music Information Retrieval Conference (ISMIR 2010). Utrecht.
Fiebrink, Rebecca, Perry R. Cook, and Dan Trueman. 2011. Human Model Evaluation in Interactive Supervised Learning.” In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 147–56. CHI ’11. New York, NY, USA: ACM.
Fiebrink, Rebecca, Dan Trueman, and Perry R. Cook. 2009. A Metainstrument for Interactive, on-the-Fly Machine Learning.” In Proceefdings of NIME, 2:3.
Françoise, Jules, Norbert Schnell, Riccardo Borghesi, and Frédéric Bevilacqua. 2014. Probabilistic Models for Designing Motion and Sound Relationships.” In Proceedings of the 2014 International Conference on New Interfaces for Musical Expression, 287–92. London, UK, United Kingdom.
Gillian, Nicholas, Benjamin Knapp, and Sile O’Modhrain. 2011a. Recognition of Multivariate Temporal Musical Gestures Using n-Dimensional Dynamic Time Warping.” In.
Gillian, Nicholas, R. Knapp, and Sile O’Modhrain. 2011b. A Machine Learning Toolbox for Musician Computer Interaction.” NIME11.
Hantrakul, Lamtharn, and Konrad Kaczmarek. 2014. Implementations of the Leap Motion Device in Sound Synthesis and Interactive Live Performance.” In Proceedings of the 2014 International Workshop on Movement and Computing, 142:142–45. MOCO ’14. New York, NY, USA: ACM.
Hong, Pengyu, M. Turk, and T.S. Huang. 2000. Gesture Modeling and Recognition Using Finite State Machines.” In Fourth IEEE International Conference on Automatic Face and Gesture Recognition, 2000. Proceedings, 410–15.
Hunt, Andy, and Marcelo M. Wanderley. 2002. Mapping Performer Parameters to Synthesis Engines.” Organised Sound 7 (2): 97–108.
King, Gary, Jennifer Pan, and Margaret E. Roberts. 10000. “How the Chinese Government Fabricates Social Media Posts for Strategic Distraction, Not Engaged Argument.” American Political Science Review.
Kratz, Sven, and Michael Rohs. 2010. A $3 Gesture Recognizer: Simple Gesture Recognition for Devices Equipped with 3D Acceleration Sensors.” In Proceedings of the 15th International Conference on Intelligent User Interfaces, 341–44. IUI ’10. New York, NY, USA: ACM.
Lee, Hyeon-Kyu, and J.H. Kim. 1999. An HMM-Based Threshold Model Approach for Gesture Recognition.” IEEE Transactions on Pattern Analysis and Machine Intelligence 21 (10): 961–73.
Marković, Dimitrije, Borjana Valčić, and Nebojša Malešević. 2016. Body Movement to Sound Interface with Vector Autoregressive Hierarchical Hidden Markov Models.” arXiv:1610.08450 [Cs, Stat], October.
Mitra, S., and T. Acharya. 2007. Gesture Recognition: A Survey.” IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews 37 (3): 311–24.
Murakami, Kouichi, and Hitomi Taguchi. 1991. Gesture Recognition Using Recurrent Neural Networks.” In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 237–42. ACM.
Paine, Garth. 2002. Interactivity, Where to from Here? Organised Sound 7 (3): 295–304.
Rocchesso, Davide, Guillaume Lemaitre, Patrick Susini, Sten Ternström, and Patrick Boussard. 2015. Sketching Sound with Voice and Gesture.” Interactions 22 (1): 38–41.
Schacher, Jan C. 2015. Gestural Electronic Music Using Machine Learning as Generative Device.” In Proceedings of the International Conference on New Interfaces for Musical Expression, NIME’15,. Baton Rouge, USA: Louisiana State University.
Schlömer, Thomas, Benjamin Poppinga, Niels Henze, and Susanne Boll. 2008. Gesture Recognition with a Wii Controller.” In Proceedings of the 2Nd International Conference on Tangible and Embedded Interaction, 11–14. TEI ’08. New York, NY, USA: ACM.
Wang, Sy Bor, A. Quattoni, L. Morency, D. Demirdjian, and T. Darrell. 2006. Hidden Conditional Random Fields for Gesture Recognition.” In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2:1521–27. IEEE.
Williamson, John, and Roderick Murray-Smith. 2002. Audio Feedback for Gesture Recognition.”
Wilson, A.D., and A.F. Bobick. 1999. Parametric Hidden Markov Models for Gesture Recognition.” IEEE Transactions on Pattern Analysis and Machine Intelligence 21 (9): 884–900.
Wright, Matthew. 2005. Open Sound Control: An Enabling Technology for Musical Networking.” Organised Sound 10 (3): 193–200.
Wu, Ying, and Thomas S. Huang. 1999. Vision-Based Gesture Recognition: A Review.” In Gesture-Based Communication in Human-Computer Interaction, edited by Annelies Braffort, Rachid Gherbi, Sylvie Gibet, Daniel Teil, and James Richardson, 103–15. Lecture Notes in Computer Science 1739. Springer Berlin Heidelberg.
Yang, Ming-Hsuan, N. Ahuja, and M. Tabb. 2002. Extraction of 2D Motion Trajectories and Its Application to Hand Gesture Recognition.” IEEE Transactions on Pattern Analysis and Machine Intelligence 24 (8): 1061–74.
Yoon, Ho-Sub, Jung Soh, Younglae J. Bae, and Hyun Seung Yang. 2001. Hand Gesture Recognition Using Combined Features of Location, Angle and Velocity.” Pattern Recognition 34 (7): 1491–501.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.