TPL Dataflow

This blog post is coming a couple of weeks late due to my copy of Windows becoming corrupted – and strangely the only things I didn’t have backed up were this Oscilloscope app!

Anyway, last time I took a look at charting performance, but this post will investigate TPL Dataflow.

In my original F# post, I discussed that I felt that the code would have been better implemented in RX, which I then went and did in this post. However, RX isn’t the only recent new technology coming out of Microsoft that deals with streams of data. TPL Dataflow has a confusingly similar overlap with the usages of RX, and this is what its whitepaper has to say:

“Astute readers may notice some similarities between TPL Dataflow and Reactive Extensions (Rx), currently available as a download from the MSDN Data developer center. Rx is predominantly focused on coordination and composition of event streams with a LINQ-based API, providing a rich set of combinators for manipulating IObservables of data. In contrast, TPL Dataflow is focused on providing building blocks for message passing and parallelizing CPU- and I/O-intensive applications with high-throughput and low-latency, while also providing developers explicit control over how data is buffered and moves about the system. As such, Rx and TPL Dataflow, while potentially viewed as similar at a 30,000 foot level, address distinct needs. Even so, TPL Dataflow and Rx provide a better together story.”

That does sound very interesting – who doesn’t want better performance!

Implementation

I’ll dive in straight away and look at some code. The code is structured similarly to the RX code in my previous post.

/blog/2012-02-15-tpl-dataflow/images/clip_image00225255B425255D.png — clip_image001

/blog/2012-02-15-tpl-dataflow/images/clip_image00325255B425255D.png — clip_image002

clip_image004

/blog/2012-02-15-tpl-dataflow/images/clip_image00625255B425255D.png — clip_image005

/blog/2012-02-15-tpl-dataflow/images/clip_image00725255B525255D.png — clip_image006

/blog/2012-02-15-tpl-dataflow/images/clip_image00825255B425255D.png — clip_image007

http://www.taumuon.co.uk/wp-content/uploads/2012/02/clip_image008_thumb25255B125255D.png?imgmax=800 — clip_image008

The processor usage between the apps is in the same ballpark, but the TPL Dataflow app has many more garbage collections, and # bytes in all heaps also is running slightly higher. I had a quick look in the profiler, and it seems that there are many Task allocations from the Dataflow.Post() operation from within the SlidingWindow and Sample implementations.

The extra generation 1 collections don’t really matter for this application with the functionality that it currently has – the % time in GC barely registers, and there are no Gen 2 garbage collections over the minute or so that the app was running.

Once the app is doing more calculations and drawing more complex charts it would be interesting to see whether any latency spikes due gen 2 GCs cause similar slowdowns to the one discussed at the start of my previous post.

It would be an interesting exercise to limit the amount of GCs throughout the app, for instance there’s no need for the microphone access observable to return an IObservable<float[]> instead of IObservable; currently every read from the microphone allocates a new float[]. Similarly, new List are created to more easily interface with DynamicDataDisplay – it would be better to change the types of data that D3 takes to be more observable-friendly, and to save having so many allocations. Again, there’s not much point doing this, other than an interesting theoretical exercise, until the garbage collection overhead proves to be a performance issue.

Conclusion

For an application as simple as this, there isn’t any benefit to using TPL Dataflow – it is a powerful library, with functionality such as blocks being able to decline offered blocks, and request them later, which would be difficult to implement in RX. As my app doesn’t (currently) need that level of functionality, there’s no benefit to using the library.

I may revisit this in the future – if I had some computationally expensive operation (FFT for instance) where I’d want greater control over the flow of data through the system.