William McKnight

Stream Computing: When Low Latency Processing Matters

By William McKnight on May 5, 2011
View Full Bio →

I spent some time with Violin Memory on an expo show floor recently.  They were demonstrating their memory arrays.  A series of lights tracked the data flow through the drives in a simulation of where transactions are processed in flight.  As the demonstration was tuned up to higher levels of concurrent data flow, the movement of the data caused the lights to go up correspondingly.  I was warned that if we got to a major enterprise production workload simulation, the demonstration would cease to be very demonstrative because the flow would not be visible to the naked eye, but I had to see it for myself.  Sure enough, it was just like the lights were just turned “on” with barely a flicker among the hundreds of lights.  That’s how much data movement there is in an enterprise.

I thought about the several companies I have who are processing right on these data streams and not waiting until the data hits a database before processing. 

Every business that has a heartbeat has data streams.  However, these streams become streams in the vernacular when processing, often for advanced concepts that would be considered data mining (some using PMML) frequently upon multiple streams at once, becomes a requirement coupled with a requirement to perform the processing in real-time.  Streams are post-processing in the sense that it is data in play that has already been entered into an input channel and is in process of flowing to its next organizational landing spot. 

For example, readings by multiple medical devices upon a patient are analyzed to detect abnormalities that need to be acted upon right away.  Similar activity happens in semiconductor fabrication lines, where streams of readouts are immediately correlated to determine if the processes are within acceptable ranges.  Irregularities do happen and spotting one just a few minutes earlier, which translates into several hundred parts produced, can save the company millions of dollars.

But perhaps the biggest use of streams is by financial companies, who work with the smallest units of time to place trades, analyze profit and determine fraud.  This is an example where multiple streams come into play.  Financial companies have up to hundreds of concurrent streams doing the same processing, but they need multiple to keep up with the demand.  Frequently, a given customer’s sequential transactions will be spread across many of these streams and they all need to be analyzed together.

Stream programming can be pretty accessible to developers with extended SQL being used by many of the products.  The extended SQL includes time series and pattern matching syntax that allows analysis to take place on the “last 5 minutes” or the “last 30 rows” of data, which brings up an interesting point about stream data - Stream processing may be the only way to process this data.

Data flow from a stream is generally voluminous.  Therefore, it may not be possible for the data to flow to a database.  Consider terabytes of information per minute off a sensor reading system, or audio/video from the internet.  Stream processing may be as good as it gets for that data.  If you want integration, it should be “light” so as not to cause processing to fall behind.  This is actually quite key to stream processing.  As you fall behind the stream, you lose the real-time benefit.  Careful analysis of stream flow to processing and technology capability is required, but it is a misconception that only a single transaction in a single flow can be processed by stream processing.

Some of the best integration for stream data is with master data management hubs.  These hubs contain summarized information about a customer profile and preset thresholds of acceptable activity, making it easy to compare his latest transaction(s) to his profile.  For example, some customers routinely charge $50,000 items which are known to be acceptable while if others charge a $10,000 item, which is out of character, it would be cause for alarm.  The “last…” feature of the stream processing allows transactions for anybody to be analyzed in light of recent transactions by that customer as well as, perhaps, associated or unrelated customers – whatever might constitute a fraud or a time for intervention into the transaction processing stream.

Interestingly, some are finding one of the benefits to stream processing is offloading burdened transaction systems.

The opportunity for real-time processing is everywhere.  Stream processing is very important in the march to real-time.  Robust application environments are being developed that will allow for not only stream processing, but also in learning and incorporating the input of transaction workloads into the decision basis for stream processing.  For example, by correlating fraudulent transactions to the transaction profiles that lead up to the fraud, the pattern in the stream can stop the transaction at an early point and with a high degree of confidence. 

Data modeling in this environment is minimal.  There is the MDM data modeling.  An understanding of the transaction records is also very important.  But there is not data modeling for a persistent store – at least not for the stream processing part of the data lifecycle.  This is extreme low latency processing that requires minimal disruption to the existing look of the data.  It would be worthwhile for every data modeler to learn about this exciting approach to data analysis and look for ways that stream processing can intercept flow and deliver microsecond insights to systems and people.

Follow all Expert Blog updates by subscribing to the RSS RSS feed.

About the Author

William functions as Strategist, Lead Enterprise Information Architect, and Program Manager for complex, high-volume full life-cycle implementations worldwide utilizing the disciplines of data warehousing, master data management, business intelligence, data quality and operational business intelligence.

Vinay Dalviakar
May 10, 2011

Once again, great content. You make the complex understandable.  My company is financial and we look for “wash sales” in the stream with CEP.  It’s the only way we could do it and I’m sure we’ll be doing more in the stream.

Name:

Email:

Comment:

An … a day keeps the doctor away. What word is missing?

Notify me of follow-up comments?