Our journal paper titled Viper: A Module for Communication-Layer Determinism and Scaling in Low-Latency Stream Processing has been accepted at the Elsevier journal Future Generation Computer Systems!
The abstract follows:
Stream Processing Engines (SPEs) process continuous streams of data and produce results in a real-time fashion, typically through one-at-a-time tuple analysis. In Fog architectures, the limited resources of the edge devices, enabling close-to-the-source scalable analysis, demand for computationally- and energy-efficient SPEs. When looking into the vital SPE processing properties required from applications, determinism, which ensures consistent results independently of the way the analysis is parallelized, has a strong position besides scalability in throughput and low processing latency. SPEs scale in throughput and latency by relying on shared-nothing parallelism, deploying multiple copies of each operator to which tuples are distributed based on its semantics. The coordination of the asynchronous analysis of parallel operators required to enforce determinism is then carried out by additional dedicated sorting operators. To prevent this costly coordination from becoming a bottleneck, we introduce the Viper communication module, which can be integrated in the SPE communication layer and boost the coordination of the parallel threads analyzing the data. Using Apache Storm and data extracted from the Linear Road benchmark and a real-world smart grid system, we show benefits in the throughput, latency and energy efficiency coming from the utilization of the Viper module.
Our paper titled LoCoVolt: Distributed Detection of Broken Meters in Smart Grids through Stream Processing has been accepted at the industrial track of the 12th ACM International Conference on Distributed and Event-Based Systems (DEBS)!
The abstract follows:
Smart Grids and Advanced Metering Infrastructures are rapidly replacing traditional energy grids. The cumulative computational power of their IT devices, which can be leveraged to continuously monitor the state of the grid, is nonetheless vastly underused.
This paper provides evidence of the potential of streaming analysis run at smart grid devices. We propose a structural component, which we name LoCoVolt (Local Comparison of Voltages), that is able to detect in a distributed fashion malfunctioning smart meters, which report erroneous information about the power quality. This is achieved by comparing the voltage readings of meters that, because of their proximity in the network, are expected to report readings following similar trends. Having this information can allow utilities to react promptly and thus increase timeliness, quality and safety of their services to society and, implicitly, their business value. As we show, based on our implementation on Apache Flink and the evaluation conducted with resource-constrained hardware (i.e., with capacity similar to that of hardware in smart grids) and data from a real-world network, the streaming paradigm can deliver efficient and effective monitoring tools and thus achieve the desired goals with almost no additional computational cost.
Our paper titled Continuous and Parallel LiDAR Point-cloud Clustering has been accepted at the 38th IEEE International Conference on Distributed Computing Systems (ICDCS)!
The abstract follows:
In distributed digitalized environments in the context of Internet of Things, we often need to do analysis of big data originating at high rate-sensors at the edge of the infrastructure. A characteristic example is the light detection and ranging (LiDAR) technology, that allows to sense surrounding objects with fine-grained resolution in large areas. Their data (known as point clouds), generated continuously at very high rates, through appropriate analysis can provide information to support automated functionality in distributed cyberphysical systems; clustering of point clouds is a key problem to extract this type of information. Methods for solving the problem in a continuous fashion can facilitate improved processing in fog architectures, through enabling low-latency, efficient continuous and streaming processing of data close to the sources; moreover, parallelism is a key requirement to exploit a variety of computing architectures in this context.
We propose Lisco, a single-pass continuous Euclidean-distance-based clustering of LiDAR point clouds, that maximizes the granularity of the data processing pipeline and thus shows the potential for data- and pipeline-parallelism. We further present its parallel version, P-Lisco, that is architecture-independent and exploits the parallelism revealed by Lisco’s algorithmic approach. Besides their algorithmic analysis, we provide a thorough experimental evaluation on architectures representative of high-end servers and of resource-constrained embedded devices and highlight the multiplicative improvements and scalability benefits of the proposed algorithms compared to the baseline, using both real-world datasets as well as synthetic ones to fully explore a wide spectrum of stress-levels for the algorithms.