Our paper titled “GeneaLog: Fine-Grained Data Streaming Provenance at the Edge” has been accepted at the 2018 ACM/IFIP/USENIX International Middleware Conference.
The abstract follows:
Fine-grained data provenance in stream processing allows linking each result tuple back to the source data that contributed to its generation, something beneficial for many big data applications; e.g., in security and safety-related applications, it can help debug analytical queries, thus facilitating the inspection of the conditions triggering an alert. Furthermore, when data transmission or storage has to be minimized, such as in edge computing and cyber-physical systems, it can help to identify which fraction of the source data should be prioritized.
The memory and processing time costs of fine-grained data provenance, which can be afforded by high-end servers, can nonetheless be prohibitive for the resource-constrained devices deployed in edge computing and cyber-physical systems. Motivated by this challenge, we present GeneaLog, a novel fine-grained data provenance technique for data streaming applications. Leveraging the logical dependencies of the data, GeneaLog takes advantage of cross-layer properties of the software stack and incurs a minimal, constant size per-tuple overhead. Furthermore, it allows for a modular and efficient algorithmic implementation using only standard data streaming operators. This is particularly useful for streaming applications distributed at different physical nodes since the provenance processing can be executed at third nodes, orthogonal to the data-processing. We evaluate a full-fledged implementation of GeneaLog using vehicular and smart grid applications, confirming it efficiently captures fine-grained provenance data with minimal overhead.
Our journal paper titled Viper: A Module for Communication-Layer Determinism and Scaling in Low-Latency Stream Processing has been accepted at the Elsevier journal Future Generation Computer Systems!
The abstract follows:
Stream Processing Engines (SPEs) process continuous streams of data and produce results in a real-time fashion, typically through one-at-a-time tuple analysis. In Fog architectures, the limited resources of the edge devices, enabling close-to-the-source scalable analysis, demand for computationally- and energy-efficient SPEs. When looking into the vital SPE processing properties required from applications, determinism, which ensures consistent results independently of the way the analysis is parallelized, has a strong position besides scalability in throughput and low processing latency. SPEs scale in throughput and latency by relying on shared-nothing parallelism, deploying multiple copies of each operator to which tuples are distributed based on its semantics. The coordination of the asynchronous analysis of parallel operators required to enforce determinism is then carried out by additional dedicated sorting operators. To prevent this costly coordination from becoming a bottleneck, we introduce the Viper communication module, which can be integrated in the SPE communication layer and boost the coordination of the parallel threads analyzing the data. Using Apache Storm and data extracted from the Linear Road benchmark and a real-world smart grid system, we show benefits in the throughput, latency and energy efficiency coming from the utilization of the Viper module.
Our paper titled LoCoVolt: Distributed Detection of Broken Meters in Smart Grids through Stream Processing has been accepted at the industrial track of the 12th ACM International Conference on Distributed and Event-Based Systems (DEBS)!
The abstract follows:
Smart Grids and Advanced Metering Infrastructures are rapidly replacing traditional energy grids. The cumulative computational power of their IT devices, which can be leveraged to continuously monitor the state of the grid, is nonetheless vastly underused.
This paper provides evidence of the potential of streaming analysis run at smart grid devices. We propose a structural component, which we name LoCoVolt (Local Comparison of Voltages), that is able to detect in a distributed fashion malfunctioning smart meters, which report erroneous information about the power quality. This is achieved by comparing the voltage readings of meters that, because of their proximity in the network, are expected to report readings following similar trends. Having this information can allow utilities to react promptly and thus increase timeliness, quality and safety of their services to society and, implicitly, their business value. As we show, based on our implementation on Apache Flink and the evaluation conducted with resource-constrained hardware (i.e., with capacity similar to that of hardware in smart grids) and data from a real-world network, the streaming paradigm can deliver efficient and effective monitoring tools and thus achieve the desired goals with almost no additional computational cost.