ScaleJoin accepted at IEEE Big Data 2015!

We got our ScaleJoin paper accepted at the IEEE International Conference on Big Data (IEEE Big Data)!
You can find the abstract below. See you in California!

Abstract
The inherently large and varying volumes of data generated to facilitate autonomous functionality in large scale cyber-physical systems demand near real-time processing of data streams, often as close to the sensing devices as possible. In this context, data streaming is imperative for data-intensive processing infrastructures. Stream joins, the streaming counterpart of database joins, compare tuples coming from different streams and constitute one of the most important and expensive data streaming operators. Dictated by the needs of big data streaming analytics, algorithmic implementations of stream joins have to be capable of efficiently processing bursty and rate-varying data streams in a deterministic and skew-resilient fashion. To leverage the design of modern multicore architectures, scalability and parallelism need to be addressed also in the algorithmic design.
In this paper we present ScaleJoin, an algorithmic construction for deterministic and parallel stream joins that guarantees all the above properties, thus filling in a gap in the existing state-of-the art. Key to the novelty of ScaleJoin is a new data structure, Scalegate, and its lock-free implementation. ScaleGate facilitates concurrent data exchange and balances independent actions among processing threads; it also enables fine-grain parallelism while providing the necessary synchronization for deterministic processing. As a result, it allows ScaleJoin to run on an arbitrary number of processing threads that can evenly share the overall comparisons run in parallel and achieve high processing throughput and low processing latency. As we show, ScaleJoin not only guarantees deterministic, disjoint and skew-resilient parallelism, but also achieves higher throughput than state-of-the-art parallel stream joins.

Tagged with: , , ,
Posted in Concurrent Data Structures, Data Streaming, Research, ScaleGate

STONE: A Streaming DDoS Defense Framework published! – Expert Systems With Applications (ESwA) 2015

We recently got our STONE journal paper at the Expert Systems With Applications (ESwA)! Here you have the abstract.

Abstract
Distributed Denial-of-Service (DDoS) attacks aim at rapidly exhausting the communication and computational power of a network target by flooding it with large volumes of malicious traffic. In order to be effective, a DDoS defense mechanism should detect and mitigate threats quickly, while allowing legitimate users access to the attack’s target. Nevertheless, defense mechanisms proposed in the literature tend not to address detection and mitigation challenges jointly, but rather focus solely on the detection or the mitigation facet. At the same time, they usually overlook the limitations of centralized defense frameworks that, when deployed physically close to a possible target, become ineffective if DDoS attacks are able to saturate the target’s incoming links.
This paper presents STONE, a framework with expert system functionality that provides effective and joint DDoS detection and mitigation. STONE characterizes regular network traffic of a service by aggregating it into common prefixes of IP addresses, and detecting attacks when the aggregated traffic deviates from the regular one. Upon detection of an attack, STONE allows traffic from known sources to access the service while discarding suspicious one. STONE relies on the data streaming processing paradigm in order to characterize and detect anomalies in real time. We implemented STONE on top of StreamCloud, an elastic and parallel-distributed Stream Processing Engine. The evaluation, conducted on real network traces, shows that STONE detects DDoS attacks rapidly, provides minimal degradation of legitimate traffic while mitigating a threat, and also exhibits a processing throughput that scales linearly with the number of nodes used to deploy and run it.

Tagged with: , ,
Posted in Data Streaming, DDoS, Research

5 minutes Storm installation guide (single-node setup)

The following is a short guide to setup Storm quickly!

Note: this installation does not require sudo, logs and other data maintained by ZooKeeper and Storm are in my home folder

Create a directory for storm and enter it
mkdir storm
cd storm

Create a data directory
mkdir -p datadir/zookeeper

Download ZooKeeper and unzip it
wget http://apache.mirrors.spacedump.net/zookeeper/current/zookeeper-3.4.6.tar.gz (or the appropriate version)
tar -xvf zookeeper-3.4.6.tar.gz

Download Storm and unzip it
wget http://apache.mirrors.spacedump.net/storm/apache-storm-0.9.5/apache-storm-0.9.5.tar.gz (or the appropriate version)
tar -xvf apache-storm-0.9.5.tar.gz

Configure ZooKeeper
Add the following to zookeeper-3.4.6/conf/zoo.cfg
tickTime=2000
dataDir=/home/username/datadir/zookeeper
clientPort=2181

Configure Storm
Uncomment/add the following to apache-storm-0.9.5/conf/storm.yaml
storm.zookeeper.servers:
  - "127.0.0.1"
nimbus.host: "127.0.0.1”
storm.local.dir: "/home/username/storm/datadir/storm"
supervisor.slots.ports:
    - 6700
    - 6701
    - 6702
    - 6703

Start ZooKeeper
zookeeper-3.4.6/bin/zkServer.sh start

Start nimbus
apache-storm-0.9.5/bin/storm nimbus (start in separate shell or in background with &)

Start supervisor
apache-storm-0.9.5/bin/storm supervisor (start in separate shell or in background with &)

Start UI
apache-storm-0.9.5/bin/storm ui (start in separate shell or in background with &)

Try to connect to 127.0.0.1:8080, you see Storm UI? Enjoy!

Notes:

  • I ssh a remote server, so I run  ssh -L 8080:localhost:8080 username@server to be able to locally access port 8080 on the server
  • This setup creates 4 workers, if you want more/less modify supervisor.slots.ports
Tagged with: , ,
Posted in Data Streaming, programming, Storm