Hazelcast 1.8 (comparison of the grid/cloud computing frameworks - part II)
Test environment
During fail-over tests we used the same test environment as in the part I of our comparison. However, we simulated node failures during our tests in the following order:
- intel4 went down after 60 seconds from the beginning of computations
- intel3 went down after 180 seconds from the beginning of computations
- intel2 went down after 300 seconds from the beginning of computations
Code
During fail-over tests we used a slightly modified code from the part I of our comparison. We had to change two things in our code:
- We had to add new fail-over hook just before task generation phase. It allowed us to precisely synchronize the test environment.
- We had to handle MemberLeftException. This exception is thrown by the Hazelcast from Future.get() method, when node has left during computation of the task. In our example we simply resubmitted failed tasks.
while (tasks.size() > 0) { Map<String[], Future<FastBigInt128>> missingTasks = new HashMap<String[], Future<FastBigInt128>>(); for (Map.Entry<String[], Future<FastBigInt128>> entry : tasks.entrySet()) { FastBigInt128 result; try { result = entry.getValue().get(); } catch (MemberLeftException e) { Future<FastBigInt128> future = executorService.submit(new Agent(entry.getKey(), ++count)); missingTasks.put(entry.getKey(), future); continue; } results.add(result); } tasks = missingTasks; }
You can find all the code in our repository: http://dacframe.org/lab
Results
You can see all the results with std deviation and average values on the following table:
Hazelcast 1.8 <-->| Average | 501 223.60 | ||
| Std Deviation | 10 513.81 | ||
| Computation time | Repeated tasks | ||
|---|---|---|---|
| 494 447.00 | 52 | ||
| 526 467.00 | 56 | ||
| 500 203.00 | 73 | ||
| 508 741.00 | 73 | ||
| 497 407.00 | 48 | ||
| 492 832.00 | 67 | ||
| 501 321.00 | 73 | ||
| 490 616.00 | 98 | ||
| 494 899.00 | 20 | ||
| 505 303.00 | 73 |
As you can see on the above table, both average time and std deviation are only slightly bigger in the fail-over test than in our first comparison. Average time increased from 321 922.70 by 55.70%, std deviation increased from 4 687.92 by 124%. However, a lot of tasks were computed several times (which is very cheap in Hazelcast). This means, that Hazelcast is quite immune to unexpected node failures.
CPU
CPU usage (%user) gathered on all machines:
CPU usage (%system) gathered on all machines:
Memory
Memory usage gathered on all machines:
Network
Network usage (received bytes/s) gathered on all machines:
Network usage (transmitted bytes/s) gathered on all machines:



