Hadoop 0.20.1 (comparison of the grid/cloud computing frameworks - part II)
Test environment
During fail-over tests we used the same test environment as in the part I of our comparison, but with a small change in configuration. We had to increase dfs.replication from 1 to 3.
Moreover, we simulated node failures during our tests in the following order:
- intel4 went down after 60 seconds from the beginning of computations
- intel3 went down after 180 seconds from the beginning of computations
- intel2 went down after 300 seconds from the beginning of computations
Code
During fail-over tests we used the same code as in the part I of our comparison. The only thing that has changed was new fail-over hook just before job.waitForCompletion(true) invocation. It allowed us to precisely synchronize the test environment.
You can find all the code in our repository: http://dacframe.org/lab
Results
You can see all the results with std deviation and average values on the following table:
Hadoop 0.20.1 <-->| Average | 1 307 526.80 | ||
| Std Deviation | 181 570.01 | ||
| Computation time | Repeated tasks | ||
|---|---|---|---|
| 1 593 734.00 | 123 | ||
| 1 194 784.00 | 45 | ||
| 1 177 549.00 | 43 | ||
| 1 164 319.00 | 44 | ||
| 1 615 523.00 | 113 | ||
| 1 178 317.00 | 40 | ||
| 1 463 073.00 | 107 | ||
| 1 145 385.00 | 44 | ||
| 1 268 696.00 | 50 | ||
| 1 273 888.00 | 63 |
As you can see on the above table, both average time and std deviation are much bigger in the fail-over test than in our first comparison. Average time increased from 384 331.60 by 240.21%, std deviation increased from 5 629.01 by 3125%. Moreover, a lot of tasks were computed several times. This means, that Hadoop is quite sensitive to unexpected node failures.
CPU
CPU usage (%user) gathered on all machines:
CPU usage (%system) gathered on all machines:
Memory
Memory usage gathered on all machines:
Network
Network usage (received bytes/s) gathered on all machines:
Network usage (transmitted bytes/s) gathered on all machines:



