GridGain 2.1.1 (comparison of the grid/cloud computing frameworks - part I)
Test environment
GridGain 2.1.1 nodes are multi-threaded. Because of that, we had to launch only one process on every machine in order to fully utilize available processing units. GridGain is fully distributed, so we didn't have to launch any additional processes. You can see the architecture of the test environment on the following figure:
Code
We prepared two versions of the GridGain test:
- with GridTasks
- with ExecutorService
The first version had some issues with large amount of tasks (known edge-case problem with "siblings" explosion). The second one performed much better with large amount of tasks, but was a little bit slower in other cases.
Test case I - using GridTasks
GridGain operates on the GridTasks. Executing such tasks is very simple:
Grid grid = GridFactory.getGrid(); GridTaskFuture<FastBigInt128> future = grid.execute(CMBFtask.class, args); log.info("Final result = " + future.get());
In order to perform computations, we had to divide problem into small tasks in the split method:
public Collection<? extends GridJob> split(int gridSize, String[] arg) throws GridException { List<GridJob> jobs = new ArrayList<GridJob>(); int n = arg.length > 0 ? Integer.parseInt(arg[0]) : 3; int level = arg.length > 1 ? Integer.parseInt(arg[1]) : 1000; int k = 1; Worker cmbfWorker = new Worker(); log.info("Generating tasks (n=" + n + ", level=" + level + ")"); for (int i = 1; i <= n; i++) { for (final String[] imageDesc : cmbfWorker.generateImages(i, level)) { jobs.add(new GridJobAdapter<Integer>(k++) { public Serializable execute() throws GridException { int k = getArgument(); Worker cmbfWorkr = new Worker(); FastBigInt128 result = cmbfWorkr.countInImage(imageDesc); log.info("Image #" + k + " (" + sentTasks + "), result = " + result); return result; } }); sentTasks++; } } log.info("Sent " + sentTasks + " tasks. Receiving results..."); return jobs; }
After that we had to gather results in the reduce method:
public FastBigInt128 reduce(List<GridJobResult> results) throws GridException { FastBigInt128 result = new FastBigInt128(0); for (GridJobResult res : results) { FastBigInt128 charCnt = res.getData(); result.add(charCnt); } return result; }
Test case II - using ExecutorService
GridGain's distributed executor service operates on the Callable/Runnable interfaces. The main work was done in the Agent class, which implements Callable interface:
public class Agent implements Callable<FastBigInt128>, Serializable { private static final long serialVersionUID = 1L; private String[] imageDesc; private int z; public Agent(String[] imageDesc, int z) { this.imageDesc = imageDesc; this.z = z; } public FastBigInt128 call() { Worker cmbfWorker = new Worker(); FastBigInt128 result = cmbfWorker.countInImage(imageDesc); System.out.println("\tResult from task #" + z + ", " + "\tvalue: " + result); return result; } }
In order to perform computations, we had to divide problem into small tasks and submit them into executor service:
for (int i = 1; i <= n; i++) { for (final String[] imageDesc : cmbfWorker.generateImages(i, level)) { Future<FastBigInt128> future = executorService.submit(new Agent(imageDesc, ++count)); tasks.add(future); } }
After that we had to gather results:
for (Future<FastBigInt128> w : tasks) { results.add(w.get()); }
You can find all the above code in our code repository: http://dacframe.org/lab
Results
You can see all the results with std deviation and average values on the following tables:
Test case I - using GridTasks
GridGain 2.1.1 <-->| Average | 343 845.50 | 321 870.70 | 2 080 172.00 |
| Std Deviation | 3 726.36 | 3 223.44 | 16 979.01 |
| Tasks: 341 | Tasks: 2705 | Tasks: 33700 | |
|---|---|---|---|
| 345 910 | 320 765 | 2 061 685 | |
| 343 918 | 318 508 | 2 101 760 | |
| 342 872 | 321 696 | 2 069 198 | |
| 344 427 | 320 416 | 2 086 688 | |
| 344 607 | 321 002 | 2 077 271 | |
| 346 853 | 321 822 | 2 073 270 | |
| 344 891 | 321 645 | 2 070 147 | |
| 344 304 | 321 304 | 2 099 394 | |
| 333 861 | 330 634 | 2 057 745 | |
| 346 812 | 320 915 | 2 104 562 |
As you can see on the above table, std deviation is quite big for 33700 tasks test case. This means, that some sort of task pre-fetching took place and the load balancer could be improved. Moreover, significant growth of number of tasks to compute (from 341 to 33700) also increased the time of computation -> this is a known edge-case problem with "siblings" explosion.
Test case II - using ExecutorService
GridGain 2.1.1 ExecutorService <-->| Average | 372 279.70 | 338 310.40 | 350 744.00 |
| Std Deviation | 19 214.47 | 8 051.70 | 4 559.89 |
| Tasks: 341 | Tasks: 2705 | Tasks: 33700 | |
|---|---|---|---|
| 367 343 | 339 679 | 349 300 | |
| 382 181 | 322 160 | 344 902 | |
| 346 471 | 341 664 | 354 741 | |
| 405 436 | 351 684 | 349 322 | |
| 350 801 | 329 408 | 347 721 | |
| 354 800 | 337 101 | 358 790 | |
| 391 611 | 339 180 | 352 707 | |
| 368 977 | 336 430 | 352 219 | |
| 366 933 | 344 230 | 353 664 | |
| 388 244 | 341 568 | 344 074 |
As you can see on the above table, std deviation is quite big in all test cases. This means, that some sort of task pre-fetching took place and the load balancer could be improved. However, significant growth of number of tasks to compute (from 341 to 33700) didn't increase the time of computation -> margin for communication is small.
CPU
CPU usage (%user and %system) gathered on the intel1 machine:
Memory
Memory usage gathered on the intel1 machine:
Network
Network usage (received and transmitted bytes/s) gathered on the intel1 machine:
Attachments
-
GridGain-test-env.png
(13.4 KB) - added by jeremian
2 years ago.
GridGain-test-env
-
GridGain_2_2_1-cpu.png
(28.1 KB) - added by klider
2 years ago.
GridGain?-cpu
-
GridGain_2_2_1-memory.png
(25.2 KB) - added by klider
2 years ago.
GridGain?-memory
-
GridGain_2_2_1-network.png
(35.6 KB) - added by klider
2 years ago.
GridGain?-network



