You are asked to develop a replicator (client) that distributes a large job over a number of computers (a server group) on a single switched LAN(should work on Linux background) . In this, a large (simulation) job can be divided into a number of small jobs, each of which can be assigned to one machine from the server group for execution. The execution results of the small jobs can be merged once all of them successfully terminate.
Hello
I have developed already such system. In my case all work is going over linux vps servers. Right now I have 15 of them. Each one may run 3 predefined tasks. I know about each vps if it exactly does work. When something goes wrong - I go to broken vps to get it back to alive.
The tasks may update used scripts; so if the script on main server is newer than one used by each vps -- it download new one for further use.
Control website uses special panel where it's easy turn on / off task loading on each vps, to increase or decrease the power of farm.
Requirements: one main server to get results of every task, in my case I use one powerful vps with big ram and 4 cpu cores. One or more small vps need to co-ordinate processing between other vpss.
I think your task is something similar. You may describe it better in chat message box.
Thanks
Have a nice day