MapReduce Cloud Computing Question:
How does fault tolerance work in mapreduce?
Answer:
In a mapreduce job the master pings each worker periodically. In case a worker does not respond to that system then the system is marked as failed. Even completed tasks are rescheduled because the output was stored in a in a local disk of a worker which failed. Hence mapreduce is able to handle large-scale failures easily by simply restarting a task. The master node always saves itself at checkpoints and in case of any failure it simply restarts from that checkpoint.
Previous Question | Next Question |
Do you know how is MapReduce related to cloud computing? | Can you please explain in MapReduce what is a scarce system resource? |