Mars RemoteΒΆ

Mars remote provides a simple but powerful way to execute Python functions in parallel.

Assume we have the code below.

>>> def add_one(x):
>>>     return x + 1
>>>
>>> def sum_all(xs):
>>>     return sum(xs)
>>>
>>> x_list = []
>>> for i in range(10):
>>>     x_list.append(add_one(i))
>>>
>>> print(sum_all(x_list))
55

Here we call add_one 10 times, then call sum_all to get the summation.

In order to make 10 add_one running in parallel, we can rewrite the code as below.

>>> import mars.remote as mr
>>>
>>> def add_one(x):
>>>     return x + 1
>>>
>>> def sum_all(xs):
>>>     return sum(xs)
>>>
>>> x_list = []
>>> for i in range(10):
>>>    x_list.append(mr.spawn(add_one, args=(i,)))
>>> print(mr.spawn(sum_all, args=(x_list,)).execute().fetch())
55

The code is quite similar with the previous one, except that calls to add_one and sum_all is replaced by mars.remote.spawn. mars.remote.spawn does not trigger execution, but instead returns a Mars Object, and the object can be passed to another mars.remote.spawn as an argument. Once .execute() is triggered, the 10 add_one will run in parallel. Once they were finished, sum_all will be triggered. Mars can handle the dependencies correctly, and for the distributed setting, Users need not to worry about the data movements between different workers, Mars can handle them automatically.

Refer to guidance for Mars remote for more information.