_images/mars-logo-title.png

Mars is a tensor-based unified framework for large-scale data computation.

Mars tensor

documentation

Mars tensor provides a familiar interface like Numpy.

Numpy Mars tensor
import numpy as np
a = np.random.rand(1000, 2000)
(a + 1).sum(axis=1)
import mars.tensor as mt
a = mt.random.rand(1000, 2000)
(a + 1).sum(axis=1).execute()

Mars dataframe

documentation

Mars DataFrame provides a familiar interface like pandas.

Pandas Mars DataFrame
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.rand(100000000, 4),
                  columns=list('abcd'))
print(df.sum())
import mars.tensor as mt
import mars.dataframe as md
df = md.DataFrame(mt.random.rand(100000000, 4),
                  columns=list('abcd'))
print(df.sum().execute())

Mars learn

documentation

Mars learn provides a familiar interface like scikit-learn.

Scikit-learn Mars learn
from sklearn.datasets import make_blobs
from sklearn.decomposition import PCA
X, y = make_blobs(
    n_samples=100000000, n_features=3,
    centers=[[3, 3, 3], [0, 0, 0],
             [1, 1, 1], [2, 2, 2]],
    cluster_std=[0.2, 0.1, 0.2, 0.2],
    random_state=9)
pca = PCA(n_components=3)
pca.fit(X)
print(pca.explained_variance_ratio_)
print(pca.explained_variance_)
from mars.learn.datasets import make_blobs
from mars.learn.decomposition import PCA
X, y = make_blobs(
    n_samples=100000000, n_features=3,
    centers=[[3, 3, 3], [0, 0, 0],
              [1, 1, 1], [2, 2, 2]],
    cluster_std=[0.2, 0.1, 0.2, 0.2],
    random_state=9)
pca = PCA(n_components=3)
pca.fit(X)
print(pca.explained_variance_ratio_)
print(pca.explained_variance_)

Easy to scale in and scale out

Mars can scale in to a single machine, and scale out to a cluster with hundreds of machines. Both the local and distributed version share the same piece of code, it’s fairly simple to migrate from a single machine to a cluster due to the increase of data.