API Reference

This is the class and function reference of Mars learn.


Samples generator

datasets.make_blobs([n_samples, n_features, …]) Generate isotropic Gaussian blobs for clustering.
datasets.make_classification([n_samples, …]) Generate a random n-class classification problem.
datasets.make_low_rank_matrix([n_samples, …]) Generate a mostly low rank matrix with bell-shaped singular values

Matrix Decomposition



Classification metrics

metrics.accuracy_score(y_true, y_pred[, …]) Accuracy classification score.
metrics.auc(x, y[, session, run_kwargs]) Compute Area Under the Curve (AUC) using the trapezoidal rule
metrics.roc_curve(y_true, y_score[, …]) Compute Receiver operating characteristic (ROC)

Pairwise metrics

metrics.pairwise.cosine_similarity(X[, Y, …]) Compute cosine similarity between samples in X and Y.
metrics.pairwise.cosine_distances(X[, Y]) Compute cosine distance between samples in X and Y.
metrics.pairwise.euclidean_distances(X[, Y, …]) Considering the rows of X (and Y=X) as vectors, compute the distance matrix between each pair of vectors.
metrics.pairwise.haversine_distances(X[, Y]) Compute the Haversine distance between samples in X and Y
metrics.pairwise.manhattan_distances(X[, Y, …]) Compute the L1 distances between the vectors in X and Y.
metrics.pairwise.rbf_kernel(X[, Y, gamma]) Compute the rbf (gaussian) kernel between X and Y.
metrics.pairwise_distances(X[, Y, metric])

Splitter Functions

model_selection.train_test_split(*arrays, …) Split arrays or matrices into random train and test subsets

Nearest Neighbors


Preprocessing and Normalization

preprocessing.normalize(X[, norm, axis, …]) Scale input vectors individually to unit norm (vector length).

Semi-Supervised Learning



utils.assert_all_finite(X[, allow_nan, …])
utils.check_array(array[, accept_sparse, …]) Input validation on a tensor, list, sparse matrix or similar.
utils.check_consistent_length(*arrays[, …]) Check that all arrays have consistent first dimensions.
utils.multiclass.type_of_target(y) Determine the type of data indicated by the target.
utils.multiclass.is_multilabel(y) Check if y is in a multilabel format.
utils.shuffle(*arrays, **options)
utils.validation.column_or_1d(y[, warn]) Ravel column or 1d numpy array, else raises an error

TensorFlow Integration

contrib.tensorflow.run_tensorflow_script(…) Run TensorFlow script in Mars cluster.

XGBoost Integration

contrib.xgboost.MarsDMatrix(data[, label, …])
contrib.xgboost.train(params, dtrain[, evals]) Train XGBoost model in Mars manner.
contrib.xgboost.predict(model, data[, …])