mars.dataframe.DataFrame

class mars.dataframe.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False, chunk_size=None, gpu=None, sparse=None, num_partitions=None)[source]
__init__(data=None, index=None, columns=None, dtype=None, copy=False, chunk_size=None, gpu=None, sparse=None, num_partitions=None)[source]

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__([data, index, columns, dtype, …])

Initialize self.

abs()

add(other[, axis, level, fill_value])

Get Addition of dataframe and other, element-wise (binary operator add).

agg([func, axis])

aggregate([func, axis])

all([axis, bool_only, skipna, level, …])

any([axis, bool_only, skipna, level, …])

append(other[, ignore_index, …])

apply(func[, axis, raw, result_type, args, …])

Apply a function along an axis of the DataFrame.

astype(dtype[, copy, errors])

Cast a pandas object to a specified dtype dtype.

backfill([axis, inplace, limit, downcast])

Synonym for DataFrame.fillna() with method='bfill'.

bfill([axis, inplace, limit, downcast])

Synonym for DataFrame.fillna() with method='bfill'.

cartesian_chunk(right, func[, args])

copy()

copy_from(obj)

copy_to(target)

corr([method, min_periods])

Compute pairwise correlation of columns, excluding NA/null values.

corrwith(other[, axis, drop, method])

Compute pairwise correlation.

count([axis, level, numeric_only, combine_size])

cummax([axis, skipna])

cummin([axis, skipna])

cumprod([axis, skipna])

cumsum([axis, skipna])

describe([percentiles, include, exclude])

diff([periods, axis])

First discrete difference of element.

div(other[, axis, level, fill_value])

Get Floating division of dataframe and other, element-wise (binary operator truediv).

dot(other)

Compute the matrix multiplication between the DataFrame and other.

drop([labels, axis, index, columns, level, …])

Drop specified labels from rows or columns.

drop_duplicates([subset, keep, inplace, …])

Return DataFrame with duplicate rows removed.

dropna([axis, how, thresh, subset, inplace])

Remove missing values.

duplicated([subset, keep, method])

Return boolean Series denoting duplicate rows.

eq(other[, axis, level])

Get Equal to of dataframe and other, element-wise (binary operator eq).

eval(expr[, inplace])

Evaluate a string describing operations on DataFrame columns.

ewm([com, span, halflife, alpha, …])

Provide exponential weighted functions.

execute([session])

expanding([min_periods, center, axis])

Provide expanding transformations.

explode(column[, ignore_index])

Transform each element of a list-like to a row, replicating index values.

ffill([axis, inplace, limit, downcast])

Synonym for DataFrame.fillna() with method='ffill'.

fillna([value, method, axis, inplace, …])

Fill NA/NaN values using the specified method.

floordiv(other[, axis, level, fill_value])

Get Integer division of dataframe and other, element-wise (binary operator floordiv).

from_records(records, **kw)

from_tensor(in_tensor[, index, columns])

ge(other[, axis, level])

Get Greater than or equal to of dataframe and other, element-wise (binary operator ge).

groupby([by, level, as_index, sort, group_keys])

gt(other[, axis, level])

Get Greater than of dataframe and other, element-wise (binary operator gt).

head([n])

Return the first n rows.

insert(loc, column, value[, allow_duplicates])

Insert column into DataFrame at specified location.

isin(values)

Whether each element in the DataFrame is contained in values.

isna()

Detect missing values.

isnull()

Detect missing values.

iterrows([batch_size, session])

Iterate over DataFrame rows as (index, Series) pairs.

itertuples([index, name, batch_size, session])

Iterate over DataFrame rows as namedtuples.

join(other[, on, how, lsuffix, rsuffix, …])

keys()

Get the ‘info axis’ (see Indexing for more).

kurt([axis, skipna, level, numeric_only, …])

kurtosis([axis, skipna, level, …])

le(other[, axis, level])

Get Less than or equal to of dataframe and other, element-wise (binary operator le).

lt(other[, axis, level])

Get Less than of dataframe and other, element-wise (binary operator lt).

map_chunk(func[, args])

Apply function to each chunk.

mask(cond[, other, inplace, axis, level, …])

Replace values where the condition is True.

max([axis, skipna, level, numeric_only, …])

mean([axis, skipna, level, numeric_only, …])

melt([id_vars, value_vars, var_name, …])

Unpivot a DataFrame from wide to long format, optionally leaving identifiers set.

memory_usage([index, deep])

Return the memory usage of each column in bytes.

merge(right[, how, on, left_on, right_on, …])

min([axis, skipna, level, numeric_only, …])

mod(other[, axis, level, fill_value])

Get Modulo of dataframe and other, element-wise (binary operator mod).

mul(other[, axis, level, fill_value])

Get Multiplication of dataframe and other, element-wise (binary operator mul).

multiply(other[, axis, level, fill_value])

Get Multiplication of dataframe and other, element-wise (binary operator mul).

ne(other[, axis, level])

Get Not equal to of dataframe and other, element-wise (binary operator ne).

notna()

Detect existing (non-missing) values.

notnull()

Detect existing (non-missing) values.

nunique([axis, dropna, combine_size])

Count distinct observations over requested axis.

pad([axis, inplace, limit, downcast])

Synonym for DataFrame.fillna() with method='ffill'.

pct_change([periods, fill_method, limit, freq])

Percentage change between the current and a prior element.

pop(item)

Return item and drop from frame.

pow(other[, axis, level, fill_value])

Get Exponential power of dataframe and other, element-wise (binary operator pow).

prod([axis, skipna, level, min_count, …])

product([axis, skipna, level, min_count, …])

quantile([q, axis, numeric_only, interpolation])

Return values at the given quantile over requested axis.

query(expr[, inplace])

Query the columns of a DataFrame with a boolean expression.

radd(other[, axis, level, fill_value])

Get Addition of dataframe and other, element-wise (binary operator radd).

rdiv(other[, axis, level, fill_value])

Get Floating division of dataframe and other, element-wise (binary operator rtruediv).

rebalance([factor, axis, num_partitions, …])

Make Data more balanced across entire cluster.

rechunk(chunk_size[, threshold, …])

reindex(*args, **kwargs)

Conform Series/DataFrame to new index with optional filling logic.

reindex_like(other[, method, copy, limit, …])

Return an object with matching indices as other object.

rename([mapper, index, columns, axis, copy, …])

Alter axes labels.

rename_axis([mapper, index, columns, axis, …])

Set the name of the axis for the index or columns.

replace([to_replace, value, inplace, limit, …])

Replace values given in to_replace with value.

reset_index([level, drop, inplace, …])

Reset the index, or a level of it.

rfloordiv(other[, axis, level, fill_value])

Get Integer division of dataframe and other, element-wise (binary operator rfloordiv).

rmod(other[, axis, level, fill_value])

Get Modulo of dataframe and other, element-wise (binary operator rmod).

rmul(other[, axis, level, fill_value])

Get Multiplication of dataframe and other, element-wise (binary operator rmul).

rolling(window[, min_periods, center, …])

Provide rolling window calculations.

round([decimals])

Round a DataFrame to a variable number of decimal places.

rpow(other[, axis, level, fill_value])

Get Exponential power of dataframe and other, element-wise (binary operator rpow).

rsub(other[, axis, level, fill_value])

Get Subtraction of dataframe and other, element-wise (binary operator rsubtract).

rtruediv(other[, axis, level, fill_value])

Get Floating division of dataframe and other, element-wise (binary operator rtruediv).

sample([n, frac, replace, weights, …])

Return a random sample of items from an axis of object.

select_dtypes([include, exclude])

Return a subset of the DataFrame’s columns based on the column dtypes.

sem([axis, skipna, level, ddof, …])

set_axis(labels[, axis, inplace])

Assign desired index to given axis.

set_index(keys[, drop, append, inplace, …])

shift([periods, freq, axis, fill_value])

Shift index by desired number of periods with an optional time freq.

skew([axis, skipna, level, numeric_only, …])

sort_index([axis, level, ascending, …])

Sort object by labels (along an axis).

sort_values(by[, axis, ascending, inplace, …])

Sort by the values along either axis.

stack([level, dropna])

Stack the prescribed level(s) from columns to index.

std([axis, skipna, level, ddof, …])

sub(other[, axis, level, fill_value])

Get Subtraction of dataframe and other, element-wise (binary operator subtract).

sum([axis, skipna, level, min_count, …])

tail([n])

Return the last n rows.

tiles()

to_cpu()

to_csv(path[, sep, na_rep, float_format, …])

Write object to a comma-separated values (csv) file.

to_gpu()

to_pandas([session])

to_parquet(path[, engine, compression, …])

Write a DataFrame to the binary parquet format, each chunk will be written to a Parquet file.

to_sql(name, con[, schema, if_exists, …])

Write records stored in a DataFrame to a SQL database.

to_tensor()

to_vineyard([vineyard_socket])

transform(func[, axis, dtypes])

Call func on self producing a DataFrame with transformed values.

truediv(other[, axis, level, fill_value])

Get Floating division of dataframe and other, element-wise (binary operator truediv).

tshift([periods, freq, axis])

Shift the time index, using the index’s frequency if available.

var([axis, skipna, level, ddof, …])

where(cond[, other, inplace, axis, level, …])

Replace values where the condition is False.

Attributes

at

Access a single value for a row/column label pair.

columns

data

dtypes

Return the dtypes in the DataFrame.

iat

iloc

index

loc

ndim

Return an int representing the number of axes / array dimensions.

shape

size

type_name

values