pyspark.pandas.concat¶
-
pyspark.pandas.concat(objs: List[Union[pyspark.pandas.frame.DataFrame, pyspark.pandas.series.Series]], axis: Union[int, str] = 0, join: str = 'outer', ignore_index: bool = False, sort: bool = False) → Union[pyspark.pandas.series.Series, pyspark.pandas.frame.DataFrame][source]¶ Concatenate pandas-on-Spark objects along a particular axis with optional set logic along the other axes.
- Parameters
- objsa sequence of Series or DataFrame
Any None objects will be dropped silently unless they are all None in which case a ValueError will be raised
- axis{0/’index’, 1/’columns’}, default 0
The axis to concatenate along.
- join{‘inner’, ‘outer’}, default ‘outer’
How to handle indexes on other axis (or axes).
- ignore_indexbool, default False
If True, do not use the index values along the concatenation axis. The resulting axis will be labeled 0, …, n - 1. This is useful if you are concatenating objects where the concatenation axis does not have meaningful indexing information. Note the index values on the other axes are still respected in the join.
- sortbool, default False
Sort non-concatenation axis if it is not already aligned.
- Returns
- object, type of objs
When concatenating all
Seriesalong the index (axis=0), aSeriesis returned. Whenobjscontains at least oneDataFrame, aDataFrameis returned. When concatenating along the columns (axis=1), aDataFrameis returned.
See also
Series.appendConcatenate Series.
DataFrame.joinJoin DataFrames using indexes.
DataFrame.mergeMerge DataFrames by indexes or columns.
Examples
>>> from pyspark.pandas.config import set_option, reset_option >>> set_option("compute.ops_on_diff_frames", True)
Combine two
Series.>>> s1 = ps.Series(['a', 'b']) >>> s2 = ps.Series(['c', 'd']) >>> ps.concat([s1, s2]) 0 a 1 b 0 c 1 d dtype: object
Clear the existing index and reset it in the result by setting the
ignore_indexoption toTrue.>>> ps.concat([s1, s2], ignore_index=True) 0 a 1 b 2 c 3 d dtype: object
Combine two
DataFrameobjects with identical columns.>>> df1 = ps.DataFrame([['a', 1], ['b', 2]], ... columns=['letter', 'number']) >>> df1 letter number 0 a 1 1 b 2 >>> df2 = ps.DataFrame([['c', 3], ['d', 4]], ... columns=['letter', 'number']) >>> df2 letter number 0 c 3 1 d 4
>>> ps.concat([df1, df2]) letter number 0 a 1 1 b 2 0 c 3 1 d 4
Combine
DataFrameandSeriesobjects with different columns.>>> ps.concat([df2, s1]) letter number 0 0 c 3.0 None 1 d 4.0 None 0 None NaN a 1 None NaN b
Combine
DataFrameobjects with overlapping columns and return everything. Columns outside the intersection will be filled withNonevalues.>>> df3 = ps.DataFrame([['c', 3, 'cat'], ['d', 4, 'dog']], ... columns=['letter', 'number', 'animal']) >>> df3 letter number animal 0 c 3 cat 1 d 4 dog
>>> ps.concat([df1, df3]) letter number animal 0 a 1 None 1 b 2 None 0 c 3 cat 1 d 4 dog
Sort the columns.
>>> ps.concat([df1, df3], sort=True) animal letter number 0 None a 1 1 None b 2 0 cat c 3 1 dog d 4
Combine
DataFrameobjects with overlapping columns and return only those that are shared by passinginnerto thejoinkeyword argument.>>> ps.concat([df1, df3], join="inner") letter number 0 a 1 1 b 2 0 c 3 1 d 4
>>> df4 = ps.DataFrame([['bird', 'polly'], ['monkey', 'george']], ... columns=['animal', 'name'])
Combine with column axis.
>>> ps.concat([df1, df4], axis=1) letter number animal name 0 a 1 bird polly 1 b 2 monkey george
>>> reset_option("compute.ops_on_diff_frames")