Skip to main content

Sparklyr

What is Sparklyr?

Sparklyr is an open-source package that bridges the gap between R and Apache Spark, allowing you to harness Spark's distributed data capabilities within a modern R environment. This powerful tool enables interactive manipulation of large datasets, making it ideal for data manipulation, analysis, and visualization.

A key feature of Sparklyr is its ability to coordinate distributed machine learning from R using either Spark MLlib or H2O SparkingWater. Moreover, Sparklyr offers a comprehensive dplyr backend, enabling you to filter and aggregate Spark datasets and import them into R for further analysis.

The extensibility of Sparklyr allows for the creation of extensions that access the full Spark API and provide interfaces to Spark packages. It can also load data into Spark DataFrames from various sources, such as local R data frames, Hive tables, CSV, JSON, and Parquet files.

Regardless of whether you're working with local Spark instances or remote Spark clusters, Sparklyr serves as the perfect tool for interacting with large datasets in an interactive setting.

ON THIS PAGE