Skip to main content

Spark SQL

What is Spark SQL?‚Äč

Spark SQL is a powerful module for structured data processing that provides native support for SQL in Spark. It enables users to run unmodified Hadoop Hive queries up to 100x faster on existing deployments and data, while also providing seamless integration with the rest of the Spark ecosystem. With Spark SQL, developers can easily import relational data from Parquet files and Hive tables, run SQL queries over imported data and existing RDDs, and write RDDs out to Hive tables or Parquet files. Additionally, Spark SQL includes a cost-based optimizer, columnar storage, and code generation to make queries fast and scalable to thousands of nodes. If you're a data scientist, analyst, or general business intelligence user looking to explore data through interactive SQL queries, Spark SQL is an essential tool to have in your toolkit.

ON THIS PAGE