Presto
is an open source distributed SQL query engine.
Install
1
brew install presto
Concepts
Server Types
Coordinator
- parsing statements
- planning queries
- managing Presto worker nodes
Coordinators communicate with workers and clients using a REST API
.
Worker
- executing tasks
- processing data
Worker nodes fetch data from connectors and exchange intermediate data with each other.
Data Sources
Connector
You can think of a connector the same way you think of a driver for a database.
Catalog
A Presto catalog contains schemas and references a data source via a connector.
Schema
Schemas are a way to organize tables.
Together, a catalog and schema define a set of tables that can be queried.
Table
This is the same as in any relational database.
The mapping from source data to tables is defined by the connector
.
Query Execution Model
Presto
executes SQL
statements and turns these statements into queries that are executed across a distributed cluster of coordinator and workers.
Statement
Query
Stage
When Presto
executes a query, it does so by breaking up the execution into a hierarchy of stages.
Task
A stage is implemented as a series of tasks distributed over a network of Presto workers.
Split
Tasks operate on splits which are sections of a larger data set.
Driver
Tasks contain one or more parallel drivers.
Drivers act upon data and combine operators to produce output that is then aggregated by a task and then delivered to another task in another stage.
Operator
An operator consumes, transforms and produces data.
Exchange
Exchanges transfer data between Presto nodes for different stages of a query.
Functions & Operators
Logical
AND
OR
NOT
Comparison
>
<
>=
<=
=
<>
!=