Presto is an open source distributed SQL query engine.
Install
1
brew install presto
Concepts
Server Types
Coordinator
- parsing statements
- planning queries
- managing Presto worker nodes
Coordinators communicate with workers and clients using a REST API.
Worker
- executing tasks
- processing data
Worker nodes fetch data from connectors and exchange intermediate data with each other.
Data Sources
Connector
You can think of a connector the same way you think of a driver for a database.
Catalog
A Presto catalog contains schemas and references a data source via a connector.
Schema
Schemas are a way to organize tables.
Together, a catalog and schema define a set of tables that can be queried.
Table
This is the same as in any relational database.
The mapping from source data to tables is defined by the connector.
Query Execution Model
Presto executes SQL statements and turns these statements into queries that are executed across a distributed cluster of coordinator and workers.
Statement
Query
Stage
When Presto executes a query, it does so by breaking up the execution into a hierarchy of stages.
Task
A stage is implemented as a series of tasks distributed over a network of Presto workers.
Split
Tasks operate on splits which are sections of a larger data set.
Driver
Tasks contain one or more parallel drivers.
Drivers act upon data and combine operators to produce output that is then aggregated by a task and then delivered to another task in another stage.
Operator
An operator consumes, transforms and produces data.
Exchange
Exchanges transfer data between Presto nodes for different stages of a query.
Functions & Operators
Logical
ANDORNOT
Comparison
> <
>= <=
= <> !=