Presto

字数 352 · 2021-04-23

Presto is an open source distributed SQL query engine.

Install

brew install presto

Coordinators communicate with workers and clients using a REST API.

Worker nodes fetch data from connectors and exchange intermediate data with each other.

You can think of a connector the same way you think of a driver for a database.

A Presto catalog contains schemas and references a data source via a connector.

Schemas are a way to organize tables.

Together, a catalog and schema define a set of tables that can be queried.

This is the same as in any relational database.

The mapping from source data to tables is defined by the connector.

Presto executes SQL statements and turns these statements into queries that are executed across a distributed cluster of coordinator and workers.

When Presto executes a query, it does so by breaking up the execution into a hierarchy of stages.

A stage is implemented as a series of tasks distributed over a network of Presto workers.

Tasks operate on splits which are sections of a larger data set.

Tasks contain one or more parallel drivers.

Drivers act upon data and combine operators to produce output that is then aggregated by a task and then delivered to another task in another stage.

An operator consumes, transforms and produces data.

Exchanges transfer data between Presto nodes for different stages of a query.

> <
>= <=
= <> !=