Presto

字数 352 · 2021-04-23

Presto is an open source distributed SQL query engine.

Install

1
brew install presto

Concepts

Server Types

Coordinator

  • parsing statements
  • planning queries
  • managing Presto worker nodes

Coordinators communicate with workers and clients using a REST API.

Worker

  • executing tasks
  • processing data

Worker nodes fetch data from connectors and exchange intermediate data with each other.

Data Sources

Connector

You can think of a connector the same way you think of a driver for a database.

Catalog

A Presto catalog contains schemas and references a data source via a connector.

Schema

Schemas are a way to organize tables.

Together, a catalog and schema define a set of tables that can be queried.

Table

This is the same as in any relational database.

The mapping from source data to tables is defined by the connector.

Query Execution Model

Presto executes SQL statements and turns these statements into queries that are executed across a distributed cluster of coordinator and workers.

Statement

Query

Stage

When Presto executes a query, it does so by breaking up the execution into a hierarchy of stages.

Task

A stage is implemented as a series of tasks distributed over a network of Presto workers.

Split

Tasks operate on splits which are sections of a larger data set.

Driver

Tasks contain one or more parallel drivers.

Drivers act upon data and combine operators to produce output that is then aggregated by a task and then delivered to another task in another stage.

Operator

An operator consumes, transforms and produces data.

Exchange

Exchanges transfer data between Presto nodes for different stages of a query.

Functions & Operators

Logical

  • AND
  • OR
  • NOT

Comparison

> <
>= <=
= <> !=