DateModifiedViewsCategory

Trino cluster is sensitive to memory setup. As Trino is developed in Java, Java is foundation to configure it. In many cases, Trino server is not started because of memory configuration. During Trino server launch, the validation rules are applied to make sure that major memory settings are consistent. It does not guarantee of cluster stability and performance, so spending time on initial memory setup can contribute to success of your cluster.

The article is based on CentOS 7 environment and Trino Starburst release 393-e.2. Since release 369-e, a breaking change to the memory configuration has happened. query.max-total-memory-per-node setting was removed in flavor of query.max-memory-per-node one.

Disable Linux swap

Trino assumes that memory swap is disabled and is not mounted.

The current swappiness setting can be retrieved.

cat /proc/sys/vm/swappiness

Turn off swappiness temporary.

sudo sysctl vm.swappiness=0

Turn off swappiness permanently changing vm.swappiness=0 setting in the file below.

sudo nano /etc/sysctl.conf

Swap memory information.

free -h

Output

              total        used        free      shared  buff/cache   available
Mem:           503G        182G        297G         10M         23G        319G
Swap:            0B          0B          0B

JVM configuration

The setting is defined in jvm.config file. It should be set up 70-80% of a server physical memory. If you are tough on resources, it can be at least 12GB less than physical memory. The more setting value is set up, the less stable system you get.

-Xmx480G

Default Trino configuration

The list of memory settings is below. If any value is skipped, it is taken as the default one.

query.max-memory-per-node
Default value: JVM max memory * 0.3
Description: Max amount of user memory a query can use on a worker.

query.max-memory
Default value: 20GB
Description: Max amount of user memory a query can use across the entire cluster.

query.max-total-memory
Default value: query.max-memory * 2
Description: Max amount of memory a query can use across the entire cluster, including revocable memory.

memory.heap-headroom-per-node
Default value: JVM max memory * 0.3
Description: Amount of memory set aside as headroom/buffer in the JVM heap for allocations that are not tracked by Trino.

Basic memory setup

  • Physical memory: 512GB
  • Workers: 10
  • JVM Xmx: physical memory * 70% = 358GB
  • query.max-memory-per-node: JVM Xmx * 0.5 = 179GB
  • memory.heap-headroom-per-node: 50GB
  • query.max-memory: workers * query.max-memory-per-node = 1,790GB

Highly concurrent memory setup

  • Physical memory: 512GB
  • Workers: 10
  • JVM Xmx: physical memory * 70% = 358GB
  • query.max-memory-per-node: JVM Xmx * 0.1 = 36GB
  • memory.heap-headroom-per-node = 50GB
  • query.max-memory: workers * query.max-memory-per-node = 360GB

Large data memory setup

  • Physical memory: 512GB
  • Workers: 10
  • JVM Xmx: physical memory * 80% = 410GB
  • query.max-memory-per-node: JVM Xmx * 0.7 = 287GB
  • memory.heap-headroom-per-node = 30GB
  • query.max-memory: workers * query.max-memory-per-node = 2,870GB

Validation rules

  • JVM Xmx > query.max-memory-per-node + memory.heap-headroom-per-node
  • query.max-total-memory > query.max-memory

Killer policy in case of out of memory

Out of memory (OOM) is customizable. The settings are.

  • query.low-memory-killer.policy
  • task.low-memory-killer.policy
  • query.low-memory-killer.delay

Spill to disk

OOM can be mitigated if spilling memory to disk is enabled. It does not cover all possible cases. Performance is heavily affected during spill-to-disk. The configuration file is config.properties. For example,

spill-enabled=true
spiller-spill-path=/mnt/trino/data/spill
max-spill-per-node=500GB
query-max-spill-per-node=200GB

Revocable memory

To reduce number of OOM failures and improve performance, the concept of revocable memory is explored. A query can be assigned more memory than outlined in memory settings when cluster is idle, but the additional memory can be revoked by the memory manager at any time. If memory is revoked, the spill to disk is the next step to solve shortage of memory.

Resources


Comments

comments powered by Disqus