Scaling ClickHouse Vertically

Last updated:

🌇 Sunsetting Kubernetes Deployments

This page covers our PostHog Kubernetes deployment, which we are currently in the process of sunsetting. Existing customers will receive support until May 31, 2023 and we will continue to provide security updates for the next year.

For existing customers
We highly recommend migrating to PostHog Cloud (US or EU). Take a look at this guide for more information on the migration process.
Looking to continue self-hosting?
We still maintain our Open-source Docker Compose deployment. Instructions for deploying can be found here.

How to scale ClickHouse vertically

Currently the easiest way to scale up a ClickHouse environment hosted by our helm chart config is to set the affinity for which node ClickHouse is deployed to and scale that node up in terms of the resources it has available to it. This is very easy to do in practice. Let's get down to the nuts and bolts of how to get this done!

  • Create a node instance or group with more CPU and memory in your K8s cluster with a label of clickhouse:true set on it (this will be used to target that node for ClickHouse deployment). There are a few ways to create a node group and most are implementation specific to your kubernetes platform. A few references for how to create an manage node groups can be found for GKE, EKS, and DigitalOcean.
    • Essentially if you know the node that you want ClickHouse to be installed on you can run kubectl label nodes <desired-clickhouse-node-name> clickhouse=true
    • To restrict other pods from not using that node we can add a taint via kubectl taint nodes <desired-clickhouse-node-name> dedicated=clickhouse:NoSchedule
  • Update your values.yaml:
clickhouse:
nodeSelector:
clickhouse: "true"
tolerations:
- key: "dedicated"
value: "clickhouse"
operator: "Equal"
effect: "NoSchedule"
  • You might need to trigger the reallocation for the clickhouse pod, e.g. run kubectl delete pod chi-posthog-posthog-0-0-0

You can find more information about optional settings like that here and also more about nodeSelectors and taints and tolerations.

Questions?

Was this page useful?

Next article

Kafka

Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications. At PostHog we mainly use it to stream events from our ingestion pipeline to ClickHouse. Dictionary broker : a cluster is built by one or more servers. The servers forming the storage layer are called brokers event : an event records the fact that "something happened" in the world or in…

Read next article