Skip to content

Commit

Permalink
[xpk cpu] Increase cpu nodes on cluster creation (#33)
Browse files Browse the repository at this point in the history
* [xpk cpu] Setup multizonal cpu nodes on cluster creation

Multizonal cpu nodes has better performance for cluster creation and
tpu creation.

Once the cluster is created, the autoscaler will adjust the number of
nodes to fit the demand.

I tested both single zone cpu and multizone cpu setup and observed
better performance with multizone cpus with the same number of total nodes

I tested that small demand will reduce cpu nodes after cluster creation.

* Move to zonal cpu node pool after more testing
  • Loading branch information
Obliviour committed Dec 5, 2023
1 parent 93d66c0 commit 0efa2b0
Showing 1 changed file with 9 additions and 8 deletions.
17 changes: 9 additions & 8 deletions xpk.py
Original file line number Diff line number Diff line change
Expand Up @@ -698,13 +698,15 @@ def run_gke_cluster_create_command(args) -> int:
0 if successful and 1 otherwise.
"""

# Create the regional cluster with one CPU nodepool in the requested zone.
# Set the number of cpu nodes to start a 1 and auto-scale to fit the need.
# Create the regional cluster with `num-nodes` CPU nodes in the same zone as
# TPUs. This has been tested with clusters of 300 VMs. Larger clusters will
# benefit from a larger initial `--num-nodes`. After the cluster is created,
# the auto-scaler can reduce/increase the nodes based on the load.
command = (
'gcloud beta container clusters create'
f' {args.cluster} --release-channel rapid --enable-autoscaling'
f' --max-nodes 1000 --min-nodes 1 --node-locations={args.zone}'
' --num-nodes=1'
' --total-min-nodes 1 --total-max-nodes 1000 --num-nodes 6'
f' --node-locations={args.zone}'
f' --project={args.project} --region={zone_to_region(args.zone)}'
f' --cluster-version={args.gke_version} --location-policy=BALANCED'
f' --machine-type={args.cluster_cpu_machine_type}'
Expand Down Expand Up @@ -1883,11 +1885,10 @@ def directory_path_type(value):
cluster_create_optional_arguments.add_argument(
'--cluster-cpu-machine-type',
type=str,
default='e2-standard-4',
default='e2-standard-16',
help=(
'Set the machine tpu within the default cpu node pool. For zonal '
'clusters, make sure that the zone supports the machine type, and for '
'regional clusters, all zones in the region supports the machine type.'
'Set the machine tpu within the default cpu node pool. For'
' regional clusters, all zones must support the machine type.'
)
)
cluster_create_optional_arguments.add_argument(
Expand Down

0 comments on commit 0efa2b0

Please sign in to comment.