Skip to content

Quickstart Colab does not succeed on a T4 #49

@gstranger

Description

@gstranger

To Reproduce

Use the currently linked train.ipynb to try training a policy. Running all the cells for setup and execution of the WalkingTask fails with the following error

WARNING:xax.task.base:Could not resolve task path for HumanoidWalkingTask, returning current working directory
WARNING 2025-06-13 22:00:23 [xax.task.base] Could not resolve task path for HumanoidWalkingTask, returning current working directory
INFO:xax.task.mixins.compile:Setting JAX logging level to INFO
  INFO  2025-06-13 22:00:24 [xax.task.mixins.compile] Setting JAX logging level to INFO
INFO:xax.task.mixins.compile:Setting JAX compilation cache directory to /root/.cache/jax/jaxcache
  INFO  2025-06-13 22:00:24 [xax.task.mixins.compile] Setting JAX compilation cache directory to /root/.cache/jax/jaxcache
INFO:xax.task.mixins.compile:Configuring JAX compilation cache parameters
  INFO  2025-06-13 22:00:24 [xax.task.mixins.compile] Configuring JAX compilation cache parameters
INFO:2025-06-13 22:00:25,080:jax._src.xla_bridge:924: Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig'
INFO:jax._src.xla_bridge:Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig'
  INFO  2025-06-13 22:00:25 [jax._src.xla_bridge] Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig'
INFO:2025-06-13 22:00:25,107:jax._src.xla_bridge:924: Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory
INFO:jax._src.xla_bridge:Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory
  INFO  2025-06-13 22:00:25 [jax._src.xla_bridge] Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory
WARNING:xax.task.mixins.artifacts:Could not resolve task path for HumanoidWalkingTask, returning current working directory
WARNING 2025-06-13 22:00:26 [xax.task.mixins.artifacts] Could not resolve task path for HumanoidWalkingTask, returning current working directory
STATUS:xax.task.mixins.artifacts:/content/humanoid_walking_task/run_0
 STATUS 2025-06-13 22:00:26 [xax.task.mixins.artifacts] /content/humanoid_walking_task/run_0
WARNING:xax.task.base:Could not resolve task path for %s, returning current working directory
WARNING 2025-06-13 22:00:26 [xax.task.base] Could not resolve task path for %s, returning current working directory
STATUS:xax.task.mixins.train:/content
 STATUS 2025-06-13 22:00:26 [xax.task.mixins.train] /content
WARNING:py.warnings:/usr/local/lib/python3.11/dist-packages/kscale/conf.py:44: UserWarning: Settings directory does not exist: /root/.kscale. Creating it now.
  warnings.warn(f"Settings directory does not exist: {dir_path}. Creating it now.")

WARNING 2025-06-13 22:00:26 [py.warnings] /usr/local/lib/python3.11/dist-packages/kscale/conf.py:44: UserWarning: Settings directory does not exist: /root/.kscale. Creating it now.
  warnings.warn(f"Settings directory does not exist: {dir_path}. Creating it now.")

STATUS:xax.task.mixins.train:humanoid_walking_task
 STATUS 2025-06-13 22:00:26 [xax.task.mixins.train] humanoid_walking_task
STATUS:xax.task.mixins.train:JAX devices: [CudaDevice(id=0)]
 STATUS 2025-06-13 22:00:26 [xax.task.mixins.train] JAX devices: [CudaDevice(id=0)]
WARNING:xax.task.base:Could not resolve task path for HumanoidWalkingTask, returning current working directory
WARNING 2025-06-13 22:00:26 [xax.task.base] Could not resolve task path for HumanoidWalkingTask, returning current working directory
INFO:httpx:HTTP Request: GET https://api.kscale.dev/robots/urdf/kbot "HTTP/1.1 200 OK"
  INFO  2025-06-13 22:00:27 [httpx] HTTP Request: GET https://api.kscale.dev/robots/urdf/kbot "HTTP/1.1 200 OK"
INFO:kscale.web.clients.robot_class:Downloading URDF file from https://kscale-www-production.s3.amazonaws.com/urdfs/a852021cad90fba8/robot.tgz?AWSAccessKeyId=ASIA2R4HRCAHS5LUR3TY&Signature=S59CHBBrSNFOSVOQv7MF9Mfn0Vk%3D&x-amz-security-token=...
  INFO  2025-06-13 22:00:27 [kscale.web.clients.robot_class] Downloading URDF file from https://kscale-www-production.s3.amazonaws.com/urdfs/a852021cad90fba8/robot.tgz?AWSAccessKeyId=ASIA2R4HRCAHS5LUR3TY&Signature=S59CHBBrSNFOSVOQv7MF9Mfn0Vk%3D&x-amz-security-token=...
INFO:httpx:HTTP Request: GET https://kscale-www-production.s3.amazonaws.com/urdfs/a852021cad90fba8/robot.tgz?AWSAccessKeyId=ASIA2R4HRCAHS5LUR3TY&Signature=S59CHBBrSNFOSVOQv7MF9Mfn0Vk%3D&x-amz-security-token=... "HTTP/1.1 200 OK"
  INFO  2025-06-13 22:00:28 [httpx] HTTP Request: GET https://kscale-www-production.s3.amazonaws.com/urdfs/a852021cad90fba8/robot.tgz?AWSAccessKeyId=ASIA2R4HRCAHS5LUR3TY&Signature=S59CHBBrSNFOSVOQv7MF9Mfn0Vk%3D&x-amz-security-token=... "HTTP/1.1 200 OK"
INFO:kscale.web.clients.robot_class:Checking MD5 hash of downloaded file
  INFO  2025-06-13 22:00:30 [kscale.web.clients.robot_class] Checking MD5 hash of downloaded file
INFO:kscale.web.clients.robot_class:Updating downloaded file information
  INFO  2025-06-13 22:00:30 [kscale.web.clients.robot_class] Updating downloaded file information
INFO:kscale.web.clients.robot_class:Unpacking URDF file
  INFO  2025-06-13 22:00:30 [kscale.web.clients.robot_class] Unpacking URDF file
INFO:kscale.web.clients.robot_class:Updating downloaded file information
  INFO  2025-06-13 22:00:30 [kscale.web.clients.robot_class] Updating downloaded file information
INFO:httpx:HTTP Request: GET https://api.kscale.dev/robots/name/kbot "HTTP/1.1 200 OK"
  INFO  2025-06-13 22:00:32 [httpx] HTTP Request: GET https://api.kscale.dev/robots/name/kbot "HTTP/1.1 200 OK"
INFO:xax.task.mixins.train:Starting a new training run
  INFO  2025-06-13 22:00:33 [xax.task.mixins.train] Starting a new training run
PING:ksim.task.rl:Model size: 1,090,861 parameters
  PING  2025-06-13 22:00:36 [ksim.task.rl] Model size: 1,090,861 parameters
PING:ksim.task.rl:Optimizer size: 2,181,722 parameters
  PING  2025-06-13 22:00:36 [ksim.task.rl] Optimizer size: 2,181,722 parameters
INFO:root:Using JAX default device: cuda:0.
  INFO  2025-06-13 22:00:36 [root] Using JAX default device: cuda:0.
INFO:root:MJX Warp is disabled via MJX_WARP_ENABLED=false.
  INFO  2025-06-13 22:00:36 [root] MJX Warp is disabled via MJX_WARP_ENABLED=false.
INFO:root:Using JAX default device: cuda:0.
  INFO  2025-06-13 22:00:40 [root] Using JAX default device: cuda:0.
INFO:root:MJX Warp is disabled via MJX_WARP_ENABLED=false.
  INFO  2025-06-13 22:00:40 [root] MJX Warp is disabled via MJX_WARP_ENABLED=false.

Status
 ✦ JAX devices: [CudaDevice(id=0)]
 ✦ humanoid_walking_task
 ✦ /content
 ✦ /content/humanoid_walking_task/run_0

Pings
 ✦ Optimizer size: 2,181,722 parameters
 ✦ Model size: 1,090,861 parameters
 ✦ Could not resolve task path for HumanoidWalkingTask, returning current working directory
 ✦ /usr/local/lib/python3.11/dist-packages/kscale/conf.py:44: UserWarning: Settings directory does not exist: /root/.kscale. Creating it now.
  warnings.warn(f"Settings directory does not exist: {dir_path}. Creating it now.")

 ✦ Could not resolve task path for %s, returning current working directory
 ✦ Could not resolve task path for HumanoidWalkingTask, returning current working directory
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
[<ipython-input-10-682409069>](https://omdsnnyzngh-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab_20250612-060058_RC00_770574881#) in <cell line: 0>()
      1 if __name__ == "__main__":
----> 2     HumanoidWalkingTask.launch(
      3         HumanoidWalkingTaskConfig(
      4             # Training parameters.
      5             num_envs=2048,

11 frames
    [... skipping hidden 7 frame]

    [... skipping hidden 1 frame]

    [... skipping hidden 15 frame]

[/usr/lib/python3.11/dataclasses.py](https://omdsnnyzngh-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab_20250612-060058_RC00_770574881#) in replace(obj, **changes)
   1501     # changes that aren't fields, this will correctly raise a
   1502     # TypeError.
-> 1503     return obj.__class__(**changes)

TypeError: Data.__init__() got an unexpected keyword argument 'cacc'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions