Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- Generating train split: 100%|███| 37429/37429 [00:01<00:00, 28527.49 examples/s]
- wandb: Currently logged in as: spammmmm1997. Use `wandb login --relogin` to force relogin
- Traceback (most recent call last):
- File "/workspace/diffusers/examples/text_to_image/train_text_to_image.py", line 1074, in <module>
- main()
- File "/workspace/diffusers/examples/text_to_image/train_text_to_image.py", line 890, in main
- latents = vae.encode(batch["pixel_values"].to(weight_dtype)).latent_dist.sample()
- File "/usr/local/lib/python3.10/dist-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper
- return method(self, *args, **kwargs)
- File "/usr/local/lib/python3.10/dist-packages/diffusers/models/autoencoder_kl.py", line 260, in encode
- h = self.encoder(x)
- File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
- return self._call_impl(*args, **kwargs)
- File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
- return forward_call(*args, **kwargs)
- File "/usr/local/lib/python3.10/dist-packages/diffusers/models/vae.py", line 141, in forward
- sample = self.conv_in(sample)
- File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
- return self._call_impl(*args, **kwargs)
- File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
- return forward_call(*args, **kwargs)
- File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 460, in forward
- return self._conv_forward(input, self.weight, self.bias)
- File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 456, in _conv_forward
- return F.conv2d(input, weight, bias, self.stride,
- RuntimeError: "slow_conv2d_cpu" not implemented for 'Half'
- wandb: Tracking run with wandb version 0.16.1
- wandb: Run data is saved locally in /workspace/diffusers/examples/text_to_image/wandb/run-20231209_210203-cxbfvkj4
- wandb: Run `wandb offline` to turn off syncing.
- wandb: Syncing run celestial-frog-115
- wandb: ⭐️ View project at https://wandb.ai/spammmmm1997/text2image-fine-tune
- wandb: 🚀 View run at https://wandb.ai/spammmmm1997/text2image-fine-tune/runs/cxbfvkj4
- 12/09/2023 21:02:04 - INFO - __main__ - ***** Running training *****
- 12/09/2023 21:02:04 - INFO - __main__ - Num examples = 37429
- 12/09/2023 21:02:04 - INFO - __main__ - Num Epochs = 3847
- 12/09/2023 21:02:04 - INFO - __main__ - Instantaneous batch size per device = 184
- 12/09/2023 21:02:04 - INFO - __main__ - Total train batch size (w. parallel, distributed & accumulation) = 1472
- 12/09/2023 21:02:04 - INFO - __main__ - Gradient Accumulation steps = 1
- 12/09/2023 21:02:04 - INFO - __main__ - Total optimization steps = 100000
- Steps: 0%| | 0/100000 [00:00<?, ?it/s]Traceback (most recent call last):
- File "/workspace/diffusers/examples/text_to_image/train_text_to_image.py", line 1074, in <module>
- main()
- File "/workspace/diffusers/examples/text_to_image/train_text_to_image.py", line 890, in main
- latents = vae.encode(batch["pixel_values"].to(weight_dtype)).latent_dist.sample()
- File "/usr/local/lib/python3.10/dist-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper
- return method(self, *args, **kwargs)
- File "/usr/local/lib/python3.10/dist-packages/diffusers/models/autoencoder_kl.py", line 260, in encode
- h = self.encoder(x)
- File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
- return self._call_impl(*args, **kwargs)
- File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
- return forward_call(*args, **kwargs)
- File "/usr/local/lib/python3.10/dist-packages/diffusers/models/vae.py", line 141, in forward
- sample = self.conv_in(sample)
- File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
- return self._call_impl(*args, **kwargs)
- File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
- return forward_call(*args, **kwargs)
- File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 460, in forward
- return self._conv_forward(input, self.weight, self.bias)
- File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 456, in _conv_forward
- return F.conv2d(input, weight, bias, self.stride,
- RuntimeError: "slow_conv2d_cpu" not implemented for 'Half'
- Traceback (most recent call last):
- File "/workspace/diffusers/examples/text_to_image/train_text_to_image.py", line 1074, in <module>
- main()
- File "/workspace/diffusers/examples/text_to_image/train_text_to_image.py", line 890, in main
- latents = vae.encode(batch["pixel_values"].to(weight_dtype)).latent_dist.sample()
- File "/usr/local/lib/python3.10/dist-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper
- return method(self, *args, **kwargs)
- File "/usr/local/lib/python3.10/dist-packages/diffusers/models/autoencoder_kl.py", line 260, in encode
- h = self.encoder(x)
- File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
- return self._call_impl(*args, **kwargs)
- File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
- return forward_call(*args, **kwargs)
- File "/usr/local/lib/python3.10/dist-packages/diffusers/models/vae.py", line 141, in forward
- sample = self.conv_in(sample)
- File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
- return self._call_impl(*args, **kwargs)
- File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
- return forward_call(*args, **kwargs)
- File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 460, in forward
- return self._conv_forward(input, self.weight, self.bias)
- File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 456, in _conv_forward
- return F.conv2d(input, weight, bias, self.stride,
- RuntimeError: "slow_conv2d_cpu" not implemented for 'Half'
- Traceback (most recent call last):
- File "/workspace/diffusers/examples/text_to_image/train_text_to_image.py", line 1074, in <module>
- main()
- File "/workspace/diffusers/examples/text_to_image/train_text_to_image.py", line 890, in main
- latents = vae.encode(batch["pixel_values"].to(weight_dtype)).latent_dist.sample()
- File "/usr/local/lib/python3.10/dist-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper
- return method(self, *args, **kwargs)
- File "/usr/local/lib/python3.10/dist-packages/diffusers/models/autoencoder_kl.py", line 260, in encode
- h = self.encoder(x)
- File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
- return self._call_impl(*args, **kwargs)
- File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
- return forward_call(*args, **kwargs)
- File "/usr/local/lib/python3.10/dist-packages/diffusers/models/vae.py", line 141, in forward
- sample = self.conv_in(sample)
- File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
- return self._call_impl(*args, **kwargs)
- File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
- return forward_call(*args, **kwargs)
- File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 460, in forward
- return self._conv_forward(input, self.weight, self.bias)
- File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 456, in _conv_forward
- return F.conv2d(input, weight, bias, self.stride,
- RuntimeError: "slow_conv2d_cpu" not implemented for 'Half'
- Traceback (most recent call last):
- File "/workspace/diffusers/examples/text_to_image/train_text_to_image.py", line 1074, in <module>
- main()
- File "/workspace/diffusers/examples/text_to_image/train_text_to_image.py", line 890, in main
- latents = vae.encode(batch["pixel_values"].to(weight_dtype)).latent_dist.sample()
- File "/usr/local/lib/python3.10/dist-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper
- return method(self, *args, **kwargs)
- File "/usr/local/lib/python3.10/dist-packages/diffusers/models/autoencoder_kl.py", line 260, in encode
- h = self.encoder(x)
- File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
- return self._call_impl(*args, **kwargs)
- File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
- return forward_call(*args, **kwargs)
- File "/usr/local/lib/python3.10/dist-packages/diffusers/models/vae.py", line 141, in forward
- sample = self.conv_in(sample)
- File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
- return self._call_impl(*args, **kwargs)
- File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
- return forward_call(*args, **kwargs)
- File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 460, in forward
- return self._conv_forward(input, self.weight, self.bias)
- File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 456, in _conv_forward
- return F.conv2d(input, weight, bias, self.stride,
- RuntimeError: "slow_conv2d_cpu" not implemented for 'Half'
- Traceback (most recent call last):
- File "/workspace/diffusers/examples/text_to_image/train_text_to_image.py", line 1074, in <module>
- main()
- File "/workspace/diffusers/examples/text_to_image/train_text_to_image.py", line 890, in main
- latents = vae.encode(batch["pixel_values"].to(weight_dtype)).latent_dist.sample()
- File "/usr/local/lib/python3.10/dist-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper
- return method(self, *args, **kwargs)
- File "/usr/local/lib/python3.10/dist-packages/diffusers/models/autoencoder_kl.py", line 260, in encode
- h = self.encoder(x)
- File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
- return self._call_impl(*args, **kwargs)
- File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
- return forward_call(*args, **kwargs)
- File "/usr/local/lib/python3.10/dist-packages/diffusers/models/vae.py", line 141, in forward
- sample = self.conv_in(sample)
- File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
- return self._call_impl(*args, **kwargs)
- File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
- return forward_call(*args, **kwargs)
- File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 460, in forward
- return self._conv_forward(input, self.weight, self.bias)
- File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 456, in _conv_forward
- return F.conv2d(input, weight, bias, self.stride,
- RuntimeError: "slow_conv2d_cpu" not implemented for 'Half'
- Traceback (most recent call last):
- File "/workspace/diffusers/examples/text_to_image/train_text_to_image.py", line 1074, in <module>
- main()
- File "/workspace/diffusers/examples/text_to_image/train_text_to_image.py", line 890, in main
- latents = vae.encode(batch["pixel_values"].to(weight_dtype)).latent_dist.sample()
- File "/usr/local/lib/python3.10/dist-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper
- return method(self, *args, **kwargs)
- File "/usr/local/lib/python3.10/dist-packages/diffusers/models/autoencoder_kl.py", line 260, in encode
- h = self.encoder(x)
- File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
- return self._call_impl(*args, **kwargs)
- File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
- return forward_call(*args, **kwargs)
- File "/usr/local/lib/python3.10/dist-packages/diffusers/models/vae.py", line 141, in forward
- sample = self.conv_in(sample)
- File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
- return self._call_impl(*args, **kwargs)
- File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
- return forward_call(*args, **kwargs)
- File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 460, in forward
- return self._conv_forward(input, self.weight, self.bias)
- File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 456, in _conv_forward
- return F.conv2d(input, weight, bias, self.stride,
- RuntimeError: "slow_conv2d_cpu" not implemented for 'Half'
- Traceback (most recent call last):
- File "/workspace/diffusers/examples/text_to_image/train_text_to_image.py", line 1074, in <module>
- main()
- File "/workspace/diffusers/examples/text_to_image/train_text_to_image.py", line 890, in main
- latents = vae.encode(batch["pixel_values"].to(weight_dtype)).latent_dist.sample()
- File "/usr/local/lib/python3.10/dist-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper
- return method(self, *args, **kwargs)
- File "/usr/local/lib/python3.10/dist-packages/diffusers/models/autoencoder_kl.py", line 260, in encode
- h = self.encoder(x)
- File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
- return self._call_impl(*args, **kwargs)
- File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
- return forward_call(*args, **kwargs)
- File "/usr/local/lib/python3.10/dist-packages/diffusers/models/vae.py", line 141, in forward
- sample = self.conv_in(sample)
- File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
- return self._call_impl(*args, **kwargs)
- File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
- return forward_call(*args, **kwargs)
- File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 460, in forward
- return self._conv_forward(input, self.weight, self.bias)
- File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 456, in _conv_forward
- return F.conv2d(input, weight, bias, self.stride,
- RuntimeError: "slow_conv2d_cpu" not implemented for 'Half'
- [2023-12-09 21:02:09,165] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 917 closing signal SIGTERM
- [2023-12-09 21:02:09,531] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 1 (pid: 918) of binary: /usr/bin/python
- Traceback (most recent call last):
- File "/usr/local/bin/accelerate", line 8, in <module>
- sys.exit(main())
- File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py", line 47, in main
- args.func(args)
- File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 1008, in launch_command
- multi_gpu_launcher(args)
- File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 666, in multi_gpu_launcher
- distrib_run.run(args)
- File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 797, in run
- elastic_launch(
- File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 134, in __call__
- return launch_agent(self._config, self._entrypoint, list(args))
- File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
- raise ChildFailedError(
- torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
- ============================================================
- train_text_to_image.py FAILED
- ------------------------------------------------------------
- Failures:
- [1]:
- time : 2023-12-09_21:02:09
- host : 43ec3d4110a7
- rank : 2 (local_rank: 2)
- exitcode : 1 (pid: 919)
- error_file: <N/A>
- traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
- [2]:
- time : 2023-12-09_21:02:09
- host : 43ec3d4110a7
- rank : 3 (local_rank: 3)
- exitcode : 1 (pid: 920)
- error_file: <N/A>
- traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
- [3]:
- time : 2023-12-09_21:02:09
- host : 43ec3d4110a7
- rank : 4 (local_rank: 4)
- exitcode : 1 (pid: 921)
- error_file: <N/A>
- traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
- [4]:
- time : 2023-12-09_21:02:09
- host : 43ec3d4110a7
- rank : 5 (local_rank: 5)
- exitcode : 1 (pid: 922)
- error_file: <N/A>
- traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
- [5]:
- time : 2023-12-09_21:02:09
- host : 43ec3d4110a7
- rank : 6 (local_rank: 6)
- exitcode : 1 (pid: 923)
- error_file: <N/A>
- traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
- [6]:
- time : 2023-12-09_21:02:09
- host : 43ec3d4110a7
- rank : 7 (local_rank: 7)
- exitcode : 1 (pid: 924)
- error_file: <N/A>
- traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
- ------------------------------------------------------------
- Root Cause (first observed failure):
- [0]:
- time : 2023-12-09_21:02:09
- host : 43ec3d4110a7
- rank : 1 (local_rank: 1)
- exitcode : 1 (pid: 918)
- error_file: <N/A>
- traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
- ============================================================
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement