Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- [2024-09-27 13:40:31,969] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
- [WARNING] On Ampere and higher architectures please use CUDA 11+
- [WARNING] On Ampere and higher architectures please use CUDA 11+
- [WARNING] On Ampere and higher architectures please use CUDA 11+
- [WARNING] On Ampere and higher architectures please use CUDA 11+
- [WARNING] On Ampere and higher architectures please use CUDA 11+
- [WARNING] On Ampere and higher architectures please use CUDA 11+
- W0927 13:40:33.228857 140401003063104 torch/distributed/run.py:779]
- W0927 13:40:33.228857 140401003063104 torch/distributed/run.py:779] *****************************************
- W0927 13:40:33.228857 140401003063104 torch/distributed/run.py:779] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
- W0927 13:40:33.228857 140401003063104 torch/distributed/run.py:779] *****************************************
- Traceback (most recent call last):
- File "/usr/local/lib/python3.8/dist-packages/accelerate/utils/deepspeed.py", line 51, in __init__
- config_decoded = base64.urlsafe_b64decode(config_file_or_dict).decode("utf-8")
- File "/usr/lib/python3.8/base64.py", line 133, in urlsafe_b64decode
- Traceback (most recent call last):
- File "/usr/local/lib/python3.8/dist-packages/accelerate/utils/deepspeed.py", line 51, in __init__
- config_decoded = base64.urlsafe_b64decode(config_file_or_dict).decode("utf-8")
- File "/usr/lib/python3.8/base64.py", line 133, in urlsafe_b64decode
- return b64decode(s)
- File "/usr/lib/python3.8/base64.py", line 87, in b64decode
- return b64decode(s)
- File "/usr/lib/python3.8/base64.py", line 87, in b64decode
- return binascii.a2b_base64(s)
- binascii .return binascii.a2b_base64(s)Error
- : Incorrect paddingbinascii
- .
- During handling of the above exception, another exception occurred:
- ErrorTraceback (most recent call last):
- : File "examples/dreambooth/train_dreambooth_flux.py", line 1801, in <module>
- Incorrect padding
- During handling of the above exception, another exception occurred:
- Traceback (most recent call last):
- File "examples/dreambooth/train_dreambooth_flux.py", line 1801, in <module>
- main(args)
- File "examples/dreambooth/train_dreambooth_flux.py", line 998, in main
- main(args)
- File "examples/dreambooth/train_dreambooth_flux.py", line 998, in main
- accelerator = Accelerator(
- File "/usr/local/lib/python3.8/dist-packages/accelerate/accelerator.py", line 285, in __init__
- accelerator = Accelerator(
- File "/usr/local/lib/python3.8/dist-packages/accelerate/accelerator.py", line 285, in __init__
- DeepSpeedPlugin() if os.environ.get("ACCELERATE_USE_DEEPSPEED", "false") == "true" else None
- File "<string>", line 15, in __init__
- DeepSpeedPlugin() if os.environ.get("ACCELERATE_USE_DEEPSPEED", "false") == "true" else None
- File "<string>", line 15, in __init__
- File "/usr/local/lib/python3.8/dist-packages/accelerate/utils/dataclasses.py", line 1025, in __post_init__
- File "/usr/local/lib/python3.8/dist-packages/accelerate/utils/dataclasses.py", line 1025, in __post_init__
- self.hf_ds_config = HfDeepSpeedConfig(self.hf_ds_config)self.hf_ds_config = HfDeepSpeedConfig(self.hf_ds_config)
- File "/usr/local/lib/python3.8/dist-packages/accelerate/utils/deepspeed.py", line 54, in __init__
- File "/usr/local/lib/python3.8/dist-packages/accelerate/utils/deepspeed.py", line 54, in __init__
- raise ValueError(raise ValueError(
- ValueErrorValueError: : Expected a string path to an existing deepspeed config, or a dictionary, or a base64 encoded string. Received: ./deepspeed.jsonExpected a string path to an existing deepspeed config, or a dictionary, or a base64 encoded string. Received: ./deepspeed.json
- W0927 13:40:36.132854 140401003063104 torch/distributed/elastic/multiprocessing/api.py:858] Sending process 10890 closing signal SIGTERM
- E0927 13:40:36.133734 140401003063104 torch/distributed/elastic/multiprocessing/api.py:833] failed (exitcode: 1) local_rank: 1 (pid: 10891) of binary: /usr/bin/python
- Traceback (most recent call last):
- File "/usr/local/bin/accelerate", line 8, in <module>
- sys.exit(main())
- File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/accelerate_cli.py", line 48, in main
- args.func(args)
- File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 1159, in launch_command
- deepspeed_launcher(args)
- File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 852, in deepspeed_launcher
- distrib_run.run(args)
- File "/usr/local/lib/python3.8/dist-packages/torch/distributed/run.py", line 892, in run
- elastic_launch(
- File "/usr/local/lib/python3.8/dist-packages/torch/distributed/launcher/api.py", line 133, in __call__
- return launch_agent(self._config, self._entrypoint, list(args))
- File "/usr/local/lib/python3.8/dist-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
- raise ChildFailedError(
- torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
- ============================================================
- examples/dreambooth/train_dreambooth_flux.py FAILED
- ------------------------------------------------------------
- Failures:
- <NO_OTHER_FAILURES>
- ------------------------------------------------------------
- Root Cause (first observed failure):
- [0]:
- time : 2024-09-27_13:40:36
- host : x2-h100.internal.cloudapp.net
- rank : 1 (local_rank: 1)
- exitcode : 1 (pid: 10891)
- error_file: <N/A>
- traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
- ============================================================
Add Comment
Please, Sign In to add comment