Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- The following values were not passed to `accelerate launch` and had defaults used instead:
- `--num_cpu_threads_per_process` was set to `40` to improve out-of-box performance when training on CPUs
- To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
- [W925 11:14:22.847626929 Utils.hpp:164] Warning: Environment variable NCCL_BLOCKING_WAIT is deprecated; use TORCH_NCCL_BLOCKING_WAIT instead (function operator())
- [W925 11:14:22.847654179 Utils.hpp:135] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function operator())
- [W925 11:14:22.848692186 Utils.hpp:164] Warning: Environment variable NCCL_BLOCKING_WAIT is deprecated; use TORCH_NCCL_BLOCKING_WAIT instead (function operator())
- [W925 11:14:22.848717216 Utils.hpp:135] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function operator())
- 09/25/2024 11:14:22 - INFO - __main__ - Distributed environment: FSDP Backend: nccl
- Num processes: 2
- Process index: 0
- Local process index: 0
- Device: cuda:0
- Mixed precision type: bf16
- You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
- 09/25/2024 11:14:22 - INFO - __main__ - Distributed environment: FSDP Backend: nccl
- Num processes: 2
- Process index: 1
- Local process index: 1
- Device: cuda:1
- Mixed precision type: bf16
- You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
- You are using a model of type t5 to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
- Downloading shards: 100%|███████████████████████| 2/2 [00:00<00:00, 4888.47it/s]
- Downloading shards: 100%|██████████████████████| 2/2 [00:00<00:00, 15307.68it/s]
- Loading checkpoint shards: 100%|██████████████████| 2/2 [00:01<00:00, 1.31it/s]
- Fetching 3 files: 100%|████████████████████████| 3/3 [00:00<00:00, 73584.28it/s]
- Loading checkpoint shards: 100%|██████████████████| 2/2 [00:57<00:00, 28.72s/it]
- Fetching 3 files: 100%|█████████████████████████| 3/3 [00:00<00:00, 8726.01it/s]
- {'axes_dims_rope'} was not found in config. Values will be initialized to default values.
- Using decoupled weight decay
- Using decoupled weight decay
- x2-h100:3217:3217 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ^docker0,lo
- x2-h100:3217:3217 [0] NCCL INFO Bootstrap : Using eth0:10.0.0.16<0>
- x2-h100:3217:3217 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
- x2-h100:3217:3217 [0] NCCL INFO cudaDriverVersion 12020
- NCCL version 2.20.5+cuda12.4
- x2-h100:3218:3218 [1] NCCL INFO cudaDriverVersion 12020
- x2-h100:3218:3218 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ^docker0,lo
- x2-h100:3218:3218 [1] NCCL INFO Bootstrap : Using eth0:10.0.0.16<0>
- x2-h100:3218:3218 [1] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
- x2-h100:3218:4557 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
- x2-h100:3218:4557 [1] NCCL INFO Failed to open libibverbs.so[.1]
- x2-h100:3218:4557 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ^docker0,lo
- x2-h100:3218:4557 [1] NCCL INFO NET/Socket : Using [0]eth0:10.0.0.16<0>
- x2-h100:3217:4556 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
- x2-h100:3217:4556 [0] NCCL INFO Failed to open libibverbs.so[.1]
- x2-h100:3217:4556 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ^docker0,lo
- x2-h100:3218:4557 [1] NCCL INFO Using non-device net plugin version 0
- x2-h100:3218:4557 [1] NCCL INFO Using network Socket
- x2-h100:3217:4556 [0] NCCL INFO NET/Socket : Using [0]eth0:10.0.0.16<0>
- x2-h100:3217:4556 [0] NCCL INFO Using non-device net plugin version 0
- x2-h100:3217:4556 [0] NCCL INFO Using network Socket
- x2-h100:3218:4557 [1] NCCL INFO comm 0x2f5704c0 rank 1 nranks 2 cudaDev 1 nvmlDev 1 busId 200000 commId 0xe3ead5030c28e9f0 - Init START
- x2-h100:3217:4556 [0] NCCL INFO comm 0x27143e10 rank 0 nranks 2 cudaDev 0 nvmlDev 0 busId 100000 commId 0xe3ead5030c28e9f0 - Init START
- x2-h100:3217:4556 [0] NCCL INFO Setting affinity for GPU 0 to ff,ffffffff
- x2-h100:3218:4557 [1] NCCL INFO Setting affinity for GPU 1 to ffff,ffffff00,00000000
- x2-h100:3217:4556 [0] NCCL INFO comm 0x27143e10 rank 0 nRanks 2 nNodes 1 localRanks 2 localRank 0 MNNVL 0
- x2-h100:3218:4557 [1] NCCL INFO comm 0x2f5704c0 rank 1 nRanks 2 nNodes 1 localRanks 2 localRank 1 MNNVL 0
- x2-h100:3218:4557 [1] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0 [2] -1/-1/-1->1->0 [3] -1/-1/-1->1->0 [4] 0/-1/-1->1->-1 [5] 0/-1/-1->1->-1 [6] 0/-1/-1->1->-1 [7] 0/-1/-1->1->-1 [8] -1/-1/-1->1->0 [9] -1/-1/-1->1->0 [10] -1/-1/-1->1->0 [11] -1/-1/-1->1->0 [12] 0/-1/-1->1->-1 [13] 0/-1/-1->1->-1 [14] 0/-1/-1->1->-1 [15] 0/-1/-1->1->-1
- x2-h100:3218:4557 [1] NCCL INFO P2P Chunksize set to 524288
- x2-h100:3217:4556 [0] NCCL INFO Channel 00/16 : 0 1
- x2-h100:3217:4556 [0] NCCL INFO Channel 01/16 : 0 1
- x2-h100:3217:4556 [0] NCCL INFO Channel 02/16 : 0 1
- x2-h100:3217:4556 [0] NCCL INFO Channel 03/16 : 0 1
- x2-h100:3217:4556 [0] NCCL INFO Channel 04/16 : 0 1
- x2-h100:3217:4556 [0] NCCL INFO Channel 05/16 : 0 1
- x2-h100:3217:4556 [0] NCCL INFO Channel 06/16 : 0 1
- x2-h100:3217:4556 [0] NCCL INFO Channel 07/16 : 0 1
- x2-h100:3217:4556 [0] NCCL INFO Channel 08/16 : 0 1
- x2-h100:3217:4556 [0] NCCL INFO Channel 09/16 : 0 1
- x2-h100:3217:4556 [0] NCCL INFO Channel 10/16 : 0 1
- x2-h100:3217:4556 [0] NCCL INFO Channel 11/16 : 0 1
- x2-h100:3217:4556 [0] NCCL INFO Channel 12/16 : 0 1
- x2-h100:3217:4556 [0] NCCL INFO Channel 13/16 : 0 1
- x2-h100:3217:4556 [0] NCCL INFO Channel 14/16 : 0 1
- x2-h100:3217:4556 [0] NCCL INFO Channel 15/16 : 0 1
- x2-h100:3217:4556 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 [4] -1/-1/-1->0->1 [5] -1/-1/-1->0->1 [6] -1/-1/-1->0->1 [7] -1/-1/-1->0->1 [8] 1/-1/-1->0->-1 [9] 1/-1/-1->0->-1 [10] 1/-1/-1->0->-1 [11] 1/-1/-1->0->-1 [12] -1/-1/-1->0->1 [13] -1/-1/-1->0->1 [14] -1/-1/-1->0->1 [15] -1/-1/-1->0->1
- x2-h100:3217:4556 [0] NCCL INFO P2P Chunksize set to 524288
- x2-h100:3217:4556 [0] NCCL INFO Channel 00/0 : 0[0] -> 1[1] via P2P/CUMEM
- x2-h100:3218:4557 [1] NCCL INFO Channel 00/0 : 1[1] -> 0[0] via P2P/CUMEM
- x2-h100:3218:4557 [1] NCCL INFO Channel 01/0 : 1[1] -> 0[0] via P2P/CUMEM
- x2-h100:3217:4556 [0] NCCL INFO Channel 01/0 : 0[0] -> 1[1] via P2P/CUMEM
- x2-h100:3218:4557 [1] NCCL INFO Channel 02/0 : 1[1] -> 0[0] via P2P/CUMEM
- x2-h100:3217:4556 [0] NCCL INFO Channel 02/0 : 0[0] -> 1[1] via P2P/CUMEM
- x2-h100:3218:4557 [1] NCCL INFO Channel 03/0 : 1[1] -> 0[0] via P2P/CUMEM
- x2-h100:3217:4556 [0] NCCL INFO Channel 03/0 : 0[0] -> 1[1] via P2P/CUMEM
- x2-h100:3218:4557 [1] NCCL INFO Channel 04/0 : 1[1] -> 0[0] via P2P/CUMEM
- x2-h100:3217:4556 [0] NCCL INFO Channel 04/0 : 0[0] -> 1[1] via P2P/CUMEM
- x2-h100:3218:4557 [1] NCCL INFO Channel 05/0 : 1[1] -> 0[0] via P2P/CUMEM
- x2-h100:3217:4556 [0] NCCL INFO Channel 05/0 : 0[0] -> 1[1] via P2P/CUMEM
- x2-h100:3218:4557 [1] NCCL INFO Channel 06/0 : 1[1] -> 0[0] via P2P/CUMEM
- x2-h100:3217:4556 [0] NCCL INFO Channel 06/0 : 0[0] -> 1[1] via P2P/CUMEM
- x2-h100:3218:4557 [1] NCCL INFO Channel 07/0 : 1[1] -> 0[0] via P2P/CUMEM
- x2-h100:3217:4556 [0] NCCL INFO Channel 07/0 : 0[0] -> 1[1] via P2P/CUMEM
- x2-h100:3218:4557 [1] NCCL INFO Channel 08/0 : 1[1] -> 0[0] via P2P/CUMEM
- x2-h100:3217:4556 [0] NCCL INFO Channel 08/0 : 0[0] -> 1[1] via P2P/CUMEM
- x2-h100:3218:4557 [1] NCCL INFO Channel 09/0 : 1[1] -> 0[0] via P2P/CUMEM
- x2-h100:3217:4556 [0] NCCL INFO Channel 09/0 : 0[0] -> 1[1] via P2P/CUMEM
- x2-h100:3218:4557 [1] NCCL INFO Channel 10/0 : 1[1] -> 0[0] via P2P/CUMEM
- x2-h100:3217:4556 [0] NCCL INFO Channel 10/0 : 0[0] -> 1[1] via P2P/CUMEM
- x2-h100:3218:4557 [1] NCCL INFO Channel 11/0 : 1[1] -> 0[0] via P2P/CUMEM
- x2-h100:3217:4556 [0] NCCL INFO Channel 11/0 : 0[0] -> 1[1] via P2P/CUMEM
- x2-h100:3218:4557 [1] NCCL INFO Channel 12/0 : 1[1] -> 0[0] via P2P/CUMEM
- x2-h100:3217:4556 [0] NCCL INFO Channel 12/0 : 0[0] -> 1[1] via P2P/CUMEM
- x2-h100:3218:4557 [1] NCCL INFO Channel 13/0 : 1[1] -> 0[0] via P2P/CUMEM
- x2-h100:3217:4556 [0] NCCL INFO Channel 13/0 : 0[0] -> 1[1] via P2P/CUMEM
- x2-h100:3218:4557 [1] NCCL INFO Channel 14/0 : 1[1] -> 0[0] via P2P/CUMEM
- x2-h100:3217:4556 [0] NCCL INFO Channel 14/0 : 0[0] -> 1[1] via P2P/CUMEM
- x2-h100:3218:4557 [1] NCCL INFO Channel 15/0 : 1[1] -> 0[0] via P2P/CUMEM
- x2-h100:3217:4556 [0] NCCL INFO Channel 15/0 : 0[0] -> 1[1] via P2P/CUMEM
- x2-h100:3217:4556 [0] NCCL INFO Connected all rings
- x2-h100:3218:4557 [1] NCCL INFO Connected all rings
- x2-h100:3217:4556 [0] NCCL INFO Connected all trees
- x2-h100:3218:4557 [1] NCCL INFO Connected all trees
- x2-h100:3218:4557 [1] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 512 | 512
- x2-h100:3218:4557 [1] NCCL INFO 16 coll channels, 0 collnet channels, 0 nvls channels, 16 p2p channels, 16 p2p channels per peer
- x2-h100:3217:4556 [0] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 512 | 512
- x2-h100:3217:4556 [0] NCCL INFO 16 coll channels, 0 collnet channels, 0 nvls channels, 16 p2p channels, 16 p2p channels per peer
- x2-h100:3218:4557 [1] NCCL INFO comm 0x2f5704c0 rank 1 nranks 2 cudaDev 1 nvmlDev 1 busId 200000 commId 0xe3ead5030c28e9f0 - Init COMPLETE
- x2-h100:3217:4556 [0] NCCL INFO comm 0x27143e10 rank 0 nranks 2 cudaDev 0 nvmlDev 0 busId 100000 commId 0xe3ead5030c28e9f0 - Init COMPLETE
- 09/25/2024 11:16:58 - INFO - __main__ - ***** Running training *****
- 09/25/2024 11:16:58 - INFO - __main__ - Num examples = 10
- 09/25/2024 11:16:58 - INFO - __main__ - Num batches each epoch = 5
- 09/25/2024 11:16:58 - INFO - __main__ - Num Epochs = 1
- 09/25/2024 11:16:58 - INFO - __main__ - Instantaneous batch size per device = 1
- 09/25/2024 11:16:58 - INFO - __main__ - Total train batch size (w. parallel, distributed & accumulation) = 8
- 09/25/2024 11:16:58 - INFO - __main__ - Gradient Accumulation steps = 4
- 09/25/2024 11:16:58 - INFO - __main__ - Total optimization steps = 2
- Steps: 0%| | 0/2 [00:00<?, ?it/s]Passing `txt_ids` 3d torch.Tensor is deprecated.Please remove the batch dimension and pass it as a 2d torch Tensor
- /usr/local/lib/python3.8/dist-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead.
- with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context: # type: ignore[attr-defined]
- /usr/local/lib/python3.8/dist-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead.
- with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context: # type: ignore[attr-defined]
- Passing `txt_ids` 3d torch.Tensor is deprecated.Please remove the batch dimension and pass it as a 2d torch Tensor
- Steps: 0%| | 0/2 [01:27<?, ?it/s, loss=0.4, lr=1]Passing `txt_ids` 3d torch.Tensor is deprecated.Please remove the batch dimension and pass it as a 2d torch Tensor
- Passing `txt_ids` 3d torch.Tensor is deprecated.Please remove the batch dimension and pass it as a 2d torch Tensor
- Steps: 0%| | 0/2 [01:31<?, ?it/s, loss=0.416, lr=1]Passing `txt_ids` 3d torch.Tensor is deprecated.Please remove the batch dimension and pass it as a 2d torch Tensor
- Passing `txt_ids` 3d torch.Tensor is deprecated.Please remove the batch dimension and pass it as a 2d torch Tensor
- Steps: 0%| | 0/2 [01:35<?, ?it/s, loss=0.327, lr=1]Passing `txt_ids` 3d torch.Tensor is deprecated.Please remove the batch dimension and pass it as a 2d torch Tensor
- Passing `txt_ids` 3d torch.Tensor is deprecated.Please remove the batch dimension and pass it as a 2d torch Tensor
- Steps: 50%|█████████▌ | 1/2 [01:40<01:40, 100.45s/it, loss=0.327, lr=1]09/25/2024 11:18:39 - INFO - accelerate.accelerator - Saving current state to /flux-dreambooth-outputs/dreamboot-yaremovaa/checkpoint-1
- 09/25/2024 11:18:39 - INFO - accelerate.accelerator - Saving FSDP model
- /usr/local/lib/python3.8/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
- warnings.warn(
- /usr/local/lib/python3.8/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:737: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
- local_shape = tensor.shape
- /usr/local/lib/python3.8/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
- tensor.shape,
- /usr/local/lib/python3.8/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
- tensor.dtype,
- /usr/local/lib/python3.8/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:752: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
- tensor.device,
- 09/25/2024 11:18:45 - INFO - accelerate.utils.fsdp_utils - Saving model to /flux-dreambooth-outputs/dreamboot-yaremovaa/checkpoint-1/pytorch_model_fsdp_0
- /usr/local/lib/python3.8/dist-packages/accelerate/utils/fsdp_utils.py:107: FutureWarning: `save_state_dict` is deprecated and will be removed in future versions.Please use `save` instead.
- dist_cp.save_state_dict(
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement