Untitled

The following values were not passed to `accelerate launch` and had defaults used instead:
	`--num_cpu_threads_per_process` was set to `40` to improve out-of-box performance when training on CPUs
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
[W925 11:14:22.847626929 Utils.hpp:164] Warning: Environment variable NCCL_BLOCKING_WAIT is deprecated; use TORCH_NCCL_BLOCKING_WAIT instead (function operator())
[W925 11:14:22.847654179 Utils.hpp:135] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function operator())
[W925 11:14:22.848692186 Utils.hpp:164] Warning: Environment variable NCCL_BLOCKING_WAIT is deprecated; use TORCH_NCCL_BLOCKING_WAIT instead (function operator())
[W925 11:14:22.848717216 Utils.hpp:135] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function operator())
09/25/2024 11:14:22 - INFO - __main__ - Distributed environment: FSDP  Backend: nccl
Num processes: 2
Process index: 0
Local process index: 0
Device: cuda:0

Mixed precision type: bf16

You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
09/25/2024 11:14:22 - INFO - __main__ - Distributed environment: FSDP  Backend: nccl
Num processes: 2
Process index: 1
Local process index: 1
Device: cuda:1

Mixed precision type: bf16

You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
You are using a model of type t5 to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
Downloading shards: 100%|███████████████████████| 2/2 [00:00<00:00, 4888.47it/s]
Downloading shards: 100%|██████████████████████| 2/2 [00:00<00:00, 15307.68it/s]
Loading checkpoint shards: 100%|██████████████████| 2/2 [00:01<00:00,  1.31it/s]
Fetching 3 files: 100%|████████████████████████| 3/3 [00:00<00:00, 73584.28it/s]
Loading checkpoint shards: 100%|██████████████████| 2/2 [00:57<00:00, 28.72s/it]
Fetching 3 files: 100%|█████████████████████████| 3/3 [00:00<00:00, 8726.01it/s]
{'axes_dims_rope'} was not found in config. Values will be initialized to default values.
Using decoupled weight decay
Using decoupled weight decay
x2-h100:3217:3217 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ^docker0,lo
x2-h100:3217:3217 [0] NCCL INFO Bootstrap : Using eth0:10.0.0.16<0>
x2-h100:3217:3217 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
x2-h100:3217:3217 [0] NCCL INFO cudaDriverVersion 12020
NCCL version 2.20.5+cuda12.4
x2-h100:3218:3218 [1] NCCL INFO cudaDriverVersion 12020
x2-h100:3218:3218 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ^docker0,lo
x2-h100:3218:3218 [1] NCCL INFO Bootstrap : Using eth0:10.0.0.16<0>
x2-h100:3218:3218 [1] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
x2-h100:3218:4557 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
x2-h100:3218:4557 [1] NCCL INFO Failed to open libibverbs.so[.1]
x2-h100:3218:4557 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ^docker0,lo
x2-h100:3218:4557 [1] NCCL INFO NET/Socket : Using [0]eth0:10.0.0.16<0>
x2-h100:3217:4556 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
x2-h100:3217:4556 [0] NCCL INFO Failed to open libibverbs.so[.1]
x2-h100:3217:4556 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ^docker0,lo
x2-h100:3218:4557 [1] NCCL INFO Using non-device net plugin version 0
x2-h100:3218:4557 [1] NCCL INFO Using network Socket
x2-h100:3217:4556 [0] NCCL INFO NET/Socket : Using [0]eth0:10.0.0.16<0>
x2-h100:3217:4556 [0] NCCL INFO Using non-device net plugin version 0
x2-h100:3217:4556 [0] NCCL INFO Using network Socket
x2-h100:3218:4557 [1] NCCL INFO comm 0x2f5704c0 rank 1 nranks 2 cudaDev 1 nvmlDev 1 busId 200000 commId 0xe3ead5030c28e9f0 - Init START
x2-h100:3217:4556 [0] NCCL INFO comm 0x27143e10 rank 0 nranks 2 cudaDev 0 nvmlDev 0 busId 100000 commId 0xe3ead5030c28e9f0 - Init START
x2-h100:3217:4556 [0] NCCL INFO Setting affinity for GPU 0 to ff,ffffffff
x2-h100:3218:4557 [1] NCCL INFO Setting affinity for GPU 1 to ffff,ffffff00,00000000
x2-h100:3217:4556 [0] NCCL INFO comm 0x27143e10 rank 0 nRanks 2 nNodes 1 localRanks 2 localRank 0 MNNVL 0
x2-h100:3218:4557 [1] NCCL INFO comm 0x2f5704c0 rank 1 nRanks 2 nNodes 1 localRanks 2 localRank 1 MNNVL 0
x2-h100:3218:4557 [1] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0 [2] -1/-1/-1->1->0 [3] -1/-1/-1->1->0 [4] 0/-1/-1->1->-1 [5] 0/-1/-1->1->-1 [6] 0/-1/-1->1->-1 [7] 0/-1/-1->1->-1 [8] -1/-1/-1->1->0 [9] -1/-1/-1->1->0 [10] -1/-1/-1->1->0 [11] -1/-1/-1->1->0 [12] 0/-1/-1->1->-1 [13] 0/-1/-1->1->-1 [14] 0/-1/-1->1->-1 [15] 0/-1/-1->1->-1
x2-h100:3218:4557 [1] NCCL INFO P2P Chunksize set to 524288
x2-h100:3217:4556 [0] NCCL INFO Channel 00/16 :    0   1
x2-h100:3217:4556 [0] NCCL INFO Channel 01/16 :    0   1
x2-h100:3217:4556 [0] NCCL INFO Channel 02/16 :    0   1
x2-h100:3217:4556 [0] NCCL INFO Channel 03/16 :    0   1
x2-h100:3217:4556 [0] NCCL INFO Channel 04/16 :    0   1
x2-h100:3217:4556 [0] NCCL INFO Channel 05/16 :    0   1
x2-h100:3217:4556 [0] NCCL INFO Channel 06/16 :    0   1
x2-h100:3217:4556 [0] NCCL INFO Channel 07/16 :    0   1
x2-h100:3217:4556 [0] NCCL INFO Channel 08/16 :    0   1
x2-h100:3217:4556 [0] NCCL INFO Channel 09/16 :    0   1
x2-h100:3217:4556 [0] NCCL INFO Channel 10/16 :    0   1
x2-h100:3217:4556 [0] NCCL INFO Channel 11/16 :    0   1
x2-h100:3217:4556 [0] NCCL INFO Channel 12/16 :    0   1
x2-h100:3217:4556 [0] NCCL INFO Channel 13/16 :    0   1
x2-h100:3217:4556 [0] NCCL INFO Channel 14/16 :    0   1
x2-h100:3217:4556 [0] NCCL INFO Channel 15/16 :    0   1
x2-h100:3217:4556 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 [4] -1/-1/-1->0->1 [5] -1/-1/-1->0->1 [6] -1/-1/-1->0->1 [7] -1/-1/-1->0->1 [8] 1/-1/-1->0->-1 [9] 1/-1/-1->0->-1 [10] 1/-1/-1->0->-1 [11] 1/-1/-1->0->-1 [12] -1/-1/-1->0->1 [13] -1/-1/-1->0->1 [14] -1/-1/-1->0->1 [15] -1/-1/-1->0->1
x2-h100:3217:4556 [0] NCCL INFO P2P Chunksize set to 524288
x2-h100:3217:4556 [0] NCCL INFO Channel 00/0 : 0[0] -> 1[1] via P2P/CUMEM
x2-h100:3218:4557 [1] NCCL INFO Channel 00/0 : 1[1] -> 0[0] via P2P/CUMEM
x2-h100:3218:4557 [1] NCCL INFO Channel 01/0 : 1[1] -> 0[0] via P2P/CUMEM
x2-h100:3217:4556 [0] NCCL INFO Channel 01/0 : 0[0] -> 1[1] via P2P/CUMEM
x2-h100:3218:4557 [1] NCCL INFO Channel 02/0 : 1[1] -> 0[0] via P2P/CUMEM
x2-h100:3217:4556 [0] NCCL INFO Channel 02/0 : 0[0] -> 1[1] via P2P/CUMEM
x2-h100:3218:4557 [1] NCCL INFO Channel 03/0 : 1[1] -> 0[0] via P2P/CUMEM
x2-h100:3217:4556 [0] NCCL INFO Channel 03/0 : 0[0] -> 1[1] via P2P/CUMEM
x2-h100:3218:4557 [1] NCCL INFO Channel 04/0 : 1[1] -> 0[0] via P2P/CUMEM
x2-h100:3217:4556 [0] NCCL INFO Channel 04/0 : 0[0] -> 1[1] via P2P/CUMEM
x2-h100:3218:4557 [1] NCCL INFO Channel 05/0 : 1[1] -> 0[0] via P2P/CUMEM
x2-h100:3217:4556 [0] NCCL INFO Channel 05/0 : 0[0] -> 1[1] via P2P/CUMEM
x2-h100:3218:4557 [1] NCCL INFO Channel 06/0 : 1[1] -> 0[0] via P2P/CUMEM
x2-h100:3217:4556 [0] NCCL INFO Channel 06/0 : 0[0] -> 1[1] via P2P/CUMEM
x2-h100:3218:4557 [1] NCCL INFO Channel 07/0 : 1[1] -> 0[0] via P2P/CUMEM
x2-h100:3217:4556 [0] NCCL INFO Channel 07/0 : 0[0] -> 1[1] via P2P/CUMEM
x2-h100:3218:4557 [1] NCCL INFO Channel 08/0 : 1[1] -> 0[0] via P2P/CUMEM
x2-h100:3217:4556 [0] NCCL INFO Channel 08/0 : 0[0] -> 1[1] via P2P/CUMEM
x2-h100:3218:4557 [1] NCCL INFO Channel 09/0 : 1[1] -> 0[0] via P2P/CUMEM
x2-h100:3217:4556 [0] NCCL INFO Channel 09/0 : 0[0] -> 1[1] via P2P/CUMEM
x2-h100:3218:4557 [1] NCCL INFO Channel 10/0 : 1[1] -> 0[0] via P2P/CUMEM
x2-h100:3217:4556 [0] NCCL INFO Channel 10/0 : 0[0] -> 1[1] via P2P/CUMEM
x2-h100:3218:4557 [1] NCCL INFO Channel 11/0 : 1[1] -> 0[0] via P2P/CUMEM
x2-h100:3217:4556 [0] NCCL INFO Channel 11/0 : 0[0] -> 1[1] via P2P/CUMEM
x2-h100:3218:4557 [1] NCCL INFO Channel 12/0 : 1[1] -> 0[0] via P2P/CUMEM
x2-h100:3217:4556 [0] NCCL INFO Channel 12/0 : 0[0] -> 1[1] via P2P/CUMEM
x2-h100:3218:4557 [1] NCCL INFO Channel 13/0 : 1[1] -> 0[0] via P2P/CUMEM
x2-h100:3217:4556 [0] NCCL INFO Channel 13/0 : 0[0] -> 1[1] via P2P/CUMEM
x2-h100:3218:4557 [1] NCCL INFO Channel 14/0 : 1[1] -> 0[0] via P2P/CUMEM
x2-h100:3217:4556 [0] NCCL INFO Channel 14/0 : 0[0] -> 1[1] via P2P/CUMEM
x2-h100:3218:4557 [1] NCCL INFO Channel 15/0 : 1[1] -> 0[0] via P2P/CUMEM
x2-h100:3217:4556 [0] NCCL INFO Channel 15/0 : 0[0] -> 1[1] via P2P/CUMEM
x2-h100:3217:4556 [0] NCCL INFO Connected all rings
x2-h100:3218:4557 [1] NCCL INFO Connected all rings
x2-h100:3217:4556 [0] NCCL INFO Connected all trees
x2-h100:3218:4557 [1] NCCL INFO Connected all trees
x2-h100:3218:4557 [1] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 512 | 512
x2-h100:3218:4557 [1] NCCL INFO 16 coll channels, 0 collnet channels, 0 nvls channels, 16 p2p channels, 16 p2p channels per peer
x2-h100:3217:4556 [0] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 512 | 512
x2-h100:3217:4556 [0] NCCL INFO 16 coll channels, 0 collnet channels, 0 nvls channels, 16 p2p channels, 16 p2p channels per peer
x2-h100:3218:4557 [1] NCCL INFO comm 0x2f5704c0 rank 1 nranks 2 cudaDev 1 nvmlDev 1 busId 200000 commId 0xe3ead5030c28e9f0 - Init COMPLETE
x2-h100:3217:4556 [0] NCCL INFO comm 0x27143e10 rank 0 nranks 2 cudaDev 0 nvmlDev 0 busId 100000 commId 0xe3ead5030c28e9f0 - Init COMPLETE
09/25/2024 11:16:58 - INFO - __main__ - ***** Running training *****
09/25/2024 11:16:58 - INFO - __main__ -   Num examples = 10
09/25/2024 11:16:58 - INFO - __main__ -   Num batches each epoch = 5
09/25/2024 11:16:58 - INFO - __main__ -   Num Epochs = 1
09/25/2024 11:16:58 - INFO - __main__ -   Instantaneous batch size per device = 1
09/25/2024 11:16:58 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 8
09/25/2024 11:16:58 - INFO - __main__ -   Gradient Accumulation steps = 4
09/25/2024 11:16:58 - INFO - __main__ -   Total optimization steps = 2
Steps:   0%|                                              | 0/2 [00:00<?, ?it/s]Passing `txt_ids` 3d torch.Tensor is deprecated.Please remove the batch dimension and pass it as a 2d torch Tensor
/usr/local/lib/python3.8/dist-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead.
  with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context:  # type: ignore[attr-defined]
/usr/local/lib/python3.8/dist-packages/torch/utils/checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead.
  with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context:  # type: ignore[attr-defined]
Passing `txt_ids` 3d torch.Tensor is deprecated.Please remove the batch dimension and pass it as a 2d torch Tensor
Steps:   0%|                              | 0/2 [01:27<?, ?it/s, loss=0.4, lr=1]Passing `txt_ids` 3d torch.Tensor is deprecated.Please remove the batch dimension and pass it as a 2d torch Tensor
Passing `txt_ids` 3d torch.Tensor is deprecated.Please remove the batch dimension and pass it as a 2d torch Tensor
Steps:   0%|                            | 0/2 [01:31<?, ?it/s, loss=0.416, lr=1]Passing `txt_ids` 3d torch.Tensor is deprecated.Please remove the batch dimension and pass it as a 2d torch Tensor
Passing `txt_ids` 3d torch.Tensor is deprecated.Please remove the batch dimension and pass it as a 2d torch Tensor
Steps:   0%|                            | 0/2 [01:35<?, ?it/s, loss=0.327, lr=1]Passing `txt_ids` 3d torch.Tensor is deprecated.Please remove the batch dimension and pass it as a 2d torch Tensor
Passing `txt_ids` 3d torch.Tensor is deprecated.Please remove the batch dimension and pass it as a 2d torch Tensor
Steps:  50%|█████████▌         | 1/2 [01:40<01:40, 100.45s/it, loss=0.327, lr=1]09/25/2024 11:18:39 - INFO - accelerate.accelerator - Saving current state to /flux-dreambooth-outputs/dreamboot-yaremovaa/checkpoint-1
09/25/2024 11:18:39 - INFO - accelerate.accelerator - Saving FSDP model
/usr/local/lib/python3.8/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:737: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
  local_shape = tensor.shape
/usr/local/lib/python3.8/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:749: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
  tensor.shape,
/usr/local/lib/python3.8/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:751: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
  tensor.dtype,
/usr/local/lib/python3.8/dist-packages/torch/distributed/fsdp/_state_dict_utils.py:752: FutureWarning: Please use DTensor instead and we are deprecating ShardedTensor.
  tensor.device,
09/25/2024 11:18:45 - INFO - accelerate.utils.fsdp_utils - Saving model to /flux-dreambooth-outputs/dreamboot-yaremovaa/checkpoint-1/pytorch_model_fsdp_0
/usr/local/lib/python3.8/dist-packages/accelerate/utils/fsdp_utils.py:107: FutureWarning: `save_state_dict` is deprecated and will be removed in future versions.Please use `save` instead.
  dist_cp.save_state_dict(