Advertisement
kopyl

Untitled

Dec 9th, 2023
46
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 17.91 KB | None | 0 0
  1. Generating train split: 100%|███| 37429/37429 [00:01<00:00, 28527.49 examples/s]
  2. wandb: Currently logged in as: spammmmm1997. Use `wandb login --relogin` to force relogin
  3. Traceback (most recent call last):
  4. File "/workspace/diffusers/examples/text_to_image/train_text_to_image.py", line 1074, in <module>
  5. main()
  6. File "/workspace/diffusers/examples/text_to_image/train_text_to_image.py", line 890, in main
  7. latents = vae.encode(batch["pixel_values"].to(weight_dtype)).latent_dist.sample()
  8. File "/usr/local/lib/python3.10/dist-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper
  9. return method(self, *args, **kwargs)
  10. File "/usr/local/lib/python3.10/dist-packages/diffusers/models/autoencoder_kl.py", line 260, in encode
  11. h = self.encoder(x)
  12. File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
  13. return self._call_impl(*args, **kwargs)
  14. File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
  15. return forward_call(*args, **kwargs)
  16. File "/usr/local/lib/python3.10/dist-packages/diffusers/models/vae.py", line 141, in forward
  17. sample = self.conv_in(sample)
  18. File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
  19. return self._call_impl(*args, **kwargs)
  20. File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
  21. return forward_call(*args, **kwargs)
  22. File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 460, in forward
  23. return self._conv_forward(input, self.weight, self.bias)
  24. File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 456, in _conv_forward
  25. return F.conv2d(input, weight, bias, self.stride,
  26. RuntimeError: "slow_conv2d_cpu" not implemented for 'Half'
  27. wandb: Tracking run with wandb version 0.16.1
  28. wandb: Run data is saved locally in /workspace/diffusers/examples/text_to_image/wandb/run-20231209_210203-cxbfvkj4
  29. wandb: Run `wandb offline` to turn off syncing.
  30. wandb: Syncing run celestial-frog-115
  31. wandb: ⭐️ View project at https://wandb.ai/spammmmm1997/text2image-fine-tune
  32. wandb: 🚀 View run at https://wandb.ai/spammmmm1997/text2image-fine-tune/runs/cxbfvkj4
  33. 12/09/2023 21:02:04 - INFO - __main__ - ***** Running training *****
  34. 12/09/2023 21:02:04 - INFO - __main__ - Num examples = 37429
  35. 12/09/2023 21:02:04 - INFO - __main__ - Num Epochs = 3847
  36. 12/09/2023 21:02:04 - INFO - __main__ - Instantaneous batch size per device = 184
  37. 12/09/2023 21:02:04 - INFO - __main__ - Total train batch size (w. parallel, distributed & accumulation) = 1472
  38. 12/09/2023 21:02:04 - INFO - __main__ - Gradient Accumulation steps = 1
  39. 12/09/2023 21:02:04 - INFO - __main__ - Total optimization steps = 100000
  40. Steps: 0%| | 0/100000 [00:00<?, ?it/s]Traceback (most recent call last):
  41. File "/workspace/diffusers/examples/text_to_image/train_text_to_image.py", line 1074, in <module>
  42. main()
  43. File "/workspace/diffusers/examples/text_to_image/train_text_to_image.py", line 890, in main
  44. latents = vae.encode(batch["pixel_values"].to(weight_dtype)).latent_dist.sample()
  45. File "/usr/local/lib/python3.10/dist-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper
  46. return method(self, *args, **kwargs)
  47. File "/usr/local/lib/python3.10/dist-packages/diffusers/models/autoencoder_kl.py", line 260, in encode
  48. h = self.encoder(x)
  49. File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
  50. return self._call_impl(*args, **kwargs)
  51. File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
  52. return forward_call(*args, **kwargs)
  53. File "/usr/local/lib/python3.10/dist-packages/diffusers/models/vae.py", line 141, in forward
  54. sample = self.conv_in(sample)
  55. File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
  56. return self._call_impl(*args, **kwargs)
  57. File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
  58. return forward_call(*args, **kwargs)
  59. File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 460, in forward
  60. return self._conv_forward(input, self.weight, self.bias)
  61. File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 456, in _conv_forward
  62. return F.conv2d(input, weight, bias, self.stride,
  63. RuntimeError: "slow_conv2d_cpu" not implemented for 'Half'
  64. Traceback (most recent call last):
  65. File "/workspace/diffusers/examples/text_to_image/train_text_to_image.py", line 1074, in <module>
  66. main()
  67. File "/workspace/diffusers/examples/text_to_image/train_text_to_image.py", line 890, in main
  68. latents = vae.encode(batch["pixel_values"].to(weight_dtype)).latent_dist.sample()
  69. File "/usr/local/lib/python3.10/dist-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper
  70. return method(self, *args, **kwargs)
  71. File "/usr/local/lib/python3.10/dist-packages/diffusers/models/autoencoder_kl.py", line 260, in encode
  72. h = self.encoder(x)
  73. File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
  74. return self._call_impl(*args, **kwargs)
  75. File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
  76. return forward_call(*args, **kwargs)
  77. File "/usr/local/lib/python3.10/dist-packages/diffusers/models/vae.py", line 141, in forward
  78. sample = self.conv_in(sample)
  79. File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
  80. return self._call_impl(*args, **kwargs)
  81. File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
  82. return forward_call(*args, **kwargs)
  83. File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 460, in forward
  84. return self._conv_forward(input, self.weight, self.bias)
  85. File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 456, in _conv_forward
  86. return F.conv2d(input, weight, bias, self.stride,
  87. RuntimeError: "slow_conv2d_cpu" not implemented for 'Half'
  88. Traceback (most recent call last):
  89. File "/workspace/diffusers/examples/text_to_image/train_text_to_image.py", line 1074, in <module>
  90. main()
  91. File "/workspace/diffusers/examples/text_to_image/train_text_to_image.py", line 890, in main
  92. latents = vae.encode(batch["pixel_values"].to(weight_dtype)).latent_dist.sample()
  93. File "/usr/local/lib/python3.10/dist-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper
  94. return method(self, *args, **kwargs)
  95. File "/usr/local/lib/python3.10/dist-packages/diffusers/models/autoencoder_kl.py", line 260, in encode
  96. h = self.encoder(x)
  97. File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
  98. return self._call_impl(*args, **kwargs)
  99. File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
  100. return forward_call(*args, **kwargs)
  101. File "/usr/local/lib/python3.10/dist-packages/diffusers/models/vae.py", line 141, in forward
  102. sample = self.conv_in(sample)
  103. File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
  104. return self._call_impl(*args, **kwargs)
  105. File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
  106. return forward_call(*args, **kwargs)
  107. File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 460, in forward
  108. return self._conv_forward(input, self.weight, self.bias)
  109. File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 456, in _conv_forward
  110. return F.conv2d(input, weight, bias, self.stride,
  111. RuntimeError: "slow_conv2d_cpu" not implemented for 'Half'
  112. Traceback (most recent call last):
  113. File "/workspace/diffusers/examples/text_to_image/train_text_to_image.py", line 1074, in <module>
  114. main()
  115. File "/workspace/diffusers/examples/text_to_image/train_text_to_image.py", line 890, in main
  116. latents = vae.encode(batch["pixel_values"].to(weight_dtype)).latent_dist.sample()
  117. File "/usr/local/lib/python3.10/dist-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper
  118. return method(self, *args, **kwargs)
  119. File "/usr/local/lib/python3.10/dist-packages/diffusers/models/autoencoder_kl.py", line 260, in encode
  120. h = self.encoder(x)
  121. File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
  122. return self._call_impl(*args, **kwargs)
  123. File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
  124. return forward_call(*args, **kwargs)
  125. File "/usr/local/lib/python3.10/dist-packages/diffusers/models/vae.py", line 141, in forward
  126. sample = self.conv_in(sample)
  127. File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
  128. return self._call_impl(*args, **kwargs)
  129. File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
  130. return forward_call(*args, **kwargs)
  131. File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 460, in forward
  132. return self._conv_forward(input, self.weight, self.bias)
  133. File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 456, in _conv_forward
  134. return F.conv2d(input, weight, bias, self.stride,
  135. RuntimeError: "slow_conv2d_cpu" not implemented for 'Half'
  136. Traceback (most recent call last):
  137. File "/workspace/diffusers/examples/text_to_image/train_text_to_image.py", line 1074, in <module>
  138. main()
  139. File "/workspace/diffusers/examples/text_to_image/train_text_to_image.py", line 890, in main
  140. latents = vae.encode(batch["pixel_values"].to(weight_dtype)).latent_dist.sample()
  141. File "/usr/local/lib/python3.10/dist-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper
  142. return method(self, *args, **kwargs)
  143. File "/usr/local/lib/python3.10/dist-packages/diffusers/models/autoencoder_kl.py", line 260, in encode
  144. h = self.encoder(x)
  145. File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
  146. return self._call_impl(*args, **kwargs)
  147. File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
  148. return forward_call(*args, **kwargs)
  149. File "/usr/local/lib/python3.10/dist-packages/diffusers/models/vae.py", line 141, in forward
  150. sample = self.conv_in(sample)
  151. File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
  152. return self._call_impl(*args, **kwargs)
  153. File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
  154. return forward_call(*args, **kwargs)
  155. File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 460, in forward
  156. return self._conv_forward(input, self.weight, self.bias)
  157. File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 456, in _conv_forward
  158. return F.conv2d(input, weight, bias, self.stride,
  159. RuntimeError: "slow_conv2d_cpu" not implemented for 'Half'
  160. Traceback (most recent call last):
  161. File "/workspace/diffusers/examples/text_to_image/train_text_to_image.py", line 1074, in <module>
  162. main()
  163. File "/workspace/diffusers/examples/text_to_image/train_text_to_image.py", line 890, in main
  164. latents = vae.encode(batch["pixel_values"].to(weight_dtype)).latent_dist.sample()
  165. File "/usr/local/lib/python3.10/dist-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper
  166. return method(self, *args, **kwargs)
  167. File "/usr/local/lib/python3.10/dist-packages/diffusers/models/autoencoder_kl.py", line 260, in encode
  168. h = self.encoder(x)
  169. File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
  170. return self._call_impl(*args, **kwargs)
  171. File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
  172. return forward_call(*args, **kwargs)
  173. File "/usr/local/lib/python3.10/dist-packages/diffusers/models/vae.py", line 141, in forward
  174. sample = self.conv_in(sample)
  175. File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
  176. return self._call_impl(*args, **kwargs)
  177. File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
  178. return forward_call(*args, **kwargs)
  179. File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 460, in forward
  180. return self._conv_forward(input, self.weight, self.bias)
  181. File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 456, in _conv_forward
  182. return F.conv2d(input, weight, bias, self.stride,
  183. RuntimeError: "slow_conv2d_cpu" not implemented for 'Half'
  184. Traceback (most recent call last):
  185. File "/workspace/diffusers/examples/text_to_image/train_text_to_image.py", line 1074, in <module>
  186. main()
  187. File "/workspace/diffusers/examples/text_to_image/train_text_to_image.py", line 890, in main
  188. latents = vae.encode(batch["pixel_values"].to(weight_dtype)).latent_dist.sample()
  189. File "/usr/local/lib/python3.10/dist-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper
  190. return method(self, *args, **kwargs)
  191. File "/usr/local/lib/python3.10/dist-packages/diffusers/models/autoencoder_kl.py", line 260, in encode
  192. h = self.encoder(x)
  193. File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
  194. return self._call_impl(*args, **kwargs)
  195. File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
  196. return forward_call(*args, **kwargs)
  197. File "/usr/local/lib/python3.10/dist-packages/diffusers/models/vae.py", line 141, in forward
  198. sample = self.conv_in(sample)
  199. File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
  200. return self._call_impl(*args, **kwargs)
  201. File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
  202. return forward_call(*args, **kwargs)
  203. File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 460, in forward
  204. return self._conv_forward(input, self.weight, self.bias)
  205. File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 456, in _conv_forward
  206. return F.conv2d(input, weight, bias, self.stride,
  207. RuntimeError: "slow_conv2d_cpu" not implemented for 'Half'
  208. [2023-12-09 21:02:09,165] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 917 closing signal SIGTERM
  209. [2023-12-09 21:02:09,531] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 1 (pid: 918) of binary: /usr/bin/python
  210. Traceback (most recent call last):
  211. File "/usr/local/bin/accelerate", line 8, in <module>
  212. sys.exit(main())
  213. File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py", line 47, in main
  214. args.func(args)
  215. File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 1008, in launch_command
  216. multi_gpu_launcher(args)
  217. File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 666, in multi_gpu_launcher
  218. distrib_run.run(args)
  219. File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 797, in run
  220. elastic_launch(
  221. File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 134, in __call__
  222. return launch_agent(self._config, self._entrypoint, list(args))
  223. File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
  224. raise ChildFailedError(
  225. torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
  226. ============================================================
  227. train_text_to_image.py FAILED
  228. ------------------------------------------------------------
  229. Failures:
  230. [1]:
  231. time : 2023-12-09_21:02:09
  232. host : 43ec3d4110a7
  233. rank : 2 (local_rank: 2)
  234. exitcode : 1 (pid: 919)
  235. error_file: <N/A>
  236. traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
  237. [2]:
  238. time : 2023-12-09_21:02:09
  239. host : 43ec3d4110a7
  240. rank : 3 (local_rank: 3)
  241. exitcode : 1 (pid: 920)
  242. error_file: <N/A>
  243. traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
  244. [3]:
  245. time : 2023-12-09_21:02:09
  246. host : 43ec3d4110a7
  247. rank : 4 (local_rank: 4)
  248. exitcode : 1 (pid: 921)
  249. error_file: <N/A>
  250. traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
  251. [4]:
  252. time : 2023-12-09_21:02:09
  253. host : 43ec3d4110a7
  254. rank : 5 (local_rank: 5)
  255. exitcode : 1 (pid: 922)
  256. error_file: <N/A>
  257. traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
  258. [5]:
  259. time : 2023-12-09_21:02:09
  260. host : 43ec3d4110a7
  261. rank : 6 (local_rank: 6)
  262. exitcode : 1 (pid: 923)
  263. error_file: <N/A>
  264. traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
  265. [6]:
  266. time : 2023-12-09_21:02:09
  267. host : 43ec3d4110a7
  268. rank : 7 (local_rank: 7)
  269. exitcode : 1 (pid: 924)
  270. error_file: <N/A>
  271. traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
  272. ------------------------------------------------------------
  273. Root Cause (first observed failure):
  274. [0]:
  275. time : 2023-12-09_21:02:09
  276. host : 43ec3d4110a7
  277. rank : 1 (local_rank: 1)
  278. exitcode : 1 (pid: 918)
  279. error_file: <N/A>
  280. traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
  281. ============================================================
  282.  
  283.  
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement