Advertisement
Morkeleb

koboldcpp_rocm log

Dec 8th, 2024
15
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 7.85 KB | Software | 0 0
  1. ***
  2. Welcome to KoboldCpp - Version 1.79.1.yr0-ROCm
  3. For command line arguments, please refer to --help
  4. ***
  5. Auto Selected HIP Backend...
  6.  
  7. Auto Recommended GPU Layers: 24
  8. Attempting to use hipBLAS library for faster prompt ingestion. A compatible AMD GPU will be required.
  9. Initializing dynamic library: koboldcpp_hipblas.dll
  10. ==========
  11. Namespace(model='', model_param='E:/LargeLanguageModels/EVA-Qwen2.5-32B-v0.2-Q4_K_S.gguf', port=5001, port_param=5001, host='', launch=False, config=None, threads=7, usecublas=['normal', '0'], usevulkan=None, useclblast=None, usecpu=False, contextsize=16384, gpulayers=24, tensor_split=None, checkforupdates=False, ropeconfig=[0.0, 10000.0], blasbatchsize=512, blasthreads=7, lora=None, noshift=False, nofastforward=False, nommap=False, usemlock=False, noavx2=False, debugmode=0, onready='', benchmark=None, prompt='', promptlimit=100, multiuser=1, multiplayer=False, remotetunnel=False, highpriority=False, foreground=False, preloadstory=None, quiet=False, ssl=None, nocertify=False, mmproj=None, draftmodel=None, draftamount=8, password=None, ignoremissing=False, chatcompletionsadapter=None, flashattention=False, quantkv=0, forceversion=0, smartcontext=False, unpack='', nomodel=False, showgui=False, skiplauncher=False, hordemodelname='', hordeworkername='', hordekey='', hordemaxctx=0, hordegenlen=0, sdmodel='', sdthreads=7, sdclamped=0, sdt5xxl='', sdclipl='', sdclipg='', sdvae='', sdvaeauto=False, sdquant=False, sdlora='', sdloramult=1.0, whispermodel='', hordeconfig=None, sdconfig=None, noblas=False)
  12. ==========
  13. Loading model: E:\LargeLanguageModels\EVA-Qwen2.5-32B-v0.2-Q4_K_S.gguf
  14.  
  15. The reported GGUF Arch is: qwen2
  16. Arch Category: 5
  17.  
  18. ---
  19. Identified as GGUF model: (ver 6)
  20. Attempting to Load...
  21. ---
  22. Using automatic RoPE scaling for GGUF. If the model has custom RoPE settings, they'll be used directly instead!
  23. It means that the RoPE values written above will be replaced by the RoPE values indicated after loading.
  24. System Info: AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | AMX_INT8 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | RISCV_VECT = 0 | WASM_SIMD = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 0 |
  25. CUBLAS: Warning, you are running Qwen2 without Flash Attention and may observe incoherent output.
  26. ---
  27. Initializing CUDA/HIP, please wait, the following step may take a few minutes for first launch...
  28. ---
  29. ggml_cuda_init: found 1 ROCm devices:
  30. Device 0: AMD Radeon RX 7900 XT, compute capability 11.0, VMM: no
  31. llama_load_model_from_file: using device ROCm0 (AMD Radeon RX 7900 XT) - 20106 MiB free
  32. llama_model_loader: loaded meta data with 37 key-value pairs and 771 tensors from E:\LargeLanguageModels\EVA-Qwen2.5-32B-v0.2-Q4Ƀ^C²llm_load_vocab: special tokens cache size = 22
  33. llm_load_vocab: token to piece cache size = 0.9310 MB
  34. llm_load_print_meta: format = GGUF V3 (latest)
  35. llm_load_print_meta: arch = qwen2
  36. llm_load_print_meta: vocab type = BPE
  37. llm_load_print_meta: n_vocab = 152064
  38. llm_load_print_meta: n_merges = 151387
  39. llm_load_print_meta: vocab_only = 0
  40. llm_load_print_meta: n_ctx_train = 131072
  41. llm_load_print_meta: n_embd = 5120
  42. llm_load_print_meta: n_layer = 64
  43. llm_load_print_meta: n_head = 40
  44. llm_load_print_meta: n_head_kv = 8
  45. llm_load_print_meta: n_rot = 128
  46. llm_load_print_meta: n_swa = 0
  47. llm_load_print_meta: n_embd_head_k = 128
  48. llm_load_print_meta: n_embd_head_v = 128
  49. llm_load_print_meta: n_gqa = 5
  50. llm_load_print_meta: n_embd_k_gqa = 1024
  51. llm_load_print_meta: n_embd_v_gqa = 1024
  52. llm_load_print_meta: f_norm_eps = 0.0e+00
  53. llm_load_print_meta: f_norm_rms_eps = 1.0e-05
  54. llm_load_print_meta: f_clamp_kqv = 0.0e+00
  55. llm_load_print_meta: f_max_alibi_bias = 0.0e+00
  56. llm_load_print_meta: f_logit_scale = 0.0e+00
  57. llm_load_print_meta: n_ff = 27648
  58. llm_load_print_meta: n_expert = 0
  59. llm_load_print_meta: n_expert_used = 0
  60. llm_load_print_meta: causal attn = 1
  61. llm_load_print_meta: pooling type = 0
  62. llm_load_print_meta: rope type = 2
  63. llm_load_print_meta: rope scaling = linear
  64. llm_load_print_meta: freq_base_train = 1000000.0
  65. llm_load_print_meta: freq_scale_train = 1
  66. llm_load_print_meta: n_ctx_orig_yarn = 131072
  67. llm_load_print_meta: rope_finetuned = unknown
  68. llm_load_print_meta: ssm_d_conv = 0
  69. llm_load_print_meta: ssm_d_inner = 0
  70. llm_load_print_meta: ssm_d_state = 0
  71. llm_load_print_meta: ssm_dt_rank = 0
  72. llm_load_print_meta: ssm_dt_b_c_rms = 0
  73. llm_load_print_meta: model type = 32B
  74. llm_load_print_meta: model ftype = all F32
  75. llm_load_print_meta: model params = 32.76 B
  76. llm_load_print_meta: model size = 17.49 GiB (4.59 BPW)
  77. llm_load_print_meta: general.name = Qwen2.5 32B
  78. llm_load_print_meta: BOS token = 11 ','
  79. llm_load_print_meta: EOS token = 151643 '<|endoftext|>'
  80. llm_load_print_meta: EOT token = 151645 '<|im_end|>'
  81. llm_load_print_meta: PAD token = 151643 '<|endoftext|>'
  82. llm_load_print_meta: LF token = 148848 'ÄĬ'
  83. llm_load_print_meta: FIM PRE token = 151659 '<|fim_prefix|>'
  84. llm_load_print_meta: FIM SUF token = 151661 '<|fim_suffix|>'
  85. llm_load_print_meta: FIM MID token = 151660 '<|fim_middle|>'
  86. llm_load_print_meta: FIM PAD token = 151662 '<|fim_pad|>'
  87. llm_load_print_meta: FIM REP token = 151663 '<|repo_name|>'
  88. llm_load_print_meta: FIM SEP token = 151664 '<|file_sep|>'
  89. llm_load_print_meta: EOG token = 151643 '<|endoftext|>'
  90. llm_load_print_meta: EOG token = 151645 '<|im_end|>'
  91. llm_load_print_meta: EOG token = 151662 '<|fim_pad|>'
  92. llm_load_print_meta: EOG token = 151663 '<|repo_name|>'
  93. llm_load_print_meta: EOG token = 151664 '<|file_sep|>'
  94. llm_load_print_meta: max token length = 256
  95. llm_load_tensors: tensor 'token_embd.weight' (q4_K) (and 482 others) cannot be used with preferred buffer type ROCm_Host, using ðÿoF²(This is not an error, it just means some tensors will use CPU instead.)
  96. llm_load_tensors: offloading 24 repeating layers to GPU
  97. llm_load_tensors: offloaded 24/65 layers to GPU
  98. llm_load_tensors: CPU_Mapped model buffer size = 11629.41 MiB
  99. llm_load_tensors: ROCm0 model buffer size = 6279.09 MiB
  100. .................................................................................................
  101. Automatic RoPE Scaling: Using model internal value.
  102. llama_new_context_with_model: n_seq_max = 1
  103. llama_new_context_with_model: n_ctx = 16512
  104. llama_new_context_with_model: n_ctx_per_seq = 16512
  105. llama_new_context_with_model: n_batch = 512
  106. llama_new_context_with_model: n_ubatch = 512
  107. llama_new_context_with_model: flash_attn = 0
  108. llama_new_context_with_model: freq_base = 1000000.0
  109. llama_new_context_with_model: freq_scale = 1
  110. llama_new_context_with_model: n_ctx_per_seq (16512) < n_ctx_train (131072) -- the full capacity of the model will not be utilizep¿¥▄llama_kv_cache_init: CPU KV buffer size = 2580.00 MiB
  111. llama_kv_cache_init: ROCm0 KV buffer size = 1548.00 MiB
  112. llama_new_context_with_model: KV self size = 4128.00 MiB, K (f16): 2064.00 MiB, V (f16): 2064.00 MiB
  113. llama_new_context_with_model: CPU output buffer size = 0.58 MiB
  114. llama_new_context_with_model: ROCm0 compute buffer size = 1416.77 MiB
  115. llama_new_context_with_model: ROCm_Host compute buffer size = 42.26 MiB
  116. llama_new_context_with_model: graph nodes = 2246
  117. llama_new_context_with_model: graph splits = 564 (with bs=512), 3 (with bs=1)
  118. Load Text Model OK: True
  119. Embedded KoboldAI Lite loaded.
  120. Embedded API docs loaded.
  121. Starting Kobold API on port 5001 at http://localhost:5001/api/
  122. Starting OpenAI Compatible API on port 5001 at http://localhost:5001/v1/
  123. ======
  124. Please connect to custom endpoint at http://localhost:5001
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement