哈尔滨网络公司网站建设360渠道推广系统

张小明 2025/12/30 15:19:18
哈尔滨网络公司网站建设,360渠道推广系统,天津塘沽爆炸事件,华为开发者大会最近g4f不好用了#xff0c;于是在SCNet搭建vllm跑coder模型#xff0c;以达到让Auto-coder继续发光发热的效果。 这次先用qwen32b模型试试效果。 先上结论#xff0c;这个32b模型不行。感觉不是很聪明的样子。 启动vLLM服务 先创建SCNet AI服务器 登录SCNet官网#xf…最近g4f不好用了于是在SCNet搭建vllm跑coder模型以达到让Auto-coder继续发光发热的效果。这次先用qwen32b模型试试效果。先上结论这个32b模型不行。感觉不是很聪明的样子。启动vLLM服务先创建SCNet AI服务器登录SCNet官网https://www.scnet.cn/选择dcu异步服务器先选一块卡镜像选择qwq32b_vllm 这样vllm环境就是现成的不用再去调试了。启动Vllm服务启动后进入容器先测试一下镜像自带的jupyter notebook里面的指令在notebook中启动vllm服务python app.py # port:7860启动的app.py的代码import gradio as gr from transformers import AutoTokenizer from vllm import LLM, SamplingParams # 初始化模型 tokenizer AutoTokenizer.from_pretrained(/root/public_data/model/admin/qwq-32b-gptq-int8) llm LLM(model/root/public_data/model/admin/qwq-32b-gptq-int8, tensor_parallel_size1, gpu_memory_utilization0.9, max_model_len32768) sampling_params SamplingParams(temperature0.7, top_p0.8, repetition_penalty1.05, max_tokens512) # 定义推理函数 def generate_response(prompt): # 使用模型生成回答 # prompt How many rs are in the word \strawberry\ messages [ {role: user, content: prompt} ] text tokenizer.apply_chat_template( messages, tokenizeFalse, add_generation_promptTrue ) # generate outputs outputs llm.generate([text], sampling_params) # 提取生成的文本 response outputs[0].outputs[0].text return response # 创建 Gradio 界面 def create_interface(): with gr.Blocks() as demo: gr.Markdown(# Qwen/QwQ-32B 大模型问答系统) with gr.Row(): input_text gr.Textbox(label输入你的问题, placeholder请输入问题..., lines3) output_text gr.Textbox(label模型的回答, lines5, interactiveFalse) submit_button gr.Button(提交) submit_button.click(fngenerate_response, inputsinput_text, outputsoutput_text) return demo # 启动 Gradio 应用 if __name__ __main__: demo create_interface() demo.launch(server_name0.0.0.0, shareTrue, debugTrue)可以看到是直接从公共目录调用的模型所以不用再去下载了。5分钟模型就读取好了。8分钟服务就起来了INFO 12-11 08:20:28 model_runner.py:1041] Starting to load model /root/public_data/model/admin/qwq-32b-gptq-int8... INFO 12-11 08:20:28 selector.py:121] Using ROCmFlashAttention backend. Loading safetensors checkpoint shards: 0% Completed | 0/8 [00:00?, ?it/s] Loading safetensors checkpoint shards: 12% Completed | 1/8 [00:3403:58, 34.04s/it] Loading safetensors checkpoint shards: 25% Completed | 2/8 [01:2104:12, 42.13s/it] Loading safetensors checkpoint shards: 38% Completed | 3/8 [02:1003:46, 45.34s/it] Loading safetensors checkpoint shards: 50% Completed | 4/8 [02:5703:02, 45.61s/it] Loading safetensors checkpoint shards: 62% Completed | 5/8 [03:4102:15, 45.10s/it] Loading safetensors checkpoint shards: 75% Completed | 6/8 [04:3101:33, 46.72s/it] Loading safetensors checkpoint shards: 88% Completed | 7/8 [05:0000:41, 41.16s/it] Loading safetensors checkpoint shards: 100% Completed | 8/8 [05:0400:00, 29.21s/it] Loading safetensors checkpoint shards: 100% Completed | 8/8 [05:0400:00, 38.05s/it] INFO 12-11 08:25:34 model_runner.py:1052] Loading model weights took 32.8657 GB INFO 12-11 08:26:58 gpu_executor.py:122] # GPU blocks: 4291, # CPU blocks: 1024 INFO 12-11 08:27:16 model_runner.py:1356] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set enforce_eagerTrue or use --enforce-eager in the CLI. INFO 12-11 08:27:16 model_runner.py:1360] CUDA graphs can take additional 1~3 GiB memory per GPU. If you are running out of memory, consider decreasing gpu_memory_utilization or enforcing eager mode. You can also reduce the max_num_seqs as needed to decrease memory usage. INFO 12-11 08:28:18 model_runner.py:1483] Graph capturing finished in 62 secs. * Running on local URL: http://0.0.0.0:7860 * Running on public URL: https://ad18c32dd20881d8aa.gradio.live This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run gradio deploy from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)使用了gradio好处就是直接外网就可以访问服务也就是这个* Running on local URL: http://0.0.0.0:7860 * Running on public URL: https://ad18c32dd20881d8aa.gradio.live从外网用浏览器打开页面问了它这个问题请帮我思考一下我想用一块64G的dcu 跑大模型api调用的服务主要用于ai自动化编程我应该用vllm启动哪个大模型感觉它的回答不行答案就不贴了它的回答全是考虑没有结论不知道是不是token不够长的缘故命令行直接VLLM启动服务不死心在命令行启动服务以便api调用直接用vllm命令启动服务vllm serve /root/public_data/model/admin/qwq-32b-gptq-int8 --gpu_memory_utilization 0.95 --max_model_len 105152启动后把8000端口映射出去映射到这里https://c-1998910428559491073.ksai.scnet.cn:58043/v1/models显示{object:list,data:[{id:/root/public_data/model/admin/qwq-32b-gptq-int8,object:model,created:1765416800,owned_by:vllm,root:/root/public_data/model/admin/qwq-32b-gptq-int8,parent:null,max_model_len:105152,permission:[{id:modelperm-17616f8047064f4dac923291dd0ce429,object:model_permission,created:1765416800,allow_create_engine:false,allow_sampling:true,allow_logprobs:true,allow_search_indices:false,allow_view:true,allow_fine_tuning:false,organization:*,group:null,is_blocking:false}]}]}这样这个模型的名字是/root/public_data/model/admin/qwq-32b-gptq-int8模型base_url是https://c-1998910428559491073.ksai.scnet.cn:58043/v1/模型的token key可以随便写比如hello现在就可以用CherryStudio测试一下了CherryStudio测试通过证明api调用正常在Auto-coder中调用启动Auto-coderauto-coder.chat配置模型/models /add_model nameqwq-32b-gptq-int8 model_name/root/public_data/model/admin/qwq-32b-gptq-int8 base_urlhttps://c-1998910428559491073.ksai.scnet.cn:58043/v1/ api_keyhello /conf model:qwq-32b-gptq-int8注意有时候需要用add_provider这句/models /add_provider nameqwq-32b-gptq-int8 model_name/root/public_data/model/admin/qwq-32 b-gptq-int8 base_urlhttps://c-1998910428559491073.ksai.scnet.cn:58043/v1/ api_keyhello添加完毕codingauto-coder.chat:~$ /models /add_model nameqwq-32b-gptq-int8 model_name/root/public_data/model/admin/qwq-32b-gptq-int8 b ase_urlhttps://c-1998910428559491073.ksai.scnet.cn:58043/v1/ api_keyhello Successfully added custom model: qwq-32b-gptq-int8 codingauto-coder.chat:~$ /conf model:qwq-32b-gptq-int8 Configuration updated: model qwq-32b-gptq-int8不行它还是傻傻的不够格啊codingauto-coder.chat:~$ 帮我做一个chrome和edge的浏览器翻译插件要求能选词翻译能翻译整个网页。 翻译功能使用openai调用ai大模型 实现要求能配置常见的几款大模型并能自定义兼容openai的大模型。 ────────────────────────────────────────────── Starting Agentic Edit: autocoderwork ─────────────────────────────────────────────── ╭─────────────────────────────────────────────────────────── Objective ───────────────────────────────────────────────────────────╮ │ User Query: │ │ 帮我做一个chrome和edge的浏览器翻译插件要求能选词翻译能翻译整个网页。 │ │ 翻译功能使用openai调用ai大模型实现要求能配置常见的几款大模型并能自定义兼容openai的大模型。 │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ wsl: Failed to start the systemd user session for skywalk. See journalctl for more details. Conversation ID: 4cbaf28c-bdce-410e-9f08-d6619efef059 conversation tokens: 19124 (conversation round: 1) Student: I need help I want to know about the following Please write a story about a girl named Alice who went to the market to buy apples and oranges. She went to the market with her mother to buy apples and oranges. When she arrived at the market, she saw that the apples were expensive and the oranges were cheap. She bought some apples and oranges. She went home and her mother cooked them. She was happy. /think /think /think /think /think /think /think /think /think /think /think再换另一台电脑还是不行都变成复读机了def main(): This function is used to get the main function of this module return self def __init__(self): pass def main(): This function is used to get the main function of this module return self def __init__(self): pass def main(): This function is used to get the main function of this module return self def __init__(self): pass def main(): This function is used to get the main function of this module return self def __init__(self)^C──────────────────────────────────────────────── Agentic Edit Finished ─────────────────────────────────────────────────所以qwq-32b-gptq-int8这个模型达不到Auto-Coder的要求。或者说它智力上达不到要求另外它不支持function call也达不到要求。下次实践目标下回我想运行的是这个模型Qwen/Qwen3-Coder-30B-A3B-Instruct先到SCNet的模型广场找到它。然后把它克隆至控制台也就是这个地址/public/home/ac7sc1ejvp/SothisAI/model/Aihub/Qwen3-Coder-30B-A3B-Instruct/main/Qwen3-Coder-30B-A3B-Instructvllm启动vllm serve /public/home/ac7sc1ejvp/SothisAI/model/Aihub/Qwen3-Coder-30B-A3B-Instruct/main/Qwen3-Coder-30B-A3B-Instruct至于效果如何请看下回分解调试vllm serve启动报错vllm serve /root/public_data/model/admin/qwq-32b-gptq-int8Loading safetensors checkpoint shards: 100% Completed | 8/8 [04:4700:00, 35.90s/it] INFO 12-11 08:53:55 model_runner.py:1052] Loading model weights took 32.8657 GB INFO 12-11 08:54:03 gpu_executor.py:122] # GPU blocks: 5753, # CPU blocks: 1024 Process SpawnProcess-1: Traceback (most recent call last): File /opt/conda/lib/python3.10/multiprocessing/process.py, line 314, in _bootstrap self.run() File /opt/conda/lib/python3.10/multiprocessing/process.py, line 108, in run self._target(*self._args, **self._kwargs) File /opt/conda/lib/python3.10/site-packages/vllm/engine/multiprocessing/engine.py, line 388, in run_mp_engine engine MQLLMEngine.from_engine_args(engine_argsengine_args, File /opt/conda/lib/python3.10/site-packages/vllm/engine/multiprocessing/engine.py, line 138, in from_engine_args return cls( File /opt/conda/lib/python3.10/site-packages/vllm/engine/multiprocessing/engine.py, line 78, in __init__ self.engine LLMEngine(*args, File /opt/conda/lib/python3.10/site-packages/vllm/engine/llm_engine.py, line 339, in __init__ self._initialize_kv_caches() File /opt/conda/lib/python3.10/site-packages/vllm/engine/llm_engine.py, line 487, in _initialize_kv_caches self.model_executor.initialize_cache(num_gpu_blocks, num_cpu_blocks) File /opt/conda/lib/python3.10/site-packages/vllm/executor/gpu_executor.py, line 125, in initialize_cache self.driver_worker.initialize_cache(num_gpu_blocks, num_cpu_blocks) File /opt/conda/lib/python3.10/site-packages/vllm/worker/worker.py, line 258, in initialize_cache raise_if_cache_size_invalid(num_gpu_blocks, File /opt/conda/lib/python3.10/site-packages/vllm/worker/worker.py, line 493, in raise_if_cache_size_invalid raise ValueError( ValueError: The models max seq len (131072) is larger than the maximum number of tokens that can be stored in KV cache (92048). Try increasing gpu_memory_utilization or decreasing max_model_len when initializing the engine. I1211 08:54:04.332280 2611 ProcessGroupNCCL.cpp:1126] [PG 0 Rank 0] ProcessGroupNCCL destructor entered. I1211 08:54:04.332350 2611 ProcessGroupNCCL.cpp:1111] [PG 0 Rank 0] Launching ProcessGroupNCCL abort asynchrounously. I1211 08:54:04.332547 2611 ProcessGroupNCCL.cpp:1016] [PG 0 Rank 0] future is successfully executed for: ProcessGroup abort I1211 08:54:04.332578 2611 ProcessGroupNCCL.cpp:1117] [PG 0 Rank 0] ProcessGroupNCCL aborts successfully. I1211 08:54:04.332683 2611 ProcessGroupNCCL.cpp:1149] [PG 0 Rank 0] ProcessGroupNCCL watchdog thread joined. I1211 08:54:04.332782 2611 ProcessGroupNCCL.cpp:1153] [PG 0 Rank 0] ProcessGroupNCCL heart beat monitor thread joined. Traceback (most recent call last): File /opt/conda/bin/vllm, line 8, in module sys.exit(main()) File /opt/conda/lib/python3.10/site-packages/vllm/scripts.py, line 165, in main args.dispatch_function(args) File /opt/conda/lib/python3.10/site-packages/vllm/scripts.py, line 37, in serve uvloop.run(run_server(args)) File /opt/conda/lib/python3.10/site-packages/uvloop/__init__.py, line 82, in run return loop.run_until_complete(wrapper()) File uvloop/loop.pyx, line 1518, in uvloop.loop.Loop.run_until_complete File /opt/conda/lib/python3.10/site-packages/uvloop/__init__.py, line 61, in wrapper return await main File /opt/conda/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py, line 538, in run_server async with build_async_engine_client(args) as engine_client: File /opt/conda/lib/python3.10/contextlib.py, line 199, in __aenter__ return await anext(self.gen) File /opt/conda/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py, line 105, in build_async_engine_client async with build_async_engine_client_from_engine_args( File /opt/conda/lib/python3.10/contextlib.py, line 199, in __aenter__ return await anext(self.gen) File /opt/conda/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py, line 192, in build_async_engine_client_from_engine_args raise RuntimeError( RuntimeError: Engine process failed to start重点是这两句raise_if_cache_size_invalid(num_gpu_blocks,File /opt/conda/lib/python3.10/site-packages/vllm/worker/worker.py, line 493, in raise_if_cache_size_invalidraise ValueError(ValueError: The models max seq len (131072) is larger than the maximum number of tokens that can be stored in KV cache (92048). Try increasing gpu_memory_utilization or decreasing max_model_len when initializing the engine.也就是提高gpu_memory_utilization就行 --gpu_memory_utilization 0.95vllm serve /root/public_data/model/admin/qwq-32b-gptq-int8 --gpu_memory_utilization 0.95这回稍微好一点了ValueError: The models max seq len (131072) is larger than the maximum number of tokens that can be stored in KV cache (105152). Try increasing gpu_memory_utilization or decreasing max_model_len when initializing the engine.我再调成0.98试试不行就降低max_model_len 它降低为105152 或101866vllm serve /root/public_data/model/admin/qwq-32b-gptq-int8 --gpu_memory_utilization 0.95 --max_model_len 105152ok了
版权声明:本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若内容造成侵权/违法违规/事实不符,请联系邮箱:809451989@qq.com进行投诉反馈,一经查实,立即删除!

汽车网站图片购物网站制作公司

在上节课 少儿编程Scratch3.0教程——03 外观积木(基础知识) 中,你学习了外观积木的用法,这节课我们就使用它们来一起完成一个动画,狗熊变兔子。老样子,我们还是先来看看做出来的动画效果。有一个人&#x…

张小明 2025/12/23 2:18:38 网站建设

网站建设 容易吗天津网站建设价位

✅ 一、核心目标&#xff1a;把带有 Component&#xff08;及其派生注解如 Service&#xff09;的类&#xff0c;自动注册为 Spring 容器中的 Bean&#xff0c;无需手动写 <bean> 标签。✅ 二、实现原理&#xff08;关键流程&#xff09; 步骤 1&#xff1a;启用组件扫描…

张小明 2025/12/30 2:58:11 网站建设

网站策划 要求公司起名字大全免费两个字

在高校的自习室、图书馆、甚至宿舍深夜的台灯下&#xff0c;无数本科生、研究生正与毕业论文“死磕”。他们不是缺乏知识&#xff0c;也不是不够努力&#xff0c;而是困在了从“知道”到“写出来”的鸿沟之间。文献读了一堆&#xff0c;却不知如何组织&#xff1b;实验做了几轮…

张小明 2025/12/23 2:16:35 网站建设

做网站能赚钱吗知乎奥迪汽车建设网站

文章目录环境文档用途详细信息环境 系统平台&#xff1a;N/A 版本&#xff1a;5.6.5 文档用途 用于介绍使用jdbc方式连接瀚高数据库时&#xff0c;如何将瀚高数据库驱动包添加到应用程序代码中。 详细信息 说明&#xff1a; 瀚高数据库jdbc驱动因数据库版本而异、因jdk版…

张小明 2025/12/23 2:15:33 网站建设

临猗商城网站建设平台开发公司年度工作总结

在大模型技术飞速迭代的今天&#xff0c;智能体&#xff08;Agent&#xff09;已经从概念走向实际应用&#xff0c;成为连接技术与场景的核心桥梁。无论是数据分析、网页交互还是复杂任务自动化&#xff0c;智能体都在展现出强大的潜力。但对于广大开发者而言&#xff0c;智能体…

张小明 2025/12/23 2:14:32 网站建设

已备案网站域名网杭州做网站比较出名的公司有哪些

当测试用例跨越晨昏线 "早安&#xff0c;我这边冒烟测试通过了" "晚安&#xff0c;我这里性能测试报告已上传" 这样的对话在跨国测试团队中司空见惯。当上海的程序员开始一天的工作时&#xff0c;硅谷的同事正准备下班&#xff1b;当柏林的测试工程师提交…

张小明 2025/12/23 2:12:29 网站建设