llm vllm

route

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
INFO llm_engine.py:436] init engine (profile, create kv cache, warmup model) took 135.23 seconds
INFO api_server.py:958] Starting vLLM API server on http://0.0.0.0:8000
Available routes are:
Route: /openapi.json, Methods: GET, HEAD
Route: /docs, Methods: GET, HEAD
Route: /docs/oauth2-redirect, Methods: GET, HEAD
Route: /redoc, Methods: GET, HEAD
Route: /health, Methods: GET
Route: /ping, Methods: GET, POST
Route: /tokenize, Methods: POST
Route: /detokenize, Methods: POST
Route: /v1/models, Methods: GET
Route: /version, Methods: GET
Route: /v1/chat/completions, Methods: POST
Route: /v1/completions, Methods: POST
Route: /v1/embeddings, Methods: POST
Route: /pooling, Methods: POST
Route: /score, Methods: POST
Route: /v1/score, Methods: POST
Route: /v1/audio/transcriptions, Methods: POST
Route: /rerank, Methods: POST
Route: /v1/rerank, Methods: POST
Route: /v2/rerank, Methods: POST
Route: /invocations, Methods: POST
INFO: Started server process [1]
INFO: Waiting for application startup.
INFO: Application startup complete.