sglang diffusion走读
Cheat sheet
- generate
- _send_to_scheduler_and_wait_for_response -> event_loop
scheduler::recv_reqs,self.worker.execute_forward-> pipeline.forwardbuild_pipeline- model maybe download
- get pipeline cls => e.g. PreprocessPipelineI2V
- init
executor or self.build_executor- load module
- lazy
post_initinitialize_pipelinecreate_pipeline_stages- e.g. WAN stages
- forward => pipeline forward -> self.executor.execute. e.g. SyncExecutor
class ParallelExecutor- DenoisingStage
- …
- post_process_sample
- _send_to_scheduler_and_wait_for_response -> event_loop
tips
with set_forward_context
通过一个全局变量_forward_context保存fwd需要的metadata, 保持入参整洁
通过getter和setter来访问全局变量
1 | def get_forward_context() -> Optional[ForwardContext]: |
pipeline stage
InputValidationStage
- metadata检查和整备
TextEncodingStage
- text encoder
- pos embd & neg embd
- multi encoder
- encode_text流程
- preprocess -> 直接返回str
- tokenize
- encoder fwd
- postprocess -> e.g. t5: 1)裁实际长度 2)padding 0到指定形状
ConditioningStage
目前直接返回
TimestepPreparationStage
- scheduler.set_timesteps
- FlowUniPCMultistepScheduler
LatentPreparationStage
adjust_video_length- prepare latent shape
- 宽高除一个compress ratio
- random latent tensor
1 | batch.latents = latents # random |
DenoisingStage
- prepare
- prepare sp
- prepare args
- …
- Denoising loop (for timestep)
- _select_and_manage_model
- 高噪声阶段和低噪声阶段用不同的model
- 超分
- replace: offload掉不用的
self._manage_device_placement- model.to(“cpu”)就行
scheduler.scale_model_input: 暂时直接返回attn_metadata = _build_attn_metadatanoise_pred = self._predict_noise_with_cfg(latent)- pos pass + neg pass => dit model fwd => WanTransformer3DModel
- patch embedding
- condition_embedder: time + text + image
- for blocks. fwd
latents = self.scheduler.step(model_output=noise_pred, sample=latents)- 结合新旧的latent做降噪
multistep_uni_p_bh_update
_post_denoising_loop- post sp
post_denoising_loop目前直接返回
- _select_and_manage_model
DecodingStage
vae decode: latent to pixel