OpenTinker's architecture consists of three core components with a streamlined three-phase communication protocol:
Architecture Components
- Client: Lightweight local interface for defining environments and submitting training jobs. Requires no local GPU.
- Scheduler & Worker Pool: Central coordinator that manages GPU resource allocation and maintains a pool of available Workers.
- Training Server (GPU Worker): Dedicated GPU-powered worker that executes model training and rollout generation.
Protocol Flow
- Job Submit: Client sends job request (model config, training/inference args) to the Scheduler.
- Allocation: Scheduler allocates GPU Worker(s) from the pool and spawns a Training/Inference Server instance.
- Data Streaming: Client establishes a link with the Training/Inference Server for real-time metrics.