- Problem
- Industrial fleet telemetry across 700+ vehicles arrives fragmented, asynchronous, and model-specific; raw payloads have to become production evidence.
- Role
- Team delivery. Stated scope: InfluxDB ops, mqtt → InfluxDB Converter refactor, expression-evaluation visualization, and Celery module ops.
- System
- Webhook ingest, Bridge InfluxDB as raw-hex landing/replay storage, multi-process converter, measurement InfluxDB, Celery batch, Avro/GCS output.
- Transfer
- The same P1-P3 primitives map to robot teleop data, manufacturing-line telemetry, and foundation-data pipelines.
- Proof
- 1 year 5 months of production operation across a 700+ vehicle fleet. Pipeline performance validated through a heterogeneous asynchronous telemetry load test at 7,000 events/sec (7k TPS).
[vehicle terminal] → Webhook → Bridge InfluxDB
→ V2InfluxConverterProcess (multi-process)
→ Measurement InfluxDB → Celery batch → Avro/GCS
Bridge InfluxDB was not a Kafka replacement; it acted as a time-series landing / replay layer for raw hex payloads so failed conversions could be reprocessed. The converter then normalized those payloads into measurement InfluxDB. The pipeline was designed and tested to absorb a 7,000 events/sec asynchronous event-stream load without becoming the bottleneck.
Load-test boundary
- Fleet scale: 700+ vehicles in production
- Throughput: 7k events/sec synthetic heterogeneous telemetry
- Stage: raw landing → converter normalization → measurement write path
- Validation: parser stability, bounded memory, replay path
- Not claimed: Kafka-style consumer groups or exactly-once semantics
- Tier 1 (ingest): Django / Flask webhook · raw hex payload preserved
- Tier 2 (decode): ISO-TP reassembly + expression DSL + 4-pack BMS alignment
- Tier 3 (analytics): Celery module plug-ins (summary / driving_score / submatrix / avro)
- Tier 4 (output): measurement InfluxDB + Avro on GCS
Production reliability
- Failure-time reprocessing SLA operated in production. When the converter or measurement-tier InfluxDB hit transient outages, Bridge InfluxDB's raw-hex landing layer drove an automatic replay path within the defined SLA.
- Metadata DB on MHA (MySQL High Availability) for zero-loss operation. Automatic failover under master failures, with no data loss in production.
Why this transfers to robot / manufacturing
| Industrial vehicle fleet | Robot / manufacturing |
|---|
| 4-pack BMS async signals per vehicle | 30+ joints + F/T + vision per robot · N machines per line |
| CAN ISO-TP multi-frame | ROS2 chunked / OPC-UA chunked |
| Per-model .dbc / Excel DSL | Per-robot URDF / per-PLC vendor protocol |
- Contribution
- mqtt → InfluxDB Converter refactor · expression-evaluation visualization · Celery module ops
- Period
- 1 year 5 months