# RunPod Serverless - GPU Services

## Resumen

RunPod Serverless proporciona GPUs on-demand para los servicios GRACE, PENNY y FACTORY.

**Cuenta**: rpd@tzr.systems
**Balance**: ~$69 USD
**Cuota máxima**: 5 workers simultáneos

---

## Endpoints Activos

| Servicio | Endpoint ID | Workers Max | Módulos |
|----------|-------------|-------------|---------|
| **GRACE** | `rfltzijgn1jno4` | 2 | ASR, OCR, TTS, Face, Embeddings, Avatar |
| **PENNY** | `zsu7eah0fo7xt6` | 2 | TTS (voz) |
| **FACTORY** | `hffu4q5pywjzng` | 1 | Embeddings, procesamiento |

**GPUs soportadas**: RTX 3090, RTX 4090, RTX A4000, NVIDIA L4

---

## Configuración Crítica

### Problema Resuelto (2025-12-27)

Los workers serverless no iniciaban usando la API GraphQL. La solución fue usar la **REST API** con parámetros específicos.

### API Correcta: REST (no GraphQL)

```bash
# CORRECTO - REST API
curl -X POST "https://rest.runpod.io/v1/endpoints" \
  -H "Authorization: Bearer $RUNPOD_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "mi-endpoint",
    "templateId": "TEMPLATE_ID",
    "gpuTypeIds": ["NVIDIA GeForce RTX 3090", "NVIDIA GeForce RTX 4090"],
    "scalerType": "QUEUE_DELAY",
    "scalerValue": 4,
    "workersMin": 0,
    "workersMax": 2,
    "idleTimeout": 60,
    "executionTimeoutMs": 600000,
    "flashboot": true
  }'
```

**Campos obligatorios**:
- `gpuTypeIds`: Array de strings (NO string separado por comas)
- `scalerType`: "QUEUE_DELAY"
- `scalerValue`: 4 (segundos de delay antes de escalar)
- `flashboot`: true (arranque rápido)

---

## Uso de la API

### Enviar Job

```bash
curl -X POST "https://api.runpod.ai/v2/{ENDPOINT_ID}/run" \
  -H "Authorization: $RUNPOD_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"input": {"module": "ASR_ENGINE", "audio_base64": "..."}}'
```

### Verificar Estado

```bash
curl "https://api.runpod.ai/v2/{ENDPOINT_ID}/status/{JOB_ID}" \
  -H "Authorization: $RUNPOD_API_KEY"
```

### Health Check

```bash
curl "https://api.runpod.ai/v2/{ENDPOINT_ID}/health" \
  -H "Authorization: $RUNPOD_API_KEY"
```

Respuesta esperada:
```json
{
  "jobs": {"completed": 5, "failed": 0, "inQueue": 0},
  "workers": {"idle": 1, "ready": 1, "running": 0}
}
```

---

## Módulos GRACE

| Módulo | Descripción | Modelo |
|--------|-------------|--------|
| `ASR_ENGINE` | Speech-to-Text | Faster Whisper Large V3 |
| `OCR_CORE` | Reconocimiento de texto | GOT-OCR 2.0 |
| `TTS` | Text-to-Speech | XTTS-v2 |
| `FACE_VECTOR` | Vectores faciales | InsightFace |
| `EMBEDDINGS` | Embeddings de texto | BGE-Large |
| `AVATAR_GEN` | Generación de avatares | SDXL |

---

## Docker Image

**Registry temporal**: `ttl.sh/tzzr-grace:24h` (expira en 24h)

### Rebuild

```bash
# En servidor deck (72.62.1.113)
cd /tmp/docker-grace
docker build -t ttl.sh/tzzr-grace:24h .
docker push ttl.sh/tzzr-grace:24h
```

### Dockerfile clave

```dockerfile
FROM runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
# Fix blinker antes de requirements
RUN pip install --no-cache-dir --ignore-installed blinker
RUN pip install --no-cache-dir -r requirements.txt
```

---

## Troubleshooting

### Workers no inician (0 workers)

1. Usar REST API, no GraphQL
2. Verificar `gpuTypeIds` es array
3. Incluir `flashboot: true`
4. Verificar cuota de workers no excedida

### Job en cola indefinidamente

```bash
# Purgar cola
curl -X POST "https://api.runpod.ai/v2/{ENDPOINT_ID}/purge-queue" \
  -H "Authorization: $RUNPOD_API_KEY"
```

### Error de modelo (TOS)

Agregar variable de entorno: `COQUI_TOS_AGREED=1`

---

## Credenciales

Ubicación: `/home/orchestrator/.secrets/runpod_api_key`

```bash
export RUNPOD_API_KEY=$(cat ~/.secrets/runpod_api_key)
```

---

## Referencias

- [RunPod REST API](https://docs.runpod.io/api-reference)
- [Serverless Endpoints](https://docs.runpod.io/serverless/endpoints/manage-endpoints)
- [Status Page](https://uptime.runpod.io)