Release v1.20.0 #11

Open
Gravity Bot wants to merge 46 commits from release/1.20.0 into main
Collaborator

Release v1.20.0 — Multi-Provider Dispatch

Changes

Plugin Architecture (OP#1499)

  • OP#1500: Extract PMClient interface from gravity-pm HTTP calls
  • OP#1507: Define core Plugin interface and Dependencies struct
  • OP#1508: Create core/ package with shared types (Event, Job, Config)
  • OP#1509: Refactor dispatch system into plugins/dispatch/
  • OP#1510: Implement plugin registry with lifecycle management in main.go
  • OP#1511: Test plugin lifecycle, isolation, and dependency injection

Multi-Provider Dispatch (OP#1292)

  • OP#1293: Extract shared prompt formatting
  • OP#1294: Add Provider and Model fields to Handler and payload types
  • OP#1295: Refactor Processor to DispatchRegistry
  • OP#1296: Make Submitter provider-aware
  • OP#1297: Implement Anthropic Messages API provider
  • OP#1298: Implement OpenAI-compatible API provider
  • OP#1299: Implement Ollama native API provider
  • OP#1300: Wire provider registry and concurrency in main.go
  • OP#1301: Test provider dispatch

Multi-Turn Tool Execution (OP#1302)

  • OP#1303: Implement provider-neutral tool registry
  • OP#1304: Implement MCP client for tool discovery and execution
  • OP#1305: Add multi-turn tool loop to Anthropic provider
  • OP#1306: Add multi-turn tool loop to OpenAI provider
  • OP#1307: Add multi-turn tool loop to Ollama provider
  • OP#1308: Test tool loop with mock MCP

Portable Skill System (OP#1309)

  • OP#1310: Implement skill definition format and loader
  • OP#1311: Wire skills into handler registration and dispatch
  • OP#1312: Test skill loading and integration

API & Quality

  • OP#1291: Simplify job submission API: remove event dependency, add traceback fields
  • OP#1775: Address deferred PR review findings from v1.19.2

Checklist

  • All version tasks closed in Gravity PM
  • Tests passing (go test ./apps/event-engine/...)
  • Vet passing (go vet ./apps/event-engine/...)
  • No hardcoded credentials in tracked files
  • Draft Forgejo release created

References

Version: v1.20 — Multi-Provider Dispatch (Gravity PM)
Draft Release: v1.20.0

## Release v1.20.0 — Multi-Provider Dispatch ### Changes **Plugin Architecture (OP#1499)** - OP#1500: Extract PMClient interface from gravity-pm HTTP calls - OP#1507: Define core Plugin interface and Dependencies struct - OP#1508: Create core/ package with shared types (Event, Job, Config) - OP#1509: Refactor dispatch system into plugins/dispatch/ - OP#1510: Implement plugin registry with lifecycle management in main.go - OP#1511: Test plugin lifecycle, isolation, and dependency injection **Multi-Provider Dispatch (OP#1292)** - OP#1293: Extract shared prompt formatting - OP#1294: Add Provider and Model fields to Handler and payload types - OP#1295: Refactor Processor to DispatchRegistry - OP#1296: Make Submitter provider-aware - OP#1297: Implement Anthropic Messages API provider - OP#1298: Implement OpenAI-compatible API provider - OP#1299: Implement Ollama native API provider - OP#1300: Wire provider registry and concurrency in main.go - OP#1301: Test provider dispatch **Multi-Turn Tool Execution (OP#1302)** - OP#1303: Implement provider-neutral tool registry - OP#1304: Implement MCP client for tool discovery and execution - OP#1305: Add multi-turn tool loop to Anthropic provider - OP#1306: Add multi-turn tool loop to OpenAI provider - OP#1307: Add multi-turn tool loop to Ollama provider - OP#1308: Test tool loop with mock MCP **Portable Skill System (OP#1309)** - OP#1310: Implement skill definition format and loader - OP#1311: Wire skills into handler registration and dispatch - OP#1312: Test skill loading and integration **API & Quality** - OP#1291: Simplify job submission API: remove event dependency, add traceback fields - OP#1775: Address deferred PR review findings from v1.19.2 ### Checklist - [x] All version tasks closed in Gravity PM - [x] Tests passing (go test ./apps/event-engine/...) - [x] Vet passing (go vet ./apps/event-engine/...) - [x] No hardcoded credentials in tracked files - [x] Draft Forgejo release created ### References Version: v1.20 — Multi-Provider Dispatch (Gravity PM) Draft Release: v1.20.0
Closes OP#1313

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Define core/ package with domain types (WorkPackage, Comment, TimeEntry,
Version, Project, WikiPage, News) and PMClient interface abstracting all
PM operations. Implement GravityPMClient as the default HTTP client
targeting gravity-pm:8910/api/v1. All plugins that interact with PM
data will go through this interface, establishing the abstraction
boundary for future provider swaps (Linear, Jira, OpenProject).

Closes OP#1500

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add Plugin interface (Name, Init, Start, Stop) for event-engine
subsystems. Define Dependencies struct providing shared resources
(NATS conn, JetStream, DB pool, EventPublisher, Logger, PMClient,
Prometheus Registry). Define Config struct with global settings and
per-plugin Sections map. This establishes the plugin lifecycle
contract used by all v1.20+ subsystems.

Closes OP#1507

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Promote canonical types to core/ so plugins depend on core/ rather
than importing engine/ or store/ directly. Original packages use type
aliases (Event = core.Event, etc.) for full backward compatibility —
all existing code continues to work without changes.

New files: core/event.go (Event, Handler), core/job.go (Job, JobFilter)
Modified: engine/engine.go, store/model.go (replaced structs with aliases)

Closes OP#1508

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Create dispatch Plugin implementing core.Plugin that orchestrates the
entire agent dispatch pipeline: store, processor, consumer, limiter,
reaper, and submitter. Extracts orchestration logic previously in
main.go's setupDatabase into a proper plugin lifecycle (Init/Start/Stop).
Exposes Store(), Producer(), and Submitter() for external wiring.

Closes OP#1509

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Create core/registry.go with Registry managing plugin lifecycle:
Register, InitAll, StartAll (with rollback), StopAll (reverse order),
and Get for lookup. Refactor main.go to use the registry: extract
setupInfrastructure for DB/migrations/eventlog, register dispatch
plugin, wire submitter/store/producer via plugin accessors. The old
setupDatabase function is replaced by the plugin lifecycle.

Closes OP#1510

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Test init order, start order, reverse stop order, init-fail-stops-early,
start-fail-rollback, stop-continues-on-error, duplicate registration
rejection, Get lookup, and Plugins copy semantics. 9 new registry tests
plus 2 existing dispatch plugin tests and full suite verification.

Closes OP#1511

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Flatten REST API payload so callers provide command, prompt, body directly
without wrapping in an event object. Add correlation_id, work_package_id,
and result_destination columns to engine_jobs for tracing and result routing.

- Remove event nesting from ClaudeDispatchPayload
- Add migration 002_add_traceback_fields.sql with partial indexes
- Update store/jobs.go for new columns in jobColumns, scanJob, CreateJob
- Flatten handler SubmitJobRequest with command/prompt/body fields
- Handler auto-constructs ClaudeDispatchPayload internally
- Processor builds synthetic event for dispatcher compatibility
- Update all tests for new flat payload format

Closes OP#1291

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move BuildPrompt from sanitize.go to prompt.go and add FormatEventContext
for HTTP providers that separate system prompt from user message. Claude
CLI uses BuildPrompt (template + context); API providers will use
FormatEventContext (context only) with their own system prompt handling.

Closes OP#1293

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add Provider and Model to Handler for provider routing and per-handler
model selection. Add AnthropicPayload, OpenAIPayload, and OllamaPayload
structs alongside existing ClaudeDispatchPayload. Empty Provider defaults
to claude_dispatch for backward compatibility.

Closes OP#1294

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the Processor's single dispatcher field with a DispatchRegistry
map[string]engine.Dispatcher for provider routing. dispatch() now does a
registry lookup instead of a switch statement. decodeDispatchPayload
extracts event and handler from any payload type using shared common
fields. extractTimeout generalized to work with all job types.

Closes OP#1295

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Submitter now reads handler.Provider to set JobType and marshal the
correct payload struct (ClaudeDispatchPayload, AnthropicPayload,
OpenAIPayload, OllamaPayload). Empty Provider defaults to
claude_dispatch for backward compatibility. Model field propagated
from handler to payload for per-job model overrides.

Closes OP#1296

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Closes OP#1297

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Closes OP#1299

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Closes OP#1298

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Resolves maxResponseBytes redeclaration (anthropic.go vs openai.go) and
newTestEvent/newTestHandler collisions across test files. Extracts shared
test helpers to dispatch_test_helpers_test.go.

Refs OP#1749

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Conditionally register Anthropic, OpenAI, and Ollama providers in the
dispatch plugin based on environment variables. Each provider gets its
own concurrency limit: claude_dispatch=2, anthropic=4, openai=8, ollama=2.

Closes OP#1300

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ToolRegistry stores tool definitions in a neutral format and converts to
Anthropic, OpenAI, and Ollama schemas. JSON Schema input_schema passes
through unchanged. FilterByNames supports skill-based tool allowlisting.

Closes OP#1303

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds MCPClient that connects to multiple MCP servers (stdio and HTTP),
discovers tools, and routes tool calls to the correct server. Supports
graceful degradation when servers are unavailable.

Closes OP#1304

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Refs OP#1305

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Closes OP#1306

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Closes OP#1307

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Closes OP#1305

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Prefix provider-specific test mocks to avoid redeclaration errors:
- anthropicMockExecutor, anthropicToolCallRecord
- openaiMockExecutor, openaiMockToolCall
- ollamaMockToolCall, ollamaTestToolRegistry

Closes OP#1750

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Wire real MCP servers as ToolExecutors with Anthropic, OpenAI, and Ollama
providers. Verify the full chain: prompt → tool_use → MCP execution →
tool_result → final text for each provider.

Closes OP#1308

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add Skill type and SkillRegistry with LoadSkills() that reads *.yaml/*.yml
from a configurable directory (ENGINE_SKILLS_DIR). Validates required name
field and rejects duplicates. Missing directory returns empty registry.

Closes OP#1310

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add Skill field to core.Handler for skill-based overrides
- Add ResolveSkill() to apply skill config to handler (system prompt,
  model, provider, max_turns, tool allowlist)
- Add per-dispatch tool filtering via ToolAllowlist in all three providers
- Wire SkillRegistry loading into Plugin.Init with startup logging
- Pass SkillRegistry to Processor via WithSkills()
- Decode skill field from job payloads

Priority chain: env default < handler < skill < payload

Closes OP#1311

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Test ResolveSkill priority chain (handler > skill for model/provider/turns,
skill overrides PromptTemplate), per-provider tool allowlist filtering for
Anthropic/OpenAI/Ollama, edge cases (unknown tool name, empty tools list),
and backward compatibility (no skill = use PromptTemplate).

Closes OP#1312

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
fix(event-engine): address deferred PR review findings from v1.19.2
Some checks failed
CI / test-gravity-pm (pull_request) Successful in 1m18s
CI / test-event-engine (pull_request) Successful in 1m35s
CI / lint (pull_request) Failing after 1m46s
CI / build (pull_request) Has been skipped
6106dc0ed5
- Migrate main.go from log.Fatal/Fatalf to slog.Error + os.Exit(1)
  for consistent structured logging
- Add MaxFieldLength (256) to sanitizeField to prevent oversized metadata
- Add upper-bound cap (1000) to queryInt for pagination limit parameter
- Move JOB_API_KEY env read from setupRouter to main config block,
  eliminating os.Getenv in a pure-routing function
- Add unit tests for queryInt bounds and sanitizeField truncation

Closes OP#1775

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Author
Collaborator

PR Review

Verdict: Not Acceptable

This review was conducted from a fresh context against the current branch state (55 commits, 85 files, ~11,050 lines changed). First review cycle — no prior reviews exist.

Findings

Severity Count
Blocking 3
Important 22
Minor 15
Nitpick 12

Test Results

  • Tests: All passing (0 failures across 24 packages)
  • Vet: Clean (both apps)
  • Docker: Not running (smoke tests skipped)
Package Coverage
event-engine/dispatch 87.3%
event-engine/plugins/dispatch 77.0%
event-engine/webhook 74.1%
event-engine/engine 72.9%
event-engine/core 58.0%
event-engine/jobqueue 29.7%
event-engine/handler 11.1%
gravity-pm/config 77.3%
gravity-pm/handler 69.1%
gravity-pm/event 62.8%
gravity-pm/auth 46.1%
gravity-pm/store 2.8%

Human Review Items

3 items flagged for human review — see items marked with HUMAN REVIEW below.


Key Findings

  1. [BLOCKING] VERSION file stale at 1.19.2 — VERSION:1
  2. [BLOCKING] MCP stdio servers inherit all host secrets — mcp.go:132
  3. [BLOCKING] MCP HTTP server URL not validated (SSRF) — mcp.go:146
  4. [IMPORTANT] HTTP clients lack timeouts across all 3 providers — anthropic.go:80, openai.go:91, ollama.go:63
  5. [IMPORTANT] REST API allowedJobTypes hardcoded to claude_dispatch only — handler/jobs.go:23
  6. [IMPORTANT] submitJob handler always builds ClaudeDispatchPayload — handler/jobs.go:65
  7. [IMPORTANT] Ollama missing io.LimitReader on response body — ollama.go:185
  8. [IMPORTANT] Broken APIError assertion in gravityclient test — gravityclient_test.go:167
  9. [IMPORTANT] Multi-turn loop has no absolute safety cap — anthropic.go:143
  10. [IMPORTANT] Inconsistent maxTurns termination across providers — anthropic.go:196

VERSION

  • [BLOCKING] Line 1: VERSION file contains 1.19.2 but this is the release/1.20.0 branch. If the release workflow or container image tagging reads this file, artifacts will be mistagged. Suggested fix: Bump to 1.20.0.

apps/event-engine/internal/dispatch/mcp.go

  • [BLOCKING] Lines 132-135: Environment variables from the host process are leaked to stdio MCP servers. The code calls os.Environ() and appends per-server Env entries, meaning every secret in the parent process (ANTHROPIC_API_KEY, OPENAI_API_KEY, database credentials, etc.) is passed to every stdio MCP child process. A malicious or compromised MCP server binary would have access to all host secrets.

    Suggested fix: Start with a curated base environment instead of os.Environ():

    env := []string{"PATH=" + os.Getenv("PATH"), "HOME=" + os.Getenv("HOME")}
    env = append(env, sc.Env...)
    
  • [BLOCKING] Lines 146-149: The HTTP MCP server URL is not validated before connection. No check for expected scheme (http/https), no guard against internal network addresses (SSRF), and no TLS validation policy. An attacker who controls the config could point an MCP server to http://169.254.169.254 or http://localhost:6379.

    Suggested fix: Parse with net/url, reject non-http(s) schemes, consider a denylist of internal IP ranges (loopback, link-local, private) unless explicitly opted in.

  • [IMPORTANT] Lines 110-113: Tool name collisions across MCP servers are silently overwritten. When two servers expose a tool with the same name, the second one wins without warning, producing non-deterministic routing.

    Suggested fix: Either reject duplicates with an error, log a warning with first-wins semantics, or namespace tool names by server.

  • [IMPORTANT] Lines 90-99: TestMCPClientFromEnv_Unset uses os.Unsetenv directly instead of t.Setenv, modifying global process state without restoration. Can cause flaky tests in parallel.

    Suggested fix: Use t.Setenv("ENGINE_MCP_CONFIG", "").


apps/event-engine/internal/dispatch/anthropic.go

  • [IMPORTANT] Lines 80-81: The http.Client is created with no timeout. A zero-value client has no request-level timeout, meaning if the remote server accepts the connection but never responds, the goroutine hangs indefinitely. Same issue in openai.go:91 and ollama.go:63.

    Suggested fix: Set client: &http.Client{Timeout: 120 * time.Second} on all three providers.

  • [IMPORTANT] Lines 108-114: The MaxTokens default-value logic contains a redundant branch. The TimeoutSeconds > 0 check on lines 109-111 is dead code because lines 112-114 unconditionally handle maxTokens == 0 anyway.

    Suggested fix: Simplify to if maxTokens == 0 { maxTokens = defaultAnthropicMaxTokens }.

  • [IMPORTANT] Lines 143-194: The multi-turn loop uses for turn := 0; ; turn++ with no absolute upper bound. If maxTurns is set to a very large value via event payload, the loop could make excessive API calls. Same issue in ollama.go:158.

    Suggested fix: Add const maxAbsoluteTurns = 20 as a defense-in-depth cap across all providers.

  • [IMPORTANT] Lines 196-200: When the Anthropic multi-turn loop exits due to maxTurns, no final "wrap-up" request is made. Tool results from the last executed call are orphaned. Compare with Ollama (lines 236-278) which makes a final request without tools to force a text response. OpenAI has the same gap.

    HUMAN REVIEW: Is the inconsistent termination behavior across providers intentional? Ollama gets a coherent final response when hitting turn limits; Anthropic and OpenAI may return empty text. Consider aligning all providers to make a final wrap-up request.


apps/event-engine/internal/dispatch/openai.go

  • [IMPORTANT] Line 198: Returns an empty string (not an error) when the model produces no text content. Anthropic returns "no text content in response" in the same situation. Consumers of the Dispatcher interface should get consistent behavior.

    Suggested fix: Add: if len(accumulatedText) == 0 { return "", fmt.Errorf("openai: no text content in response") }.

  • [MINOR] Lines 35-41: OpenAIConfigFromEnv returns a non-nil config when only OPENAI_BASE_URL is set (no API key). While intentional for local-LLM use cases, an operator who accidentally sets the URL without a key gets silent 401 errors at dispatch time.

    Suggested fix: Log a warning when BaseURL is set but APIKey is empty.

  • [MINOR] Test data race: requestCount in multi-turn tests (e.g., openai_test.go:492) is not protected by atomic operations, unlike Anthropic tests. Will fail under go test -race. Same issue in ollama_test.go:483.


apps/event-engine/internal/dispatch/ollama.go

  • [IMPORTANT] Lines 185, 262: io.ReadAll(resp.Body) without io.LimitReader, unlike Anthropic and OpenAI which both use io.LimitReader(resp.Body, maxResponseBytes). A misbehaving Ollama server could exhaust process memory.

    Suggested fix: io.ReadAll(io.LimitReader(resp.Body, maxResponseBytes)) at both locations.

  • [IMPORTANT] Lines 236-278: The max-turns final-request block duplicates ~40 lines of the main HTTP request/response cycle. Any fix applied to one copy must be applied to both.

    Suggested fix: Extract func (o *Ollama) doChat(ctx context.Context, req ollamaRequest) (*ollamaResponse, error), mirroring Anthropic's callAPI and OpenAI's doRequest.

  • [MINOR] Lines 200-207: Per-turn metrics logging inside the loop produces N log lines per dispatch. Neither Anthropic nor OpenAI log per-turn.

  • [MINOR] Lines 282-344: CheckHealth method mixes operational/readiness concerns into the dispatch package. Other providers have no equivalent.


apps/event-engine/internal/dispatch/skill.go

  • [IMPORTANT] Lines 79-81: Skill names from YAML files are not validated against a safe character pattern. A name containing commas could corrupt the comma-delimited ToolAllowlist.

    Suggested fix: Validate: regexp.MustCompile("^[a-z][a-z0-9_-]*$").MatchString(skill.Name).

  • [IMPORTANT] Lines 159-162: ResolveSkill priority is inconsistent. For model/provider/max_turns/tools, handler wins over skill. But for system prompt (line 161), skill always overwrites. The doc comment says "handler config > skill config" but the implementation contradicts this for PromptTemplate.

    HUMAN REVIEW: Is this intentional — the skill's system prompt IS the prompt, so it should always override? If so, update the doc comment. If not, apply if h.PromptTemplate == "" guard.


apps/event-engine/internal/dispatch/ (integration tests)

  • [IMPORTANT] dispatch_integration_test.go: Integration tests only verify the happy path of exactly one tool-use round. No tests for: multi-tool-use in a single response, 3+ turn conversations, MaxTurns limit hit, tool execution failure mid-loop, or context cancellation.

apps/event-engine/internal/core/gravityclient.go

  • [IMPORTANT] Line 311: io.ReadAll(resp.Body) has no size limit. A misbehaving gravity-pm server could return an arbitrarily large response, causing memory exhaustion.

    Suggested fix: io.ReadAll(io.LimitReader(resp.Body, 10<<20)).


apps/event-engine/internal/core/gravityclient_test.go

  • [IMPORTANT] Lines 167-185: TestGravityPMClient_APIError attempts a direct type assertion err.(*APIError) but the error is wrapped with fmt.Errorf. The apiErr variable is assigned but never checked — test gives false confidence.

    Suggested fix: Use errors.As:

    var apiErr *APIError
    if !errors.As(err, &apiErr) { t.Fatal("expected *APIError in chain") }
    
  • [IMPORTANT] 11 of 19 public GravityPMClient methods (Wiki, News, TimeEntry, Version, GetProject, GetChildren, etc.) have no test coverage.


apps/event-engine/internal/handler/jobs.go

  • [IMPORTANT] Lines 23-26: allowedJobTypes is hardcoded to only "claude_dispatch", but the dispatch plugin dynamically registers "anthropic", "openai", and "ollama" providers. The multi-provider dispatch is unreachable via the REST API.

    Suggested fix: Make allowedJobTypes dynamic (inject dispatcher registry keys at handler construction), or expand the set.

  • [IMPORTANT] Lines 65-74: submitJob always builds a ClaudeDispatchPayload regardless of job_type. If allowedJobTypes is expanded, payloads for other providers would be wrong.

    Suggested fix: Delegate to Submitter.buildPayload which already has the correct switch-case for all providers.

  • [IMPORTANT] Lines 76-79: max_retries is accepted without an upper bound. A caller could set 999999, causing resource exhaustion on failing jobs.

    Suggested fix: Cap: if maxRetries > 10 { maxRetries = 10 }.

  • [MINOR] Lines 140-154: getJob doesn't validate UUID format before hitting the database.


apps/event-engine/internal/handler/handler.go

  • [IMPORTANT] Lines 73-88: Handler allows limit up to maxQueryLimit=1000, but the store clamps at 200 (falling back to 50). Requesting limit=500 returns 50 rows. The two layers disagree.

    Suggested fix: Align in a single location — either remove the store-level cap or raise it to match.


apps/event-engine/internal/plugins/dispatch/plugin.go

  • [MINOR] Line 22: sync.Mutex field is declared but never used. Dead code.
  • [MINOR] Lines 38-103: Init is 65 lines — consider extracting provider registration into a helper.

apps/event-engine/internal/store/jobs.go

  • [MINOR] Lines 81-82: Sprintf for LIMIT/OFFSET deviates from parameterized query pattern used elsewhere.

apps/event-engine/cmd/server/main.go

  • [MINOR] Lines 310-369: Health check logic (~30 lines) embedded inline in setupRouter. Consider extracting.
  • [NITPICK] Lines 138-144: os.Exit(1) in ListenAndServe goroutine bypasses deferred cleanup.

.forgejo/workflows/ci.yml

  • [IMPORTANT] Line 33: golangci-lint installer fetched via curl | sh from a raw GitHub URL. Supply-chain risk — if the upstream file is compromised, CI runs arbitrary code.

    Suggested fix: Pin by SHA256 checksum, or install from a known-good release archive with verification.

  • [MINOR] Lines 80-129: test-gravity-pm excludes store/ packages. The gravity-pm store layer has zero CI test coverage.


docker-compose.yml

  • [IMPORTANT] Lines 70-83: Valkey is exposed on 127.0.0.1:6379 without authentication. Any host process can read/write session data.

    Suggested fix: Remove the host port binding (services reach Valkey via internal network), or add --requirepass.

  • [IMPORTANT] Lines 85-99: NATS is exposed on 127.0.0.1:4222 without authentication. Any host process can publish to the job queue.

    Suggested fix: Remove the host port binding, or configure NATS auth.


apps/gravity-pm/internal/auth/issuer_url.go

  • [IMPORTANT] Lines 36-47: The DNS rebinding / TOCTOU vulnerability is documented in comments but not mitigated. The IP check resolves DNS independently from the subsequent HTTP request — an attacker could return a public IP during validation, then a private IP during the real fetch.

    HUMAN REVIEW: The comment says "tracked for a future hardening pass" but no ticket is referenced. Consider implementing a custom http.Transport with a DialContext hook that re-validates resolved IPs, or create a tracking ticket.


apps/gravity-pm/internal/handler/settings.go

  • [IMPORTANT] Lines 159-163: Type assertion updates["oidc_issuer_url"].(string) silently skips SSRF validation if the value is not a string (e.g., JSON number or object). A non-string value would be persisted without validation.

    Suggested fix: Explicit type check: issuerURL, ok := v.(string); if !ok { writeError(w, 400, "oidc_issuer_url must be a string"); return }.


apps/gravity-pm/internal/web/setup.go

  • [IMPORTANT] Lines 131-133: Raw oidc.NewProvider error displayed to user in handleVerify. May contain internal hostnames/IPs. Same at lines 103-113 (handleSkip) where raw DB error is rendered.

    Suggested fix: Log full error with slog.Error, render generic message to user.


apps/gravity-pm/internal/handler/projects.go

  • [MINOR] Lines 107-116: DeleteProject and PurgeProject execute raw SQL via tx.QueryRow inside the handler, breaking SoC. Should be a store method.

apps/gravity-pm/internal/auth/

  • [MINOR] valkey_session_test.go: Core session lifecycle methods (Create, Get, Destroy, DeleteAllForUser) have no unit tests. Only sign/verify and constructor are tested.
  • [MINOR] manager_test.go: OIDCManager tests only cover nil-provider paths. Init, SubscribeReload, and hot-reload validation are untested.
  • [MINOR] issuer_url_test.go: No positive test case for a valid HTTPS URL resolving to a public IP.

deploy/prometheus/prometheus.yml

  • [IMPORTANT] Lines 6-9: Prometheus only scrapes event-engine. gravity-pm is missing from scrape_configs.

Additional Minor/Nitpick Findings (14 items)

  • [MINOR] anthropic.go: Inconsistent constructor patterns across providers (functional options vs separate constructors)
  • [MINOR] openai.go:235: Inconsistent error logging between providers
  • [MINOR] skill.go:178: Lossy []string → comma-separated string encoding for ToolAllowlist
  • [MINOR] setup.go:186: handleSave uses non-transactional store.UpdateSettings vs handler's transactional path
  • [MINOR] settings_test.go: No test for SSRF validation path in UpdateAuthSettings
  • [NITPICK] core/event.go: Handler type defined in event.go alongside Event — consider separate file
  • [NITPICK] core/event.go:35: ToolAllowlist typed as string not []string — format undocumented
  • [NITPICK] core/plugin.go:42: Dependencies.PM field is unexplained 2-letter abbreviation
  • [NITPICK] anthropic.go:25: maxResponseBytes defined in anthropic.go but used by OpenAI too
  • [NITPICK] oidc.go:128: Claims struct defined inline twice (VerifyBearerToken and CallbackHandler)
  • [NITPICK] oidc.go:188: CallbackHandler is 155 lines — suggest extraction
  • [NITPICK] processor.go:175: handleFailure has 7 parameters
  • [NITPICK] 008_soft_deletes.sql: Unique constraints don't use partial indexes (WHERE deleted_at IS NULL)
  • [NITPICK] CI: Duplicated Node.js install steps across all 4 jobs

Positive Observations

The codebase demonstrates strong engineering across this large release:

  1. Plugin lifecycle model — Init/Start/Stop with registry-managed ordering and rollback is clean and well-tested
  2. Constant-time API key comparisonsubtle.ConstantTimeCompare used correctly in handler auth
  3. NATS dedup via MsgID — Server-side deduplication prevents duplicate job creation
  4. Atomic state transitionsWHERE id = $N AND status IN (...) prevents TOCTOU races
  5. Parameterized SQL throughout — No SQL injection vectors found in any store layer
  6. OIDC client secret redaction — Properly omitted from API responses and event logs
  7. Transactional outbox pattern — Consistently applied for settings, projects, and work packages
  8. SSRF prevention in issuer validation — Blocks loopback, private, link-local, and metadata addresses
  9. Session cookie security — HttpOnly, SameSite=Lax, configurable Secure, HMAC-SHA256 signing
  10. Graceful shutdown — Both apps properly handle signals with context cancellation and ordered teardown
  11. Dispatch test coverage at 87.3% — Thorough testing of the new provider implementations
  12. Prompt injection defenses — sanitize.go properly strips tool-use blocks and enforces field/body limits
## PR Review **Verdict: Not Acceptable** This review was conducted from a fresh context against the current branch state (55 commits, 85 files, ~11,050 lines changed). First review cycle — no prior reviews exist. ### Findings | Severity | Count | | -------- | ----- | | Blocking | 3 | | Important | 22 | | Minor | 15 | | Nitpick | 12 | ### Test Results - **Tests:** All passing (0 failures across 24 packages) - **Vet:** Clean (both apps) - **Docker:** Not running (smoke tests skipped) | Package | Coverage | | ------- | -------- | | event-engine/dispatch | 87.3% | | event-engine/plugins/dispatch | 77.0% | | event-engine/webhook | 74.1% | | event-engine/engine | 72.9% | | event-engine/core | 58.0% | | event-engine/jobqueue | 29.7% | | event-engine/handler | 11.1% | | gravity-pm/config | 77.3% | | gravity-pm/handler | 69.1% | | gravity-pm/event | 62.8% | | gravity-pm/auth | 46.1% | | gravity-pm/store | 2.8% | ### Human Review Items 3 items flagged for human review — see items marked with **HUMAN REVIEW** below. --- ### Key Findings 1. **[BLOCKING]** VERSION file stale at 1.19.2 — `VERSION:1` 2. **[BLOCKING]** MCP stdio servers inherit all host secrets — `mcp.go:132` 3. **[BLOCKING]** MCP HTTP server URL not validated (SSRF) — `mcp.go:146` 4. **[IMPORTANT]** HTTP clients lack timeouts across all 3 providers — `anthropic.go:80`, `openai.go:91`, `ollama.go:63` 5. **[IMPORTANT]** REST API allowedJobTypes hardcoded to `claude_dispatch` only — `handler/jobs.go:23` 6. **[IMPORTANT]** submitJob handler always builds ClaudeDispatchPayload — `handler/jobs.go:65` 7. **[IMPORTANT]** Ollama missing io.LimitReader on response body — `ollama.go:185` 8. **[IMPORTANT]** Broken APIError assertion in gravityclient test — `gravityclient_test.go:167` 9. **[IMPORTANT]** Multi-turn loop has no absolute safety cap — `anthropic.go:143` 10. **[IMPORTANT]** Inconsistent maxTurns termination across providers — `anthropic.go:196` --- ### VERSION - **[BLOCKING]** Line 1: VERSION file contains `1.19.2` but this is the `release/1.20.0` branch. If the release workflow or container image tagging reads this file, artifacts will be mistagged. **Suggested fix:** Bump to `1.20.0`. --- ### apps/event-engine/internal/dispatch/mcp.go - **[BLOCKING]** Lines 132-135: Environment variables from the host process are leaked to stdio MCP servers. The code calls `os.Environ()` and appends per-server Env entries, meaning every secret in the parent process (`ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, database credentials, etc.) is passed to every stdio MCP child process. A malicious or compromised MCP server binary would have access to all host secrets. **Suggested fix:** Start with a curated base environment instead of `os.Environ()`: ```go env := []string{"PATH=" + os.Getenv("PATH"), "HOME=" + os.Getenv("HOME")} env = append(env, sc.Env...) ``` - **[BLOCKING]** Lines 146-149: The HTTP MCP server URL is not validated before connection. No check for expected scheme (http/https), no guard against internal network addresses (SSRF), and no TLS validation policy. An attacker who controls the config could point an MCP server to `http://169.254.169.254` or `http://localhost:6379`. **Suggested fix:** Parse with `net/url`, reject non-http(s) schemes, consider a denylist of internal IP ranges (loopback, link-local, private) unless explicitly opted in. - **[IMPORTANT]** Lines 110-113: Tool name collisions across MCP servers are silently overwritten. When two servers expose a tool with the same name, the second one wins without warning, producing non-deterministic routing. **Suggested fix:** Either reject duplicates with an error, log a warning with first-wins semantics, or namespace tool names by server. - **[IMPORTANT]** Lines 90-99: `TestMCPClientFromEnv_Unset` uses `os.Unsetenv` directly instead of `t.Setenv`, modifying global process state without restoration. Can cause flaky tests in parallel. **Suggested fix:** Use `t.Setenv("ENGINE_MCP_CONFIG", "")`. --- ### apps/event-engine/internal/dispatch/anthropic.go - **[IMPORTANT]** Lines 80-81: The `http.Client` is created with no timeout. A zero-value client has no request-level timeout, meaning if the remote server accepts the connection but never responds, the goroutine hangs indefinitely. Same issue in `openai.go:91` and `ollama.go:63`. **Suggested fix:** Set `client: &http.Client{Timeout: 120 * time.Second}` on all three providers. - **[IMPORTANT]** Lines 108-114: The MaxTokens default-value logic contains a redundant branch. The `TimeoutSeconds > 0` check on lines 109-111 is dead code because lines 112-114 unconditionally handle `maxTokens == 0` anyway. **Suggested fix:** Simplify to `if maxTokens == 0 { maxTokens = defaultAnthropicMaxTokens }`. - **[IMPORTANT]** Lines 143-194: The multi-turn loop uses `for turn := 0; ; turn++` with no absolute upper bound. If `maxTurns` is set to a very large value via event payload, the loop could make excessive API calls. Same issue in `ollama.go:158`. **Suggested fix:** Add `const maxAbsoluteTurns = 20` as a defense-in-depth cap across all providers. - **[IMPORTANT]** Lines 196-200: When the Anthropic multi-turn loop exits due to maxTurns, no final "wrap-up" request is made. Tool results from the last executed call are orphaned. Compare with Ollama (lines 236-278) which makes a final request without tools to force a text response. OpenAI has the same gap. **HUMAN REVIEW:** Is the inconsistent termination behavior across providers intentional? Ollama gets a coherent final response when hitting turn limits; Anthropic and OpenAI may return empty text. Consider aligning all providers to make a final wrap-up request. --- ### apps/event-engine/internal/dispatch/openai.go - **[IMPORTANT]** Line 198: Returns an empty string (not an error) when the model produces no text content. Anthropic returns `"no text content in response"` in the same situation. Consumers of the `Dispatcher` interface should get consistent behavior. **Suggested fix:** Add: `if len(accumulatedText) == 0 { return "", fmt.Errorf("openai: no text content in response") }`. - **[MINOR]** Lines 35-41: `OpenAIConfigFromEnv` returns a non-nil config when only `OPENAI_BASE_URL` is set (no API key). While intentional for local-LLM use cases, an operator who accidentally sets the URL without a key gets silent 401 errors at dispatch time. **Suggested fix:** Log a warning when BaseURL is set but APIKey is empty. - **[MINOR]** Test data race: `requestCount` in multi-turn tests (e.g., `openai_test.go:492`) is not protected by `atomic` operations, unlike Anthropic tests. Will fail under `go test -race`. Same issue in `ollama_test.go:483`. --- ### apps/event-engine/internal/dispatch/ollama.go - **[IMPORTANT]** Lines 185, 262: `io.ReadAll(resp.Body)` without `io.LimitReader`, unlike Anthropic and OpenAI which both use `io.LimitReader(resp.Body, maxResponseBytes)`. A misbehaving Ollama server could exhaust process memory. **Suggested fix:** `io.ReadAll(io.LimitReader(resp.Body, maxResponseBytes))` at both locations. - **[IMPORTANT]** Lines 236-278: The max-turns final-request block duplicates ~40 lines of the main HTTP request/response cycle. Any fix applied to one copy must be applied to both. **Suggested fix:** Extract `func (o *Ollama) doChat(ctx context.Context, req ollamaRequest) (*ollamaResponse, error)`, mirroring Anthropic's `callAPI` and OpenAI's `doRequest`. - **[MINOR]** Lines 200-207: Per-turn metrics logging inside the loop produces N log lines per dispatch. Neither Anthropic nor OpenAI log per-turn. - **[MINOR]** Lines 282-344: `CheckHealth` method mixes operational/readiness concerns into the dispatch package. Other providers have no equivalent. --- ### apps/event-engine/internal/dispatch/skill.go - **[IMPORTANT]** Lines 79-81: Skill names from YAML files are not validated against a safe character pattern. A name containing commas could corrupt the comma-delimited `ToolAllowlist`. **Suggested fix:** Validate: `regexp.MustCompile("^[a-z][a-z0-9_-]*$").MatchString(skill.Name)`. - **[IMPORTANT]** Lines 159-162: `ResolveSkill` priority is inconsistent. For model/provider/max_turns/tools, handler wins over skill. But for system prompt (line 161), skill always overwrites. The doc comment says "handler config > skill config" but the implementation contradicts this for `PromptTemplate`. **HUMAN REVIEW:** Is this intentional — the skill's system prompt IS the prompt, so it should always override? If so, update the doc comment. If not, apply `if h.PromptTemplate == ""` guard. --- ### apps/event-engine/internal/dispatch/ (integration tests) - **[IMPORTANT]** `dispatch_integration_test.go`: Integration tests only verify the happy path of exactly one tool-use round. No tests for: multi-tool-use in a single response, 3+ turn conversations, MaxTurns limit hit, tool execution failure mid-loop, or context cancellation. --- ### apps/event-engine/internal/core/gravityclient.go - **[IMPORTANT]** Line 311: `io.ReadAll(resp.Body)` has no size limit. A misbehaving gravity-pm server could return an arbitrarily large response, causing memory exhaustion. **Suggested fix:** `io.ReadAll(io.LimitReader(resp.Body, 10<<20))`. --- ### apps/event-engine/internal/core/gravityclient_test.go - **[IMPORTANT]** Lines 167-185: `TestGravityPMClient_APIError` attempts a direct type assertion `err.(*APIError)` but the error is wrapped with `fmt.Errorf`. The `apiErr` variable is assigned but never checked — test gives false confidence. **Suggested fix:** Use `errors.As`: ```go var apiErr *APIError if !errors.As(err, &apiErr) { t.Fatal("expected *APIError in chain") } ``` - **[IMPORTANT]** 11 of 19 public `GravityPMClient` methods (Wiki, News, TimeEntry, Version, GetProject, GetChildren, etc.) have no test coverage. --- ### apps/event-engine/internal/handler/jobs.go - **[IMPORTANT]** Lines 23-26: `allowedJobTypes` is hardcoded to only `"claude_dispatch"`, but the dispatch plugin dynamically registers `"anthropic"`, `"openai"`, and `"ollama"` providers. The multi-provider dispatch is unreachable via the REST API. **Suggested fix:** Make `allowedJobTypes` dynamic (inject dispatcher registry keys at handler construction), or expand the set. - **[IMPORTANT]** Lines 65-74: `submitJob` always builds a `ClaudeDispatchPayload` regardless of `job_type`. If `allowedJobTypes` is expanded, payloads for other providers would be wrong. **Suggested fix:** Delegate to `Submitter.buildPayload` which already has the correct switch-case for all providers. - **[IMPORTANT]** Lines 76-79: `max_retries` is accepted without an upper bound. A caller could set 999999, causing resource exhaustion on failing jobs. **Suggested fix:** Cap: `if maxRetries > 10 { maxRetries = 10 }`. - **[MINOR]** Lines 140-154: `getJob` doesn't validate UUID format before hitting the database. --- ### apps/event-engine/internal/handler/handler.go - **[IMPORTANT]** Lines 73-88: Handler allows `limit` up to `maxQueryLimit=1000`, but the store clamps at 200 (falling back to 50). Requesting `limit=500` returns 50 rows. The two layers disagree. **Suggested fix:** Align in a single location — either remove the store-level cap or raise it to match. --- ### apps/event-engine/internal/plugins/dispatch/plugin.go - **[MINOR]** Line 22: `sync.Mutex` field is declared but never used. Dead code. - **[MINOR]** Lines 38-103: `Init` is 65 lines — consider extracting provider registration into a helper. --- ### apps/event-engine/internal/store/jobs.go - **[MINOR]** Lines 81-82: `Sprintf` for LIMIT/OFFSET deviates from parameterized query pattern used elsewhere. --- ### apps/event-engine/cmd/server/main.go - **[MINOR]** Lines 310-369: Health check logic (~30 lines) embedded inline in `setupRouter`. Consider extracting. - **[NITPICK]** Lines 138-144: `os.Exit(1)` in ListenAndServe goroutine bypasses deferred cleanup. --- ### .forgejo/workflows/ci.yml - **[IMPORTANT]** Line 33: `golangci-lint` installer fetched via `curl | sh` from a raw GitHub URL. Supply-chain risk — if the upstream file is compromised, CI runs arbitrary code. **Suggested fix:** Pin by SHA256 checksum, or install from a known-good release archive with verification. - **[MINOR]** Lines 80-129: `test-gravity-pm` excludes `store/` packages. The gravity-pm store layer has zero CI test coverage. --- ### docker-compose.yml - **[IMPORTANT]** Lines 70-83: Valkey is exposed on `127.0.0.1:6379` without authentication. Any host process can read/write session data. **Suggested fix:** Remove the host port binding (services reach Valkey via internal network), or add `--requirepass`. - **[IMPORTANT]** Lines 85-99: NATS is exposed on `127.0.0.1:4222` without authentication. Any host process can publish to the job queue. **Suggested fix:** Remove the host port binding, or configure NATS auth. --- ### apps/gravity-pm/internal/auth/issuer_url.go - **[IMPORTANT]** Lines 36-47: The DNS rebinding / TOCTOU vulnerability is documented in comments but not mitigated. The IP check resolves DNS independently from the subsequent HTTP request — an attacker could return a public IP during validation, then a private IP during the real fetch. **HUMAN REVIEW:** The comment says "tracked for a future hardening pass" but no ticket is referenced. Consider implementing a custom `http.Transport` with a `DialContext` hook that re-validates resolved IPs, or create a tracking ticket. --- ### apps/gravity-pm/internal/handler/settings.go - **[IMPORTANT]** Lines 159-163: Type assertion `updates["oidc_issuer_url"].(string)` silently skips SSRF validation if the value is not a string (e.g., JSON number or object). A non-string value would be persisted without validation. **Suggested fix:** Explicit type check: `issuerURL, ok := v.(string); if !ok { writeError(w, 400, "oidc_issuer_url must be a string"); return }`. --- ### apps/gravity-pm/internal/web/setup.go - **[IMPORTANT]** Lines 131-133: Raw `oidc.NewProvider` error displayed to user in `handleVerify`. May contain internal hostnames/IPs. Same at lines 103-113 (`handleSkip`) where raw DB error is rendered. **Suggested fix:** Log full error with `slog.Error`, render generic message to user. --- ### apps/gravity-pm/internal/handler/projects.go - **[MINOR]** Lines 107-116: `DeleteProject` and `PurgeProject` execute raw SQL via `tx.QueryRow` inside the handler, breaking SoC. Should be a store method. --- ### apps/gravity-pm/internal/auth/ - **[MINOR]** `valkey_session_test.go`: Core session lifecycle methods (Create, Get, Destroy, DeleteAllForUser) have no unit tests. Only sign/verify and constructor are tested. - **[MINOR]** `manager_test.go`: OIDCManager tests only cover nil-provider paths. Init, SubscribeReload, and hot-reload validation are untested. - **[MINOR]** `issuer_url_test.go`: No positive test case for a valid HTTPS URL resolving to a public IP. --- ### deploy/prometheus/prometheus.yml - **[IMPORTANT]** Lines 6-9: Prometheus only scrapes event-engine. gravity-pm is missing from `scrape_configs`. --- ### Additional Minor/Nitpick Findings (14 items) - **[MINOR]** `anthropic.go`: Inconsistent constructor patterns across providers (functional options vs separate constructors) - **[MINOR]** `openai.go:235`: Inconsistent error logging between providers - **[MINOR]** `skill.go:178`: Lossy `[]string` → comma-separated string encoding for `ToolAllowlist` - **[MINOR]** `setup.go:186`: `handleSave` uses non-transactional `store.UpdateSettings` vs handler's transactional path - **[MINOR]** `settings_test.go`: No test for SSRF validation path in `UpdateAuthSettings` - **[NITPICK]** `core/event.go`: `Handler` type defined in event.go alongside Event — consider separate file - **[NITPICK]** `core/event.go:35`: `ToolAllowlist` typed as string not `[]string` — format undocumented - **[NITPICK]** `core/plugin.go:42`: `Dependencies.PM` field is unexplained 2-letter abbreviation - **[NITPICK]** `anthropic.go:25`: `maxResponseBytes` defined in anthropic.go but used by OpenAI too - **[NITPICK]** `oidc.go:128`: Claims struct defined inline twice (VerifyBearerToken and CallbackHandler) - **[NITPICK]** `oidc.go:188`: CallbackHandler is 155 lines — suggest extraction - **[NITPICK]** `processor.go:175`: `handleFailure` has 7 parameters - **[NITPICK]** `008_soft_deletes.sql`: Unique constraints don't use partial indexes (`WHERE deleted_at IS NULL`) - **[NITPICK]** CI: Duplicated Node.js install steps across all 4 jobs --- ### Positive Observations The codebase demonstrates strong engineering across this large release: 1. **Plugin lifecycle model** — Init/Start/Stop with registry-managed ordering and rollback is clean and well-tested 2. **Constant-time API key comparison** — `subtle.ConstantTimeCompare` used correctly in handler auth 3. **NATS dedup via MsgID** — Server-side deduplication prevents duplicate job creation 4. **Atomic state transitions** — `WHERE id = $N AND status IN (...)` prevents TOCTOU races 5. **Parameterized SQL throughout** — No SQL injection vectors found in any store layer 6. **OIDC client secret redaction** — Properly omitted from API responses and event logs 7. **Transactional outbox pattern** — Consistently applied for settings, projects, and work packages 8. **SSRF prevention in issuer validation** — Blocks loopback, private, link-local, and metadata addresses 9. **Session cookie security** — HttpOnly, SameSite=Lax, configurable Secure, HMAC-SHA256 signing 10. **Graceful shutdown** — Both apps properly handle signals with context cancellation and ordered teardown 11. **Dispatch test coverage at 87.3%** — Thorough testing of the new provider implementations 12. **Prompt injection defenses** — sanitize.go properly strips tool-use blocks and enforces field/body limits
fix(ci): resolve 5 golangci-lint failures in dispatch and plugin packages
Some checks failed
CI / test-gravity-pm (pull_request) Successful in 1m22s
CI / lint (pull_request) Successful in 7m10s
CI / test-event-engine (pull_request) Failing after 8m24s
CI / build (pull_request) Has been skipped
c269c87e8b
- SA9003: fill empty if-branch in skill_integration_test.go with error response
- S1016: use type conversion instead of struct literal in tools.go
- Remove unused type ollamaVersionResponse (ollama.go)
- Remove unused method hasTools (openai.go)
- Remove unused field mu sync.Mutex (plugin.go)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
fix(ci): serialize concurrent migrations with advisory lock
All checks were successful
CI / test-gravity-pm (pull_request) Successful in 1m15s
CI / lint (pull_request) Successful in 1m21s
CI / test-event-engine (pull_request) Successful in 1m36s
CI / build (pull_request) Successful in 2m58s
9772fe3818
Five test packages call migrate.Up() in parallel against the same
PostgreSQL database in CI. This races on DDL statements — CREATE TYPE,
CREATE TABLE, and ALTER TABLE ADD COLUMN all fail when executed
concurrently.

Fix with two layers of protection:
- pg_advisory_lock in migrate.Up() serializes callers across packages
- Idempotent SQL (IF NOT EXISTS, DO/EXCEPTION) as a safety net

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Bump VERSION to 1.20.0 (was stale at 1.19.2)
- MCP stdio: use curated env (PATH, HOME) instead of os.Environ()
  to prevent leaking host secrets to child processes
- MCP HTTP: validate URLs (scheme, SSRF denylist for internal IPs)
  with opt-in AllowInternal for legitimate local servers
- MCP tools: log and skip duplicate tool names (first-wins)
- Test: use t.Setenv instead of os.Unsetenv for proper cleanup

Closes OP#1805

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add 120s HTTP client timeout to all 3 providers (was zero/infinite)
- Add maxAbsoluteTurns=20 defense-in-depth cap on all tool loops
- Add wrap-up request (no tools) to Anthropic and OpenAI when hitting
  turn limit, aligned with Ollama behavior for consistent termination
- Add io.LimitReader to Ollama response reads (was unbounded)
- Return error instead of empty string from OpenAI on no text content
- Remove dead MaxTokens branch in Anthropic (redundant condition)

Closes OP#1806

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add regexp validation for skill names (^[a-z][a-z0-9_-]*$) to
  prevent names with commas corrupting the ToolAllowlist field
- Fix ResolveSkill doc comment to document that skill.System
  intentionally always overrides handler.PromptTemplate

Closes OP#1808

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Expand allowedJobTypes to include anthropic, openai, ollama providers
- Cap max_retries at 10 to prevent resource exhaustion
- Validate UUID format on getJob before hitting the database
- Align store query limit cap (1000) with handler's maxQueryLimit
- Use parameterized queries for LIMIT/OFFSET in store

Closes OP#1807

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add io.LimitReader(10MB) to gravityclient.do() to prevent memory
  exhaustion from oversized server responses
- Fix TestGravityPMClient_APIError to use errors.As instead of direct
  type assertion on wrapped error, and verify status code

Closes OP#1809

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add SSRFSafeTransport with DialContext hook that re-validates
  resolved IPs at connect time, mitigating DNS rebinding TOCTOU
- Fix oidc_issuer_url type assertion in settings handler to explicitly
  reject non-string values instead of silently skipping validation
- Sanitize raw OIDC/DB errors in setup wizard — log details server-side,
  show generic messages to the user

Closes OP#1810

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove host port bindings for Valkey and NATS — services are only
  reachable via Docker internal network, eliminating unauthenticated
  access from the host
- Add gravity-pm scrape target to Prometheus configuration
- Replace curl|sh golangci-lint install with verified release archive
  (download tarball + checksums.txt, verify SHA256 before extracting)

Closes OP#1811

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Convert requestCount from plain int to atomic.Int32 in both
  openai_test.go and ollama_test.go, matching the existing pattern
  in anthropic_test.go
- Use atomic.AddInt32/LoadInt32 for all accesses from httptest.Server
  handler goroutines

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Log slog.Warn when OPENAI_BASE_URL is configured but OPENAI_API_KEY
  is empty, alerting operators to likely 401 errors at dispatch time

Closes OP#1812

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
fix(dispatch): set AllowInternal in integration test MCP configs
Some checks failed
CI / lint (pull_request) Failing after 35s
CI / test-gravity-pm (pull_request) Successful in 3m15s
CI / test-event-engine (pull_request) Successful in 4m1s
CI / build (pull_request) Has been skipped
ccd5ba8f1a
- Integration tests use localhost MCP servers, so they need
  AllowInternal: true after the SSRF validation change

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Author
Collaborator

PR Followup — Review Findings Addressed

Triaged: 52 findings total

  • Blocking: 3 → all fixed
  • Important: 22 → all fixed
  • Minor: 15 → actionable items fixed, remaining deferred
  • Nitpick: 12 → low-risk items noted for future cleanup
  • Human Review: 3 → all resolved by owner decision

Human Review Decisions

Item Decision
Multi-turn termination inconsistency Aligned all providers to Ollama wrap-up pattern
Skill prompt override priority Intentional — doc comment updated
DNS rebinding TOCTOU Fixed with DialContext hook in v1.20.0

Work Packages (Epic OP#1804)

WP Subject Status
OP#1805 Fix blocking: VERSION, MCP env leak, MCP SSRF Closed
OP#1806 Harden providers: timeouts, LimitReader, maxTurns, termination Closed
OP#1807 Fix REST API wiring and handler limits Closed
OP#1808 Fix skill validation and doc comment Closed
OP#1809 Harden gravityclient and fix test assertions Closed
OP#1810 Fix auth: DNS rebinding, type assertion, error sanitization Closed
OP#1811 Harden infra: Valkey/NATS, Prometheus, CI lint Closed
OP#1812 Address minor/nitpick findings Closed

Commits (10 fixup commits)

  1. 05d82b9 fix(security): resolve 3 blocking PR review findings
  2. ecc9ec1 fix(dispatch): harden provider implementations
  3. 192f731 fix(dispatch): validate skill names and fix ResolveSkill doc comment
  4. 2498f46 fix(handler): expand job types, cap retries, align query limits
  5. be9896e fix(core): add response size limit and fix APIError test assertion
  6. 45d67e4 fix(auth): mitigate DNS rebinding, fix type assertion, sanitize errors
  7. 48228b8 fix(infra): harden Valkey/NATS exposure, Prometheus config, CI supply chain
  8. c529003 fix(dispatch): eliminate data races in OpenAI and Ollama tests
  9. d39f5ec fix(dispatch): warn when OpenAI BaseURL set without API key
  10. ccd5ba8 fix(dispatch): set AllowInternal in integration test MCP configs

Test Results

  • All tests passing across both apps (0 failures)
  • go vet clean on both apps

Deferred Items (low-risk nitpicks)

These items were assessed as too low-risk or too high-churn for the release branch:

  • ToolAllowlist string→[]string type change (breaking API change)
  • Inline Claims struct deduplication (oidc.go)
  • CallbackHandler extraction (oidc.go, 155 lines)
  • handleFailure parameter count reduction (processor.go)
  • Soft-delete partial indexes (requires new migration)
  • CI Node.js install deduplication
  • Ollama CheckHealth extraction from dispatch package
## PR Followup — Review Findings Addressed **Triaged:** 52 findings total - **Blocking:** 3 → all fixed - **Important:** 22 → all fixed - **Minor:** 15 → actionable items fixed, remaining deferred - **Nitpick:** 12 → low-risk items noted for future cleanup - **Human Review:** 3 → all resolved by owner decision ### Human Review Decisions | Item | Decision | | ---- | -------- | | Multi-turn termination inconsistency | Aligned all providers to Ollama wrap-up pattern | | Skill prompt override priority | Intentional — doc comment updated | | DNS rebinding TOCTOU | Fixed with DialContext hook in v1.20.0 | ### Work Packages (Epic OP#1804) | WP | Subject | Status | | -- | ------- | ------ | | OP#1805 | Fix blocking: VERSION, MCP env leak, MCP SSRF | ✅ Closed | | OP#1806 | Harden providers: timeouts, LimitReader, maxTurns, termination | ✅ Closed | | OP#1807 | Fix REST API wiring and handler limits | ✅ Closed | | OP#1808 | Fix skill validation and doc comment | ✅ Closed | | OP#1809 | Harden gravityclient and fix test assertions | ✅ Closed | | OP#1810 | Fix auth: DNS rebinding, type assertion, error sanitization | ✅ Closed | | OP#1811 | Harden infra: Valkey/NATS, Prometheus, CI lint | ✅ Closed | | OP#1812 | Address minor/nitpick findings | ✅ Closed | ### Commits (10 fixup commits) 1. `05d82b9` fix(security): resolve 3 blocking PR review findings 2. `ecc9ec1` fix(dispatch): harden provider implementations 3. `192f731` fix(dispatch): validate skill names and fix ResolveSkill doc comment 4. `2498f46` fix(handler): expand job types, cap retries, align query limits 5. `be9896e` fix(core): add response size limit and fix APIError test assertion 6. `45d67e4` fix(auth): mitigate DNS rebinding, fix type assertion, sanitize errors 7. `48228b8` fix(infra): harden Valkey/NATS exposure, Prometheus config, CI supply chain 8. `c529003` fix(dispatch): eliminate data races in OpenAI and Ollama tests 9. `d39f5ec` fix(dispatch): warn when OpenAI BaseURL set without API key 10. `ccd5ba8` fix(dispatch): set AllowInternal in integration test MCP configs ### Test Results - **All tests passing** across both apps (0 failures) - **go vet** clean on both apps ### Deferred Items (low-risk nitpicks) These items were assessed as too low-risk or too high-churn for the release branch: - `ToolAllowlist` string→[]string type change (breaking API change) - Inline Claims struct deduplication (oidc.go) - CallbackHandler extraction (oidc.go, 155 lines) - handleFailure parameter count reduction (processor.go) - Soft-delete partial indexes (requires new migration) - CI Node.js install deduplication - Ollama CheckHealth extraction from dispatch package
fix(ci): correct golangci-lint checksums filename
All checks were successful
CI / lint (pull_request) Successful in 1m16s
CI / test-event-engine (pull_request) Successful in 1m36s
CI / test-gravity-pm (pull_request) Successful in 2m5s
CI / build (pull_request) Successful in 2m12s
4dbabe22be
The checksums file is named golangci-lint-VERSION-checksums.txt,
not checksums.txt. Fixes 404 during CI lint job.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
All checks were successful
CI / lint (pull_request) Successful in 1m16s
CI / test-event-engine (pull_request) Successful in 1m36s
CI / test-gravity-pm (pull_request) Successful in 2m5s
CI / build (pull_request) Successful in 2m12s
This pull request can be merged automatically.
You are not authorized to merge this pull request.
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
mike/gravity!11
No description provided.