Release v1.20.0 #11
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "release/1.20.0"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Release v1.20.0 — Multi-Provider Dispatch
Changes
Plugin Architecture (OP#1499)
Multi-Provider Dispatch (OP#1292)
Multi-Turn Tool Execution (OP#1302)
Portable Skill System (OP#1309)
API & Quality
Checklist
References
Version: v1.20 — Multi-Provider Dispatch (Gravity PM)
Draft Release: v1.20.0
PR Review
Verdict: Not Acceptable
This review was conducted from a fresh context against the current branch state (55 commits, 85 files, ~11,050 lines changed). First review cycle — no prior reviews exist.
Findings
Test Results
Human Review Items
3 items flagged for human review — see items marked with HUMAN REVIEW below.
Key Findings
VERSION:1mcp.go:132mcp.go:146anthropic.go:80,openai.go:91,ollama.go:63claude_dispatchonly —handler/jobs.go:23handler/jobs.go:65ollama.go:185gravityclient_test.go:167anthropic.go:143anthropic.go:196VERSION
1.19.2but this is therelease/1.20.0branch. If the release workflow or container image tagging reads this file, artifacts will be mistagged. Suggested fix: Bump to1.20.0.apps/event-engine/internal/dispatch/mcp.go
[BLOCKING] Lines 132-135: Environment variables from the host process are leaked to stdio MCP servers. The code calls
os.Environ()and appends per-server Env entries, meaning every secret in the parent process (ANTHROPIC_API_KEY,OPENAI_API_KEY, database credentials, etc.) is passed to every stdio MCP child process. A malicious or compromised MCP server binary would have access to all host secrets.Suggested fix: Start with a curated base environment instead of
os.Environ():[BLOCKING] Lines 146-149: The HTTP MCP server URL is not validated before connection. No check for expected scheme (http/https), no guard against internal network addresses (SSRF), and no TLS validation policy. An attacker who controls the config could point an MCP server to
http://169.254.169.254orhttp://localhost:6379.Suggested fix: Parse with
net/url, reject non-http(s) schemes, consider a denylist of internal IP ranges (loopback, link-local, private) unless explicitly opted in.[IMPORTANT] Lines 110-113: Tool name collisions across MCP servers are silently overwritten. When two servers expose a tool with the same name, the second one wins without warning, producing non-deterministic routing.
Suggested fix: Either reject duplicates with an error, log a warning with first-wins semantics, or namespace tool names by server.
[IMPORTANT] Lines 90-99:
TestMCPClientFromEnv_Unsetusesos.Unsetenvdirectly instead oft.Setenv, modifying global process state without restoration. Can cause flaky tests in parallel.Suggested fix: Use
t.Setenv("ENGINE_MCP_CONFIG", "").apps/event-engine/internal/dispatch/anthropic.go
[IMPORTANT] Lines 80-81: The
http.Clientis created with no timeout. A zero-value client has no request-level timeout, meaning if the remote server accepts the connection but never responds, the goroutine hangs indefinitely. Same issue inopenai.go:91andollama.go:63.Suggested fix: Set
client: &http.Client{Timeout: 120 * time.Second}on all three providers.[IMPORTANT] Lines 108-114: The MaxTokens default-value logic contains a redundant branch. The
TimeoutSeconds > 0check on lines 109-111 is dead code because lines 112-114 unconditionally handlemaxTokens == 0anyway.Suggested fix: Simplify to
if maxTokens == 0 { maxTokens = defaultAnthropicMaxTokens }.[IMPORTANT] Lines 143-194: The multi-turn loop uses
for turn := 0; ; turn++with no absolute upper bound. IfmaxTurnsis set to a very large value via event payload, the loop could make excessive API calls. Same issue inollama.go:158.Suggested fix: Add
const maxAbsoluteTurns = 20as a defense-in-depth cap across all providers.[IMPORTANT] Lines 196-200: When the Anthropic multi-turn loop exits due to maxTurns, no final "wrap-up" request is made. Tool results from the last executed call are orphaned. Compare with Ollama (lines 236-278) which makes a final request without tools to force a text response. OpenAI has the same gap.
HUMAN REVIEW: Is the inconsistent termination behavior across providers intentional? Ollama gets a coherent final response when hitting turn limits; Anthropic and OpenAI may return empty text. Consider aligning all providers to make a final wrap-up request.
apps/event-engine/internal/dispatch/openai.go
[IMPORTANT] Line 198: Returns an empty string (not an error) when the model produces no text content. Anthropic returns
"no text content in response"in the same situation. Consumers of theDispatcherinterface should get consistent behavior.Suggested fix: Add:
if len(accumulatedText) == 0 { return "", fmt.Errorf("openai: no text content in response") }.[MINOR] Lines 35-41:
OpenAIConfigFromEnvreturns a non-nil config when onlyOPENAI_BASE_URLis set (no API key). While intentional for local-LLM use cases, an operator who accidentally sets the URL without a key gets silent 401 errors at dispatch time.Suggested fix: Log a warning when BaseURL is set but APIKey is empty.
[MINOR] Test data race:
requestCountin multi-turn tests (e.g.,openai_test.go:492) is not protected byatomicoperations, unlike Anthropic tests. Will fail undergo test -race. Same issue inollama_test.go:483.apps/event-engine/internal/dispatch/ollama.go
[IMPORTANT] Lines 185, 262:
io.ReadAll(resp.Body)withoutio.LimitReader, unlike Anthropic and OpenAI which both useio.LimitReader(resp.Body, maxResponseBytes). A misbehaving Ollama server could exhaust process memory.Suggested fix:
io.ReadAll(io.LimitReader(resp.Body, maxResponseBytes))at both locations.[IMPORTANT] Lines 236-278: The max-turns final-request block duplicates ~40 lines of the main HTTP request/response cycle. Any fix applied to one copy must be applied to both.
Suggested fix: Extract
func (o *Ollama) doChat(ctx context.Context, req ollamaRequest) (*ollamaResponse, error), mirroring Anthropic'scallAPIand OpenAI'sdoRequest.[MINOR] Lines 200-207: Per-turn metrics logging inside the loop produces N log lines per dispatch. Neither Anthropic nor OpenAI log per-turn.
[MINOR] Lines 282-344:
CheckHealthmethod mixes operational/readiness concerns into the dispatch package. Other providers have no equivalent.apps/event-engine/internal/dispatch/skill.go
[IMPORTANT] Lines 79-81: Skill names from YAML files are not validated against a safe character pattern. A name containing commas could corrupt the comma-delimited
ToolAllowlist.Suggested fix: Validate:
regexp.MustCompile("^[a-z][a-z0-9_-]*$").MatchString(skill.Name).[IMPORTANT] Lines 159-162:
ResolveSkillpriority is inconsistent. For model/provider/max_turns/tools, handler wins over skill. But for system prompt (line 161), skill always overwrites. The doc comment says "handler config > skill config" but the implementation contradicts this forPromptTemplate.HUMAN REVIEW: Is this intentional — the skill's system prompt IS the prompt, so it should always override? If so, update the doc comment. If not, apply
if h.PromptTemplate == ""guard.apps/event-engine/internal/dispatch/ (integration tests)
dispatch_integration_test.go: Integration tests only verify the happy path of exactly one tool-use round. No tests for: multi-tool-use in a single response, 3+ turn conversations, MaxTurns limit hit, tool execution failure mid-loop, or context cancellation.apps/event-engine/internal/core/gravityclient.go
[IMPORTANT] Line 311:
io.ReadAll(resp.Body)has no size limit. A misbehaving gravity-pm server could return an arbitrarily large response, causing memory exhaustion.Suggested fix:
io.ReadAll(io.LimitReader(resp.Body, 10<<20)).apps/event-engine/internal/core/gravityclient_test.go
[IMPORTANT] Lines 167-185:
TestGravityPMClient_APIErrorattempts a direct type assertionerr.(*APIError)but the error is wrapped withfmt.Errorf. TheapiErrvariable is assigned but never checked — test gives false confidence.Suggested fix: Use
errors.As:[IMPORTANT] 11 of 19 public
GravityPMClientmethods (Wiki, News, TimeEntry, Version, GetProject, GetChildren, etc.) have no test coverage.apps/event-engine/internal/handler/jobs.go
[IMPORTANT] Lines 23-26:
allowedJobTypesis hardcoded to only"claude_dispatch", but the dispatch plugin dynamically registers"anthropic","openai", and"ollama"providers. The multi-provider dispatch is unreachable via the REST API.Suggested fix: Make
allowedJobTypesdynamic (inject dispatcher registry keys at handler construction), or expand the set.[IMPORTANT] Lines 65-74:
submitJobalways builds aClaudeDispatchPayloadregardless ofjob_type. IfallowedJobTypesis expanded, payloads for other providers would be wrong.Suggested fix: Delegate to
Submitter.buildPayloadwhich already has the correct switch-case for all providers.[IMPORTANT] Lines 76-79:
max_retriesis accepted without an upper bound. A caller could set 999999, causing resource exhaustion on failing jobs.Suggested fix: Cap:
if maxRetries > 10 { maxRetries = 10 }.[MINOR] Lines 140-154:
getJobdoesn't validate UUID format before hitting the database.apps/event-engine/internal/handler/handler.go
[IMPORTANT] Lines 73-88: Handler allows
limitup tomaxQueryLimit=1000, but the store clamps at 200 (falling back to 50). Requestinglimit=500returns 50 rows. The two layers disagree.Suggested fix: Align in a single location — either remove the store-level cap or raise it to match.
apps/event-engine/internal/plugins/dispatch/plugin.go
sync.Mutexfield is declared but never used. Dead code.Initis 65 lines — consider extracting provider registration into a helper.apps/event-engine/internal/store/jobs.go
Sprintffor LIMIT/OFFSET deviates from parameterized query pattern used elsewhere.apps/event-engine/cmd/server/main.go
setupRouter. Consider extracting.os.Exit(1)in ListenAndServe goroutine bypasses deferred cleanup..forgejo/workflows/ci.yml
[IMPORTANT] Line 33:
golangci-lintinstaller fetched viacurl | shfrom a raw GitHub URL. Supply-chain risk — if the upstream file is compromised, CI runs arbitrary code.Suggested fix: Pin by SHA256 checksum, or install from a known-good release archive with verification.
[MINOR] Lines 80-129:
test-gravity-pmexcludesstore/packages. The gravity-pm store layer has zero CI test coverage.docker-compose.yml
[IMPORTANT] Lines 70-83: Valkey is exposed on
127.0.0.1:6379without authentication. Any host process can read/write session data.Suggested fix: Remove the host port binding (services reach Valkey via internal network), or add
--requirepass.[IMPORTANT] Lines 85-99: NATS is exposed on
127.0.0.1:4222without authentication. Any host process can publish to the job queue.Suggested fix: Remove the host port binding, or configure NATS auth.
apps/gravity-pm/internal/auth/issuer_url.go
[IMPORTANT] Lines 36-47: The DNS rebinding / TOCTOU vulnerability is documented in comments but not mitigated. The IP check resolves DNS independently from the subsequent HTTP request — an attacker could return a public IP during validation, then a private IP during the real fetch.
HUMAN REVIEW: The comment says "tracked for a future hardening pass" but no ticket is referenced. Consider implementing a custom
http.Transportwith aDialContexthook that re-validates resolved IPs, or create a tracking ticket.apps/gravity-pm/internal/handler/settings.go
[IMPORTANT] Lines 159-163: Type assertion
updates["oidc_issuer_url"].(string)silently skips SSRF validation if the value is not a string (e.g., JSON number or object). A non-string value would be persisted without validation.Suggested fix: Explicit type check:
issuerURL, ok := v.(string); if !ok { writeError(w, 400, "oidc_issuer_url must be a string"); return }.apps/gravity-pm/internal/web/setup.go
[IMPORTANT] Lines 131-133: Raw
oidc.NewProvidererror displayed to user inhandleVerify. May contain internal hostnames/IPs. Same at lines 103-113 (handleSkip) where raw DB error is rendered.Suggested fix: Log full error with
slog.Error, render generic message to user.apps/gravity-pm/internal/handler/projects.go
DeleteProjectandPurgeProjectexecute raw SQL viatx.QueryRowinside the handler, breaking SoC. Should be a store method.apps/gravity-pm/internal/auth/
valkey_session_test.go: Core session lifecycle methods (Create, Get, Destroy, DeleteAllForUser) have no unit tests. Only sign/verify and constructor are tested.manager_test.go: OIDCManager tests only cover nil-provider paths. Init, SubscribeReload, and hot-reload validation are untested.issuer_url_test.go: No positive test case for a valid HTTPS URL resolving to a public IP.deploy/prometheus/prometheus.yml
scrape_configs.Additional Minor/Nitpick Findings (14 items)
anthropic.go: Inconsistent constructor patterns across providers (functional options vs separate constructors)openai.go:235: Inconsistent error logging between providersskill.go:178: Lossy[]string→ comma-separated string encoding forToolAllowlistsetup.go:186:handleSaveuses non-transactionalstore.UpdateSettingsvs handler's transactional pathsettings_test.go: No test for SSRF validation path inUpdateAuthSettingscore/event.go:Handlertype defined in event.go alongside Event — consider separate filecore/event.go:35:ToolAllowlisttyped as string not[]string— format undocumentedcore/plugin.go:42:Dependencies.PMfield is unexplained 2-letter abbreviationanthropic.go:25:maxResponseBytesdefined in anthropic.go but used by OpenAI toooidc.go:128: Claims struct defined inline twice (VerifyBearerToken and CallbackHandler)oidc.go:188: CallbackHandler is 155 lines — suggest extractionprocessor.go:175:handleFailurehas 7 parameters008_soft_deletes.sql: Unique constraints don't use partial indexes (WHERE deleted_at IS NULL)Positive Observations
The codebase demonstrates strong engineering across this large release:
subtle.ConstantTimeCompareused correctly in handler authWHERE id = $N AND status IN (...)prevents TOCTOU racesPR Followup — Review Findings Addressed
Triaged: 52 findings total
Human Review Decisions
Work Packages (Epic OP#1804)
Commits (10 fixup commits)
05d82b9fix(security): resolve 3 blocking PR review findingsecc9ec1fix(dispatch): harden provider implementations192f731fix(dispatch): validate skill names and fix ResolveSkill doc comment2498f46fix(handler): expand job types, cap retries, align query limitsbe9896efix(core): add response size limit and fix APIError test assertion45d67e4fix(auth): mitigate DNS rebinding, fix type assertion, sanitize errors48228b8fix(infra): harden Valkey/NATS exposure, Prometheus config, CI supply chainc529003fix(dispatch): eliminate data races in OpenAI and Ollama testsd39f5ecfix(dispatch): warn when OpenAI BaseURL set without API keyccd5ba8fix(dispatch): set AllowInternal in integration test MCP configsTest Results
Deferred Items (low-risk nitpicks)
These items were assessed as too low-risk or too high-churn for the release branch:
ToolAllowliststring→[]string type change (breaking API change)