Frontend · AI Programming · CI/CD

A 200-Line Python Flask Service That Posts AI Code Reviews Straight to GitLab MRs

By 雨夜寻晴天 · Jun 29, 2026

Read original on juejin.cn ↗ Google Translate ↗ Alt translation

Teams waste hours on repetitive review comments that an LLM can catch in seconds. This pattern turns a spec document into an enforceable, always-on reviewer without touching the CI runner's language or framework, and the 200-line footprint means a single developer can own it.

Summary

A lightweight Python service automates the first pass of code review by connecting GitLab CI to any OpenAI-compatible large language model. When a merge request opens, the service fetches the diff via GitLab's API, constructs a prompt that includes a synced team specification file, and calls the model for a structured review. The resulting JSON—containing a score, file-level issues, and severity—gets posted back to the MR as a comment.

The entire stack runs on roughly 200 lines of Flask code. A separate Git repository holds the review rules, which the service clones on startup so that changing the spec never requires touching the application code. Environment variables control the model endpoint, GitLab token, and spec repo URL, making it straightforward to swap between OpenAI, a local Ollama model, or a corporate LLM proxy.

Frontend developers who know Express and axios can follow the logic directly: Flask maps to Express, the `requests` library maps to axios, and `python-dotenv` mirrors the Node `dotenv` package. The guide walks through installation, `.env` configuration, a line-by-line breakdown of the code, GitLab CI integration, and the most common failure modes—401 token errors, Docker networking gotchas, and JSON parsing failures from the model.

Takeaways

— A Flask service under 200 lines can fetch a GitLab MR diff, call an LLM, and post structured review comments back to the MR.

— Review rules live in a separate Git repository that the service clones on startup, so updating the spec never requires a code change.

— The service uses a single `.env` file to switch between OpenAI, a local Ollama model, or any compatible API endpoint.

— GitLab CI triggers the review via a `curl` command in `.gitlab-ci.yml`, passing built-in variables like `$CI_PROJECT_PATH` and `$CI_MERGE_REQUEST_IID`.

— GitLab personal access tokens need `api` scope to post MR comments; `read_user` alone causes a 401.

— Running the service on a Mac and calling it from a Docker-based GitLab Runner requires `host.docker.internal` instead of `localhost`.

— The prompt includes a full team specification file and asks the model to return JSON with a score, file paths, line numbers, and severity levels.

— Code includes a fallback that extracts JSON from markdown code fences when the model wraps its output in explanatory text.

— Flask's synchronous default removes the need for async/await, making the control flow easier for developers coming from JavaScript.

Conclusions

Treating the review specification as a separate, version-controlled artifact decouples policy from implementation and lets non-engineers contribute rules.

The hardest part of an AI review pipeline is not the model call but the plumbing: GitLab API authentication, diff extraction, and reliable JSON parsing from LLM output.

Flask's synchronous-by-default model is a better fit for small integration services than async Python frameworks because it eliminates an entire class of concurrency bugs.

Hardcoding a test token in a local-only service is a pragmatic trade-off that avoids secret-management overhead during development, provided it is replaced before any network exposure.

Frontend developers can transfer their mental model of Express + axios directly to Flask + requests; the syntax differences are minor compared to the architectural patterns.

LLM-based review works best as a first-pass filter for mechanical issues—null checks, naming conventions, missing error handling—rather than as a replacement for architectural judgement.

Concepts & terms

GitLab MR diff API

The endpoint `GET /projects/:id/merge_requests/:iid/changes` returns a JSON object containing a `changes` array, where each element holds the `diff` of a modified file. The service extracts and concatenates these diffs to form the prompt payload.

Review specification repository

A standalone Git repository containing a `SKILL.md` file with team coding rules in natural language. The service clones it on startup so that updating review criteria is a documentation change, not a code deployment.

LLM temperature

A parameter (typically 0–2) that controls randomness in model output. Lower values like 0.1 produce more deterministic, parseable JSON, which matters when the downstream system expects structured data.

GitLab CI predefined variables

Variables such as `$CI_PROJECT_PATH`, `$CI_MERGE_REQUEST_IID`, and `$CI_MERGE_REQUEST_SOURCE_BRANCH_NAME` that GitLab injects into every pipeline job, removing the need to hardcode project-specific values in CI configuration.

Shell executor vs Docker executor (GitLab Runner)

A shell executor runs jobs directly on the host OS, so `localhost` resolves to the host. A Docker executor runs jobs inside containers, where `localhost` points to the container itself, requiring `host.docker.internal` on Mac to reach the host.

Source: juejin.cn ↗ Google Translate ↗ Backup ↗