跪拜 Guibai
← Back to the summary

From a Pet Peeve to a Public-Facing Tool: One Developer's Journey from HTML to Java with an AI Coding Partner

From a Simple File-Sharing Need to a Public-Facing "Instant Transfer": I Built It from HTML All the Way to Java with WorkBuddy

As someone who connects to servers every day, my pain point is very specific: moving small data across machines is ridiculously expensive.

I mostly use Sunlogin or Remote Desktop to connect to production/test machines. Working is fine, but moving things is pure torture. A modified config file from my local machine, a snippet of an error log from the scene, a temporary jar I just built — I need to get it onto that server. WeChat File Assistant requires logging in on both ends; cloud drives require uploading first, then downloading, and they leave traces; rz/sz can make you question your life on a weak network. Text is even worse than files: the clipboard in Remote Desktop is unreliable, and a piece of nginx config or a temporary token often has to be typed out character by character while staring at the screen.

What I wanted was extremely simple: a web page, drop something in, the other side enters a code and gets it, auto-destroys after use, no registration, no login, no history. The tools on the market are either too heavy, force login, or record every single transfer. None of them hit the mark.

So I decided to write it myself. I'm fine with Java for the backend, but I really didn't want to touch the frontend — perfect opportunity to let WorkBuddy accompany me through the whole process and see how far it could help.

Homepage / Receive State

Step 1: Choosing the Right Track

On the WorkBuddy homepage, several main lines are laid out at the top: Daily Office, Code Development, Design & Creativity. I wanted to write engineering, so I directly clicked Code Development.

Selecting Code Development

It Didn't Rush to Write Code; It Helped Me "Deconstruct" the Requirements First

I didn't write a requirements document; I just threw in the complaint above as is. What surprised me most was that it didn't immediately enter the excited state of writing code. Instead, it first abstracted my pain point into a single sentence — "temporary file and text transfer in remote environments" — and then directly gave me three technical routes, complete with a pros/cons comparison table:

It leaned towards C and explained its reasoning.

Requirements and Three Plans

The experience at this step already won half the battle. I fed it a vague complaint, and it returned a technical selection with trade-offs — this is the conversational style an engineer wants, not just two hundred lines of runnable but misguided code.

Repeatedly Discussing "Security"

I didn't make a decision immediately; I first threw in my most important constraint: this thing needs to be exposed to the public internet for strangers to use; it can't be bare. It followed this line and broke down the security model into "User Side" and "Server Side", with granularity fine enough to be directly copied into a design document:

User/Access Side — Identity authentication uses one-time access codes (passwords), no registration system, generated temporarily for each access, entering an independent space; sharing goes through invitation links with tokens, the link has a built-in validity period, expires and becomes invalid. Upload side — Force HTTPS, dangerous extensions (exe/sh/php) are blocked via a blacklist, limit single file size, limit total files per user. Download side — Download links have a validity period and are unguessable URLs, support view-once-then-burn (deleted after one download).

Security Architecture Thinking · User and Upload Side

Server Side — Files are isolated and stored per user, invisible to each other; text content is encrypted at rest (AES-like, server only stores ciphertext); unified TTL auto-cleanup; same IP rate limiting to prevent abuse; audit logs only record time and metadata, not content.

Security Architecture Thinking · Server Side and Additional Ideas

Its sensitivity to security exceeded my expectations. I only said "don't be bare" three words, and it laid out the key model, isolated storage, rate limiting, and audit boundaries all at once. These points were almost directly carried over into the final implementation.

Nailing Down Features and Flow, Conveniently Cutting Half the Requirements

After thoroughly discussing security, we moved on to defining features. I verbally described the process for the sender and receiver, and it casually drew an end-to-end state transition:

Sender page → Upload file/text → Generate unique password (URL) → Set validity period / download count / password → Receiver page views and downloads via the link.

Basic Features and Overall Architecture Diagram

Then it separated multi-person rooms into a standalone module: temporary space via WebSocket, multiple people enter the same password to enter the same room to send and receive text; the room has a creator, a validity period, and a user limit; the creator can kick people, can destroy the room, and messages are cleared upon expiration.

The truly valuable part was the concluding suggestion — don't try to eat the whole elephant at once; split into two phases based on delivered value:

  1. MVP: Single file/text sharing + link + validity period/download count;
  2. Enhanced Version: Stack rooms, QR codes, password protection on top of the MVP.

"Do the MVP first, add features after it's running, lowest cost."

Splitting MVP / Enhanced Version

I was in that "I want all the features" excited state, it didn't go along with piling on, but instead helped me subtract. An AI assistant that can hit the brakes when you're excited and pull you back to MVP rhythm — this is more important than writing code fast. This split proved to be the right approach later.

A Reference Image, It Read Out a "Design Specification"

Features were set, but the look wasn't defined. I was too lazy to describe it, so I just grabbed a screenshot of a style I liked and threw it over. It didn't just praise it; it reverse-engineered that image into an executable visual specification: light gray background + white cards, green primary color (for button highlights and selected states), large rounded corners, low information density, three main entries at the top (Receive/Send/Room), right-side info panel + QR code, security tips at the bottom.

Then it defined the MVP form and casually asked if I wanted to create a long-term archive for the project. I said yes, named it "YiZhiNiuBo", and proceeded in this direction. It replied: "Okay, NiuBo, let's start."

Giving a reference image, setting the style

From a casually grabbed screenshot to a structured design token, this step translated "my indescribable aesthetic" into "rules it could implement", cutting communication costs in half.

First Version: Native HTML/JS to Set Up the Skeleton First

The first version came quickly, pure native HTML/JS, running on localhost:3000. The skeleton was already complete: three tabs at the top, send area in the middle, validity period selection, info card on the right. It was rough, but the vibe of the reference image was there.

First version effect

I asked it to run directly, and it provided the startup commands: npm install, npm start, and it was up in a few lines. It doesn't just hand over code; it also thinks about "how to run it" for you.

Asking whether to run

Second Version: Switch to Vue, and Casually Ask for Frosted Glass

As I wrote with native JS, the imperative DOM manipulation started to get messy. I asked it to refactor with Vue, and also mentioned an effect I always wanted — frosted glass. It used the backdrop-filter: blur() approach with semi-transparency, and the overall texture immediately improved.

Refactoring with Vue

Third Version: Pure Detail Polish, Adjusting the Frosted Glass Opacity

The first version of the frosted glass was too blurry, the background illustration was completely obscured, wasting it. I asked it to pull back the opacity. This step had no technical content, just aesthetic back-and-forth — starting from very high opacity, gradually adjusting down, landing around 45%, where the background illustration could be faintly seen without overpowering the foreground content. In the screenshot, I marked "a bit of an effect", meaning this version finally looked right.

Continuing to optimize opacity

It's worth mentioning that this kind of "almost there" tactile tuning, it handled very steadily: I didn't give specific values, just said "too transparent/"a bit more solid", and it could converge in the right direction based on the semantics, rather than requiring me to specify parameters.

Finally Landing on Java: The Tech Stack Was Driven by Deployment

When we started discussing going live, the direction changed — and it changed reasonably.

This thing needed to be publicly accessible long-term for people to use. The Node.js deployment would require installing a runtime, starting a process, configuring a daemon, and handling restarts. My backend was already Java anyway, so it was simpler to land entirely on Spring Boot 3.3.5 + Java 17: package it as a single executable jar, configure a multi-stage Dockerfile (Maven builds first, artifact goes into a JRE runtime image), docker-compose up to start, front it with Nginx reverse proxy + Let's Encrypt automatic certificate signing.

The frontend was scaled back instead: no longer carrying Vue's build chain, but returning to modular native JS — split into features / ui / utils / api directories. The backend Java took over all the heavy lifting: storage, rate limiting, scheduled cleanup, password generation, QR codes (ZXing direct output). One jar to rule them all, and deployment became clean. It also listed the deployment commands one by one.

Giving deployment and final effect

The seemingly jumpy path of HTML → Vue → Java actually has a hidden thread: native was used to validate the form, Vue to test interaction and texture, Java to handle deployment and security. The tech stack wasn't changed on a whim; it naturally grew out of following "how is this thing actually used and deployed?" This judgment of "choosing for constraints" is precisely where its professionalism lies.

At this point, the final engineering capability was no longer a toy. Let me casually post a few real parameters from my final draft to prove it wasn't just fluff:

Looking at the Code: How Well Did It Write?

Good parameters don't mean good code. I specifically picked out a few key implementations it generated and posted them — these are precisely the parts most likely to be poorly written and best reveal skill level.

First piece, the key handling for end-to-end encryption. The entire E2E linchpin is "where to put the key". Its approach was: the key is only encoded into the #k= fragment of the URL — and URL fragments, per browser specification, are never sent to the server with the request, so the server physically cannot obtain the key, only store ciphertext.

// After encryption, the key is spliced into the URL's hash fragment, never entering the request body
export function encryptedUrl(url, key) {
  return `${url}#k=${encodeURIComponent(key)}`;
}

// The receiver retrieves the key from location.hash and decrypts locally
export function keyFromLocation() {
  const params = new URLSearchParams(location.hash.replace(/^#/, ""));
  return params.get("k") || "";
}

async function encryptBytes(plainBytes, rawKey) {
  const iv = randomBytes(IV_BYTES);                       // Random 12-byte IV each time
  const key = await importKey(rawKey);
  const cipherBuffer = await crypto.subtle.encrypt({ name: "AES-GCM", iv }, key, plainBytes);
  return concatBytes(iv, new Uint8Array(cipherBuffer));   // IV prepended to ciphertext, extracted during decryption
}

Using WebCrypto's AES-GCM, random IV each time, IV prepended — this is textbook correct practice. It neither reinvents the wheel nor mistakenly sends the key to the server. A developer without a cryptography background could easily mess this up; it didn't.

Second piece, rate limiting. It used a sliding counter with a time window, and synchronized the counter object to prevent race conditions in concurrent scenarios:

public void require(String key, int maxCount, Duration window, String message) {
    Instant now = Instant.now();
    WindowCounter counter = counters.computeIfAbsent(key, ignored -> new WindowCounter(now, 0));
    synchronized (counter) {
        if (Duration.between(counter.windowStart, now).compareTo(window) >= 0) {
            counter.windowStart = now;   // Window expired, reset
            counter.count = 0;
        }
        if (counter.count >= maxCount) {
            throw new ApiException(HttpStatus.TOO_MANY_REQUESTS, message);
        }
        counter.count++;
    }
}

One method, using the passed key + window + maxCount, simultaneously serves "upload per minute/per hour", "query per minute", "message per minute", and all other scenarios, without duplicating logic for each type of rate limit. The lock granularity is also kept on the individual counter object, not the entire table, so concurrent throughput isn't dragged down by a single big lock.

Third piece, the part I admire most — why the upload concurrency gate was placed in the Filter layer. It didn't put this protection in the Controller; instead, it made a separate OncePerRequestFilter, and wrote the reason in the comment:

/**
 * Acquires upload concurrency permits before Spring parses multipart bodies.
 *
 * <p>Controller-level guards run too late for large uploads because the request
 * body may already be parsed. Keeping this protection at the filter layer
 * limits concurrent body ingestion from the same IP as well as application
 * processing.</p>
 */
@Component
public class UploadConcurrencyFilter extends OncePerRequestFilter {
    // ...
    try (TransferGuardService.Guard ignored = transferGuardService.upload(ClientIpUtil.resolve(request))) {
        filterChain.doFilter(request, response);   // Only proceeds if permit acquired, auto-releases out of scope
    } catch (ApiException exception) {
        writeApiError(response, exception);
    }
}

"Controller-level guards run too late for large uploads because the request body may already be parsed" — this is a comment written by someone who has been burned and truly understands the Spring request lifecycle. Combined with try-with-resources for automatic permit release, it both blocks malicious concurrent uploads from filling the disk and prevents permit leaks. This single piece, taken out and put into any production project's Code Review, would pass without issue.

Fourth piece, the restraint in cleanup scheduling. The reclamation of all expired data (shares, zombie uploads, rooms, rate limit counters) is strung together in a single @Scheduled method, with the period injectable from configuration:

@Scheduled(fixedDelayString = "${app.cleanup-interval-seconds:60}000")
public void cleanup() {
    shareStorageService.cleanupExpired();
    shareStorageService.cleanupStaleUploads();
    roomStorageService.cleanupExpired();
    rateLimitService.cleanup();
}

No separate timers for each data type, no cleanup logic scattered across various Services running stealthily — consolidated in one place, dependency injected, period configurable. Keep it simple where it should be simple.

Looking at these four pieces together, its code is not at the "as long as it runs" level: separation of concerns is clear, concurrency and boundaries are considered, comments are written where they are truly needed, no repetition and no over-engineering. Honestly, this quality is close to the work of a decent mid-to-senior engineer.

The Final Product: Designs Forced by Product Logic

I won't go through the interface item by item. What I want to say is, several decisions in this final interaction were forced by the core nature of "temporary transfer" — they look like UI, but are essentially product judgments. WorkBuddy had basically thought of these for me in the earlier conversations.

First decision: Default to "Receive", not "Send". I strongly agree with this choice. The sender is proactive; they know what they want to do. The receiver is passive; they are most likely led here by a password or QR code. The sooner they see "where to enter the code", the better. So the homepage defaults to the receive state, with a large password box right in the center, placing the most frequent, least patient path at zero clicks. The info bar on the right (max 2 hours, single file 200MB, no login required) succinctly explains "this is a temporary thing" without interrupting the main flow.

Homepage defaults to receive state

Second decision: Make "Validity Period" and "Receive Count" first-class citizens. In typical cloud drive sharing, expiration is an advanced option hidden in a submenu. Here, they are two unavoidable knobs in the creation flow — validity period (10 minutes to 2 hours) is laid out as buttons directly, receive count (default 1) is prominently displayed. This isn't feature stacking; it's using interaction to shove the product value proposition in the user's face: this thing is born to disappear, you must make a decision about its "short life" . The real-time byte count and 256KB limit indicator in the bottom right of the text box also continuously hint at boundaries.

Sender side puts time and count front and center

Third decision: Give password, QR code, and link — all three entry points at once. This was forced by real scenarios — my original pain point was "cross-device", and cross-device means no unified copy-paste channel. So the result page simultaneously outputs an 8-digit password (suitable for telling someone next to you / manual typing), a QR code (suitable for computer to phone scan), and a full link with key (suitable for throwing in an IM). Worth mentioning separately is the #k=E6LWXY1i0Ptt... at the tail of the link — it's the AES key from the encryption code earlier that "only lives in the fragment, never sent to the server" . The bottom text "End-to-End Encrypted · Server Only Stores Ciphertext" is backed by code here, not just decorative.

Three entry points and visible E2E key

Fourth decision: Make security "perceivable" . Encryption that is done but invisible to the user is as good as not done. Every piece of content on the receive page has an E2E tag in front of it, next to remaining count, byte size, creation time, and a ticking countdown at the top. The user doesn't need to understand AES-GCM, but they need to believe in that one glance that "this thing is encrypted, will expire, and is only for me" — this translation of backend guarantees into frontend visible signals is something many tools skip, but it's precisely what most affects trust.

Receive page makes encryption and timeliness visible

Fifth decision: The destroyed state is designed as a proper state, not just an error. Most applications handle "content gone" with a 404 or a line of red text. But for a product that promotes "view-once-then-burn", "destroyed" is precisely the story it should tell best. So here is a complete state card: clearly telling you "Receive count exhausted or content expired, server has deleted temporary content", and directly providing the next step of "Recreate". The key is that this copy is backed by reality — the server's scheduled task physically deleted the data, not just hid it on the frontend. The original requirement of "auto-disappear after use, leave no trace" is fulfilled on this screen.

Destroyed state is a carefully designed state

Sixth decision: The enhanced version's multi-person room was actually implemented, and mobile-first validated. The "temporary space" that was separated out in the design phase and powered by WebSocket didn't stay on the slides. After entering the room on a phone, the top directly shows 30 people online + 01:58:13 destruction countdownputting the two core attributes of "multi-person" and "temporary" on the first screen, followed by the real-time message stream with timestamps and byte sizes, QR code invitation, and exit. For an AI-assisted project to complete the Phase 2 feature with the same design language as Phase 1 and maintain the same restrained layout on a narrow screen — this level of completion exceeded my expectations.

Multi-person temporary space implemented on mobile

Final Thoughts

The entire project was discussed intermittently, and I hardly typed any code myself, but honestly, it wasn't as simple as "just talk". The real time was spent on the back-and-forth earlier — which of the three plans to choose, how deep security needed to go, how much to cut for the MVP, how to realize the vibe of the reference image. Once these were figured out, writing the code was the least effortful part.

A simple thought of "file and text transfer is annoying" turned into a small site that can be exposed to the public internet, used by scanning a QR code, and auto-disappears after use.