Building a Resume Matcher with tRPC, NLP, and Vertex AI / Хабр

I recently built a small resume matcher app in TypeScript that compares PDF resumes to job postings. I wanted a fast way to prototype an API, so I chose tRPC for the backend. tRPC is a TypeScript-first RPC framework that promises “end-to-end typesafe APIs”, meaning I could share types between client and server without writing OpenAPI schemas or GraphQL SDL. In practice that meant I could focus on writing logic instead of boilerplate. Unlike REST or GraphQL, tRPC doesn’t expose a generic schema – it just exposes procedures (essentially functions) on the server that the client can call, sharing input/output types directly.

Why is that useful? In short, I was building an internal tool (an MVP) and I was already using TypeScript on both ends. tRPC’s zero-build-step, type-safe model fit the bill. The official tRPC docs even tout automatic type-safety: if I change a server function’s input or output, TypeScript will warn me on the client before I even send a request. That was a big win for catching bugs early. In contrast, with REST or GraphQL I’d have to manually sync or generate schemas. On the flip side, I knew tRPC ties my API and client code closely together (it’s not a language-agnostic API) – so it’s best for “TypeScript-first” projects like this, not public cross-platform APIs.

Defining the tRPC Router and Input Validation

With tRPC set up, I wrote a simple router for the main operation: analyzing two uploaded PDF files (a CV and a job description). Using tRPC with Zod-form-data, I could validate file uploads easily. Here’s a simplified version of the router code:

export const matchRouter = router({
  analyzePdfs: baseProcedure
    .input(zfd.formData({
      vacancyPdf: zfd.file().refine(file => file.type === "application/pdf", {
        message: "Only PDF files are allowed",
      }),
      cvPdf: zfd.file().refine(file => file.type === "application/pdf", {
        message: "Only PDF files are allowed",
      }),
    }))
    .mutation(async ({ input }) => {
      const [cvText, vacancyText] = await Promise.all([
        PDFService.extractText(input.cvPdf),
        PDFService.extractText(input.vacancyPdf),
      ]);
      const result = await MatcherService.match(cvText, vacancyText);
      return { matchRequestId, ...result };
    }),
});

Above, the analyzePdfs mutation takes a multipart form with two PDF files. The zfd.file().refine(...) calls ensure each file is a PDF. Once the files are validated and uploaded (via a helper FileService), I use PDFService.extractText(...) to pull out the raw text from each PDF. Then I call MatcherService.match(cvText, vacancyText), which does the actual analysis. Because tRPC knows the input/output types, my frontend gets fully typed results without me writing extra DTOs. This rapid setup and tight type safety saved a lot of time on the MVP.

Extracting Skills with Basic NLP

Once I had the plain text of the CV and job description, I needed to extract meaningful keywords or skills from them. I kept it simple: I used a combination of natural (for tokenization), compromise (for part-of-speech like nouns), and a stopword filter. For example, in MatcherService I have a helper like this:

private static extractSkills(text: string): Set<string> {
  const doc = nlp(text);
  const nouns = doc.nouns().out("array"); // nouns are often skills or keywords
  const capitalizedWords = text.match(/\b[A-Z][a-zA-Z0-9.-]+\b/g) || []; 
  // also pick up capitalized words (like frameworks or proper nouns)
  return new Set([...nouns, ...capitalizedWords].map(w => w.toLowerCase()));
}

In plain terms, this code lowercases the text, runs it through compromise NLP to grab nouns, and also regex-matches any capitalized word (which often catches tech names). Merging those and removing duplicates gives me a set of candidate “skills” from each document. This basic keyword extraction isn’t fancy ML – it’s just a heuristic – but it’s fast and served well for highlighting matching skills. (It reminds me of some old-school resume parsers.) No external model needed yet, just some handy libraries and a bit of regex in a shared service class.

Integrating Vertex AI (Gemini 1.5 Flash) for Matching

For the core matching logic, I decided to call out to Google’s Vertex AI with the new Gemini 1.5 Flash model. This was mostly about getting a structured comparison result (like a score and suggestions) without me implementing complex NLP logic. In MatcherService, after cleaning up the text and extracting skills, I build a prompt and fetch from Vertex. For example:

const aiPrompt = `
Analyze the job description and candidate's CV to provide a structured evaluation.

Job Description:
${cleanedJD}

Candidate CV:
${cleanedCV}

Provide a structured analysis in JSON format with fields "score", "strengths", and "suggestions".
`;

const response = await fetch(process.env.AI_API_ENDPOINT!, {
  method: "POST",
  headers: {
    Authorization: process.env.AI_API_TOKEN,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    contents: [
      { role: "user", parts: [{ text: aiPrompt }] }
    ]
  })
});

const data = await response.json();
if (!data?.candidates?.[0]?.content?.parts?.[0]?.text) {
  throw new AIServiceError('Invalid AI response format');
}
const rawResponse = data.candidates[0].content.parts[0].text;
// Then parse rawResponse as JSON for score, strengths, suggestions...

Here I’m using fetch to POST to a Vertex AI endpoint (configured in AI_API_ENDPOINT), passing a user prompt in the request body. The prompt tells the model to compare the job description and CV and output a JSON with a match score, strengths, etc. I then parse the JSON text out of data.candidates[0].content.parts[0].text. This approach was super helpful – Gemini churned out a result without me writing a ranking algorithm. It feels like treating the model as a black-box comparator. Of course, this means I’m trusting the AI, and sometimes the output needed cleaning or validation. But overall, embedding Gemini in the service let me focus on UI and data flow, not on LLM prompting. (I did have to handle some errors and rate-limiting around the call.)

Why tRPC Was a Good Fit (and Its Trade-Offs)

Using tRPC definitely sped up development. With no API schemas to write, I could spin up the endpoint in minutes. Full-stack TypeScript means the router code I wrote above is shared code on the client (via a tRPC client generator), so I get compile-time checks. In practice, when I changed the Zod validation or the return shape, my React UI immediately failed to compile until I adjusted the UI types. This “autocompletion” feel is exactly what the tRPC site promises. And because tRPC has essentially zero boilerplate (no controller classes, no code-gen), the code stayed concise.

On the other hand, I’m aware of tRPC’s limits. It ties my frontend directly to this server implementation, so if I ever needed a public REST or mobile client, I’d have to reconsider. I also had to think about caching and rate limits myself (tRPC doesn’t do caching out of the box like GraphQL might). The Directus blog hit the nail on the head: tRPC is great for internal, TypeScript-heavy tools, but it “limits your options” if you need broad compatibility. For this project – essentially an internal demo – those trade-offs were acceptable. I even implemented a simple rate limiter middleware just in case my Vertex AI calls overwhelmed the quota.

Lessons Learned

Building this project with modern TypeScript APIs was pretty enjoyable. I got end-to-end typing (client knows exactly what { score: number } shape comes back) and no separate client library to maintain. The code feels very “SDK-like”, just calling functions on matchRouter as if it were local code. On the NLP side, I learned that even simple heuristics (nouns + capitalized words) can do a passable job of keyword extraction in a pinch. And finally, integrating Vertex AI reminded me that a lot of the “AI magic” can be outsourced with a well-crafted prompt.

All that said, nothing is a silver bullet. If I had more time, I’d refine error handling around the AI service and maybe add caching of results (since PDF-to-text and AI calls are expensive). And if this app grew beyond a quick demo, I might swap tRPC for a more conventional REST/GraphQL API if I needed a public interface. For now though, tRPC gave me exactly what I needed: a fast MVP with end-to-end typesafety and minimal ceremony.

You can find the full code for this project here: GitHub Repository.

Sources: I leaned on several resources while exploring this setup. The tRPC website calls out the “move fast, break nothing” end-to-end TS approach, and blog posts compare how tRPC fits among REST/GraphQL, noting its TypeScript-first advantages and constraints.

tRPC Official Site – Move Fast and Break Nothing. End-to-end typesafe APIs made easy. (Shows tRPC’s focus on full-stack TypeScript and type safety).
Viljami Kuosmanen, Comparing REST, GraphQL & tRPC (dev.to, Oct 2023) – Discusses how tRPC exposes RPC-style functions and shares types instead of a generic schema.
Bryant Gillespie, REST vs. GraphQL vs. tRPC (Directus blog, Feb 2025) – Covers tRPC’s strengths (minimal boilerplate, type safety) and trade-offs (TypeScript-only, limited API reach).

Building a Resume Matcher with tRPC, NLP, and Vertex AI

Defining the tRPC Router and Input Validation

Extracting Skills with Basic NLP

Integrating Vertex AI (Gemini 1.5 Flash) for Matching

Why tRPC Was a Good Fit (and Its Trade-Offs)

Lessons Learned

Публикации

Ближайшие события