Alessandro Fuda

LongTermMemory Is Now on iOS: Spaced Repetition in Your Pocket

2026-06-25T00:00:00+00:00

The best study session is the one you actually do. For most people that means five minutes on the train, ten minutes before bed, a quick review during lunch. The LongTermMemory iOS app is built around that reality.

I’ve been working on LongTermMemory for a while now, and the web application has been available long enough that I’ve watched how people actually use it. The pattern is consistent: they upload materials and generate flashcards on a laptop, then they want to review on their phone. The LongTermMemory iOS app closes that loop.

What the App Does

The core of LongTermMemory is AI-powered flashcard generation combined with spaced repetition scheduling. You upload a study document (PDF, PowerPoint, a photo of handwritten notes, or plain text), the AI reads it and produces question-answer pairs from the content, and the spaced repetition algorithm schedules each card at the optimal interval to move it into long-term memory before you’d naturally forget it.

The iOS app brings the review side of that workflow to your iPhone. Your account, your decks, and your progress sync from the web platform, so the workflow looks like this in practice: upload and review generated cards on a computer, then do daily review sessions on your phone whenever you have a spare few minutes.

The app runs on iOS 15.1 or later and also works on Apple Silicon Macs through the Mac App Store. It’s a 34 MB install.

Why Mobile Matters for Spaced Repetition

Spaced repetition only works if you show up consistently. The algorithm calculates exactly when each card should surface based on your personal forgetting curve, but if you miss the session the card is due, the whole system loses its precision.

Consistency is much easier when your review queue is on the device you already have with you. A five-minute session on the phone while waiting for coffee does more for long-term retention than an hour of passive re-reading at a desk. That’s not intuition; it’s what decades of memory research (starting with Ebbinghaus in the 1880s) established about the spacing effect.

The review interface in the app is optimized for short sessions. You see the question, produce your answer mentally, flip the card, rate your recall, and move to the next one. The interaction is low-friction by design, because the goal is to remove every excuse not to review.

Who Benefits Most

Students with dense course materials. Uploading a chapter PDF and getting a study deck in minutes is a different experience from spending two hours in Anki building cards before you can start learning. The AI does the card creation; you do the learning.

Professionals studying for certifications. AWS, USMLE, CFA, bar exam, NCLEX — these all involve official study guides and reference documents that map well to flashcard generation. Upload the material, get the deck, let the algorithm manage your review calendar.

Anyone who has tried Anki and stopped using it because making cards was too slow. The research on spaced repetition is unambiguous: it works dramatically better than re-reading. The adoption gap has always been the upfront effort. Automating card creation removes that gap.

A Few Honest Notes

The AI card generation performs best on text-dense content. For material that relies heavily on diagrams or flowcharts, the AI works from surrounding text, which means visual concepts may need a few manually added cards to cover properly. For most academic and professional study materials, though, the hit rate is high enough that editing a few cards is far less work than building a deck from scratch.

The app is currently at version 1.0.1. The core review experience is solid; feature depth will grow over time as the platform develops. The web application at longtermemory.com remains the more complete environment for uploading and managing content.

The app is free to download and use.

How to Start

Download the LongTermMemory app from the App Store
Sign in or create a free account (the same account works on web and mobile)
Upload a piece of study material you’re actively working with on the web platform
Open the app on your phone and start your first review session
Come back the next day — the algorithm will show you exactly what’s due

The spaced repetition system takes care of the scheduling from there. Your job is to show up for the sessions it queues. With the app in your pocket, that’s a much easier commitment to keep.

Turn Any Google Doc Into a Study Session With Quick Q&A Generator

2026-06-15T00:00:00+00:00

Most study material lives in Google Docs: lecture notes, research summaries, technical specs you need to internalize before a certification. The gap between having a document and actually learning its content is where most studying goes wrong. Quick Q&A Generator closes that gap without making you leave the page.

I built Quick Q&A Generator - LongTermMemory as a Google Docs add-on to solve a problem I kept running into while building LongTermMemory: people had great source material, but turning it into active study prompts required switching tools, copy-pasting, or just hoping passive re-reading would work. It doesn’t. Active recall does.

What It Does

The add-on installs directly into Google Docs and surfaces as a sidebar. Open any document (meeting notes, a chapter summary, a technical RFC), click Generate, and within seconds the sidebar shows 3 to 5 core question-and-answer pairs extracted by AI from the active document.

No copy-paste. No switching tabs. No prompt engineering. The AI reads the document, identifies the concepts most worth testing, and frames them as Q&A pairs you can actually study from.

Once you have your pairs, a single Sync button pushes the document and its generated Q&A set directly to your LongTermMemory dashboard, where they enter a spaced repetition schedule and become part of your review queue.

Why This Fits Into a Real Study Flow

The bottleneck in most study workflows is not access to information; it is converting information into a retrievable form. Highlighting and re-reading feel productive but produce weak retention. Q&A pairs force retrieval, which is the mechanism that actually moves knowledge into long-term memory.

The strengths that make this add-on worth using:

It works where the content already lives. You do not import anything or change your writing habits. Your Google Doc stays your Google Doc; the add-on reads it and generates the study material in place. There is no friction between taking notes and starting to learn from them.

AI identifies what matters. Writing good flashcards is a skill, and most people write them too broadly or miss the core concept entirely. The add-on extracts the high-signal concepts (the ones likely to show up in a quiz or surface during an exam) rather than turning every sentence into a question.

One-click pipeline to spaced repetition. Generating Q&A pairs is only half the work. The real value is that syncing them to LongTermMemory puts them on a schedule: the SM-2-inspired algorithm (covered in this earlier post) ensures you review them at the right intervals, soon after learning, then at increasing delays, so the knowledge sticks rather than fading within a week.

It is free. There is no paywall for the add-on itself. Install it, use it, sync as many documents as you need.

Zero context switching. The sidebar interface means you can review the generated questions, compare them against the source text, and decide whether to sync, all without leaving the document. For people who study in focused sessions, this matters.

Who It Is Built For

The add-on is useful for anyone who consumes a lot of text and needs to retain it:

Students turning lecture notes or textbook summaries into active flashcard sets
Professionals preparing for technical certifications (AWS, GCP, security exams) from documentation they are already reading
Lifelong learners who collect research notes and want a low-friction way to actually internalize them
Teachers and examiners who want a fast first draft of quiz questions from course material

If you read something in Google Docs and care whether you remember it, this fits.

How to Get Started

Install Quick Q&A Generator - LongTermMemory from the Google Workspace Marketplace (free)
Open any Google Doc you want to study from
Open the add-on from the Extensions menu → Quick Q&A Generator
Click Generate in the sidebar
Review the Q&A pairs, then click Sync to push them to your LongTermMemory dashboard

From there, the spaced repetition engine handles scheduling. Your only job is to show up for the review sessions it queues.

The best study tool is the one that gets out of the way. If your material is already in Google Docs (and for most people it is), having Q&A generation built directly into the editor removes the last excuse not to study actively.

Building a RAG-Powered Study App: Laravel + Python Microservices

2026-03-17T00:00:00+00:00

How I combined Laravel, FastAPI, Celery, Qdrant, and OpenAI into an AI study platform: what worked, what didn’t, and the chunking problem nobody warns you about.

A few years ago I was grinding through certification study material , thick PDFs, documentation pages, whitepapers , and kept hitting the same wall: the tools that could help me learn efficiently were either too dumb (static flashcard decks you had to write yourself), too expensive, or didn’t understand my material. What I wanted was something that could read my PDFs and generate questions for me, then schedule those questions based on how well I actually knew them.

So I built it. LongTermMemory is a SaaS study platform that uses Retrieval-Augmented Generation (RAG) to auto-generate question-answer pairs from uploaded materials and implements spaced repetition to move knowledge into long-term memory. This post is a technical walkthrough of the interesting engineering decisions, the mistakes I made, and specifically the one problem that took longer to solve than anything else: chunking.

The Architecture Decision: Why Two Languages?

My first instinct was to build everything in Laravel. I’ve been writing PHP professionally for years, Laravel is excellent, and managing two runtimes, two Dockerfiles, and two test suites isn’t thrilling.

The problem is that the AI/RAG ecosystem lives in Python. LlamaIndex, LangChain, the OpenAI Python client, all of the tooling for embeddings and vector operations , it’s mature, well-documented, and under active development. The PHP equivalents are either nonexistent or years behind.

The compromise: Laravel handles everything product-concern , authentication, billing, user management, the REST API the frontend talks to, email notifications, database schema. FastAPI + Celery handles everything AI-concern , document ingestion, chunking, embedding generation, vector storage, Q&A generation. The two services communicate over an internal Docker network.

Here’s the rough topology:

React (5173)
    │
    ▼
Nginx → PHP-FPM (Laravel 12)      ←→  MySQL
                │
                ▼
         FastAPI (8000)
                │
         Celery Worker  ←──────────── MinIO (raw documents)
                │        ←──────── Redis (broker + job state)
         ┌──────┴──────┐
         ▼             ▼
      Qdrant        OpenAI API
   (vectors)       (embeddings + LLM)

Documents live in MinIO (S3-compatible object storage). When a user uploads a PDF, Laravel stores it in MinIO and records the metadata in MySQL. When they trigger Q&A generation, Laravel POSTs a job request to the FastAPI service. Celery picks it up, retrieves the files from MinIO, processes them, and when done POSTs a callback to Laravel with the results.

Here the complete technical documentations: ReadTheDocs, GitBook

Async Processing and the Push Callback Model

Document processing is slow. A large PDF can take 30,120 seconds: extract text, chunk it semantically, generate embeddings for each chunk, store vectors in Qdrant, run the LLM to generate Q&A pairs. You can’t hold an HTTP connection open for that long.

The flow is: Laravel calls POST /api/generate-qa → FastAPI immediately returns a job_id → Celery picks up the task → when done, Celery calls back to Laravel with the results.

I chose push callbacks over polling for the same reason webhooks are better than polling: the server-side work happens exactly once, at the right time, rather than on every tick of a polling loop.

# When the Celery task finishes, it notifies Laravel directly
def _notify_laravel_job_finished(job_id, project_id, job_data, settings):
    payload = {
        "job_id": job_id,
        "project_id": project_id,
        "status": job_data.get("status"),
        "qa_pairs": job_data.get("qa_pairs", []),
        "error": job_data.get("error"),
    }
    url = f"{settings.laravel_app_url}/api/job-finished"
    with httpx.Client(timeout=10.0) as client:
        client.post(url, json=payload, headers={"X-API-Key": api_key})

Laravel receives this at a dedicated callback endpoint, saves the Q&A pairs, and fires an email notification to the user , all immediately when the job finishes.

Preventing Duplicate Jobs

One early bug: if a user clicked “Generate Study Plan” twice quickly, two Celery jobs would run in parallel, both writing Q&A pairs to the same project , duplicate questions and double API costs.

The fix is a Redis key per project: project_job:{project_id}. Before queuing a new task, the API checks if that key exists and the referenced job is still active. If so, it returns HTTP 409. Laravel propagates this to the frontend as “generation already in progress.” The key is cleared when the job completes, fails, or is cancelled.

The Hardest Problem: Chunking

This is the part nobody really prepares you for when you read RAG tutorials.

Naive chunking is terrible

The obvious first approach is fixed-size chunking: split the document into 512-token windows with some overlap. Quick to implement, works on toy examples. In practice the Q&A quality was noticeably bad , questions would reference “the above equation” or “as mentioned in the previous section” with no context for either, because the split happened mid-concept.

Semantic chunking with LlamaIndex

LlamaIndex’s SemanticSplitterNodeParser uses embedding similarity between consecutive sentences to decide where to split. Instead of splitting every N tokens, it splits when the semantic distance between adjacent sentences exceeds a threshold , keeping conceptually related content together.

My implementation uses a two-stage approach: first SentenceSplitter for structural splits on paragraph breaks, then SemanticSplitterNodeParser for semantic coherence within those units. The result is chunks that read like coherent paragraphs rather than arbitrary text windows.

The length problem

Here’s the thing nobody tells you: the parameters that work well for a 10-page article are completely wrong for a 300-page textbook.

With the same settings on a long document you get hundreds of tiny chunks, many of them mid-sentence fragments. The LLM generates questions that are too narrow, testing individual sentences rather than concepts. Embedding costs scale linearly with chunk count , a 300-page book produces far more chunks than you’d want.

I discovered this when a user uploaded a comprehensive textbook and the generation took 8 minutes and produced 400+ Q&A pairs, most of them nearly identical questions about adjacent paragraphs.

The fix is dynamic parameter selection based on estimated content length:

total_tokens = estimated_total_tokens if estimated_total_tokens else len(text) // 4

if total_tokens > 10_000:  # ~15 pages
    stage1_chunk_size = 2048
    stage2_buffer_size = 3
    stage2_breakpoint_threshold = 97  # only split at major topic shifts
else:
    stage1_chunk_size = 1024
    stage2_buffer_size = 1
    stage2_breakpoint_threshold = 95

For long content: larger chunk size, wider semantic buffers, higher breakpoint threshold. The result is ~75% fewer chunks for book-length content, with each chunk containing a full concept.

The `breakpoint_percentile_threshold` confusion

This took me embarrassingly long to get right. The parameter name suggests a higher value means more splits, but it’s the opposite. The threshold is a percentile of embedding distances across all sentence pairs. Setting it to the 97th percentile means “only split when the distance is in the top 3% of all distances” , only the most dramatic topic shifts trigger a split. Higher = fewer splits = larger chunks.

My initial instinct was to lower the threshold for long documents. That made things worse. For long documents, you want fewer, larger chunks , you’re looking for major topic boundaries, not every paragraph break.

Cost impact

Chunk count directly drives OpenAI API costs. Every chunk needs an embedding (input cost). Every chunk generates one Q&A pair (completion cost). If your 200-page textbook creates 800 chunks instead of 200, you’re paying 4x. Adaptive chunking isn’t just a quality improvement , it’s a billing concern.

Making Q&A Generation Actually Good

Once chunking is right, quality depends on how you use retrieved context and how you prompt the LLM.

RAG retrieval for question generation

The naive approach: for each chunk, ask the LLM to generate a question. The problem is that a single chunk often lacks context , it references concepts defined elsewhere.

The better approach: before generating a question for a chunk, retrieve the 3 most semantically similar chunks from Qdrant. Include those as “related context” in the prompt. The LLM can now generate questions that test understanding across related concepts.

The 0.7 cosine similarity threshold matters: below it, the “related” chunks aren’t actually related, they just share common words. Including irrelevant context actively hurts question quality.

Prompt engineering

The system prompt is terse and specific , an expert educational content specialist designing for mastery learning. The user message template enforces constraints: the question must test conceptual understanding (not factual recall), be self-contained, and promote long-term retention.

Key insight: “quality over quantity” as an explicit instruction in the prompt measurably improves output. Without it, the LLM generates multiple surface-level questions (“What is X?”) instead of one deeper one (“How does X relate to Y, and what are the implications for Z?”).

The LLM returns structured JSON with question, answer, key_concepts (array), and difficulty_level (easy/medium/hard) , all stored in MySQL and exposed to the frontend for filtering.

Spaced Repetition

Spaced repetition schedules reviews at increasing intervals based on recall performance. The SM-2 algorithm is the most widely used variant: performance is rated 1,5, and the next review interval is computed from the previous interval, the performance score, and an ease factor that adjusts over time.

The current schema stores Q&A pairs and their scheduling state together: scheduled_at = NULL means the item is new and has never been studied. Email reminders use a push model , an hourly artisan command finds users whose local time is 8 AM and sends a single consolidated email listing due items.

The study session UI , answering questions, rating recall quality, seeing the interval adjust , is the next major frontend feature to build.

Production Gotcha: Celery Doesn’t Auto-Reload

The Celery gotcha that everyone hits: Celery workers do not auto-reload code changes. FastAPI (via Uvicorn with --reload) picks up changes automatically. Celery doesn’t. If you modify celery_tasks.py or any service module it imports and don’t restart the worker, the old code keeps running.

docker compose restart celery-worker

The symptom is confusing: your FastAPI endpoints reflect the new code, but background processing behaves as if nothing changed. This is now in every CLAUDE.md and README for the project, and I still forget it regularly.

What I’d Do Differently

Start with semantic chunking from day one. I started with fixed-size chunks as a “quick first pass” and spent more time undoing that than I would have spent implementing semantic chunking correctly from the start.

Adaptive chunk sizing should be a first-class concern. I didn’t think about variable document lengths until users started uploading textbooks. PDFs range from a 2-page note to a 500-page manual and need fundamentally different treatment.

Use a proper task result store earlier. I started tracking Celery job state with ad-hoc Redis key patterns and built the abstraction layer later as things grew. Starting with a clean interface for job state (create, read, update, expire, index by project) would have saved refactoring time.

The push callback model was the right call. I’ve worked on systems that poll job status from a frontend timer. It always becomes a source of race conditions and extra load. The callback model is simpler to reason about and delivers results faster.

Open Problems

Multi-modal documents: PDFs with diagrams and mathematical notation are common in technical study material. Current text extraction ignores images entirely.
Self-hosted LLM: Some users are uncomfortable uploading sensitive professional material to an OpenAI-backed system. LlamaIndex supports provider-swapping; the work is validating quality parity.
Chunk attribution: Q&A pairs are stored with no reference back to the specific chunks they were generated from. Adding a source_chunk_id would enable “show me the source” functionality in the study interface.

The most interesting engineering happened at the intersection of the two services. The boundary between Laravel and FastAPI isn’t just a language split , it forced clear thinking about which concerns belong where. Auth, billing, user data: PHP. Embeddings, vectors, async AI tasks: Python.

The chunking problem genuinely surprised me. Most RAG resources treat chunking as a detail , pick a size, move on. In practice it’s where the most user-visible quality variation comes from, and adaptive sizing based on document length is not optional if your use case involves documents of wildly different lengths.

If you’re building something similar, the project is at longtermemory.com.

Passwordless Auth in Laravel 12: Implementing Magic Link Login with Sanctum

2026-03-06T00:00:00+00:00

No passwords, no reset flows, no bcrypt. Just an email, a signed URL, and a Sanctum token. Here’s how to implement magic link authentication in Laravel 12 from scratch , including the edge cases that bite you in production.

Passwords are a liability. Users forget them, reuse them, and your team ends up maintaining reset flows, email verification, and “remember me” cookie logic for years. For LongTermMemory I went fully passwordless from day one: the only way to log in is to receive a magic link by email. This post walks through the complete implementation , backend in Laravel 12 + Sanctum, frontend in React , including the production gotchas that aren’t in any tutorial.

The Flow at a Glance

User submits email → POST /api/auth/magic-link
Backend generates a signed URL + a short-lived code → sends email
User clicks link → GET /auth/magic-login/{user_id}?signature=...
Backend validates signature → generates a one-time code → redirects to frontend
Frontend receives code → POST /api/auth/exchange
Backend validates code → issues Sanctum token → user is authenticated

There are two extra branches: new users (who need email verification before getting a login link) and an OTP fallback (a 6-digit code in the same email for users whose email client breaks links). Both share the same token-issuing endpoint at the end.

Step 1: Send the Magic Link

The entry point is a single endpoint that accepts an email address:

public function magicLink(Request $request): JsonResponse
{
    $request->validate(['email' => 'required|email']);

    $user = User::where('email', $request->email)->first();

    if ($user && $user->email_verified_at) {
        $url = URL::temporarySignedRoute(
            'magic.login',
            now()->addMinutes(15),
            ['user_id' => $user->id]
        );
        $email = $user->email;
        $status = 200;
    } else {
        $email = $request->email;
        $user = $this->createNewUserInDB($email);
        $url = URL::temporarySignedRoute(
            'magic.register',
            now()->addMinutes(15),
            ['user_id' => $user->id]
        );
        $status = 404;
    }

    $otp = $this->generateAndSaveOtp($user->id);
    Mail::to($email)->send(new MagicLoginLink($url, $otp));

    return response()->json(['status' => $status]);
}

A few design decisions here:

Known vs unknown email. If the email exists and is verified, the user gets a magic.login link. If the email is new or unverified, a user record is created and they get a magic.register link (which also marks email_verified_at on click). Both flows converge at the same code-generation step.

URL::temporarySignedRoute() generates a URL with an HMAC signature and an expiry timestamp baked in. Laravel validates both automatically when you call $request->hasValidSignature(). The link expires in 15 minutes , long enough to be usable, short enough to limit exposure.

OTP in the same email. Every magic link email also contains a 6-digit OTP (rand(100000, 999999)), valid for 15 minutes. Users on mobile apps or email clients that mangle URLs can type the code instead. Same security properties, different UX.

Never enumerate users. Both branches return HTTP 200 to the caller , the $status field inside the JSON body differs (200 vs 404), but the HTTP status code is always 200. This prevents email enumeration via timing or status code differences.

Step 2: Validate the Signature and Generate a Code

When the user clicks the link, the backend validates the signature and exchanges it for a short-lived one-time code:

public function magicLogin(Request $request, $user_id)
{
    if ($errorResponse = $this->validateSignatureOrFail($request)) {
        return $errorResponse;
    }

    $user = User::findOrFail($user_id);
    return $this->generateCodeAndRedirect($user);
}

private function generateCodeAndRedirect(User $user, ?string $redirectTo = null)
{
    $code = Str::uuid()->toString();

    DB::table('magic_login_codes')->insert([
        'code'       => hash('sha256', $code),
        'user_id'    => $user->id,
        'expires_at' => now()->addMinutes(5),
        'used'       => false,
        'created_at' => now(),
    ]);

    $callbackUrl = config('app.frontend_url') . '/auth/callback?code=' . urlencode($code);

    if ($redirectTo !== null) {
        $callbackUrl .= '&redirect_to=' . urlencode($redirectTo);
    }

    return redirect($callbackUrl);
}

The signed URL is valid for 15 minutes. After validation, a fresh UUID code is generated , but only the SHA-256 hash is stored, never the plaintext. This is the same principle as storing hashed passwords: if your database leaks, raw codes can’t be replayed. The code itself lives in the URL for 5 minutes before it expires.

The redirect sends the browser to the React frontend at /auth/callback?code=, which then exchanges it for a Sanctum token.

Step 3: Exchange the Code for a Token

public function exchangeCodeWithToken(Request $request): JsonResponse
{
    $request->validate(['code' => 'required|string']);

    $record = DB::table('magic_login_codes')
        ->where('code', hash('sha256', $request->code))
        ->where('expires_at', '>', now())
        ->where('used', false)
        ->first();

    if (! $record) {
        return response()->json(['message' => 'Invalid or expired code'], 401);
    }

    DB::table('magic_login_codes')
        ->where('code', hash('sha256', $request->code))
        ->update(['used' => true]);

    $user = User::findOrFail($record->user_id);

    if (! $user->notifications_enabled) {
        $user->update(['notifications_enabled' => true]);
    }

    $token = $user->createToken('auth_token')->plainTextToken;

    return response()->json(['token' => $token]);
}

Three checks before issuing a token: the hash matches, the code hasn’t expired, and it hasn’t been used before. The used flag is set to true immediately after the record is found, before the token is issued. This stops casual replay attempts , though a fully concurrent double-submit at the exact same millisecond could theoretically pass both where('used', false) queries before either update lands. A proper fix wraps the read-and-update in a database transaction; for a low-traffic auth endpoint this race window is acceptable, but worth noting.

The notifications_enabled re-enable on login is a deliberate UX choice: users who were auto-disabled after 30 days of inactivity get their reminders back the moment they log in again. Logging in is an implicit signal of renewed interest.

$user->createToken('auth_token')->plainTextToken creates a Sanctum personal access token. The frontend stores this in localStorage as auth_token and sends it as Authorization: Bearer {token} on every subsequent request.

The Reverse Proxy Signature Gotcha

In production, the app runs behind Nginx. Signed URLs are generated using config('app.url') as the base , which might be https://api.longtermemory.com. But the request that arrives at Laravel’s validation layer may have http://localhost as its host (the proxy doesn’t forward X-Forwarded-Proto correctly in all configurations).

Laravel’s $request->hasValidSignature() reconstructs the URL from the incoming request to verify the HMAC. If the scheme or host differs from what was signed, validation silently fails.

The fix is a fallback that normalizes the URL against config('app.url') before validating:

private function hasValidAppUrlSignature(Request $request, array $ignoreQuery = []): bool
{
    // Try standard validation first
    $standardMethod = empty($ignoreQuery)
        ? $request->hasValidSignature()
        : $request->hasValidSignatureWhileIgnoring($ignoreQuery);

    if ($standardMethod) {
        return true;
    }

    // Fallback: rebuild the URL using config('app.url') as the base
    $appUrl = rtrim(config('app.url'), '/');
    $normalizedUrl = $appUrl . $request->getRequestUri();
    $normalizedRequest = Request::create($normalizedUrl);

    return URL::hasValidSignature($normalizedRequest, true, $ignoreQuery);
}

The test that covers this scenario is worth reading , it sets APP_URL to HTTPS, generates a signed URL with that scheme, then sends the request as a relative path (simulating a proxy that strips the scheme):

public function test_magic_login_with_redirect_works_when_scheme_differs_from_app_url(): void
{
    $user = User::factory()->create();

    $httpsUrl = preg_replace('/^http:/', 'https:', config('app.url'));
    config(['app.url' => $httpsUrl]);
    URL::forceRootUrl($httpsUrl);
    URL::forceScheme('https');

    $signedUrl = URL::temporarySignedRoute('magic.login.redirect', now()->addDays(30), ['user_id' => $user->id]);
    $url = $signedUrl . '&redirect_to=' . urlencode('/study-plan/pr/1');

    URL::forceScheme(null); // reset so the test request doesn't force https

    $parsedUrl = parse_url($url);
    $pathAndQuery = ($parsedUrl['path'] ?? '/') . '?' . ($parsedUrl['query'] ?? '');

    $response = $this->get($pathAndQuery);

    $response->assertRedirect();
    $this->assertStringContainsString('/auth/callback', $response->headers->get('Location'));
}

Without this fallback, every production login fails silently with a redirect to /login?error=Invalid+or+expired+link , a very confusing bug to diagnose.

Open Redirect Protection

The magic.login.redirect route accepts a redirect_to query parameter so that notification emails can deep-link users directly to their study plan after login. But this parameter must not be part of the URL signature , it’s appended after signing because the destination URL is determined at notification send time, not at route generation time.

This means redirect_to must be validated separately:

public function magicLoginWithRedirect(Request $request, $user_id)
{
    if (! $this->hasValidAppUrlSignature($request, ['redirect_to'])) {
        return redirect(config('app.frontend_url') . '/login?error=...');
    }

    $user = User::findOrFail($user_id);

    $redirectTo = $request->query('redirect_to');
    if ($redirectTo !== null && !str_starts_with($redirectTo, '/')) {
        $redirectTo = null; // reject any absolute or external URL
    }

    return $this->generateCodeAndRedirect($user, $redirectTo);
}

hasValidAppUrlSignature($request, ['redirect_to']) is the custom wrapper from the previous section , internally it calls $request->hasValidSignatureWhileIgnoring(['redirect_to']), which validates the HMAC while ignoring that specific query parameter. Then the value itself is checked: only relative paths (starting with /) are forwarded. Anything else , http://evil.com/steal, //evil.com, javascript: , is silently dropped.

The test covers this:

public function test_magic_login_with_redirect_ignores_external_redirect_to(): void
{
    // ... generate signed URL
    $url = $signedUrl . '&redirect_to=' . urlencode('http://evil.com/steal');

    $response = $this->get($url);

    $location = $response->headers->get('Location');
    $this->assertStringContainsString('/auth/callback', $location);
    $this->assertStringNotContainsString('redirect_to', $location);
    $this->assertStringNotContainsString('evil.com', $location);
}

The React Side: Handling StrictMode’s Double Invocation

In React 19 with , effects run twice in development. For most effects that’s fine. For an auth callback that exchanges a one-time code for a token, it’s a problem: the second call hits the endpoint with a code that’s already been marked used, gets a 401, and the user sees an auth error.

The fix is a ref guard:

function AuthCallback() {
  const [searchParams] = useSearchParams();
  const navigate = useNavigate();
  const { completeAuthentication } = usePostAuthFlow();
  const code = searchParams.get('code');
  const redirect_to = searchParams.get('redirect_to');
  const [error, setError] = useState<string | null>(null);
  const hasExchanged = useRef(false);

  useEffect(() => {
    const exchangeCode = async () => {
      if (!code) { navigate('/login'); return; }

      if (hasExchanged.current) return; // prevent StrictMode double-fire
      hasExchanged.current = true;

      try {
        const response = await authApi.exchange(code);

        // Validate redirect_to: relative paths only
        const validRedirect = redirect_to &&
          redirect_to.startsWith('/') &&
          !redirect_to.startsWith('//') &&
          !redirect_to.includes('://') &&
          !['/login', '/auth/callback'].includes(redirect_to)
          ? redirect_to
          : undefined;

        await completeAuthentication(response.token, validRedirect);
      } catch (err: any) {
        setError(err.response?.data?.message || 'Authentication failed');
        setTimeout(() => navigate('/login'), 3000);
      }
    };

    exchangeCode();
  }, [code, navigate, completeAuthentication]);

  // ...
}

useRef persists across re-renders and across StrictMode’s double-mount cycle. Once hasExchanged.current is set to true, any subsequent invocation of the effect exits immediately. Note that this guard is intentionally not in the dependency array , it’s a one-shot flag, not reactive state.

The frontend also validates redirect_to independently, even though the backend already validated it. Defense in depth: the frontend ensures it never navigates to an external URL regardless of what arrives in the URL parameter.

After completeAuthentication(), the browser auto-detects and sends the user’s timezone to POST /api/user/update-timezone:

// In usePostAuthFlow or AuthCallback, after token is stored
const timezone = Intl.DateTimeFormat().resolvedOptions().timeZone;
await authApi.updateTimezone(timezone); // e.g. "Europe/Rome"

This powers the timezone-aware 8 AM study reminder emails , but that’s a topic for another post.

Testing: `actingAsUser()` vs `actingAs()`

Laravel’s built-in actingAs($user) sets the authenticated user but doesn’t create a real Sanctum token. This is fine for most tests, but breaks any code that calls $request->user()->currentAccessToken() , specifically, the logout endpoint.

The solution is a custom actingAsUser() helper in TestCase:

// tests/TestCase.php
protected function actingAsUser(int $planId = CommercialPlan::FREE): User
{
    $user = User::factory()->withPlan($planId)->create();
    $token = $user->createToken('auth_token')->plainTextToken;
    $this->withHeader('Authorization', 'Bearer ' . $token);
    return $user;
}

This creates a real personal_access_tokens record. The logout test can then assert the token was actually deleted:

public function test_logout_deletes_token(): void
{
    $user = $this->actingAsUser();

    $response = $this->postJson('/api/logout');

    $response->assertStatus(200);
    $this->assertDatabaseEmpty('personal_access_tokens');
}

With plain actingAs(), currentAccessToken() returns null and the logout controller throws. With the real token helper, both the controller behavior and the database assertion are tested correctly.

What the Database Looks Like

Three tables drive the auth system:

magic_login_codes , short-lived one-time codes:

id, code (SHA-256 hash), user_id, expires_at, used (bool), created_at

otps , 6-digit fallback codes:

id, user_id, otp, expires_at, used (bool), created_at, updated_at

personal_access_tokens , Sanctum tokens (Laravel manages this table automatically):

id, tokenable_type, tokenable_id, name, token (SHA-256), last_used_at, expires_at, created_at, updated_at

Cleanup: the custom:clean-table-in-db personal_access_tokens artisan command prunes old tokens on a schedule, keeping the table from growing unbounded.

What I’d Do Differently

Separate the “new user” and “returning user” email templates. Currently both get the same MagicLoginLink mailable. The register link should have a welcome tone; the login link should be brief. Small thing, but it affects user perception.

Rate-limit the magic link endpoint. Right now a bad actor can trigger unlimited emails to any address. A simple RateLimiter::attempt('magic-link:' . $email, 5, fn() => ..., 60) per email address per minute would be enough.

Store the code in Redis instead of MySQL. The magic_login_codes table has high write churn (insert on every login, update on exchange, prune periodically). Redis with a 5-minute TTL is a better fit , auto-expiry, no cleanup job, lower latency.

Passwordless auth is one of those features that looks simple until you implement it properly. The signed URL mechanics, the reverse proxy normalization, the open redirect validation, and the StrictMode guard are all edge cases that don’t appear in tutorials but will bite you in production. Hopefully this saves you some debugging time.

The full implementation is part of LongTermMemory , an AI-powered study platform built on Laravel 12 and React 19.

Preventing Duplicate Background Jobs in Celery with Redis: A Production Pattern

2026-03-04T00:00:00+00:00

A user double-clicks “Generate Study Plan”. Two parallel Celery workers start processing the same project simultaneously, doubling OpenAI costs and writing duplicate Q&A pairs to the database. Here’s how to fix it with a Redis index key , and why TTL alone isn’t enough.

The Bug

LongTermMemory has a Q&A generation pipeline: users upload documents, and a FastAPI service queues a Celery task that runs a RAG pipeline , chunking documents, generating embeddings with OpenAI, producing Q&A flashcard pairs, and calling back to the Laravel backend with the results.

The pipeline is expensive. A moderate document set can cost several cents in OpenAI tokens and take a minute to complete. A double-click on “Generate Study Plan” would trigger two POST /api/generate-qa requests in quick succession, each passing the duplicate check (there was none), each creating its own Celery task, both running in parallel on the same project data.

The result: doubled costs, duplicate Q&A pairs in the database, and a callback race where both tasks notify Laravel they’re “done” , potentially with partial results overwriting each other.

The fix is a per-project active job index in Redis.

Two Redis Keys, Two Responsibilities

The JobStorage class uses two distinct key namespaces:

job:{job_id} , stores the full job metadata as a JSON blob (status, progress counters, Q&A pairs, errors). One key per job, 24-hour TTL.
project_job:{project_id} , stores the currently active job_id for a project. One key per project, 24-hour TTL.

The second key is the deduplication index. Its only purpose is to answer one question at request time: does this project already have a running job?

def _job_key(self, job_id: str) -> str:
    return f"job:{job_id}"

def _project_job_key(self, project_id: int) -> str:
    return f"project_job:{project_id}"

The TTL is set to 86400 seconds (24 hours) on both key types. This is a safety net , if a task crashes without hitting any of its cleanup paths, the lock releases automatically the next day rather than blocking the project forever.

The Index: Set, Check, Clear

Three methods manage the index:

set_project_active_job , called immediately after the job is created in Redis, before the Celery task is queued:

def set_project_active_job(self, project_id: int, job_id: str) -> None:
    key = self._project_job_key(project_id)
    self.redis_client.setex(key, self.job_ttl, job_id)

setex sets the key with an atomic TTL in one call. No separate expire needed.

get_project_active_job , called at the start of every POST /api/generate-qa request:

def get_project_active_job(self, project_id: int) -> Optional[str]:
    key = self._project_job_key(project_id)
    job_id = self.redis_client.get(key)

    if job_id is None:
        return None

    # Verify the job still exists and is in an active state
    job_data = self.get_job(job_id)
    if job_data is None or job_data.get("status") not in ("queued", "processing"):
        # Job finished or expired , clean up stale index
        self.redis_client.delete(key)
        return None

    return job_id

The key detail: the function doesn’t just check whether the index key exists , it also checks the referenced job’s status. If the job has status = "completed" or status = "failed", or if the job:{job_id} key has expired, the index is stale and gets deleted. The function returns None, allowing a new job to proceed.

This handles the edge case where clear_project_active_job was never called , a task that timed out or was killed by the OS before reaching its exception handlers. Without this check, the 24-hour TTL would be the only safety valve. With it, a new request automatically heals the stale state.

clear_project_active_job , called in the Celery task at every terminal state:

def clear_project_active_job(self, project_id: int) -> None:
    key = self._project_job_key(project_id)
    self.redis_client.delete(key)

The FastAPI Endpoint: Check Before Queue

The POST /api/generate-qa endpoint in routers/qa.py does the duplicate check before creating anything:

@router.post("/generate-qa", response_model=GenerateQAResponse)
async def generate_qa(request: GenerateQARequest, settings: Settings = Depends(get_settings)):
    # Check if there's already an active job for this project
    active_job_id = job_storage.get_project_active_job(request.project_id)
    if active_job_id:
        active_job = job_storage.get_job(active_job_id)
        active_status = active_job["status"] if active_job else "unknown"
        raise HTTPException(
            status_code=409,
            detail=f"A study plan generation is already in progress for this project "
                   f"(status: {active_status}). Please wait for it to complete..."
        )

    # Create job in Redis
    job_id = str(uuid.uuid4())
    job_data = {"id": job_id, "project_id": request.project_id, "status": "queued", ...}
    job_storage.create_job(job_id, job_data)
    job_storage.set_project_active_job(request.project_id, job_id)

    # Queue Celery task , same UUID used as both job_id and Celery task_id
    task = process_content_task.apply_async(
        args=[job_id, request.project_id, ...],
        task_id=job_id,
        queue="rag_processing"
    )

    return GenerateQAResponse(job_id=job_id, status="queued", ...)

The job_id and the Celery task_id are the same UUID. This simplifies status polling: GET /api/generate-qa/{job_id} can look up both job:{job_id} in Redis and AsyncResult(job_id) in Celery using a single identifier.

If get_project_active_job returns a non-null value, the endpoint raises 409 immediately , before allocating a job ID, before writing to Redis, before touching the Celery queue. The duplicate request is rejected at the earliest possible point.

The 409 Propagation: FastAPI → Laravel → React

The FastAPI service runs as a private Python microservice, not directly accessible from the browser. Requests flow through Laravel, which proxies them to FastAPI. The 409 is intercepted and re-thrown in StudyPlansController::callPythonRagApi():

private function callPythonRagApi($request, Collection $documents, Collection $weblinks, $userNotes): array
{
    try {
        $response = Http::withHeaders([
            'X-API-Key' => config('services.rag-service.api_key'),
            'Accept'    => 'application/json',
        ])->post(config('services.rag-service.url') . '/api/generate-qa', [
            'project_id' => $request->project_id,
            'user_id'    => $request->user()->id,
            // ...
        ]);

        if ($response->status() === 409) {
            $detail = $response->json('detail') ?? 'A study plan generation is already in progress for this project.';
            throw new HttpException(409, $detail);
        }

        if ($response->failed()) {
            throw new Exception("Python RAG service error: {$response->body()}");
        }

    } catch (HttpException $e) {
        throw $e; // re-throw to preserve HTTP status code
    } catch (Exception $e) {
        throw new Exception("RAG service error: {$e->getMessage()}");
    }

    return $response->json();
}

The catch(HttpException $e){ throw $e; } re-throw is load-bearing. Without it, the outer catch(Exception $e) would catch the HttpException (which extends Exception) and wrap it in a plain Exception("RAG service error: ..."), destroying the 409 status code. The explicit re-throw preserves the HttpException(409, ...) so it reaches Laravel’s exception handler, which serializes it into a JSON response with the original detail message. The React frontend receives a 409 with the error text and displays it inline: “A study plan generation is already in progress for this project.”

The message intentionally includes the current job status (queued or processing) so the user knows whether the first request is still waiting for a worker or actively running.

Cleanup in the Celery Task

clear_project_active_job is called at every terminal exit point in process_content_task:

# Success
job_storage.update_job(job_id, job_data)
job_storage.clear_project_active_job(project_id)
_notify_laravel_job_finished(job_id, project_id, job_data, settings)

# OpenAI errors (EmbeddingError, LLMError)
except (EmbeddingError, LLMError) as e:
    error_info = _categorize_openai_error(str(e))
    job_storage.set_job_error(job_id, error_info["user_message"], error_info)
    job_storage.clear_project_active_job(project_id)
    _notify_laravel_job_finished(...)

# Any other exception
except Exception as e:
    job_storage.set_job_error(job_id, f"Unexpected error: {str(e)}", {...})
    job_storage.clear_project_active_job(project_id)
    _notify_laravel_job_finished(...)

Three branches, three cleanup calls. This covers every path the task can exit through. The index key is deleted before the Laravel callback is sent , so if Laravel immediately triggers a new generation in response to the failure notification, the check at the top of generate_qa will find no active job.

The 24-hour TTL is the last line of defense for situations the code can’t handle: a worker process killed by OOM, a Docker container restarted mid-task, a Redis connection error in the cleanup call itself.

Why Not Celery’s Built-In Task Result Backend?

Celery has a native result backend (Redis, database, or others) that stores task state , PENDING, STARTED, SUCCESS, FAILURE. It’s tempting to use this directly for deduplication: store the last task ID per project, check AsyncResult(task_id).state.

The issue is visibility boundaries. The Celery result backend tracks task state from Celery’s perspective. The custom job:{job_id} Redis key tracks job state from the application’s perspective , including progress counters, Q&A pair counts, error details, and the multi-stage pipeline status that Celery has no concept of. The two states can diverge: a task that’s STARTED in Celery may be on step 2 of 6 in the pipeline, and the job key reflects that granularity.

The project_job:{project_id} index is a thin layer on top of the existing job tracking system. It adds one key per project, costs one Redis read per incoming request, and doesn’t require polling Celery at all. The check in get_project_active_job calls get_job() (a Redis GET on job:{job_id}) rather than AsyncResult(job_id).state , staying within the same storage layer.

What I’d Do Differently

Use SET NX for atomic lock acquisition. The current implementation calls create_job then set_project_active_job as two separate Redis writes. In theory, two simultaneous requests could both pass the get_project_active_job check before either has written the index key. Using SET project_job:{project_id} {job_id} NX EX 86400 (set if not exists, with TTL) would make the lock acquisition atomic , only one of the two requests would succeed, and the other would get the key’s existing value on its next read. For the current traffic volume this race window is negligible, but it’s the correct approach at scale.

Expose a job cancellation hook to the frontend. The POST /api/generate-qa/{job_id}/cancel endpoint exists on the FastAPI side and calls celery_app.control.revoke(job_id, terminate=True) followed by clear_project_active_job. But the React frontend has no cancel button , users who trigger a generation and want to abort it have no way to do so short of waiting it out. Surfacing this as a UI action would also naturally resolve the UX problem that prompted the deduplication fix in the first place.

Log the duplicate attempt for cost attribution. When a 409 fires, the only record is a logger.warning() line in the FastAPI service. Persisting a lightweight audit record (project ID, timestamp, rejected job details) would make it easy to track which projects hit the duplicate guard most often , useful data if per-project generation quotas become relevant.

The core pattern is simple: one Redis key per project, pointing to the active job ID. The complexity is in the edge cases , stale keys after unexpected termination, the two-key read in get_project_active_job, the three-branch cleanup in the Celery task. Getting those right is what separates a deduplication scheme that works in testing from one that holds up in production.

The full implementation is part of LongTermMemory , an AI study platform built on FastAPI, Celery, Redis, and Laravel 12.

Timezone-Aware Email Notifications in Laravel: Sending at 8 AM in Every User’s Local Time

2026-02-28T00:00:00+00:00

The problem sounds simple: send a study reminder email at 8 AM. The catch is that your users live in Tokyo, Rome, New York, and Nairobi. Here’s how to build a Laravel artisan command that fires for every user at their local 8 AM , including the N+1-avoidance pattern, the deduplication scheme, and the rate-limit stagger that keeps the email provider happy.

Why “Send at 8 AM” Is Non-Trivial

A cron job that runs at 0 8 * * * sends email at 8 AM UTC , which is fine for users in London in winter and confusing for everyone else. The standard alternative, running the job every hour and checking whether it’s currently 8 AM in each user’s timezone, introduces its own problems: N+1 queries, duplicate sends when the cron overlaps, and edge cases around NULL timezone values.

LongTermMemory sends daily study reminder emails to users who have due flashcard items. The requirement: each notification lands at 8 AM in the user’s local time, contains direct links to their study sessions (via magic link deep-links), and fires at most once per day regardless of cron retries.

The implementation is a single artisan command, custom:send-study-review-notifications, that runs hourly.

The Core Idea: Collect Timezones at Target Hour

Rather than querying users first and then checking their timezones, the command inverts the approach: it starts by collecting every IANA timezone identifier where the current local hour matches the target, then queries only the users in those timezones.

protected $signature = 'custom:send-study-review-notifications {--hour=8 : The local hour to target (0-23)}';

private function getCandidateUserIds(): Collection
{
    $targetHour = (int) $this->option('hour');
    $targetTimezones = collect(timezone_identifiers_list())
        ->filter(fn (string $tz) => Carbon::now($tz)->hour === $targetHour)
        ->values();

    if ($targetTimezones->isEmpty()) {
        $this->info("No timezones currently at {$targetHour}:00.");
        return collect();
    }
    // ...
}

timezone_identifiers_list() returns all ~400 valid IANA timezone identifiers. Carbon::now($tz)->hour gives the current local hour for each one. At any given moment, roughly 15,25 of those timezones will be at hour 8, depending on DST state.

This is evaluated in PHP, not SQL , a collect()->filter() loop over 400 strings is fast enough (microseconds) and avoids the complexity of storing UTC-offset data in MySQL.

The Two-Query Pattern: Candidates, Then Due Items

Fetching users and their due items in a single query would require a complex self-join that’s hard to read and harder to extend. The command uses two separate queries:

Query 1 , candidate users: Who is at 8 AM right now and has at least one study plan?

$query = DB::table('users')
    ->where('notifications_enabled', true)
    ->whereIn('timezone', $targetTimezones)
    ->whereExists(fn ($q) => $q->select(DB::raw(1))
        ->from('projects')
        ->whereColumn('projects.user_id', 'users.id')
        ->whereExists(fn ($q2) => $q2->select(DB::raw(1))
            ->from('study_plans')
            ->whereColumn('study_plans.project_id', 'projects.id')
        )
    );

DB::raw(1) in the SELECT of the EXISTS subquery is idiomatic SQL: the optimizer ignores the selected value in an EXISTS context, so SELECT 1 signals “I only care whether a row exists.” This is a lint-friendly convention, not a performance trick.

NULL timezone handling. Users who registered before timezone detection was added have NULL in the timezone column. The command treats them as UTC:

if ($targetTimezones->contains('UTC')) {
    $query->orWhere(fn ($q) => $q->where('notifications_enabled', true)
        ->whereNull('timezone')
        ->whereExists(fn ($q2) => $q2->select(DB::raw(1))
            ->from('projects')
            ->whereColumn('projects.user_id', 'users.id')
            ->whereExists(fn ($q3) => $q3->select(DB::raw(1))
                ->from('study_plans')
                ->whereColumn('study_plans.project_id', 'projects.id')
            )
        )
    );
}

The condition only activates when UTC is among the target timezones , at any other hour, NULL-timezone users are simply excluded.

Query 2 , due items, grouped by user: Which projects actually have items to review?

private function getDueProjectsByUser(Collection $candidateUserIds): Collection
{
    return DB::table('study_plans')
        ->join('projects', 'study_plans.project_id', '=', 'projects.id')
        ->whereIn('projects.user_id', $candidateUserIds)
        ->where(fn ($q) => $q->where(fn ($q2) => $q2
                ->where('study_plans.scheduled_at', '<=', Carbon::now())
                ->where('study_plans.is_strict', false)
            )->orWhereNull('study_plans.scheduled_at')
        )
        ->select('projects.user_id', 'study_plans.project_id')
        ->distinct()
        ->get()
        ->groupBy('user_id');
}

The query returns one row per (user_id, project_id) pair. ->groupBy('user_id') is a Collection method (not SQL GROUP BY) that organizes those rows into a keyed structure:

[
    5  => [ { user_id: 5, project_id: 10 }, { user_id: 5, project_id: 23 } ],
    12 => [ { user_id: 12, project_id: 31 } ],
]

The due item filter mirrors the session fetching logic: an item qualifies if its scheduled_at <= now() and is_strict = false, or if scheduled_at IS NULL (new item, never reviewed). Strict items , those rated again or hard, meaning the algorithm wants the user to revisit them soon , are excluded from the notification trigger. They’ll reappear once their countdown elapses.

Two queries. No N+1. No model hydration on the candidate pass (just pluck('id')).

Deduplication with `insertOrIgnore`

The cron runs hourly. At 8:00 AM UTC, the command fires. If it also runs at 8:05 due to a retry or overlap, the same users would get a second email. The notification_logs table prevents this:

$userToday = Carbon::now($user->timezone ?? 'UTC')->toDateString();
$projectIds = $userProjects->pluck('project_id')->values();

$inserted = DB::table('notification_logs')->insertOrIgnore([
    'user_id'    => $user->id,
    'type'       => 'study_review_reminder',
    'sent_date'  => $userToday,
    'created_at' => now(),
    'updated_at' => now(),
]);

if ($inserted === 1) {
    $user->notify((new StudyReviewReminder($projectIds))->delay(...));
}

notification_logs has a unique composite index on (user_id, type, sent_date). insertOrIgnore maps to INSERT IGNORE in MySQL , if a row with that combination already exists, the insert silently does nothing and returns 0. If it succeeds, it returns 1 and the notification is dispatched.

Crucially, the sent_date is the user’s local date (Carbon::now($user->timezone ?? 'UTC')->toDateString()), not UTC. A user in UTC+14 (Line Islands) whose 8 AM fires at 2026-03-02 18:00 UTC gets sent_date = 2026-03-03 , their local date , so a retry at 18:05 UTC still deduplicates correctly.

Rate Limiting: 1-Second Stagger

Notifications are dispatched via Laravel’s queue. The underlying mail provider (Resend, on the free plan) has a 2 req/s rate limit. Queuing all notifications simultaneously would burst well past that.

The fix is incremental dispatch delay:

$user->notify(
    (new StudyReviewReminder($projectIds))->delay(now()->addSeconds($sentCount))
);
$sentCount++;

User 1 → 0s delay (immediate)
User 2 → 1s delay
User 3 → 2s delay
…

The delay is set at dispatch time, before the notification hits the queue. Each notification is processed at least 1 second after the previous one, keeping throughput at ≤ 1 req/s , safely under the 2 req/s ceiling. The comment in the code flags this as a free-plan constraint: on a paid plan with higher rate limits, the stagger can be reduced or removed.

The Notification: Deep-Link Magic and Signed Unsubscribe URLs

The StudyReviewReminder notification is queued (implements ShouldQueue) and builds one action URL per project:

foreach ($this->projectIds as $projectId) {
    $projectName = Project::find($projectId)?->name ?? "Project #{$projectId}";
    $signedUrl = URL::temporarySignedRoute(
        'magic.login.redirect',
        now()->addDays(30),
        ['user_id' => $notifiable->id]
    );
    // redirect_to is appended after signing to avoid %2F double-encoding issues.
    $actionUrl = $signedUrl . '&redirect_to=' . urlencode("/study-plan/pr/{$projectId}");

    $projects[] = ['name' => $projectName, 'url' => $actionUrl];
}

Each link is a 30-day temporary signed URL for magic.login.redirect , the same passwordless login route used elsewhere in the app. After validating the signature, the backend generates a short-lived one-time code, then redirects the browser to /auth/callback?code=...&redirect_to=/study-plan/pr/{projectId}. The user lands directly in their study session, authenticated, without entering any credentials.

redirect_to is appended after signing rather than included in the signed payload because the destination URL is determined at notification send time, and appending a %2F-encoded path to an already-signed URL would mangle the HMAC. The backend validates redirect_to separately, accepting only relative paths that start with /.

The unsubscribe URL uses a permanent signed route (no expiry):

$unsubscribeUrl = URL::signedRoute(
    'notifications.unsubscribe',
    ['user_id' => $notifiable->id]
);

URL::signedRoute() (without temporary) generates a URL that’s valid indefinitely. When the user clicks it, the notifications.unsubscribe route validates the signature and sets notifications_enabled = false on the user record. No login required, no expiry to worry about. The user simply never gets another reminder.

The auto-re-enable on next login: when the user authenticates again (via magic link), the exchangeCodeWithToken endpoint checks notifications_enabled and flips it back to true if it was disabled. Logging in is treated as an implicit signal of renewed interest.

Testing With Frozen Time

Timezone logic is notoriously hard to test without time control. Carbon::setTestNow() makes it tractable:

protected function tearDown(): void
{
    Carbon::setTestNow(); // reset after each test
    parent::tearDown();
}

public function test_sends_notification_to_user_in_8am_timezone(): void
{
    Notification::fake();

    // Freeze time so UTC is 08:00
    Carbon::setTestNow(Carbon::create(2026, 3, 3, 8, 0, 0, 'UTC'));

    $user = $this->createUserAtEightAm('UTC');
    $project = Project::factory()->create(['user_id' => $user->id]);
    StudyPlan::factory()->due()->create(['project_id' => $project->id]);

    $this->artisan('custom:send-study-review-notifications');

    Notification::assertSentTo($user, StudyReviewReminder::class);
}

The test suite covers seven cases:

User at 8 AM in their timezone → notification sent
User outside the 8 AM window → nothing sent
User with notifications_enabled = false → nothing sent
Existing notification_logs row for today → insertOrIgnore skips the send
Only strict study plans (e.g., rated again 5 minutes ago) → nothing sent
All items scheduled in the future → nothing sent
New items with scheduled_at IS NULL → notification sent

The due() and future() factory states from StudyPlanFactory keep the fixtures readable:

public function due(): static
{
    return $this->state(['scheduled_at' => now()->subHour()]);
}

public function future(): static
{
    return $this->state(['scheduled_at' => now()->addDay()]);
}

What I’d Do Differently

Batch the Project::find() calls inside the notification. StudyReviewReminder::toMail() calls Project::find($projectId) in a loop , one query per project. For a user with ten projects, that’s ten queries inside a queued job. A single Project::whereIn('id', $this->projectIds)->get()->keyBy('id') before the loop would reduce it to one.

Add a configurable notification window. The --hour option already makes the target hour configurable, but there’s no way to send a second notification (e.g., an evening reminder at 20:00) without running the command with --hour=20 and managing two cron entries. A window-based approach , “send between 7 and 9 AM, once per day” , would be more resilient to users whose 8 AM falls between two hourly runs.

Prune notification_logs on a schedule. The table grows by one row per user per day. A weekly cleanup (DELETE WHERE sent_date < NOW() - INTERVAL 7 DAY) prevents it from becoming a performance liability as the user base scales.

Timezone-aware notifications look like a three-liner until you account for NULL timezones, N+1 queries, deduplication across cron retries, and email provider rate limits. The implementation pattern , collect target timezones in PHP, query users with EXISTS, dedup with insertOrIgnore, stagger dispatch , handles all four without anything exotic.

SEO for React SPAs Without SSR: Puppeteer Prerendering in Production

2026-02-19T00:00:00+00:00

React SPAs are nearly invisible to social media crawlers and slower to index on Google. Here’s how I solved SEO for LongTermMemory without migrating to Next.js , using a two-variant routing pattern and a Puppeteer script that prerenders the landing page at build time.

The Problem

A React SPA with client-side routing serves one thing to every visitor: a nearly empty index.html with a

and a JavaScript bundle. Google’s crawler can execute JavaScript and will eventually index the content, but it does so in a second wave , days or weeks after the initial crawl. Social media crawlers (Facebook, Twitter/X, LinkedIn, Slack) don’t execute JavaScript at all. They see the empty shell, find no og:title or og:description meta tags, and either show a blank card or scrape the minimal fallback tags from .

For a SaaS landing page, this is a real problem. Pricing, FAQ, feature descriptions , all the content that matters for SEO and social sharing , exists only in JavaScript. It never lands in the raw HTML that crawlers read.

The standard answer is server-side rendering: Next.js, Remix, or a similar framework. But LongTermMemory’s frontend is a standalone Vite + React 19 SPA that has been in production for months. Migrating to Next.js would mean rewriting routing, data fetching patterns, authentication callbacks, Stripe integration, and the Tailwind configuration , weeks of work for a feature that benefits one route.

The alternative: prerender the landing page at build time using Puppeteer, and serve the resulting static HTML as dist/index.html.

The Architecture: Two Landing Page Variants

The core idea is a split: one version of the landing page for authenticated users (the full interactive app), and a separate static version for everyone else (which search engines and crawlers see).

In App.tsx, the root route renders a LandingRoute component instead of directly mounting a page:

// src/App.tsx
function LandingRoute() {
  const isAuthenticated = !!localStorage.getItem('auth_token');
  const [searchParams] = useSearchParams();
  const unsubscribed = searchParams.get('unsubscribed') === '1';

  return (
    <>
      {unsubscribed && (
        <div className="fixed top-4 left-1/2 -translate-x-1/2 z-[60] ...">
          <p className="text-sm text-green-800">
            You have successfully disabled your notifications.
          p>
        div>
      )}
      <Suspense fallback={<PageLoader />}>
        {isAuthenticated ? <Landing /> : <LandingPublic />}
      Suspense>
    
  );
}

Landing , the full authenticated experience: upload forms, Stripe checkout, UserContext for user state, real-time plan limits.
LandingPublic , a stateless version with no UserContext, no Stripe, no authenticated API calls. Its only job is to render all the landing page content as static HTML that Puppeteer can capture.

Both components look identical to users. The split is invisible at runtime.

`LandingPublic`: Designed for Prerendering

LandingPublic has two responsibilities: look like the real landing page, and be fully renderable by a headless browser.

The data-prerender-ready marker. The prerender script needs to know when React has finished mounting. Rather than relying on arbitrary timeouts, LandingPublic puts a data-prerender-ready attribute on its root element:

<div className="min-h-screen bg-slate-50" data-prerender-ready>
  {/* ...full page content... */}
div>

The Puppeteer script waits for [data-prerender-ready] before proceeding.

Lazy sections with IntersectionObserver. Below-fold sections (Pricing, FAQ, Educational modals) use a LazySection wrapper that only renders children when the container scrolls into view:

function LazySection({ children, fallback, id }) {
  const [visible, setVisible] = useState(false);
  const ref = useRef(null);

  useEffect(() => {
    const observer = new IntersectionObserver(
      ([entry]) => { if (entry.isIntersecting) { setVisible(true); observer.disconnect(); } },
      { rootMargin: '200px', threshold: 0 }
    );
    observer.observe(ref.current);
    return () => observer.disconnect();
  }, []);

  return (
    <div ref={ref} id={id}>
      {visible ? children : (fallback || <div style={{ minHeight: '200px' }} />)}
    div>
  );
}

This keeps the initial bundle fast for real users. For Puppeteer, the script needs to scroll through the entire page to fire the observers and reveal all sections before capturing the HTML.

Pricing data from the API. The static variant still fetches live pricing from the backend:

useEffect(() => {
  const fetchPlans = async () => {
    const fetchedPlans = await plansApi.getCommercialPlans();
    setPlans(fetchedPlans.slice(1)); // skip Free plan
    setIsLoading(false);
  };
  fetchPlans();
}, []);

This means the prerendered HTML contains real pricing numbers, not hardcoded values that would go stale.

react-helmet-async for full meta coverage. All SEO meta tags , Open Graph, Twitter Card, canonical URL, keywords, JSON-LD structured data , are injected via in LandingPublic. Because Puppeteer captures the page after JavaScript executes, these tags end up in the final HTML:

<Helmet>
  <title>LongTerm Memory - AI-Powered Study & Exam Preparationtitle>
  <meta name="description" content="Master any subject with AI-powered question-answer generation..." />
  <meta property="og:title" content="LongTerm Memory - AI-Powered Study & Exam Preparation" />
  <meta property="og:image" content="https://longtermemory.com/og-image.jpg" />
  <meta name="twitter:card" content="summary_large_image" />
  <link rel="canonical" href="https://longtermemory.com" />
  <script type="application/ld+json">
    {JSON.stringify({ "@context": "https://schema.org", "@type": "SoftwareApplication", ... })}
  script>
Helmet>

The JSON-LD block includes the live pricing offers built from the API response, making it valid structured data for Google’s Rich Results.

The Prerender Script

scripts/prerender.mjs runs after the Vite build and overwrites dist/index.html with the prerendered HTML. Key parts of the script:

import puppeteer from 'puppeteer';
import { promises as fs } from 'fs';
import path from 'path';
import { fileURLToPath } from 'url';
import http from 'http';
import handler from 'serve-handler';

const distPath = path.join(path.dirname(fileURLToPath(import.meta.url)), '../dist');
const indexPath = path.join(distPath, 'index.html');

function tryListenOnPort(port) {
  return new Promise((resolve, reject) => {
    const server = http.createServer((req, res) => handler(req, res, { public: distPath }));
    server.on('error', reject);
    server.listen(port, () => resolve({ server, port }));
  });
}

async function startServer(startPort = 3010) {
  const maxPort = startPort + 3;
  for (let port = startPort; port <= maxPort; port++) {
    try {
      return await tryListenOnPort(port);
    } catch (err) {
      if (err.code === 'EADDRINUSE') continue;
      throw err;
    }
  }
  throw new Error(`All ports ${startPort},${maxPort} are already in use`);
}

async function prerenderPage() {
  const { server, port } = await startServer(3010);

  try {
    const browser = await puppeteer.launch({
      headless: 'new',
      args: ['--no-sandbox', '--disable-setuid-sandbox']
    });
    const page = await browser.newPage();

    // Simulate unauthenticated user: clear localStorage before any script runs
    await page.evaluateOnNewDocument(() => { localStorage.clear(); });

    await page.goto(`http://localhost:${port}`, {
      waitUntil: 'networkidle0',
      timeout: 30000
    });

    // Wait for React to finish mounting
    await page.waitForSelector('[data-prerender-ready]', { timeout: 10000 })
      .catch(() => console.log('⚠ data-prerender-ready not found, continuing...'));

    // Scroll to trigger all IntersectionObserver-gated sections
    await page.evaluate(async () => {
      const delay = ms => new Promise(r => setTimeout(r, ms));
      for (let y = 0; y < document.body.scrollHeight; y += 400) {
        window.scrollTo(0, y);
        await delay(100);
      }
      window.scrollTo(0, 0);
    });

    // Wait for pricing plan cards to appear
    await page.waitForFunction(
      () => {
        const pricing = document.querySelector('#pricing');
        if (!pricing) return false;
        return pricing.querySelectorAll('.rounded-lg.bg-white.p-5').length >= 2;
      },
      { timeout: 15000 }
    ).catch(() => console.log('⚠ Pricing data did not load in time, continuing...'));

    // Final wait for animations
    await new Promise(resolve => setTimeout(resolve, 2000));

    const html = await page.content();
    await fs.writeFile(indexPath, html, 'utf8');

    await browser.close();
  } finally {
    server.close();
  }
}

prerenderPage();

A few design choices worth noting:

evaluateOnNewDocument vs evaluate. Using evaluateOnNewDocument to clear localStorage runs the code before any page script executes, including React’s initial render. If you cleared localStorage after navigation with evaluate, the React component tree would already have read auth_token, rendered Landing instead of LandingPublic, and it would be too late.

waitUntil: 'networkidle0' waits until there are no more than 0 in-flight network requests for 500ms. This is the right choice here because LandingPublic fetches pricing data on mount , you need the API response to arrive before capturing the HTML.

The scroll loop. IntersectionObserver only fires when elements enter the viewport. A headless browser has a viewport but doesn’t scroll automatically. The 400px step with a 100ms delay gives each observer time to fire and its component time to mount before moving to the next section.

The pricing check uses waitForFunction with a DOM selector rather than a fixed timeout. If the API is slow, a 2-second setTimeout would produce HTML with a loading spinner instead of pricing data. Polling for the actual DOM element is reliable regardless of API latency.

Port cycling tries 3010 through 3013. CI environments often have ports occupied by other services; retrying automatically avoids flaky build failures.

Build Commands

"scripts": {
  "build": "tsc -b && vite build",
  "build:prerender": "npm run build:dev && node scripts/prerender.mjs",
  "build:production": "tsc -b && vite build --mode production && node scripts/prerender.mjs"
}

build:prerender , development build + prerender (uses localhost:8080 API, for local testing)
build:production , production build + prerender (uses https://api.longtermemory.com)

The prerender step requires the backend API to be reachable during the build, because LandingPublic fetches live pricing. In the CI/CD pipeline this means the production build runs against the live API.

What This Solves and What It Doesn’t

Solved:

All landing page text (hero copy, feature descriptions, FAQ answers) is in the raw HTML , Google indexes it in the first crawl wave, no JavaScript execution required
og:title, og:description, og:image, twitter:card are baked into the HTML , social share previews work correctly on all platforms
JSON-LD structured data with real pricing is present , eligible for Google Rich Results (price, availability, product type)
</code> and <code class="language-plaintext highlighter-rouge"><meta name="description"></code> exist as static HTML, not injected by JavaScript , the minimal fallback in <code class="language-plaintext highlighter-rouge">index.html</code> is a backup, not the primary</li> </ul> <p><strong>Not solved:</strong></p> <ul> <li>Dynamic routes (<code class="language-plaintext highlighter-rouge">/study-plan/pr/:id</code>, <code class="language-plaintext highlighter-rouge">/study-session/pr/:id</code>, <code class="language-plaintext highlighter-rouge">/dashboard</code>) are not prerendered , they require authentication anyway, so they don’t need to be indexed</li> <li>The <code class="language-plaintext highlighter-rouge">/privacy</code> and <code class="language-plaintext highlighter-rouge">/terms</code> routes are plain React pages with no prerendering , they’re text-heavy and could benefit from it, but haven’t been a priority</li> <li>On-page SEO beyond the landing page (canonical tags, sitemap) is handled separately</li> </ul> <hr /> <h2 id="the-fallback-layer-indexhtml">The Fallback Layer: <code class="language-plaintext highlighter-rouge">index.html</code></h2> <p>Before the prerender script runs, <code class="language-plaintext highlighter-rouge">index.html</code> contains minimal static meta tags as a safety net:</p> <div class="language-html highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt"><title></span>LongTerm Memory - AI-Powered Study <span class="err">&</span> Exam Preparation<span class="nt"> name="description" content="Master any subject with AI-powered question-answer generation and spaced repetition..." />
If the prerender fails (API unreachable, timeout, Puppeteer crash), the build still succeeds and the fallback tags are served. They’re not as rich as the fully rendered HTML , no OG tags, no JSON-LD , but they’re better than nothing and they prevent the build pipeline from blocking on an SEO failure.

What I’d Do Differently

Prerender /privacy and /terms too. These pages are static content and would benefit from prerendering. The current script only handles /. Extending it to run against multiple routes and write each to its own dist/{path}/index.html would be straightforward.

Decouple pricing from the prerender. Requiring the live API to be reachable during the build is a fragile dependency. A better approach: cache the pricing data in a JSON file committed to the repo (updated by a separate scheduled job), and have LandingPublic read from that file during prerendering. The build would then be fully offline-capable.

Add a prerender verification step. The script has no post-check that the output HTML actually contains expected content. A simple grep for a known FAQ string or a pricing number would catch cases where the API timed out and the HTML was captured in a loading state.

The full setup , LandingRoute, LandingPublic, LazySection, and the prerender script , took about a day to build and deploy. The authenticated app was entirely untouched. Google Search Console now shows the landing page content indexed in the first crawl, and social share previews work correctly across all platforms.

The SM-2 Algorithm in Practice: Building a Spaced Repetition System in Laravel

2026-02-05T00:00:00+00:00

Spaced repetition is the most evidence-backed technique for long-term memorization. Here’s how to go from the original SM-2 paper to a working Laravel implementation , including the scheduling logic, timezone handling, strict mode, and the honest parts where this diverges from full SM-2.

When I built LongTermMemory , an AI-powered study platform that auto-generates Q&A pairs from uploaded documents , the spaced repetition engine was the part I most wanted to get right. The AI can generate great questions; spaced repetition is what moves the answers into long-term memory. This post walks through the full implementation: the database schema, the scheduling enum, the item-fetching logic, and the React evaluation UI.

What SM-2 Actually Does

The SM-2 algorithm (Piotr Woźniak, SuperMemo, 1987) schedules reviews at increasing intervals based on how well you recalled each item. After every review, you rate your performance on a scale of 0,5. SM-2 then updates two values per item:

Interval (I): days until the next review
Ease Factor (EF): a multiplier, starting at 2.5, that adjusts based on performance

The update rules:

If score < 3 (failure): reset interval to 1, keep EF unchanged
If score ≥ 3 (success): new_interval = old_interval * EF, then adjust EF: new_EF = EF + (0.1 - (5 - score) * (0.08 + (5 - score) * 0.02))

The key insight: EF drifts down when you struggle and up when recall is easy. Over time, hard items get reviewed more frequently and easy ones less frequently , automatically, without you managing it.

The current LongTermMemory implementation is SM-2 inspired but intentionally simplified: fixed intervals, four rating levels instead of six, no adaptive ease factor yet. That last part matters and I’ll be explicit about it.

The Schema

Everything starts with the study_plans table, created in the initial migration:

Schema::create('study_plans', function (Blueprint $table) {
    $table->id();
    $table->unsignedBigInteger('project_id');
    $table->text('question');
    $table->text('answer')->nullable();
    $table->timestamp('scheduled_at')->nullable();  // UTC; NULL = new, never studied
    $table->integer('batch');
    $table->boolean('completed')->default(false);
    $table->timestamps();

    $table->foreign('project_id')->references('id')->on('projects')->onDelete('cascade');
});

A second migration adds is_strict:

$table->boolean('is_strict')->default(false);

And columns defined in subsequent migrations and populated by the RAG service callback (key_concepts, difficulty_level, session_id) complete the picture:

study_plans
├── id
├── project_id          -- FK to projects
├── question            -- generated by LLM
├── answer              -- generated by LLM
├── key_concepts        -- comma-separated string, from LLM
├── difficulty_level    -- easy / medium / hard, from LLM
├── scheduled_at        -- UTC timestamp; NULL = new item
├── is_strict           -- 1 = must wait until scheduled_at; 0 = show anytime
├── completed           -- session-level flag
├── batch               -- generation batch number
├── session_id          -- FK to study_sessions
└── created_at / updated_at

scheduled_at = NULL is the sentinel for “new item, never reviewed.” Once the user rates it for the first time, scheduled_at gets set and it enters the review cycle.

is_strict is a nuance I’ll explain below , it controls whether an item must be held until its exact scheduled time or can float.

The Scheduling Logic: `AnswerEvaluation` Enum

The entire scheduling decision lives in a PHP 8.1 enum:

enum AnswerEvaluation: string
{
    case AGAIN = 'again';
    case HARD  = 'hard';
    case GOOD  = 'good';
    case EASY  = 'easy';

    public function getNextSchedule(Carbon $now, object $context): Carbon
    {
        return match($this) {
            self::AGAIN => $now->copy()->addMinute(),
            self::HARD  => $now->copy()->addMinutes(10),
            self::GOOD  => $this->calculateNewScheduledAtByTz(4),
            self::EASY  => $this->calculateNewScheduledAtByTz(8),
        };
    }

    public function getStrictStatus(): int
    {
        return match($this) {
            self::AGAIN, self::HARD => 1,
            self::GOOD, self::EASY  => 0,
        };
    }

    private function calculateNewScheduledAtByTz(int $delay_days): Carbon
    {
        $userTime = Carbon::now(request()->user()->timezone ?? null);
        $new_scheduled_at = $userTime->addDays($delay_days)->setTime(4, 0);
        $new_scheduled_at->setTimezone('UTC');
        return $new_scheduled_at;
    }
}

The four levels map to:

Button	Meaning	Next review
Again	Wrong / blank	1 minute
Hard	Struggled but got it	10 minutes
Good	Remembered well	4 days
Easy	Perfect recall	8 days

calculateNewScheduledAtByTz is where the timezone handling lives. Rather than scheduling “4 days from now in UTC”, it schedules “4 days from now at 4:00 AM in the user’s local timezone, stored as UTC”. This ensures that a user in Tokyo and a user in Rome both get their review queued for early morning local time, not at some random hour dictated by UTC offset.

is_strict and getStrictStatus() encode a review discipline rule:

AGAIN and HARD → is_strict = 1. The item must be held until its scheduled_at time. You failed or struggled , the algorithm wants you to revisit it soon, and it won’t let you skip ahead.
GOOD and EASY → is_strict = 0. The item can be shown any time on or after its scheduled date. You remembered it well; a bit of flexibility is fine.

The Controller: Evaluating an Answer

The QaItemEvaluation method in StudyPlansController delegates entirely to the enum:

public function QaItemEvaluation(QaItemEvaluationRequest $request): JsonResponse
{
    $now        = Carbon::now('UTC');
    $difficulty = AnswerEvaluation::from($request->difficulty);
    $new_scheduled_at = $difficulty->getNextSchedule($now, $this);
    $strict_status    = $difficulty->getStrictStatus();

    $item = StudyPlan::find($request->item_id);
    $item->update([
        'scheduled_at' => $new_scheduled_at,
        'is_strict'    => $strict_status,
    ]);

    return response()->json(['new_scheduled_at' => $item->scheduled_at]);
}

AnswerEvaluation::from($request->difficulty) converts the string 'again'|'hard'|'good'|'easy' to the enum case and throws a ValueError if the value is invalid , the QaItemEvaluationRequest form request validates the input before it reaches here.

Fetching the Next Item: Order and Strict Mode

Two private methods handle how items are selected for a session.

getOrderedStudyPlanItems defines the canonical order:

private function getOrderedStudyPlanItems(int $project_id): Collection
{
    return StudyPlan::where('project_id', $project_id)
        ->orderByRaw('scheduled_at IS NULL')
        ->orderBy('scheduled_at')
        ->orderBy('id')
        ->get();
}

orderByRaw('scheduled_at IS NULL') sorts dated items before NULL items. In MySQL, IS NULL returns 1 for nulls and 0 for non-null values, so ordering ascending puts 0 (dated items) before 1 (new items). The result: overdue reviews come first, then new items , which is the correct SM-2 priority.

getTodayQaItemsCollection adds the date filter for the current session:

private function getTodayQaItemsCollection(int $project_id): Collection
{
    $userEndOfDay = Carbon::now(request()->user()->timezone ?? null)->endOfDay();
    $utcEndOfDay  = $userEndOfDay->addHours(4)->setTimezone('UTC');

    return StudyPlan::where('project_id', $project_id)
        ->where(function ($query) use ($utcEndOfDay) {
            $query->where('scheduled_at', '<=', $utcEndOfDay)
                  ->orWhereNull('scheduled_at');
        })
        ->orderByRaw('scheduled_at IS NULL')
        ->orderBy('scheduled_at')
        ->get();
}

endOfDay() in the user’s timezone, then converted to UTC with a 4-hour buffer. The buffer is necessary because GOOD and EASY items are scheduled at 4:00 AM local time via setTime(4, 0) , meaning a review triggered today lands at 4 AM tomorrow. Without the buffer, endOfDay() (23:59:59) would exclude those items and they’d only appear the following day.

fetchQaItemFromCollection enforces the strict mode rule at fetch time:

private function fetchQaItemFromCollection(Collection $qa_items)
{
    return $qa_items->first(function ($item) {
        if ($item->is_strict !== 1) {
            return true;  // non-strict: always eligible
        }
        return $item->scheduled_at <= now();  // strict: only if time has passed
    });
}

This iterates the ordered collection and returns the first item that passes the check. A strict item scheduled for 10 minutes from now is skipped in favour of the next non-strict item. Once the 10 minutes have elapsed, it becomes eligible again.

Sessions and Progress Tracking

When the user starts a session, a StudySession record is created and up to 50 items are tagged with its ID:

public function createNewStudySession(CreateNewStudySessionRequest $request): JsonResponse
{
    $today_qa_items     = $this->getTodayQaItemsCollection($request->project_id);
    $estimated_questions = min(count($today_qa_items), $this->today_session_limit); // 50

    $study_session = StudySession::create([
        'project_id'          => $request->project_id,
        'estimated_questions' => $estimated_questions,
    ]);

    $subset = $this->getOrderedStudyPlanItems($request->project_id)
                   ->take($this->today_session_limit);

    StudyPlan::whereIn('id', $subset->pluck('id'))
             ->update(['session_id' => $study_session->id, 'completed' => 0]);

    return response()->json(['study_session' => $study_session]);
}

When an item is answered, completed is flipped to true and its session_id is set. This powers the two progress bars in the UI:

Session progress (blue): completed = 1 within the current session_id ÷ estimated_questions
Global study plan progress (green): items with scheduled_at > now() (future reviews) ÷ total items

$total_answered_questions = StudyPlan::where('project_id', $request->project_id)
    ->where('scheduled_at', '>', now())
    ->count();

$session_question_completed = StudyPlan::where('project_id', $request->project_id)
    ->where('session_id', $study_session_id)
    ->where('completed', 1)
    ->count();

The React Evaluation UI

The QAItemDisplay component renders the four evaluation buttons , each maps directly to an AnswerEvaluation enum case:

export type EvaluationDifficulty = 'again' | 'hard' | 'good' | 'easy';

const handleEvaluate = async (difficulty: EvaluationDifficulty) => {
  try {
    await studyPlansApi.evaluateQAItem({ item_id: item.id, difficulty });
    onNextQuestion();
  } catch (error) {
    onNextQuestion(); // still advance even if API call fails
  }
};

The buttons appear only when the answer is visible , forcing the user to actually read the answer before rating themselves. Each button shows the next-review interval inline so users understand what they’re committing to:

<button onClick={() => handleEvaluate('again')} className="... border-red-300 bg-red-50 ...">
  <span>Againspan>
  <span>1 min or lessspan>
  <span>I was wrong or didn't remember it at allspan>
button>

<button onClick={() => handleEvaluate('easy')} className="... border-green-300 bg-green-50 ...">
  <span>Easyspan>
  <span>8 daysspan>
  <span>I remembered it perfectlyspan>
button>

New items display a new label (green); items in the review cycle display review (orange) , derived directly from item.scheduled_at === null.

Testing: Factory States

The StudyPlanFactory exposes two states that make scheduling tests readable:

public function due(): static
{
    return $this->state(['scheduled_at' => now()->subHour()]);
}

public function future(): static
{
    return $this->state(['scheduled_at' => now()->addDay()]);
}

A test for the notification command that checks whether due items are included can write:

StudyPlan::factory()->due()->create(['project_id' => $project->id]);
StudyPlan::factory()->future()->create(['project_id' => $project->id]);

// Only the due item should trigger a notification

Combined with Carbon::setTestNow() for freezing time in timezone tests, these states make it straightforward to test edge cases without constructing timestamps by hand.

What This Is Not (Yet): Full SM-2

Being explicit about the gap between this implementation and the original SM-2:

What’s implemented:

Four-level self-assessment (maps to SM-2’s 0,2 as fail, 3,4 as hard/good, 5 as easy)
Short re-show intervals for failures (1 min, 10 min)
Multi-day intervals for successes (4 days, 8 days)
Strict mode to enforce minimum wait on failures
Timezone-aware scheduling at 4 AM local time
Session-capped daily reviews (50 items)

What’s missing:

Adaptive ease factor. Real SM-2 adjusts the per-item EF based on history. An item you consistently rate Easy slowly gets a longer interval; one you rate Hard repeatedly gets reviewed more often. The current implementation uses fixed intervals regardless of history.
Growing intervals. After the first Good review (4 days), the second should be 4 * EF ≈ 10 days, the third ≈ 25 days, and so on. Currently every Good review resets to 4 days.
Interval tracking per item. The schema doesn’t yet store the current interval or ease factor , they’d need to be added as columns on study_plans.

The fixed intervals are a pragmatic first pass that still produces the core benefit of spaced repetition: items you fail come back soon; items you know well come back later. Adding the adaptive ease factor is the next step , it requires two new columns on study_plans (current_interval and ease_factor) and updating the scheduling logic in AnswerEvaluation.

What I’d Do Differently

Store interval and ease_factor on study_plans from day one. Adding them later means a migration plus updating the scheduling logic, and any items reviewed before the migration have no history. Start with the columns even if they’re unused initially.

Separate the scheduling logic from the enum. The calculateNewScheduledAtByTz method reads request()->user()->timezone directly inside the enum, which couples the scheduling logic to the HTTP request context. A SpacedRepetitionScheduler service that accepts a user timezone as a parameter would be easier to test and reuse.

Cap the minimum interval at the user’s next waking hours. Scheduling a review for “1 minute from now” at 11:50 PM means it’ll appear at 11:51 PM. A smarter implementation would schedule short-interval items for the next morning if the user is at end of day , the calculateNewScheduledAtByTz logic with setTime(4, 0) already does this for multi-day intervals, but not for the minute-level ones.

Spaced repetition looks deceptively simple on paper , a few intervals, a rating, a timestamp. The complexity is in the details: timezone handling, strict mode for failures, item ordering, progress tracking across sessions. The fixed-interval foundation works; the adaptive ease factor is the next layer to build on top of it.

The full implementation is part of LongTermMemory, an AI study platform built on Laravel 12, FastAPI, and React 19.

Two-Stage Semantic Chunking for RAG in Python: Structural Splitting + Semantic Coherence

2026-01-29T00:00:00+00:00

Fixed-size chunking splits text at arbitrary token boundaries, cutting mid-sentence and blending unrelated topics into the same chunk. Here’s how to build a two-stage pipeline with LlamaIndex , structural splitting first, semantic coherence second , and why adaptive sizing matters for long documents.

The Problem With Fixed-Size Chunking

The simplest chunking strategy is a sliding window: split every N tokens with M tokens of overlap. It’s easy to implement and works reasonably well on clean, uniform text. It breaks down in two common situations.

Mid-sentence splits. A chunk that ends at token 512 may cut a sentence in half. The embedding for that chunk represents a dangling thought , and when the retriever pulls it back, the LLM receives incomplete context. Overlap helps but doesn’t eliminate the problem: two consecutive chunks now share a sentence fragment, both pulling each other slightly off-topic.

Topic bleed. A 1,024-token window over a textbook chapter will often straddle two sections , the end of “Cellular Respiration” and the start of “Photosynthesis.” The embedding averages those topics, making the chunk a poor match for queries about either one.

The alternative is semantic chunking: let the content’s own structure guide the split points.

LongTermMemory’s DocumentProcessor uses a two-stage pipeline , structural splitting followed by semantic coherence , implemented in about 90 lines of Python using LlamaIndex.

The Architecture at a Glance

Raw text
    │
    ▼
Stage 1: SentenceSplitter         ← structural: paragraph breaks, chapter boundaries
    │         (respects "\n\n")
    ▼
Stage 2: SemanticSplitterNodeParser  ← semantic: merge/split by embedding similarity
    │         (OpenAI embeddings)
    ▼
Structural heading extraction     ← heuristics on first line + node metadata
    │
    ▼
LLM title fallback                ← GPT-3.5-turbo when no heading found
    │
    ▼
TextChunk objects → Qdrant

Stage 1: Structural Splitting With `SentenceSplitter`

The first stage uses LlamaIndex’s SentenceSplitter to break the document into structurally coherent pieces. The key parameter is separator="\n\n" , the splitter preferentially splits on paragraph breaks before falling back to sentence boundaries:

sentence_splitter = SentenceSplitter(
    chunk_size=stage1_chunk_size,
    chunk_overlap=stage1_chunk_overlap,
    separator="\n\n",  # Split on paragraph breaks
)
initial_nodes = sentence_splitter.get_nodes_from_documents([llama_doc])

With chunk_size=1024, each initial node is at most 1,024 tokens. But because "\n\n" is the preferred split point, a section that ends at token 900 followed by a paragraph break will produce a 900-token chunk , no mid-paragraph split , rather than running over into the next section to pad out to 1,024 tokens.

This stage doesn’t require any API call. It’s pure text processing, fast and free.

Stage 2: Semantic Coherence With `SemanticSplitterNodeParser`

The second stage takes the structural chunks from Stage 1 and re-examines their boundaries using embedding similarity. Adjacent sentences are grouped by semantic similarity , if two consecutive sentences are closely related, they stay in the same chunk; if similarity drops below a threshold, a new split is inserted.

semantic_splitter = SemanticSplitterNodeParser(
    buffer_size=stage2_buffer_size,
    breakpoint_percentile_threshold=stage2_breakpoint_threshold,
    embed_model=self.embed_model,
)
semantic_nodes = semantic_splitter.get_nodes_from_documents(
    [LlamaDocument(text=node.get_content(), metadata=node.metadata)
     for node in initial_nodes]
)

The Stage 1 nodes are re-wrapped as LlamaDocument objects before being passed to the semantic splitter. This is necessary because SemanticSplitterNodeParser.get_nodes_from_documents expects Document inputs, not TextNode inputs , passing initial_nodes directly would raise a type error.

buffer_size controls how many surrounding sentences are included when computing the embedding for a sentence. buffer_size=1 means each sentence is embedded with one sentence of context on each side; buffer_size=3 means three sentences of context. A larger buffer makes the embeddings smoother and more stable, reducing over-splitting on long content.

breakpoint_percentile_threshold sets how high the similarity drop must be before a split is inserted. At 95, only the most semantically divergent sentence boundaries become chunk boundaries , the splitter produces fewer, larger chunks. At 97, even fewer splits.

The embed_model is OpenAIEmbedding(model="text-embedding-3-small"), initialized once in DocumentProcessor.__init__ and reused across all documents in the job.

Adaptive Sizing: Short vs. Long Content

The parameters above are not fixed , they switch based on estimated document length:

total_tokens = estimated_total_tokens if estimated_total_tokens is not None else len(text) // 4

if total_tokens > settings.long_content_threshold:   # default: 10,000 tokens
    stage1_chunk_size = settings.long_chunk_size              # 2048
    stage1_chunk_overlap = settings.long_chunk_overlap        # 200
    stage2_buffer_size = settings.long_buffer_size            # 3
    stage2_breakpoint_threshold = settings.long_breakpoint_threshold  # 97
else:
    stage1_chunk_size = 1024
    stage1_chunk_overlap = 200
    stage2_buffer_size = 1
    stage2_breakpoint_threshold = 95

The token estimate is len(text) // 4 , one token per four characters, the standard approximation. At 10,000 tokens the threshold is around 40,000 characters, or roughly 15,20 pages of dense text.

Why larger chunks for long content? Each call to SemanticSplitterNodeParser embeds every sentence in every Stage 1 node. A 100-page textbook at standard settings (chunk_size=1024) produces ~40 Stage 1 nodes, each of which the semantic splitter processes sentence-by-sentence , potentially hundreds of embedding API calls. At long-content settings (chunk_size=2048, buffer_size=3, threshold=97), the Stage 1 pass produces fewer, larger nodes, the semantic pass is less aggressive about splitting, and the total embedding count drops substantially.

The tradeoff is retrieval granularity: larger chunks are coarser, but for long documents the alternative is prohibitive API cost and latency.

All five parameters are configurable via environment variables, so the thresholds can be tuned without a code change.

The Fallback: Structural Splitting Only

If no OpenAI API key is provided , or if embedding model initialization fails , the semantic stage is skipped:

if self.embed_model is None:
    logger.warning("No embedding model provided, skipping semantic splitting")
    # Fallback to sentence splitter results
    semantic_nodes = initial_nodes
else:
    semantic_splitter = SemanticSplitterNodeParser(...)
    semantic_nodes = semantic_splitter.get_nodes_from_documents(...)

Stage 1 output is used as-is. The chunks are structurally clean (paragraph-respecting, size-bounded) but not semantically optimized. For development, testing, or cost-sensitive environments where embedding costs matter more than retrieval quality, this is a usable fallback.

Chunk Enrichment: Section Titles

After chunking, each TextChunk gets a section_title , a short label that tells the Q&A generator what the chunk is about. This improves Q&A quality: a chunk labeled “The Krebs Cycle” produces more focused questions than unlabeled prose.

Title assignment happens in append_section_title_to_chunks, with two priority levels:

Priority 1 , structural heading extraction. _extract_structural_heading scans each node’s metadata and content for heading signals:

# 1. Check node metadata
if 'header' in node.metadata:
    return node.metadata['header']
if 'section' in node.metadata:
    return node.metadata['section']

# 2. Heuristics on first line
first_line = lines[0].strip()
if len(first_line) < 100 and len(first_line) > 3:
    # Numbered section: "3.1 The Krebs Cycle", "Chapter 5"
    if re.match(r'^(\d+\.)*\d+\s+', first_line) or \
       re.match(r'^(Chapter|Section|Part)\s+\d+', first_line, re.IGNORECASE):
        return first_line
    # Standard academic keywords
    if re.match(r'^(Introduction|Conclusion|Abstract|Methods?|Results?|Discussion|...)\s*$',
                first_line, re.IGNORECASE):
        return first_line.strip()
    # Title case, ≤10 words
    if first_line.istitle() and len(first_line.split()) <= 10:
        return first_line
    # All caps, 3,10 words
    if first_line.isupper() and 3 <= len(first_line.split()) <= 10:
        return first_line

The title metadata key from LlamaIndex is intentionally filtered: if it equals "document" (the placeholder passed during wrapping), or if it looks like a filename or file path, it’s discarded.

Priority 2 , LLM-generated title. When no structural heading is found, generate_chunk_title_with_llm calls GPT-3.5-turbo with the first 500 characters of the chunk:

response = self.openai_client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": "You are a concise summarizer. Generate only the title, nothing else."},
        {"role": "user",   "content": f"Generate a concise title (maximum 10 words)...\n\nText: {truncated_content}"}
    ],
    max_tokens=30,
    temperature=0.3,
    timeout=10.0
)

max_tokens=30 bounds the response. temperature=0.3 keeps the title deterministic. The first 500 characters are enough to capture the chunk’s topic without sending the full chunk , which would be wasteful for long chunks and isn’t needed for a title.

If the LLM returns a title longer than 15 words (the model occasionally ignores the 10-word instruction), it’s truncated to 10 words. If the LLM call fails after two retries, section_title is set to None and a logger.error is emitted.

The `TextChunk` Data Model

Every chunk coming out of the pipeline is a Pydantic model:

class TextChunk(BaseModel):
    content: str
    chunk_index: int
    page_number: Optional[int] = None
    section_title: Optional[str] = None
    token_count: Optional[int] = None
    document_id: int = 0
    document_path: str = ""
    filename: str = ""
    description: str = ""

token_count is the len(content) / 4 estimate computed in convert_chunks_to_text_chunks. document_id, document_path, filename, and description are populated by the Celery task after the chunk is returned from chunk_document , the processor itself doesn’t know about the database record, only the content.

The Complete Pipeline: `chunk_document`

The public entry point is chunk_document, which orchestrates the full sequence:

def chunk_document(self, document_path: str, original_filename: str) -> list[TextChunk]:
    # 1. Download from MinIO
    content, content_type = self.download_document(document_path)

    # 2. Extract text (PDF / DOCX / XLSX)
    file_ext = original_filename.lower().split('.')[-1]
    if file_ext == 'pdf':
        text, page_count = self.extract_text_from_pdf(content)
    elif file_ext == 'docx':
        text = self.extract_text_from_docx(content)
    elif file_ext == 'xlsx':
        text = self.extract_text_from_xlsx(content)

    # 3. Two-stage semantic chunking
    semantic_chunks, structural_headings = self.semantic_chunk_text(text, document_title=original_filename)

    # 4. Wrap in TextChunk objects (adds token_count)
    chunks = self.convert_chunks_to_text_chunks(semantic_chunks)

    # 5. Enrich with section titles (structural heading → LLM fallback)
    self.append_section_title_to_chunks(chunks, structural_headings)

    return chunks

page_count from the PDF extractor is not currently propagated into TextChunk.page_number , that field is populated separately when the Celery task has per-page data available. For weblinks, semantic_chunk_text is called directly (bypassing chunk_document) with a pre-computed token estimate passed as estimated_total_tokens to avoid a redundant len(text) // 4 computation.

What I’d Do Differently

Cache the structural heading extraction result from Stage 1 into Stage 2. The current pipeline runs _extract_structural_heading on Stage 2 output , nodes that the semantic splitter may have merged or split relative to Stage 1 nodes. Headings that appeared at the start of a Stage 1 node may no longer appear at the start of the corresponding Stage 2 node. Passing heading metadata through the node pipeline (rather than re-extracting from content) would be more reliable.

Use a token counter instead of len(text) // 4. The character-to-token ratio varies significantly across languages and content types , code, Chinese text, and LaTeX all have different ratios. tiktoken with the cl100k_base encoding would give exact counts for GPT and embedding models at negligible cost.

Batch the LLM title calls. append_section_title_to_chunks calls generate_chunk_title_with_llm one chunk at a time in a loop. For a document with 40 chunks needing LLM titles, that’s 40 sequential API calls. A single prompt with all chunk previews, or a batch of parallel async calls, would reduce wall-clock time substantially.

Propagate page_number from the PDF extractor. PyMuPDF’s block-based extraction processes the document page by page. The page number is available during extraction but not carried into TextChunk. For Q&A generation, knowing the source page is useful for generating citations and for debugging retrieval quality.

The two-stage approach costs one embedding API call per document at index time , the semantic stage processes every sentence in every Stage 1 node. For a 50-page document on short-content settings, that’s on the order of a few hundred embedding vectors. The payoff is chunks that respect both document structure and semantic boundaries, which translates directly to fewer garbage retrievals when a user’s flashcard session asks the RAG pipeline for context.

The full implementation is part of LongTermMemory , an AI study platform built on FastAPI, LlamaIndex, Qdrant, and Laravel 12.

Alessandro Fuda

LongTermMemory Is Now on iOS: Spaced Repetition in Your Pocket

What the App Does

Why Mobile Matters for Spaced Repetition

Who Benefits Most

A Few Honest Notes

How to Start

Turn Any Google Doc Into a Study Session With Quick Q&A Generator

What It Does

Why This Fits Into a Real Study Flow

Who It Is Built For

How to Get Started

Building a RAG-Powered Study App: Laravel + Python Microservices

The Architecture Decision: Why Two Languages?

Async Processing and the Push Callback Model

Preventing Duplicate Jobs

The Hardest Problem: Chunking

Naive chunking is terrible

Semantic chunking with LlamaIndex

The length problem

The breakpoint_percentile_threshold confusion

Cost impact

Making Q&A Generation Actually Good

RAG retrieval for question generation

Prompt engineering

Spaced Repetition

Production Gotcha: Celery Doesn’t Auto-Reload

What I’d Do Differently

Open Problems

Passwordless Auth in Laravel 12: Implementing Magic Link Login with Sanctum

The Flow at a Glance

Step 1: Send the Magic Link

Step 2: Validate the Signature and Generate a Code

Step 3: Exchange the Code for a Token

The Reverse Proxy Signature Gotcha

Open Redirect Protection

The React Side: Handling StrictMode’s Double Invocation

Testing: actingAsUser() vs actingAs()

What the Database Looks Like

What I’d Do Differently

Preventing Duplicate Background Jobs in Celery with Redis: A Production Pattern

The Bug

Two Redis Keys, Two Responsibilities

The Index: Set, Check, Clear

The FastAPI Endpoint: Check Before Queue

The 409 Propagation: FastAPI → Laravel → React

Cleanup in the Celery Task

Why Not Celery’s Built-In Task Result Backend?

What I’d Do Differently

Timezone-Aware Email Notifications in Laravel: Sending at 8 AM in Every User’s Local Time

Why “Send at 8 AM” Is Non-Trivial

The Core Idea: Collect Timezones at Target Hour

The Two-Query Pattern: Candidates, Then Due Items

Deduplication with insertOrIgnore

Rate Limiting: 1-Second Stagger

The Notification: Deep-Link Magic and Signed Unsubscribe URLs

Testing With Frozen Time

What I’d Do Differently

SEO for React SPAs Without SSR: Puppeteer Prerendering in Production

The Problem

The Architecture: Two Landing Page Variants

LandingPublic: Designed for Prerendering

The Prerender Script

Build Commands

What This Solves and What It Doesn’t

What I’d Do Differently

The SM-2 Algorithm in Practice: Building a Spaced Repetition System in Laravel

What SM-2 Actually Does

The Schema

The Scheduling Logic: AnswerEvaluation Enum

The Controller: Evaluating an Answer

Fetching the Next Item: Order and Strict Mode

Sessions and Progress Tracking

The React Evaluation UI

Testing: Factory States

What This Is Not (Yet): Full SM-2

What I’d Do Differently

Two-Stage Semantic Chunking for RAG in Python: Structural Splitting + Semantic Coherence

The Problem With Fixed-Size Chunking

The Architecture at a Glance

The `breakpoint_percentile_threshold` confusion

Testing: `actingAsUser()` vs `actingAs()`

Deduplication with `insertOrIgnore`

`LandingPublic`: Designed for Prerendering

The Scheduling Logic: `AnswerEvaluation` Enum

Stage 1: Structural Splitting With `SentenceSplitter`

Stage 2: Semantic Coherence With `SemanticSplitterNodeParser`

The `TextChunk` Data Model

The Complete Pipeline: `chunk_document`