Version Control

Git Performance Optimization for Large Repositories

Practical techniques for optimizing Git performance in large repositories, including shallow clones, sparse checkout, fsmonitor, commit graphs, and CI/CD-specific optimizations.

Git Performance Optimization for Large Repositories

Overview

Git was designed for the Linux kernel, one of the largest open-source projects on the planet, but that does not mean every Git operation scales gracefully as your repository grows. When your repo accumulates tens of thousands of commits, hundreds of contributors, and gigabytes of tracked files, routine operations like git status, git fetch, and git log can grind to a halt. This article covers the practical techniques I have used over the past decade to keep Git fast in repositories that most developers would consider unwieldy, from monorepos serving dozens of microservices to legacy codebases with fifteen years of history.

Prerequisites

  • Git 2.38 or later (many features discussed require recent versions)
  • Basic familiarity with Git internals (objects, packfiles, refs)
  • A Unix-like shell or Git Bash on Windows
  • Node.js 18+ for the automation scripts
  • Optional: access to a CI/CD platform (GitHub Actions, GitLab CI, or Jenkins)

Diagnosing Slow Git Operations

Before optimizing anything, you need to measure. Git ships with built-in tracing that tells you exactly where time is being spent. Set the GIT_TRACE family of environment variables to get detailed timing breakdowns.

# Enable performance tracing
GIT_TRACE=1 GIT_TRACE_PERFORMANCE=1 git status

# Output looks like this:
# 14:23:01.234567 trace.c:234           performance: 0.001234567 s: git command: git status
# 14:23:01.234890 read-cache.c:2345     performance: 0.045123000 s: read cache .git/index
# 14:23:01.280013 preload-index.c:123   performance: 0.312456000 s: preload index
# 14:23:01.592469 diff-lib.c:567        performance: 0.089234000 s: diff-files
# 14:23:01.681703 dir.c:2890            performance: 0.234567000 s: read directory

The trace output reveals that in a large repository, read directory and preload index dominate the wall-clock time. That is exactly where most optimizations target.

For a more structured approach, use GIT_TRACE2_PERF:

# Write structured performance data to a file
GIT_TRACE2_PERF=/tmp/git-perf.log git fetch origin

# Parse the performance log
cat /tmp/git-perf.log | grep "\.total" | sort -t'|' -k4 -rn | head -20

You can also write a quick Node.js script to benchmark Git operations programmatically:

var execSync = require("child_process").execSync;

var commands = [
    "git status",
    "git log --oneline -100",
    "git diff HEAD~10",
    "git branch -a",
    "git rev-list --count HEAD"
];

commands.forEach(function(cmd) {
    var start = process.hrtime.bigint();
    try {
        execSync(cmd, { cwd: process.argv[2] || ".", stdio: "pipe" });
    } catch (e) {
        // Command may fail in some repos, that is fine
    }
    var end = process.hrtime.bigint();
    var ms = Number(end - start) / 1e6;
    console.log(cmd.padEnd(30) + " " + ms.toFixed(1) + " ms");
});
$ node benchmark-git.js /path/to/large-repo
git status                     1243.7 ms
git log --oneline -100         89.2 ms
git diff HEAD~10               312.4 ms
git branch -a                  45.1 ms
git rev-list --count HEAD      2341.8 ms

If git status takes more than 200 ms or git rev-list --count HEAD takes more than a second, you have optimization opportunities.


Shallow Clones and Partial Clones

The most immediate way to speed up Git is to simply download less data. Git supports two strategies for this: shallow clones and partial clones.

Shallow Clones

A shallow clone truncates history at a specified depth. You get the working tree and recent commits, but not the full DAG.

# Clone only the last 10 commits
git clone --depth 10 https://github.com/your-org/large-repo.git

# Check the size difference
du -sh large-repo/.git
# 45M    large-repo/.git

# Compare to a full clone
git clone https://github.com/your-org/large-repo.git large-repo-full
du -sh large-repo-full/.git
# 2.1G   large-repo-full/.git

Shallow clones are excellent for CI/CD, but they have limitations. You cannot run git log beyond the depth boundary, and some merge operations require fetching additional history. You can "unshallow" a clone when needed:

# Fetch remaining history when you need it
git fetch --unshallow

Partial Clones

Partial clones, introduced in Git 2.22, are more sophisticated. They tell the server to omit certain objects (typically blobs) and fetch them on demand when accessed.

# Clone without downloading file contents (blobless clone)
git clone --filter=blob:none https://github.com/your-org/large-repo.git
# Clone time: ~15 seconds instead of ~8 minutes

# Clone without trees either (treeless clone - fastest initial clone)
git clone --filter=tree:0 https://github.com/your-org/large-repo.git
# Clone time: ~5 seconds

# Check what was actually downloaded
git rev-list --objects --all --missing=print | grep "^?" | wc -l
# 142857  (number of missing objects that will be fetched on demand)

The blobless clone is my recommendation for most development workflows. You get full commit history for git log and git blame, but file contents are fetched lazily. The first git checkout of a branch will trigger a fetch for any blobs you do not already have, but subsequent operations are fast because blobs get cached locally.

# Configure a partial clone after the fact
git config remote.origin.promisor true
git config remote.origin.partialclonefilter "blob:none"

Sparse Checkout for Monorepos

If your repository is a monorepo with dozens of services, you probably do not need every directory checked out on every developer's machine. Sparse checkout lets you select which paths materialize in the working tree.

# Initialize sparse checkout (cone mode is faster)
git sparse-checkout init --cone

# Only check out specific directories
git sparse-checkout set services/api services/auth packages/shared

# Verify what is checked out
git sparse-checkout list
# services/api
# services/auth
# packages/shared

# Check the working tree size
find . -type f | wc -l
# 1,247  (instead of 45,000+ in the full repo)

Cone mode restricts patterns to directory-level matching, which is significantly faster than the older pattern-based sparse checkout because Git can skip entire subtrees without evaluating individual file patterns.

Combine sparse checkout with partial clones for the fastest possible monorepo workflow:

# The ultimate fast monorepo clone
git clone \
  --filter=blob:none \
  --sparse \
  https://github.com/your-org/monorepo.git

cd monorepo
git sparse-checkout set services/my-service packages/shared

This gives you a working checkout in seconds instead of minutes, with only the files you actually need on disk.


Git GC and Pack Optimization

Git stores objects in two formats: loose objects (individual files in .git/objects) and packfiles (compressed archives). Over time, loose objects accumulate and packfiles become suboptimal. Running git gc consolidates and compresses everything.

# Check the current state of your repository
git count-objects -vH
# count: 12345
# size: 234.56 MiB
# in-pack: 567890
# packs: 47
# size-pack: 1.23 GiB
# prune-packable: 890
# garbage: 0
# size-garbage: 0 bytes

# Run aggressive garbage collection
git gc --aggressive --prune=now

# Check again
git count-objects -vH
# count: 0
# size: 0 bytes
# in-pack: 580235
# packs: 1
# size-pack: 987.65 MiB
# prune-packable: 0
# garbage: 0
# size-garbage: 0 bytes

The --aggressive flag tells Git to spend more time optimizing the delta chain, resulting in smaller packfiles. Use it sparingly since it is CPU-intensive, but it is worth running periodically on large repos.

Tune pack settings for large repositories:

# Increase pack window for better delta compression
git config pack.window 25
git config pack.depth 50

# Use multiple threads for packing
git config pack.threads 0  # 0 = auto-detect CPU count

# Increase the pack size limit to avoid excessive pack splitting
git config pack.packSizeLimit 2g

# Enable bitmap indexes for faster counting
git repack -adb

The bitmap index is particularly impactful. It pre-computes reachability information so that operations like git rev-list --count and git clone (server-side) become dramatically faster.


Fsmonitor and Filesystem Cache

The fsmonitor feature integrates Git with a filesystem watcher (Watchman on Unix, the built-in FSMonitor daemon on Windows) so that git status does not have to stat every file in the working tree.

# Enable the built-in FSMonitor daemon (Git 2.37+)
git config core.fsmonitor true

# Or use Facebook's Watchman
git config core.fsmonitor "$HOME/.git-watchman-hook.sh"

# Enable the untracked cache alongside fsmonitor
git config core.untrackedcache true

# Benchmark the difference
time git status  # Before: 1.2s
time git status  # After:  0.08s

The impact is enormous on repositories with tens of thousands of files. Instead of calling lstat() on every tracked file, Git asks the filesystem watcher "what changed since the last time I checked?" and only examines those files.

On Windows specifically, the built-in FSMonitor daemon is a game-changer. Windows filesystem operations are notoriously slower than Linux, and fsmonitor eliminates most of them:

# Windows-specific: enable the built-in fsmonitor daemon
git config core.fsmonitor true
git fsmonitor--daemon start

# Verify it is running
git fsmonitor--daemon status
# fsmonitor-daemon is watching "C:/dev/large-repo"

Commit Graph and Multi-Pack Index

The commit-graph file is a pre-computed data structure that accelerates commit traversal. Operations like git log, git merge-base, and git branch --contains benefit dramatically.

# Generate the commit graph
git commit-graph write --reachable --changed-paths

# Enable it globally
git config core.commitGraph true
git config fetch.writeCommitGraph true

# Benchmark git log before and after
time git log --oneline -1000 > /dev/null
# Before: 0.89s
# After:  0.12s

# The changed-paths Bloom filter accelerates path-limited log
time git log --oneline -- packages/api/src/server.js
# Before: 4.2s
# After:  0.3s

The --changed-paths flag generates Bloom filters that allow Git to skip commits that definitely did not touch a given path. This makes git log -- <path> up to 10x faster in large repositories.

The multi-pack index (MIDX) is the packfile equivalent. Instead of searching through dozens of packfiles, Git consults a single index:

# Write a multi-pack index
git multi-pack-index write

# Enable bitmap support for the MIDX
git multi-pack-index write --bitmap

# Repack using the MIDX for better performance
git multi-pack-index repack --batch-size=512m

.gitattributes for Large Files

Tracking large binary files directly in Git bloats the repository and slows down every operation. Use Git LFS for binaries and configure .gitattributes to mark files appropriately.

# Install Git LFS
git lfs install

# Track binary file types
git lfs track "*.png"
git lfs track "*.jpg"
git lfs track "*.zip"
git lfs track "*.woff2"
git lfs track "*.sqlite"

Your .gitattributes file should also mark files that Git should not attempt to diff or merge as text:

# Binary files - do not diff
*.png binary
*.jpg binary
*.gif binary
*.ico binary
*.woff binary
*.woff2 binary
*.ttf binary
*.eot binary

# Large generated files - do not diff
package-lock.json -diff linguist-generated
yarn.lock -diff linguist-generated

# Git LFS tracking
*.png filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.sqlite filter=lfs diff=lfs merge=lfs -text

# Ensure consistent line endings for text files
*.js text eol=lf
*.json text eol=lf
*.pug text eol=lf
*.css text eol=lf
*.md text eol=lf

Marking package-lock.json as -diff is a small but meaningful optimization. In a large Node.js project, that file can be 50,000+ lines, and generating diffs for it on every commit is wasteful.


Git Maintenance and Scheduled Optimization

Git 2.29 introduced git maintenance, a built-in task scheduler that runs optimization tasks automatically so you never have to think about git gc again.

# Register the current repository for maintenance
git maintenance register

# Start the maintenance scheduler
git maintenance start

# Check the configured tasks
git config --get-regexp maintenance

# maintenance.auto=false
# maintenance.strategy=incremental

The incremental strategy runs these tasks on a schedule:

Task Frequency What It Does
commit-graph Hourly Updates the commit-graph file
prefetch Hourly Fetches from remotes in the background
loose-objects Daily Packs loose objects
incremental-repack Daily Consolidates packfiles
pack-refs Daily Packs loose refs

You can customize the schedule:

# Run gc weekly instead of letting it trigger automatically
git config gc.auto 0
git config maintenance.gc.schedule weekly

# Customize which tasks run
git config maintenance.commit-graph.enabled true
git config maintenance.prefetch.enabled true
git config maintenance.loose-objects.enabled true
git config maintenance.incremental-repack.enabled true

For a Node.js project with frequent dependency updates, I recommend also running a periodic repack:

# Add to crontab or Task Scheduler
# Weekly deep repack at 3am Sunday
0 3 * * 0 cd /path/to/repo && git repack -adb --geometric=2

Reducing Clone Time in CI/CD Pipelines

CI/CD pipelines clone your repository on every run. For large repos, this can add minutes to every build. Here are the techniques I use to minimize clone time.

GitHub Actions

# .github/workflows/build.yml
name: Build
on: [push, pull_request]

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      # Shallow clone with fetch-depth
      - uses: actions/checkout@v4
        with:
          fetch-depth: 1  # Only the latest commit

      # For monorepos, use sparse checkout
      - uses: actions/checkout@v4
        with:
          fetch-depth: 1
          sparse-checkout: |
            services/api
            packages/shared
          sparse-checkout-cone-mode: true

      - uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: 'npm'
          cache-dependency-path: services/api/package-lock.json

      - run: npm ci --prefix services/api
      - run: npm test --prefix services/api

GitLab CI

# .gitlab-ci.yml
variables:
  GIT_DEPTH: 1
  GIT_STRATEGY: fetch  # Reuse existing clone if available

build:
  stage: build
  script:
    - npm ci
    - npm run build
  cache:
    key: ${CI_COMMIT_REF_SLUG}
    paths:
      - node_modules/

Jenkins

// Jenkinsfile
pipeline {
    agent any
    options {
        // Shallow clone
        checkout([$class: 'GitSCM',
            branches: [[name: '*/main']],
            extensions: [
                [$class: 'CloneOption', depth: 1, shallow: true],
                [$class: 'SparseCheckoutPaths',
                    sparseCheckoutPaths: [
                        [$class: 'SparseCheckoutPath', path: 'services/api/'],
                        [$class: 'SparseCheckoutPath', path: 'packages/shared/']
                    ]
                ]
            ],
            userRemoteConfigs: [[url: env.GIT_URL]]
        ])
    }
    stages {
        stage('Build') {
            steps {
                sh 'npm ci'
                sh 'npm run build'
            }
        }
    }
}

Using a Reference Clone

If you run many CI jobs on the same machine (or persistent runners), maintain a mirror and use --reference:

# One-time setup: create a bare mirror
git clone --mirror https://github.com/your-org/large-repo.git /opt/git-mirrors/large-repo.git

# Update the mirror periodically (cron every 5 minutes)
cd /opt/git-mirrors/large-repo.git && git remote update

# Clone using the reference (objects are shared via hardlinks)
git clone --reference /opt/git-mirrors/large-repo.git \
    --dissociate \
    https://github.com/your-org/large-repo.git /workspace/build

The --reference flag tells Git to use objects from the mirror instead of downloading them. The --dissociate flag copies the referenced objects so the clone is self-contained. This can reduce clone time from minutes to seconds.


Measuring Git Performance with GIT_TRACE

I mentioned GIT_TRACE at the beginning, but it deserves a deeper look. There are several trace categories, each revealing different aspects of Git's behavior.

# Trace all Git commands
GIT_TRACE=1 git fetch

# Trace packfile negotiation (useful for slow fetches)
GIT_TRACE_PACKET=1 git fetch origin main

# Trace object access patterns
GIT_TRACE_PACK_ACCESS=1 git log --oneline -50

# Trace setup time (config loading, repo discovery)
GIT_TRACE_SETUP=1 git status

# Combine multiple traces
GIT_TRACE=1 \
GIT_TRACE_PERFORMANCE=1 \
GIT_TRACE_PACK_ACCESS=/tmp/pack-trace.log \
git status

For automated performance monitoring, write the traces to files and parse them:

var execSync = require("child_process").execSync;
var fs = require("fs");
var path = require("path");

function measureGitPerformance(repoPath, commands) {
    var results = {};
    var traceFile = path.join(repoPath, ".git", "perf-trace.log");

    commands.forEach(function(cmd) {
        var env = Object.assign({}, process.env, {
            GIT_TRACE2_PERF: traceFile
        });

        // Clear previous trace
        try { fs.unlinkSync(traceFile); } catch (e) { /* ignore */ }

        var start = Date.now();
        try {
            execSync(cmd, {
                cwd: repoPath,
                env: env,
                stdio: "pipe",
                timeout: 30000
            });
        } catch (e) {
            // Some commands may fail, record the timing anyway
        }
        var elapsed = Date.now() - start;

        // Parse the trace file for detailed breakdown
        var breakdown = {};
        try {
            var trace = fs.readFileSync(traceFile, "utf8");
            var lines = trace.split("\n");
            lines.forEach(function(line) {
                var match = line.match(/\|(\w+)\|.*\|(\d+\.\d+)\|/);
                if (match) {
                    var category = match[1];
                    var seconds = parseFloat(match[2]);
                    breakdown[category] = (breakdown[category] || 0) + seconds;
                }
            });
        } catch (e) {
            // Trace file may not exist
        }

        results[cmd] = {
            totalMs: elapsed,
            breakdown: breakdown
        };
    });

    return results;
}

var repoPath = process.argv[2] || ".";
var perf = measureGitPerformance(repoPath, [
    "git status",
    "git log --oneline -100",
    "git diff HEAD~5",
    "git branch -a --contains HEAD"
]);

console.log(JSON.stringify(perf, null, 2));
$ node measure-git.js /path/to/large-repo
{
  "git status": {
    "totalMs": 1247,
    "breakdown": {
      "read_cache": 0.045,
      "preload_index": 0.312,
      "diff_files": 0.089,
      "read_directory": 0.734
    }
  },
  "git log --oneline -100": {
    "totalMs": 91,
    "breakdown": {
      "commit_graph": 0.003,
      "traverse_commits": 0.078
    }
  }
}

Complete Working Example

Here is a comprehensive Git performance optimization script that diagnoses issues, applies configuration, and sets up scheduled maintenance for a large Node.js monorepo. Save this as optimize-git.js in your repository root.

#!/usr/bin/env node

var execSync = require("child_process").execSync;
var fs = require("fs");
var path = require("path");
var os = require("os");

// ============================================================
// Git Performance Optimization Script
// For large Node.js monorepos
// ============================================================

var REPO_PATH = process.argv[2] || process.cwd();
var VERBOSE = process.argv.indexOf("--verbose") !== -1;
var DRY_RUN = process.argv.indexOf("--dry-run") !== -1;

function log(msg) {
    console.log("[git-optimize] " + msg);
}

function warn(msg) {
    console.log("[git-optimize] WARNING: " + msg);
}

function exec(cmd, opts) {
    opts = opts || {};
    var execOpts = {
        cwd: opts.cwd || REPO_PATH,
        stdio: VERBOSE ? "inherit" : "pipe",
        timeout: opts.timeout || 60000,
        encoding: "utf8"
    };

    if (DRY_RUN && !opts.readOnly) {
        log("  DRY RUN: " + cmd);
        return "";
    }

    try {
        return execSync(cmd, execOpts).toString().trim();
    } catch (e) {
        if (opts.ignoreError) {
            return e.stdout ? e.stdout.toString().trim() : "";
        }
        throw e;
    }
}

// ============================================================
// Step 1: Diagnose Current State
// ============================================================

function diagnose() {
    log("=== DIAGNOSIS ===");

    // Git version
    var version = exec("git --version", { readOnly: true });
    log("Git version: " + version);

    var versionMatch = version.match(/(\d+)\.(\d+)/);
    var major = parseInt(versionMatch[1], 10);
    var minor = parseInt(versionMatch[2], 10);

    if (major < 2 || (major === 2 && minor < 38)) {
        warn("Git version is below 2.38. Some optimizations require newer Git.");
    }

    // Repository size
    var objects = exec("git count-objects -vH", { readOnly: true });
    log("Object statistics:\n" + objects);

    // Number of tracked files
    var fileCount = exec("git ls-files | wc -l", { readOnly: true }).trim();
    log("Tracked files: " + fileCount);

    // Number of commits
    var commitCount = exec("git rev-list --count HEAD", { readOnly: true });
    log("Total commits: " + commitCount);

    // Number of branches
    var branchCount = exec("git branch -a | wc -l", { readOnly: true }).trim();
    log("Total branches (local + remote): " + branchCount);

    // Number of pack files
    var packDir = path.join(REPO_PATH, ".git", "objects", "pack");
    var packCount = 0;
    try {
        var packFiles = fs.readdirSync(packDir).filter(function(f) {
            return f.endsWith(".pack");
        });
        packCount = packFiles.length;
    } catch (e) {
        // Not a standard repo layout
    }
    log("Pack files: " + packCount);

    // Benchmark key operations
    log("\n=== BENCHMARKS ===");
    var benchmarks = [
        { cmd: "git status", label: "git status" },
        { cmd: "git log --oneline -100", label: "git log (100 commits)" },
        { cmd: "git diff HEAD~1", label: "git diff (last commit)" },
        { cmd: "git branch --contains HEAD", label: "git branch --contains" }
    ];

    var timings = {};
    benchmarks.forEach(function(b) {
        var start = Date.now();
        exec(b.cmd, { readOnly: true, ignoreError: true });
        var elapsed = Date.now() - start;
        timings[b.label] = elapsed;
        log("  " + b.label.padEnd(30) + elapsed + " ms");
    });

    // Issue detection
    log("\n=== ISSUES DETECTED ===");
    var issues = [];

    if (timings["git status"] > 500) {
        issues.push("git status is slow (" + timings["git status"] + "ms). Enable fsmonitor.");
    }

    if (packCount > 10) {
        issues.push("Too many pack files (" + packCount + "). Run git repack.");
    }

    if (parseInt(fileCount, 10) > 10000) {
        issues.push("Large number of tracked files (" + fileCount + "). Consider sparse checkout.");
    }

    var commitGraphExists = fs.existsSync(
        path.join(REPO_PATH, ".git", "objects", "info", "commit-graph")
    ) || fs.existsSync(
        path.join(REPO_PATH, ".git", "objects", "info", "commit-graphs")
    );

    if (!commitGraphExists) {
        issues.push("No commit-graph file found. Generating one will speed up log operations.");
    }

    if (issues.length === 0) {
        log("  No major issues detected.");
    } else {
        issues.forEach(function(issue) {
            log("  - " + issue);
        });
    }

    return {
        fileCount: parseInt(fileCount, 10),
        commitCount: parseInt(commitCount, 10),
        packCount: packCount,
        timings: timings,
        issues: issues,
        gitMajor: major,
        gitMinor: minor,
        hasCommitGraph: commitGraphExists
    };
}

// ============================================================
// Step 2: Apply Optimizations
// ============================================================

function optimize(diagnosis) {
    log("\n=== APPLYING OPTIMIZATIONS ===");

    // Core settings
    log("Configuring core Git settings...");
    exec("git config core.preloadindex true");
    exec("git config core.fscache true");
    exec("git config core.untrackedcache true");

    // Enable fsmonitor if Git supports it
    if (diagnosis.gitMajor > 2 || (diagnosis.gitMajor === 2 && diagnosis.gitMinor >= 37)) {
        log("Enabling built-in FSMonitor daemon...");
        exec("git config core.fsmonitor true");
    } else {
        log("Skipping FSMonitor (requires Git 2.37+).");
    }

    // Commit graph
    if (!diagnosis.hasCommitGraph) {
        log("Writing commit-graph with Bloom filters...");
        exec("git commit-graph write --reachable --changed-paths", { timeout: 300000 });
    }
    exec("git config core.commitGraph true");
    exec("git config fetch.writeCommitGraph true");
    exec("git config gc.writeCommitGraph true");

    // Pack optimization
    log("Configuring pack settings...");
    exec("git config pack.threads 0");
    exec("git config pack.window 25");
    exec("git config pack.depth 50");
    exec("git config pack.useSparse true");

    // Fetch optimization
    log("Configuring fetch settings...");
    exec("git config fetch.parallel 4");
    exec("git config fetch.prune true");
    exec("git config fetch.pruneTags true");

    // Index optimization
    exec("git config index.version 4");
    exec("git config feature.manyFiles true");

    // Repack if too many pack files
    if (diagnosis.packCount > 10) {
        log("Repacking objects (this may take a while)...");
        exec("git repack -adb --geometric=2", { timeout: 600000 });
        log("Writing multi-pack index...");
        exec("git multi-pack-index write", { ignoreError: true });
    }

    // GC settings
    exec("git config gc.auto 256");
    exec("git config gc.autoPackLimit 10");

    log("Optimizations applied successfully.");
}

// ============================================================
// Step 3: Setup Maintenance Schedule
// ============================================================

function setupMaintenance(diagnosis) {
    log("\n=== SETTING UP MAINTENANCE ===");

    if (diagnosis.gitMajor < 2 || (diagnosis.gitMajor === 2 && diagnosis.gitMinor < 29)) {
        warn("git maintenance requires Git 2.29+. Skipping.");
        return;
    }

    exec("git maintenance register", { ignoreError: true });

    // Configure incremental strategy
    exec("git config maintenance.strategy incremental");
    exec("git config maintenance.auto false");

    // Enable specific tasks
    exec("git config maintenance.commit-graph.enabled true");
    exec("git config maintenance.commit-graph.schedule hourly");
    exec("git config maintenance.prefetch.enabled true");
    exec("git config maintenance.prefetch.schedule hourly");
    exec("git config maintenance.loose-objects.enabled true");
    exec("git config maintenance.loose-objects.schedule daily");
    exec("git config maintenance.incremental-repack.enabled true");
    exec("git config maintenance.incremental-repack.schedule daily");
    exec("git config maintenance.pack-refs.enabled true");

    // Start the scheduler
    exec("git maintenance start", { ignoreError: true });

    log("Maintenance scheduler configured and started.");
}

// ============================================================
// Step 4: Generate .gitattributes recommendations
// ============================================================

function checkGitattributes() {
    log("\n=== .GITATTRIBUTES CHECK ===");

    var gitattributesPath = path.join(REPO_PATH, ".gitattributes");
    var hasGitattributes = fs.existsSync(gitattributesPath);

    if (!hasGitattributes) {
        log("No .gitattributes file found. Creating recommended configuration...");
        var content = [
            "# Auto-detect text files and normalize line endings",
            "* text=auto",
            "",
            "# JavaScript and web files",
            "*.js text eol=lf",
            "*.jsx text eol=lf",
            "*.ts text eol=lf",
            "*.tsx text eol=lf",
            "*.json text eol=lf",
            "*.css text eol=lf",
            "*.scss text eol=lf",
            "*.html text eol=lf",
            "*.pug text eol=lf",
            "*.md text eol=lf",
            "*.yml text eol=lf",
            "*.yaml text eol=lf",
            "",
            "# Prevent noisy diffs for generated files",
            "package-lock.json -diff linguist-generated",
            "yarn.lock -diff linguist-generated",
            "",
            "# Binary files",
            "*.png binary",
            "*.jpg binary",
            "*.jpeg binary",
            "*.gif binary",
            "*.ico binary",
            "*.woff binary",
            "*.woff2 binary",
            "*.ttf binary",
            "*.eot binary",
            "*.zip binary",
            "*.gz binary",
            "*.tar binary",
            ""
        ].join("\n");

        if (!DRY_RUN) {
            fs.writeFileSync(gitattributesPath, content);
            log("Created .gitattributes with recommended settings.");
        } else {
            log("DRY RUN: Would create .gitattributes");
        }
    } else {
        var existing = fs.readFileSync(gitattributesPath, "utf8");
        if (existing.indexOf("package-lock.json") === -1) {
            log("Consider adding: package-lock.json -diff linguist-generated");
        }
        if (existing.indexOf("binary") === -1) {
            log("Consider adding binary file markers for images and fonts.");
        }
        log(".gitattributes exists and looks reasonable.");
    }
}

// ============================================================
// Step 5: Post-optimization benchmark
// ============================================================

function postBenchmark() {
    log("\n=== POST-OPTIMIZATION BENCHMARKS ===");

    var benchmarks = [
        { cmd: "git status", label: "git status" },
        { cmd: "git log --oneline -100", label: "git log (100 commits)" },
        { cmd: "git diff HEAD~1", label: "git diff (last commit)" }
    ];

    benchmarks.forEach(function(b) {
        var start = Date.now();
        exec(b.cmd, { readOnly: true, ignoreError: true });
        var elapsed = Date.now() - start;
        log("  " + b.label.padEnd(30) + elapsed + " ms");
    });
}

// ============================================================
// Main
// ============================================================

function main() {
    log("Git Performance Optimization Tool");
    log("Repository: " + REPO_PATH);
    log("Mode: " + (DRY_RUN ? "DRY RUN" : "LIVE"));
    log("");

    // Verify we are in a git repository
    try {
        exec("git rev-parse --is-inside-work-tree", { readOnly: true });
    } catch (e) {
        console.error("Error: " + REPO_PATH + " is not a Git repository.");
        process.exit(1);
    }

    var diagnosis = diagnose();
    optimize(diagnosis);
    setupMaintenance(diagnosis);
    checkGitattributes();
    postBenchmark();

    log("\n=== DONE ===");
    log("Run with --verbose for detailed output.");
    log("Run with --dry-run to preview changes without applying them.");
}

main();

Run it:

# Preview changes without applying
node optimize-git.js /path/to/large-repo --dry-run

# Apply all optimizations
node optimize-git.js /path/to/large-repo

# Apply with detailed output
node optimize-git.js /path/to/large-repo --verbose

Expected output for a large repository:

[git-optimize] Git Performance Optimization Tool
[git-optimize] Repository: /home/dev/monorepo
[git-optimize] Mode: LIVE

[git-optimize] === DIAGNOSIS ===
[git-optimize] Git version: git version 2.43.0
[git-optimize] Tracked files: 47832
[git-optimize] Total commits: 28491
[git-optimize] Total branches (local + remote): 142
[git-optimize] Pack files: 23

[git-optimize] === BENCHMARKS ===
[git-optimize]   git status                    1247 ms
[git-optimize]   git log (100 commits)         89 ms
[git-optimize]   git diff (last commit)        45 ms
[git-optimize]   git branch --contains         3421 ms

[git-optimize] === ISSUES DETECTED ===
[git-optimize]   - git status is slow (1247ms). Enable fsmonitor.
[git-optimize]   - Too many pack files (23). Run git repack.
[git-optimize]   - Large number of tracked files (47832). Consider sparse checkout.
[git-optimize]   - No commit-graph file found.

[git-optimize] === APPLYING OPTIMIZATIONS ===
[git-optimize] Configuring core Git settings...
[git-optimize] Enabling built-in FSMonitor daemon...
[git-optimize] Writing commit-graph with Bloom filters...
[git-optimize] Configuring pack settings...
[git-optimize] Configuring fetch settings...
[git-optimize] Repacking objects (this may take a while)...
[git-optimize] Writing multi-pack index...
[git-optimize] Optimizations applied successfully.

[git-optimize] === POST-OPTIMIZATION BENCHMARKS ===
[git-optimize]   git status                    87 ms
[git-optimize]   git log (100 commits)         12 ms
[git-optimize]   git diff (last commit)        23 ms

[git-optimize] === DONE ===

Common Issues and Troubleshooting

1. Shallow Clone Breaks Git Operations

fatal: shallow file has changed since we read it

This happens when a background process (like git maintenance or IDE integrations) modifies the shallow boundary while another operation is running. The fix is to unshallow or increase the depth:

# Fix by fetching more history
git fetch --deepen=100

# Or unshallow entirely
git fetch --unshallow

# Prevent by disabling maintenance on shallow clones
git config maintenance.auto false

2. FSMonitor Daemon Crashes or Becomes Unresponsive

warning: could not open directory '.git/fsmonitor--daemon': No such file or directory
error: fsmonitor--daemon is not running

The FSMonitor daemon can crash if the repository is on a network drive or if the filesystem does not support inotify (Linux) or FSEvents (macOS). Fix it by restarting or falling back to the untracked cache:

# Restart the daemon
git fsmonitor--daemon stop
git fsmonitor--daemon start

# If it keeps crashing, disable it and use untracked cache instead
git config --unset core.fsmonitor
git config core.untrackedcache true

# On Linux, check inotify limits
cat /proc/sys/fs/inotify/max_user_watches
# If too low:
echo 524288 | sudo tee /proc/sys/fs/inotify/max_user_watches
echo 524288 | sudo tee /proc/sys/fs/inotify/max_user_instances

3. Commit-Graph Becomes Corrupted

error: commit-graph requires overflow generation data but has none
warning: commit-graph has generation number zero for commit <hash>

Corruption can happen after a crash during git commit-graph write. The fix is straightforward:

# Remove the corrupted commit-graph
rm -f .git/objects/info/commit-graph
rm -rf .git/objects/info/commit-graphs

# Regenerate it
git commit-graph write --reachable --changed-paths

# Verify integrity
git commit-graph verify

4. Partial Clone Fetch Storms

remote: Enumerating objects: 12345, done.
Receiving objects: 100% (12345/12345), 234.56 MiB | 1.23 MiB/s

When you check out a branch that touches many files in a partial clone, Git fetches all the missing blobs at once, which can be slower than a full clone. Mitigate this by pre-fetching:

# Pre-fetch blobs for a branch before switching to it
git fetch origin feature-branch
git diff --name-only HEAD..origin/feature-branch | \
    xargs git cat-file --batch-check 2>/dev/null

# Or configure a blob size limit to prefetch small blobs
git config remote.origin.partialclonefilter "blob:limit=1m"

5. Multi-Pack Index Errors After Manual Repack

error: multi-pack-index file exists, but failed to parse
fatal: multi-pack-index required, but not found

If you run git repack manually while a multi-pack index exists, the MIDX can become stale. Always regenerate it after repacking:

# Remove stale MIDX
rm -f .git/objects/pack/multi-pack-index*

# Repack and regenerate
git repack -adb
git multi-pack-index write --bitmap

Best Practices

  • Measure before optimizing. Run GIT_TRACE_PERFORMANCE=1 git status before changing any configuration. A 50ms git status does not need fsmonitor. Focus your effort where the numbers tell you to.

  • Use partial clones for development, shallow clones for CI. Partial clones preserve full history for git log and git blame while keeping the initial clone fast. Shallow clones are better for CI because you rarely need history beyond the current commit.

  • Enable the commit-graph with Bloom filters on every repository. There is virtually no downside and it makes git log -- <path> dramatically faster. Set fetch.writeCommitGraph=true globally so it stays updated automatically.

  • Mark package-lock.json as -diff in .gitattributes. In a Node.js project, this single line eliminates the most common source of noisy, slow diffs. The file is machine-generated and should not be reviewed line-by-line anyway.

  • Set up git maintenance and forget about manual git gc. The incremental strategy handles commit-graph updates, prefetch, loose object packing, and repacking automatically. It runs in the background and has negligible performance impact.

  • Use sparse checkout in monorepos with more than 5 services. If each developer only works on one or two services, there is no reason to have 50,000 files in their working tree. Sparse checkout with cone mode keeps git status fast and reduces disk usage.

  • Keep your Git version current. Nearly every major Git release in the past three years has included performance improvements. Git 2.38 alone brought a 30% improvement in git status on Windows. If you are stuck on Git 2.25, you are leaving performance on the table.

  • Avoid tracking large binary files directly. Use Git LFS or store binaries in an external artifact repository. A single 100MB binary in your history will slow down every clone forever, even if you delete it in a later commit.

  • Configure index.version 4 for large repositories. Version 4 of the index format uses path compression, reducing index file size by 30-50% for repositories with deep directory structures. This directly speeds up every operation that reads the index.

  • Profile CI pipeline clone times quarterly. Clone time creeps up as repositories grow. Set a performance budget (for example, "clone must complete in under 30 seconds") and monitor it. When you exceed the budget, apply the techniques from this article.


References

Powered by Contentful