How to Break a single MD file into multiple docs

# How to Extract Individual Files from COMBINED_ALL_DOCUMENTS_FINAL.md

## Overview

The file `COMBINED_ALL_DOCUMENTS_FINAL.md` contains all 11 markdown documents combined into one file with clear delimiters. This guide shows you how to extract them back into separate files.

## Delimiter Format

Each file is wrapped with clear delimiters:

“`

========== FILE_START: filename.md ==========

[file content here]

========== FILE_END: filename.md ==========

“`

## Method 1: Using PowerShell (Windows)

“`powershell

# Navigate to the av1 directory

cd <file location>

# Run this script to extract all files

$content = Get-Content “your_combainer .md” -Raw

$files = @()

$pattern = ‘========== FILE_START: (.+?) ==========\s*\n([\s\S]*?)\n========== FILE_END: \1 ==========’

$matches = [regex]::Matches($content, $pattern)

foreach ($match in $matches) {

    $filename = $match.Groups[1].Value

    $filecontent = $match.Groups[2].Value

    $filecontent | Out-File -FilePath $filename -Encoding UTF8

    Write-Host “Extracted: $filename”

}

Write-Host “Extraction complete! $($matches.Count) files extracted.”

“`

## Method 2: Using Python

“`python

import re

# Read the combined file

with open(‘your_file.md’, ‘r’, encoding=’utf-8′) as f:

    content = f.read()

# Extract each file

pattern = r’========== FILE_START: (.+?) ==========\s*\n([\s\S]*?)\n========== FILE_END: \1 ==========’

matches = re.findall(pattern, content)

for filename, filecontent in matches:

    with open(filename, ‘w’, encoding=’utf-8′) as f:

        f.write(filecontent)

    print(f’Extracted: {filename}’)

print(f’Extraction complete! {len(matches)} files extracted.’)

“`

## Method 3: Using Bash/Linux

“`bash

#!/bin/bash

# Read the combined file and extract each section

awk ‘

/^========== FILE_START:/ {

    filename = $3;

    getline;  # skip blank line

    output = 1;

    next;

}

/^========== FILE_END:/ {

    output = 0;

    close(filename);

    print “Extracted: ” filename > “/dev/stderr”;

    next;

}

output {

    print > filename;

}

‘ COMBINED_ALL_DOCUMENTS_FINAL.md

“`

## Method 4: Manual Extraction

If you prefer to extract files manually:

1. Open `COMBINED.md` in your text editor

2. Search for `========== FILE_START: filename.md ==========`

3. Copy everything between the START and END delimiters

4. Paste into a new file with the appropriate filename

5. Repeat for each file

## Verification

After extraction, verify the files:

**PowerShell:**

“`powershell

Get-ChildItem *.md | Select-Object Name, Length | Sort-Object Name

“`

**Bash:**

“`bash

wc -l *.md | sort -n

“`

**Expected Output:**

– x markdown files extracted

– Total lines should match the original files (approximately 3,600+ lines combined)

## Quick Extraction Script (Windows)

Save this as `extract_files.ps1` in the av1 directory:

“`powershell

$combined = Get-Content “COMBINED_ALL_DOCUMENTS_FINAL.md” -Raw

$pattern = ‘(?s)========== FILE_START: (.+?) ==========\r?\n(.+?)\r?\n========== FILE_END: \1 ==========’

[regex]::Matches($combined, $pattern) | ForEach-Object {

    $filename = $_.Groups[1].Value

    $content = $_.Groups[2].Value

    Set-Content -Path $filename -Value $content -Encoding UTF8

    Write-Host “✓ Extracted: $filename” -ForegroundColor Green

}

“`

Then run: `.\extract_files.ps1`

## Notes

– All files are UTF-8 encoded

– Line endings are preserved from the original files

– The combined file maintains all original formatting and content

– File delimiters are on their own lines for clean extraction

## Support

If you encounter issues:

1. Verify the delimiter format is intact

2. Check file encoding (should be UTF-8)

3. Ensure no special characters in filenames

4. Verify line ending consistency

## File Statistics

**Total combined file size:** ~3,600 lines

**Number of documents:** 11

**Format:** Markdown (.md)

**Encoding:** UTF-8

0 Comments

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

©2025 Luke Angel

Log in with your credentials

Forgot your details?

%d bloggers like this: