Parsing WordPress Export Files in the Browser
Looking for expert solutions in AI, Web Applications, APIs, or blockchain development?
Request a Free ConsultationParsing WordPress Export Files in the Browser: A Stylish JavaScript Tool with Clear Guidance
WordPress’s XML export (WXR) is a fantastic way to back up or analyze your site’s content, but transforming that XML into a CSV file often feels like a chore reserved for server-side wizards. What if you could do it right in your browser with a tool that’s both powerful and visually appealing? In this post, I’ll walk you through a JavaScript solution that parses WordPress XML exports, generates a CSV with URLs, titles, types, and HubSpot module detection, and features a sleek design with step-by-step instructions. I’ve also shared it on GitHub for you to try out!
Why a Browser-Based Tool?
Server-side scripts like Python or Node.js are great for heavy lifting, but a browser-based approach offers distinct perks:
- Zero Setup: Open an HTML file, and you’re ready—no dependencies needed.
- User-Friendly: A polished UI with clear directions welcomes users of all levels.
- Portability: Share it as a single file or host it anywhere.
This tool delivers:
- Parsing of WordPress XML exports.
- Extraction of URLs, titles, and types for published posts and pages.
- Detection of HubSpot modules in content.
- A downloadable CSV with the results.
The Code
Here’s the full solution, styled with CSS to look sharp and intuitive:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>WordPress XML Parser</title>
<style>
body {
font-family: 'Arial', sans-serif;
background-color: #f4f7fa;
display: flex;
justify-content: center;
align-items: center;
min-height: 100vh;
margin: 0;
color: #333;
}
.container {
background: white;
padding: 2rem;
border-radius: 10px;
box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
width: 100%;
max-width: 500px;
text-align: center;
}
.container li {
text-align: left;
margin-left: 20px;
}
.container p {
font-weight: bold;
font-size: 1.2rem;
}
h1 {
font-size: 1.8rem;
margin-bottom: 1.5rem;
color: #2c3e50;
}
.file-input-wrapper {
position: relative;
margin-bottom: 1.5rem;
}
.file-input-wrapper input[type="file"] {
display: none;
}
.file-input-wrapper label {
display: inline-block;
padding: 0.75rem 1.5rem;
background-color: #3498db;
color: white;
border-radius: 5px;
cursor: pointer;
transition: background-color 0.3s;
}
.file-input-wrapper label:hover {
background-color: #2980b9;
}
button {
padding: 0.75rem 1.5rem;
background-color: #2ecc71;
color: white;
border: none;
border-radius: 5px;
cursor: pointer;
font-size: 1rem;
transition: background-color 0.3s;
}
button:hover {
background-color: #27ae60;
}
button:disabled {
background-color: #bdc3c7;
cursor: not-allowed;
}
#status {
margin-top: 1rem;
font-size: 0.9rem;
color: #7f8c8d;
}
.error {
color: #e74c3c;
}
</style>
</head>
<body>
<div class="container">
<h1>WordPress XML Parser</h1>
<div class="instructions">
<p>Extract URLs from your WordPress export file with ease:</p>
<ul>
<li>Export your site’s content from WordPress (Tools → Export → All Content).</li>
<li>Click "Choose XML File" to upload your .xml file.</li>
<li>Click "Parse XML" to process the file.</li>
<li>Download a CSV with URLs, titles, types (post/page), and HubSpot module detection.</li>
</ul>
</div>
<div class="file-input-wrapper">
<input type="file" id="xmlFile" accept=".xml">
<label for="xmlFile">Choose XML File</label>
</div>
<button onclick="parseXML()" id="parseButton" disabled>Parse XML</button>
<div id="status">Select a file to begin</div>
</div>
<script>
const fileInput = document.getElementById('xmlFile');
const parseButton = document.getElementById('parseButton');
const status = document.getElementById('status');
fileInput.addEventListener('change', () => {
if (fileInput.files.length > 0) {
parseButton.disabled = false;
status.textContent = `File selected: ${fileInput.files[0].name}`;
} else {
parseButton.disabled = true;
status.textContent = 'Select a file to begin';
}
});
function arrayToCSV(arr) {
return arr.map(item => `"${item.replace(/"/g, '""')}"`).join(',');
}
const hubspotPattern = /{%\s*module_block\s+module\s+["“][^"”]+["”]\s*%}.*?{%\s*end_module_block\s*%}/;
function parseXML() {
const file = fileInput.files[0];
if (!file) {
status.textContent = 'Please select an XML file';
status.classList.add('error');
return;
}
status.textContent = 'Parsing...';
status.classList.remove('error');
parseButton.disabled = true;
const reader = new FileReader();
reader.onload = function(e) {
const parser = new DOMParser();
const xmlDoc = parser.parseFromString(e.target.result, 'text/xml');
const parserError = xmlDoc.getElementsByTagName('parsererror')[0];
if (parserError) {
status.textContent = 'Error: Invalid XML file';
status.classList.add('error');
parseButton.disabled = false;
return;
}
const items = xmlDoc.getElementsByTagName('item');
const csvRows = ['URL,Title,Type,Has_HubSpot_Module'];
for (let item of items) {
const postType = item.getElementsByTagNameNS('http://wordpress.org/export/1.2/', 'post_type')[0]?.textContent;
const status = item.getElementsByTagNameNS('http://wordpress.org/export/1.2/', 'status')[0]?.textContent;
const url = item.getElementsByTagName('link')[0]?.textContent;
const title = item.getElementsByTagName('title')[0]?.textContent || '';
const content = item.getElementsByTagNameNS('http://purl.org/rss/1.0/modules/content/', 'encoded')[0]?.textContent || '';
if (status === 'publish' && (postType === 'post' || postType === 'page')) {
const hasHubspot = hubspotPattern.test(content) ? 'Yes' : 'No';
csvRows.push(arrayToCSV([url, title, postType, hasHubspot]));
}
}
const csvContent = csvRows.join('\n');
const blob = new Blob([csvContent], { type: 'text/csv' });
const link = document.createElement('a');
link.href = URL.createObjectURL(blob);
link.download = 'wordpress_urls.csv';
link.click();
status.textContent = 'CSV downloaded successfully!';
parseButton.disabled = false;
};
reader.onerror = function() {
status.textContent = 'Error reading file';
status.classList.add('error');
parseButton.disabled = false;
};
reader.readAsText(file);
}
</script>
</body>
</html>
How It Works
- UI Design:
- A crisp white container with rounded corners and a subtle shadow sits against a light gray background, centered on the page.
- The
WordPress XML Parser
title leads into a bolded instruction paragraph (1.2rem, bold) followed by a left-aligned list with a 20px margin for readability. - A blue
Choose XML File
button (custom file input) and a greenParse XML
button (disabled until a file is selected) offer intuitive controls with hover effects. - A status area provides feedback like “File selected: export.xml” or “Parsing…”.
- Instructions:
- The bolded intro, “Extract URLs from your WordPress export file with ease:”, sets the stage, followed by a neatly aligned list of four steps, guiding users from export to download.
- CSV Formatting:
- The
arrayToCSV
function escapes quotes and joins fields, ensuring a valid CSV structure.
- HubSpot Detection:
- A regex (
hubspotPattern
) identifies HubSpot modules (e.g.,{% module_block module "widget_id" %}
), marking them in the output.
- XML Processing:
FileReader
loads the XML, andDOMParser
turns it into a DOM object.- Loops through
<item>
elements, filtering for publishedpost
orpage
types, and grabs URLs, titles, and content. - Handles errors like invalid XML or file read failures.
- CSV Download:
- Builds CSV rows, wraps them in a
Blob
, and triggers a download with a dynamic<a>
element.
Example Output
For an export item like:
<item>
<title>Sample Blog Post</title>
<link>https://example.com/blog/sample-blog-post/</link>
<wp:post_type>post</wp:post_type>
<wp:status>publish</wp:status>
<content:encoded><![CDATA[No HubSpot here...]]></content:encoded>
</item>
The CSV becomes:
URL,Title,Type,Has_HubSpot_Module
"https://example.com/blog/sample-blog-post/","Sample Blog Post","post","No"
Try It Out
I’ve open-sourced this on GitHub: https://github.com/grizzlypeaksoftware/WordpressXMLParser. Clone it, open index.html
in your browser, and follow the on-page instructions. For a live demo, check Demo: Wordpress XML Parser
Pros and Cons
Pros:
- Stylish, modern design with bolded guidance for clarity.
- No dependencies—runs on pure browser APIs.
- Open-source and easy to customize.
Cons:
- Large XML files might push browser memory limits.
- Error handling is basic—there’s room to expand.
Enhancements to Consider
- Loading Animation: Add a spinner for parsing feedback.
- Filters: Offer checkboxes for post types or statuses.
- Styling: Experiment with colors or a dark mode.
Conclusion
This WordPress XML parser combines functionality with a chic, user-centric design, making it a breeze to extract site data from an export file. The bold instructions and aligned steps ensure anyone can use it, while the clean aesthetic keeps it inviting. Grab it from GitHub, test it out, and let me know your thoughts—or share your own tweaks—in the comments!

Retrieval Augmented Generation with Node.js: A Practical Guide to Building LLM Based Applications
"Unlock the power of AI-driven applications with RAG techniques in Node.js, from foundational concepts to advanced implementations of Large Language Models."
Get the Kindle Edition
Designing Solutions Architecture for Enterprise Integration: A Comprehensive Guide
"This comprehensive guide dives into enterprise integration complexities, offering actionable insights for scalable, robust solutions. Align strategies with business goals and future-proof your digital infrastructure."
Get the Kindle EditionWe create solutions using APIs and AI to advance financial security in the world. If you need help in your organization, contact us!
Cutting-Edge Software Solutions for a Smarter Tomorrow
Grizzly Peak Software specializes in building AI-driven applications, custom APIs, and advanced chatbot automations. We also provide expert solutions in web3, cryptocurrency, and blockchain development. With years of experience, we deliver impactful innovations for the finance and banking industry.
- AI-Powered Applications
- Chatbot Automation
- Web3 Integrations
- Smart Contract Development
- API Development and Architecture
Ready to bring cutting-edge technology to your business? Let us help you lead the way.
Request a Consultation Now