What Regex Removes HTML Tags?
Strip HTML tags: /<[^>]*>/g -- but use a real parser (DOMParser) for production. This regex matches anything between < and > and removes it. It works for simple cases but fails on nested tags, attributes containing >, comments, and malformed HTML.
Quick Regex Approach
JavaScript
const html = '<p>Hello <b>world</b></p>';
const text = html.replace(/<[^>]*>/g, '');
// "Hello world"
Python
import re
html = '<p>Hello <b>world</b></p>'
text = re.sub(r'<[^>]*>', '', html)
# 'Hello world'
Better: Use a Parser
JavaScript (DOMParser)
function stripHTML(html) {
const doc = new DOMParser()
.parseFromString(html, 'text/html');
return doc.body.textContent || '';
}
stripHTML('<p>Hello <b>world</b></p>');
// "Hello world"
Python (BeautifulSoup)
from bs4 import BeautifulSoup
soup = BeautifulSoup('<p>Hello</p>', 'html.parser')
soup.get_text()
# 'Hello'
Why Regex Fails on HTML
- Attributes with
>:<div data-x="a>b"> - Comments:
<!-- <tag> --> - Script content:
<script>if (a>b)</script> - Security: malformed HTML can bypass regex-based sanitization (XSS)
Try It Yourself
Test regex patterns with our Regex Tester or encode HTML with our HTML Encoder.