Strip HTML tags, URLs, emails, emojis, English letters, digits, invisible characters, duplicate lines and more from any Assamese text. Pick exactly what you want removed and keep your Assamese (অসমীয়া) content perfectly clean — all in your browser.
The cleaner runs your selected removal rules on the input text in a fixed, sensible order: first the structural strippers (HTML, URLs, emails, emojis, invisible and control characters), then the optional aggressive removers (punctuation, English letters, digits, keep-only-Assamese), and finally the whitespace and line-level rules (blank lines, duplicates, spaces, trim, single-line). Every rule is a pure JavaScript regex transformation, so the entire pipeline finishes in milliseconds even on long documents and the input never leaves your browser.
Removes everything between < and > — useful when you copy text from a webpage or rich editor and end up with stray <p>, <br>, <span> or attributes mixed into your Assamese.
Strips full links — http://example.com, https://assam.gov.in/page, www.something.com — and bare domain-style links from your text.
Strips standard email formats like name@example.com from anywhere in the text.
Strips emoji and pictograph Unicode ranges (😀 🎉 🇮🇳 ✅ ⭐ etc.) while keeping your Assamese characters perfectly intact.
Removes the Byte Order Mark (BOM), Zero-Width Joiner (ZWJ), Zero-Width Non-Joiner (ZWNJ), soft hyphen and other zero-width spaces. These often sneak in when copying from messaging apps and break searches and word counts.
Strips non-printable ASCII control bytes (NUL, BEL, VT, FF, etc.) while preserving line breaks and tabs.
Strips Latin punctuation (. , ; : ! ? " ' ( ) [ ] { } – —) and Assamese punctuation (। ॥). Use only when you want pure words.
Strips every A–Z and a–z. Assamese characters and digits are preserved.
Strips both English digits 0–9 and Assamese digits ০–৯ in one pass.
The most aggressive rule — discards everything except Bengali-script Unicode (the block Assamese uses) and spaces. Perfect for extracting clean Assamese from mixed-language pages.
Drops empty lines and lines that contain only spaces or tabs.
Keeps only the first occurrence of each line; drops subsequent identical ones. Order is preserved.
Multiple spaces or tabs in a row collapse to a single space.
Removes leading and trailing whitespace from every line.
Replaces every line break with a single space — useful for turning a paragraph block into one continuous line.
Strip HTML and tracking junk after copying from WhatsApp, Facebook or news sites — paste straight into your CMS.
Remove invisible Unicode and stray English letters from manuscript drafts before sending to a typesetter or printer.
Pre-process Assamese corpora — strip emojis, URLs and duplicates before feeding to a tokenizer or model.
Clean quoted source text for essays and citations — keep only the Assamese, drop everything else.
The Formatter fixes and normalizes — spacing, punctuation, smart quotes, Unicode normalization, digit conversion. The Cleaner removes — HTML, URLs, emojis, English, digits, blank lines, duplicates, invisible characters. Use Cleaner first to strip junk, then Formatter to polish.
No. Bengali-script characters (which Assamese uses) are never altered. The cleaner only removes things you explicitly tick.
Yes — it keeps only Bengali-script Unicode and spaces. If you want to preserve sentence structure, leave it off and use the milder rules instead.
Yes. The interface is fully responsive and works in any modern mobile browser.
No. Every rule runs locally in your browser. Your text never leaves your device.