Pull only Assamese (অসমীয়া) characters out of any mixed-language document. Strip English, numbers, symbols and HTML in one click — perfect for cleaning bilingual files before publishing or translation.
The tool uses the official Bengali–Assamese Unicode block (range U+0980 to U+09FF) — the same range Wikipedia, Google, and every modern operating system uses to identify Assamese characters. Anything outside that range is treated as foreign content and removed.
Strip English commentary or source notes from a draft so you can review the Assamese text in isolation, or generate a glossary of unique Assamese terms.
Extract Assamese vocabulary from bilingual textbooks or papers for study lists, flashcards, or vocabulary frequency analysis.
Pre-clean training data, scraped HTML, or chat logs before feeding them into an Assamese language model or text-to-speech engine.
Pull Assamese captions out of mixed social-media posts, or remove stray English emojis and URLs from a paragraph before publishing.
| Category | Example | Default Behaviour |
|---|---|---|
| Assamese letters (vowels, consonants, conjuncts) | অ আ ক খ ক্ষ্ম | KEPT |
| Vowel signs / matras | া ি ী ু ো | KEPT |
| Assamese numerals | ০ ১ ২ ৩ | KEPT (toggle off if needed) |
| Assamese punctuation | । ॥ | OPTIONAL (off by default) |
| English letters | A–Z, a–z | REMOVED |
| Arabic numerals | 0 1 2 3 | REMOVED |
| Latin punctuation & symbols | . , ! ? @ # & | REMOVED |
| HTML tags & emojis | <p> 🙂 🇮🇳 | REMOVED |
It scans the text you paste and keeps only characters that belong to the Bengali–Assamese Unicode block (U+0980 to U+09FF). Everything else — English letters, Latin numbers, punctuation, emojis, HTML tags — is removed automatically.
No. The extraction runs entirely in your browser using JavaScript. Your text never leaves your device, which makes it safe for confidential documents.
Yes. Paste any bilingual or multilingual document and the tool will return only the Assamese portion. You can choose to preserve sentence and paragraph structure, or collect just the Assamese words as a clean list.
Yes. Assamese numerals ০ ১ ২ ৩ ৪ ৫ ৬ ৭ ৮ ৯ are part of the Bengali–Assamese Unicode block and are kept by default. You can toggle them off if you want only letters.
Yes, it is 100% free with no sign-up, no download, and no character limit. Use it as often as you need.