# Text Segmentation API
> Count and split text the way people actually read it, using Unicode-correct segmentation. The count endpoint returns the number of grapheme clusters — the real, user-perceived characters, so a family emoji counts as 1 (not 7) and an accented letter as 1 — alongside words, sentences, code points, UTF-16 code units (the naive string length that over-counts) and UTF-8 byte length. This is exactly what character-limit fields, tweet/SMS counters and validation need so the count agrees with what the user sees. The segment endpoint splits text into grapheme, word or sentence segments (word segments are flagged word-like versus punctuation and spaces) and is locale-aware, so Japanese, Chinese and Thai word boundaries come out right. Everything is computed locally with no network calls. A Unicode text segmenter — distinct from the Unicode codepoint database (unicode), the case/text-utilities toolkit (text) and string similarity (similarity). No upstream key, no cache.

## Authentication
All requests require your oanor API key in the `x-oanor-key` header. Get one at https://www.oanor.com/developer/keys.

```bash
curl -H "x-oanor-key: oanor_live_…" "https://api.oanor.com/segmenter-api/..."
```

## Pricing
- **Free** (Free) — 2,020 calls/Mo, 2 req/s
- **Starter** ($6/Mo) — 38,000 calls/Mo, 8 req/s
- **Pro** ($20/Mo) — 204,000 calls/Mo, 20 req/s
- **Mega** ($52/Mo) — 790,000 calls/Mo, 50 req/s

## Endpoints

### Segmentation

#### `GET /v1/count` — Emoji-safe character/word/sentence counts

**Parameters:**
- `text` (query, required, string) — The text Example: `Hello 👋 café`
- `locale` (query, optional, string) — BCP 47 locale (for word/sentence) Example: `en`

**Example:**
```bash
curl -H "x-oanor-key: $KEY" \
  "https://api.oanor.com/segmenter-api/v1/count?text=Hello+%F0%9F%91%8B+caf%C3%A9&locale=en"
```

**Response:**
```json
{
    "data": {
        "words": 2,
        "locale": "en",
        "graphemes": 12,
        "sentences": 1,
        "bytes_utf8": 16,
        "code_units": 13,
        "code_points": 12
    },
    "meta": {
        "timestamp": "2026-06-01T23:40:40.762Z",
        "request_id": "ae97a5a8-1b82-4a5e-83cd-fcf2cd15156d"
    },
    "status": "ok",
    "message": "Counted",
    "success": true
}
```

#### `GET /v1/segment` — Split into segments

**Parameters:**
- `text` (query, required, string) — The text Example: `The quick brown fox.`
- `granularity` (query, optional, string) — grapheme, word, sentence Example: `word`
- `locale` (query, optional, string) — BCP 47 locale Example: `en`

**Example:**
```bash
curl -H "x-oanor-key: $KEY" \
  "https://api.oanor.com/segmenter-api/v1/segment?text=The+quick+brown+fox.&granularity=word&locale=en"
```

**Response:**
```json
{
    "data": {
        "count": 8,
        "locale": "en",
        "segments": [
            {
                "index": 0,
                "segment": "The",
                "is_word_like": true
            },
            {
                "index": 3,
                "segment": " ",
                "is_word_like": false
            },
            {
                "index": 4,
                "segment": "quick",
                "is_word_like": true
            },
            {
                "index": 9,
                "segment": " ",
                "is_word_like": false
            },
            {
                "index": 10,
                "segment": "brown",
                "is_word_like": true
            },
            {
                "index": 15,
                "segment": " ",
                "is_word_like": false
            },
            {
                "index": 16,
                "segment": "fox",
                "is_word_like": true
            },
            {
                "index": 19,
                "segment": ".",
                "is_word_like": false
            }
        ],
        "granularity": "word"
    },
    "meta": {
        "timestamp": "2026-06-01T23:40:40.866Z",
        "request_id": "07a3203e-f0d0-41f0-bf6a-e43b10184456"
    },
    "status": "ok",
    "message": "Segmented",
    "success": true
}
```

### Meta

#### `GET /v1/meta` — Granularities

**Example:**
```bash
curl -H "x-oanor-key: $KEY" \
  "https://api.oanor.com/segmenter-api/v1/meta"
```

**Response:**
```json
{
    "data": {
        "note": "Count and split text the way people actually read it, using Unicode-correct segmentation (Intl.Segmenter). /v1/count?text=... returns the number of grapheme clusters (the real, user-perceived characters — so '👨‍👩‍👧' counts as 1, not 7, and 'café' as 4), words and sentences, alongside code points, UTF-16 code units (the naive string length) and UTF-8 byte length — perfect for character-limit fields, tweets/SMS counters and validation that must agree with what users see. /v1/segment splits text into grapheme, word or sentence segments (word segments are flagged word-like vs punctuation/space), locale-aware so CJK and Thai word boundaries are correct. Everything is computed locally with no network calls. A Unicode text segmenter — distinct from the Unicode codepoint database (unicode), case/text utilities (text) and string similarity (similarity). No key, no cache.",
        "endpoints": [
            "/v1/count",
            "/v1/segment",
            "/v1/meta"
        ],
        "granularities": [
            "grapheme",
            "word",
            "sentence"
        ]
    },
    "meta": {
        "timestamp": "2026-06-01T23:40:40.963Z",
        "request_id": "1a4e11fc-8f16-4cd0-a1f3-a3ecf4ebecfa"
    },
    "status": "ok",
    "message": "Meta retrieved",
    "success": true
}
```


---
Marketplace page: https://www.oanor.com/api/segmenter-api
OpenAPI spec: https://www.oanor.com/api/segmenter-api/openapi.json
