#text-mining

1 APIs with this tag

N-gram API

Generate n-grams from text, with frequency counts — entirely locally. The ngrams endpoint breaks text into contiguous sequences of n tokens and returns each distinct n-gram with how often it occurs, ranked by frequency: word n-grams (unigrams, bigrams, trigrams and beyond) for phrase and collocation analysis, or character n-grams (shingles) for fuzzy matching, language detection and indexing. The range endpoint produces every size from a minimum to a maximum in a single call (for example 1–3 grams), which is exactly what you need to build feature vectors. Choose word or character mode, whether to lower-case first, and a top-N limit to keep only the most frequent. Word tokenization is Unicode-aware and keeps internal apostrophes and hyphens (don't, well-known) as single tokens. Everything runs locally and deterministically, so it is fast and private. Ideal for text mining and NLP feature extraction, language modelling and autocomplete, search indexing and shingling, plagiarism and similarity detection, and keyword and collocation analysis. Pure local computation — no key, no third-party service, instant. Live, nothing stored. 3 endpoints. This produces n-grams and counts; for extractive summaries and keywords use a summarize API and for grapheme/character counting use a text-segmentation API.

api.oanor.com/ngram-api

N-gram API

Your cookie choices