All functions

tokenize_characters() tokenize_words() tokenize_sentences() tokenize_lines() tokenize_paragraphs() tokenize_regex() tokenize_tweets()

Basic tokenizers

chunk_text()

Chunk text into smaller segments

mobydick

The text of Moby Dick

tokenize_ngrams() tokenize_skip_ngrams()

N-gram tokenizers

tokenize_ptb()

Penn Treebank Tokenizer

tokenize_character_shingles()

Character shingle tokenizers

tokenize_word_stems()

Word stem tokenizer

tokenizers

Tokenizers

count_words() count_characters() count_sentences()

Count words, sentences, characters