Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Curious, I found an epub, converted it to a txt, and dumped it into the Qwen3 tokenizer. It yielded 359,088 tokens, end to end.

Using the GPT-4 tokenizer (cl100k_base) yields 349,371 tokens.

Recent Google and Anthropic models do not have local tokenizers and ridiculously make you call their APIs to do it, so no idea about those.

Just thought that was interesting.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: