As ChemDataExtractor processes documents, it adds each unique word that it encounters to the Lexicon as a Lexeme. Each Lexeme stores various word features, so they don't have to be re-calculated for every occurrence of that word.

You can access the Lexeme for a token using the lex property.

>>> s = Sentence('Sulphur and Oxygen.')
>>> s.tokens[0]
Token('Sulphur', 0, 7)
>>> s.tokens[0].lex.normalized
>>> s.tokens[0].lex.is_hyphenated
>>> s.tokens[0].lex.cluster