#8 open
adimit

Convert HTML entities within RelevantTexts

Reported by adimit | July 23rd, 2008 @ 04:09 AM | in 0.2

Currently, HTML entitities, like ä will just be left in the text. Taggers (and tokenizers) might choke on this, so we should transform them to Unicode and only then pass them to the tokenization process.

Comments and changes to this ticket

Please Sign in or create a free account to add a new ticket.

With your very own profile, you can contribute to projects, track your activity, watch tickets, receive and update tickets through your email and much more.

New-ticket Create new ticket

Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile ยป

A platform for aiding second language learners through texts acquired from the Internet.

People watching this ticket

Referenced by

Pages