Removing some lines from a big text file ?

Closed
pierre1 - Mar 7, 2010 at 05:56 AM
Richard.Williams Posts 25 Registration date Saturday November 7, 2009 Status Member Last seen July 18, 2012 - Mar 11, 2010 at 09:28 AM
Hello guys, i have this BIG text file (50 000 lines), each line contains a name, example:

Michael johnson
derrek pierre
joséph francois
saphire moore

i need to remove all the lines which contain international charcters like "é" , "à" .. and keep online lines with normal english alphabet, can you please suggest me the simplest way to do this ?

thanks a lot of your help.
Related:

1 response

Richard.Williams Posts 25 Registration date Saturday November 7, 2009 Status Member Last seen July 18, 2012 14
Mar 11, 2010 at 09:28 AM
This script will remove all lines that contain any international char. Basically, it ensures that each line has characters only from a-z, A-Z, 0-9, space, tab, comma, colon, semicolon.




# Script IntlNames.txt
str file, content, line, approvedlist
# Approved list of characters. Enclose this list between ^(#...)^.
set $approvedlist = "^(#a-zA>Z0>9 \,\:\;)^"
# Get file contents into a variable.
cat $file > $content
# Go thru lines one by one.
lex "1" $content > $line
while ($line <> "")
do
    # Does this line contain any character outside our "approved" list of characters ?
    if ( { chen -r $approvedlist $line } == 0)
        # No unapproved chars - this is a "good" line.
        echo $line
    endif
    # Get the next line
    lex "1" $content > $line
done





Save the script in file C:/Scripts/IntlNames.txt, start biterscripting, enter this command.


script "C:/Scripts/IntlNames.txt" file("C:/testfile.txt") > "C:/testfile.txt"



This will remove all lines with intl chars from file C:/testfile.txt. 50000 lines is pretty small file.
2