Received Sun, 19 Dec 2004 22:04:14 PHT
Linux - howto convert iso-8859-1 charset html files into utf-8 charset files
howto convert iso-8859-1 charset files into utf-8 charset files
the solution:
iconv --from-code=ISO-8859-1 --to-code=UTF-8 ./oldfile.htm > ./newfile.html
using bash shell and using Linux - of course !
windows user have their own solutions - hopefully - else they are
invited to join Open Heart - open mind - Open Source community ...
in the internationalization of Internet, one of the major issue is
character set display of foreign languages - a night mare for me and
most likely for quiet a few other publishes / webmasters. until a while
ago my own computer-world was intact and all clean iso-8859-1.
then file system of SuSE Linux professional has been converted to utf-8
and i thought it's about time to convert my own laptop - the origin of
all publishing ... and all my tools and editors to UTF-8
all ?
much of my chapters have been created during the past 4 years on rental
PC or various PC with fully undefined or mixed or messed up character
set definitions and configurations ..... on win machines in various
Internet cafes here in the Philippines ...
so i converted to utf-8
did i convert to utf-8 ?
first of all i changed the meta tag from
charset=iso-8859-1 to charset=utf-8
and all the characters by hand to new - easy for fr, es, de, en ... but
converting ru, bg or it was a nightmare ...
and earlier attempt to use correct automated procedure failed for a
forgotten reason.
result is that most of the online files in German language are now
messed up and need to be corrected one by one these past and coming
weeks and may be months
using bash shell command
iconv --from-code=ISO-8859-1 --to-code=UTF-8 ./oldfile.htm > ./newfile.html
if new to iconv - then see
man iconv
that shell method is the official version ..
official - but not yet final ...
because it works only half
i have to write the converted charset into another NEW file and then rename
one more reminder ...
ONLY convert file that really are in old charset to new charset
- if you convert a file that was already in the new charset format
or that you converted manually before
or inserted text components in new charset inbetween old text components
--- then you may get something worse ... neither UTF-8 and nor ISO-8859-1 ...
hence make sure your OLD file IS in OLD charset before running the tool !!
of course
love and bliss
hans
the solution:
iconv --from-code=ISO-8859-1 --to-code=UTF-8 ./oldfile.htm > ./newfile.html
using bash shell and using Linux - of course !
windows user have their own solutions - hopefully - else they are
invited to join Open Heart - open mind - Open Source community ...
in the internationalization of Internet, one of the major issue is
character set display of foreign languages - a night mare for me and
most likely for quiet a few other publishes / webmasters. until a while
ago my own computer-world was intact and all clean iso-8859-1.
then file system of SuSE Linux professional has been converted to utf-8
and i thought it's about time to convert my own laptop - the origin of
all publishing ... and all my tools and editors to UTF-8
all ?
much of my chapters have been created during the past 4 years on rental
PC or various PC with fully undefined or mixed or messed up character
set definitions and configurations ..... on win machines in various
Internet cafes here in the Philippines ...
so i converted to utf-8
did i convert to utf-8 ?
first of all i changed the meta tag from
charset=iso-8859-1 to charset=utf-8
and all the characters by hand to new - easy for fr, es, de, en ... but
converting ru, bg or it was a nightmare ...
and earlier attempt to use correct automated procedure failed for a
forgotten reason.
result is that most of the online files in German language are now
messed up and need to be corrected one by one these past and coming
weeks and may be months
using bash shell command
iconv --from-code=ISO-8859-1 --to-code=UTF-8 ./oldfile.htm > ./newfile.html
if new to iconv - then see
man iconv
that shell method is the official version ..
official - but not yet final ...
because it works only half
i have to write the converted charset into another NEW file and then rename
one more reminder ...
ONLY convert file that really are in old charset to new charset
- if you convert a file that was already in the new charset format
or that you converted manually before
or inserted text components in new charset inbetween old text components
--- then you may get something worse ... neither UTF-8 and nor ISO-8859-1 ...
hence make sure your OLD file IS in OLD charset before running the tool !!
of course
love and bliss
hans






