Open big text files >2GB

update 26.04.2010 - please read the comments after the blog post. There are users that could not confirm my findings, so please take their comments in concideration! Thank you!

Today I had a challenge. We have 2GB typo3 database, that actually had to be utf8, but due to some misconfiguration all international symbols where stored wrong. (with latin collation). Here is a little bit more information and here is exactly what our problem was:

$TYPO3_CONF_VARS['SYS']['setDBinit'] = 'SET NAMES utf8;';
Without the above MySQL statement your TYPO3 UTF-8 setup will probably
work, but chances are, that each international character is stored in
the datebase as two separate latin-1 chars (not using MySQL's own
UTF-8 handling). If you check your db using phpMyAdmin and find all
umlauts being shown as two wierd characters, then this might be the
case. If this happens to you, you CANNOT add the above
statement any more. Your output will be broken.

Somehow we had to get the data again in the right format. After trying and testing with small tables I found out that dumping the data with default charset latin1 was generating not good looking sql file (umlauts were not readable), but importing the data on my localhost with SET NAMES utf8 was correcting this problem. After importing it on my localhost I could again export the data in utf8 - everything was looking fine and the import on the server was also ok.

So, the solution to the problem was found. Now I had to dump the database, make the necessary change in the file, import it locally, export it, upload it to the server and import it there :D

The dump generated a 2GB file and I had to edit this one :). Here started the real problems.

Most of the editors for windows are not created to handle large files. My beloved IDE Netbeans warned me, that I would most probably get an out of memory exception( well I got it :)).
PSPadcouldn't open the file.
couldn't make it either.
Then I found 010editor - I finally managed to open the file (opening was fast enough <2 minutes). I made the necessary change to the file and saved it. This took around 10 minutes. I was so happy! Everything seemed to go well, I tryed to import the file - bam!!! Error on first line - what the ... I tried to import the same file without the changes I made - everything was working.
So, I started to look for another editor.Textpad was looking promising. Specially designed to open large files. Ok, 25MB - opened immediately. Well, great! Let me try it with 2GB - "Disk full while accessing" - oh no! Not again! I couldn't find any information why this occurs, so I had to look further.

And finally! I found it! JujuEdit did the job!The file opened in under 1 sec! Saving was fast enough. Importing was successfull. If you want to open enourmous files in windows use JujuEdit :)

I also downloaded the windows version of emacs :) But couldn't understand how to install it :)

P.S. Later it came to my mind that I could use vim in the linux console to edit the file :D - I did it and it was working :), but it doesn't matter.

If you use windows JujuEdit is the software.

Rate this blog entry:
Let us kill IE6!!!
24 mai 2009 à Paris! Joomla!Day France