The Problem
Occasionally, I find a text file that was written on a Windows box that contains additional garbage text. Most often the text displayed, looks like this:
/*^M * @(#)MyApplication.java 2.0 01 April 2005^M *^M * Copyright (c) 2003-2005 Werner Randelshofer^M * Staldenmattweg 2, Immensee, CH-6405, Switzerland.^M * This software is in the public domain.^M */^M^M
Or, even worse, as a single line, like this:
/*^M * @(#)MyApplication.java 2.0 01 April 2005^M *^M * Copyright (c) 2003-2005 Werner Randelshofer^M * Staldenmattweg 2, Immensee, CH-6405, Switzerland.^M * This software is in the public domain.^M */^M^M
Either way, this is annoying, if not unusable.
Brief Explanation
The primary cause of the problem is a difference of encoding ‘newline’ between the Unix and DOS (Windows) conventions. The difference is long-standing, dating back to the days when printers were the primary ‘display’.
The Windows’ convention uses two ASCII characters, which signal ‘line-feed’ (which meant to roll the printer paper up one line) and ‘carriage-return’ (which meant to send the printer head back to the beginning of the line). Unix selected one of those characters (‘carriage-return’) to do the same thing.
These symbols usually appear as:
^M^J
Or,
^M
Depending on the encoding, platform, and application.
The Solution Using Emacs
On most Unix platforms, commands such as unix2dos and dos2unix can be used to convert a text file from Windows to UNIX format or vice-versa. However, sometimes a file can get so garbled that even these tools do not work. Regardless, it is nice to know-how to fix this in Emacs.
The easiest way to fix the second case in Emacs is:
- Place the cursor on the first part of the strange character, the caret (^).
- Press C-‘ ‘ (Control + Space) to begin marking.
- Move to the right one character. (You’ll notice that it jumps an extra character. That is because ^M is really one ASCII character.)
- Press C-W to remove the text.
- Immediately, press C-Y to yank the text back.
- Jump to the top of the document (Esc-< or M-<).
- Replace all occurrences:
- M-x replace-string
- Press C-Y to paste in the text to be replaced.
- Press C-Q, C-J to replace with a ‘quoted’ ^J, which is the Unix newline (or, C-Q, C-M, C-Q, C-J for Windows).
- Press ‘Enter’ to replace all occurrences.
A little experimentation will be necessary to adapt to other cases. You can read more here:
http://lists.freebsd.org/pipermail/freebsd-questions/2006-October/134422.html