Nov 102008
 

The Problem

Occasionally, I find a text file that was written on a Windows box that contains additional garbage text.  Most often the text displayed, looks like this:

/*^M
 * @(#)MyApplication.java  2.0  01 April 2005^M
 *^M
 * Copyright (c) 2003-2005 Werner Randelshofer^M
 * Staldenmattweg 2, Immensee, CH-6405, Switzerland.^M
 * This software is in the public domain.^M
 */^M^M

Or, even worse, as a single line, like this:

/*^M * @(#)MyApplication.java  2.0  01 April 2005^M *^M * Copyright (c) 2003-2005 Werner Randelshofer^M * Staldenmattweg 2, Immensee, CH-6405, Switzerland.^M * This software is in the public domain.^M */^M^M

Either way, this is annoying, if not unusable.

Brief Explanation

The primary cause of the problem is a difference of encoding ‘newline’ between the Unix and DOS (Windows) conventions.  The difference is long-standing, dating back to the days when printers were the primary ‘display’.

The Windows’ convention uses two ASCII characters, which signal ‘line-feed’ (which meant to roll the printer paper up one line) and ‘carriage-return’ (which meant to send the printer head back to the beginning of the line).  Unix selected one of those characters (‘carriage-return’) to do the same thing.

These symbols usually appear as:

^M^J

Or,

^M

Depending on the encoding, platform, and application.

The Solution Using Emacs

On most Unix platforms, commands such as unix2dos and dos2unix can be used to convert a text file from Windows to UNIX format or vice-versa.  However, sometimes a file can get so garbled that even these tools do not work.  Regardless, it is nice to know-how to fix this in Emacs.

The easiest way to fix the second case in Emacs is:

  1. Place the cursor on the first part of the strange character, the caret (^).
  2. Press C-‘ ‘ (Control + Space) to begin marking.
  3. Move to the right one character.  (You’ll notice that it jumps an extra character.  That is because ^M is really one ASCII character.)
  4. Press C-W to remove the text.
  5. Immediately, press C-Y to yank the text back.
  6. Jump to the top of the document (Esc-< or M-<).
  7. Replace all occurrences:
    1. M-x replace-string
    2. Press C-Y to paste in the text to be replaced.
    3. Press C-Q, C-J to replace with a ‘quoted’ ^J, which is the Unix newline (or, C-Q, C-M, C-Q, C-J for Windows).
    4. Press ‘Enter’ to replace all occurrences.

A little experimentation will be necessary to adapt to other cases.  You can read more here:

http://lists.freebsd.org/pipermail/freebsd-questions/2006-October/134422.html

Share