Download Transcoder source tarball v0.0

(transcoder transcoder) is a Guile library that converts between
character sets.  It follows the R6RS idea of transcoders quite
closely.  The following functions work as described in R6Rs.

	  make-transcoder
	  native-transcoder
	  transcoder-codec
	  transcoder-eol-style
	  transcoder-error-handling-mode
	  latin-1-codec
	  utf-8-codec
	  utf-16-codec
	  eol-style
	  native-eol-style
	  error-handling-mode

There are four additional functions.

u32vector->locale-string vector transcoder
locale-string->u32vector string transcoder

    Given TRANSCODER, a transcoder object as returned by the procedure
    make-transcoder, these procedures will convert an encoded string
    to a vector of codepoints, or a vector of codepoints into an
    encoded string.

read-codepoint port transcoder
write-codepoint codepoint port transcoder

    Given an open PORT and a TRANSCODER object as returned by
    make-transcoder, these procedures will either read a codepoint
    from the port or write a codepoint to the port.  It is returned as
    an integer.

Here is an example converting between encoded string and codepoints

(use-modules (transcoder transcoder))
(setlocale LC_ALL "")

(define tc (make-transcoder (utf-8-codec)))

(write (locale-string->u32vector "a b c" tc))
(newline)
;; Returns #u32(97 32 98 32 99)


(write (locale-string->u32vector "á ñ ö" tc))
(newline)
;; Returns #u32(225 32 241 32 246)

;; Here are variations on the letter A
(display (u32vector->locale-string #u32(#xC1 #xE1 #x103 #x1ce #xc2) tc))
(newline)
;; Returns ÁáăǎÂ

Here is an example to scan a file for non-ASCII character

(define tc (make-transcoder (utf-8-codec)))

(define iport (open-file "demo.scm" "r"))

(define oport (open-output-string))

(let loop ((cp (read-codepoint iport tc)))
  (if (not (eof-object? cp))
      (begin
	(if (> cp 128)
	    (write-codepoint cp oport tc))
	(loop (read-codepoint iport tc)))
      (display (get-output-string oport))))
(newline)