Skip to content

Commit 610d522

Browse files
committed
os: fix handling of Windows Unicode console input and ^Z
Go 1.5 worked with Unicode console input but not ^Z. Go 1.6 did not work with Unicode console input but did handle one ^Z case. Go 1.7 did not work with Unicode console input but did handle one ^Z case. The intent of this CL is for Go 1.8 to work with Unicode console input and also handle all ^Z cases. Here's a simple test program for reading from the console. It prints a "> " prompt, calls read, prints what it gets, and repeats. package main import ( "fmt" "os" ) func main() { p := make([]byte, 100) fmt.Printf("> ") for { n, err := os.Stdin.Read(p) fmt.Printf("[%d %q %v]\n> ", n, p[:n], err) } } On Unix, typing a ^D produces a break in the input stream. If the ^D is at the beginning of a line, then the 0 bytes returned appear as an io.EOF: $ go run /tmp/x.go > hello [6 "hello\n" <nil>] > hello^D[5 "hello" <nil>] > ^D[0 "" EOF] > ^D[0 "" EOF] > hello^Dworld [5 "hello" <nil>] > [6 "world\n" <nil>] > On Windows, the EOF character is ^Z, not ^D, and there has been a long-standing problem that in Go programs, ^Z on Windows does not behave in the expected way, namely like ^D on Unix. Instead, the ^Z come through as literal ^Z characters: C:\>c:\go1.5.4\bin\go run x.go > ^Z [3 "\x1a\r\n" <nil>] > hello^Zworld [13 "hello\x1aworld\r\n" <nil>] > CL 4310 attempted to fix this bug, then known as #6303, by changing the use of ReadConsole to ReadFile. This CL was released as part of Go 1.6 and did fix the case of a ^Z by itself, but not as part of a larger input: C:\>c:\go1.6.3\bin\go run x.go > ^Z [0 "" EOF] > hello^Zworld [13 "hello\x1aworld\r\n" <nil>] > So the fix was incomplete. Worse, the fix broke Unicode console input. ReadFile does not handle Unicode console input correctly. To handle Unicode correctly, programs must use ReadConsole. Early versions of Go used ReadFile to read the console, leading to incorrect Unicode handling, which was filed as #4760 and fixed in CL 7312053, which switched to ReadConsole and was released as part of Go 1.1 and still worked as of Go 1.5: C:\>c:\go1.5.4\bin\go run x.go > hello [7 "hello\r\n" <nil>] > hello world™ [16 "hello world™\r\n" <nil>] > But in Go 1.6: C:\>c:\go1.6.3\bin\go run x.go > hello [7 "hello\r\n" <nil>] > hello world™ [0 "" EOF] > That is, changing back to ReadFile in Go 1.6 reintroduced #4760, which has been refiled as #17097. (We have no automated test for this because we don't know how to simulate console input in a test: it appears that one must actually type at a keyboard to use the real APIs. This CL at least adds a comment warning not to reintroduce ReadFile again.) CL 29493 attempted to fix #17097, but it was not a complete fix: the hello world™ example above still fails, as does Shift-JIS input, which was filed as #17939. CL 29493 also broke ^Z handling, which was filed as #17427. This CL attempts the never before successfully performed trick of simultaneously fixing Unicode console input and ^Z handling. It changes the console input to use ReadConsole again, as in Go 1.5, which seemed to work for all known Unicode input. Then it adds explicit handling of ^Z in the input stream. (In the case where standard input is a redirected file, ^Z processing should not happen, and it does not, because this code path is only invoked when standard input is the console.) With this CL: C:\>go run x.go > hello [7 "hello\r\n" <nil>] > hello world™ [16 "hello world™\r\n" <nil>] > ^Z [0 "" EOF] > [2 "\r\n" <nil>] > hello^Zworld [5 "hello" <nil>] > [0 "" EOF] > [7 "world\r\n" <nil>] This almost matches Unix: $ go run /tmp/x.go > hello [6 "hello\n" <nil>] > hello world™ [15 "hello world™\n" <nil>] > ^D [0 "" EOF] > [1 "\n" <nil>] > hello^Dworld [5 "hello" <nil>] > [6 "world\n" <nil>] > The difference is in the handling of hello^Dworld / hello^Zworld. On Unix, hello^Dworld terminates the read of hello but does not result in a zero-length read between reading hello and world. This is dictated by the tty driver, not any special Go code. On Windows, in this CL, hello^Zworld inserts a zero length read result between hello and world, which is treated as an interior EOF. This is implemented by the Go code in this CL, but it matches the handling of ^Z on the console in other programs: C:\>copy con x.txt hello^Zworld 1 file(s) copied. C:\>type x.txt hello C:\> A natural question is how to test all this. As noted above, we don't know how to write automated tests using the actual Windows console. CL 29493 introduced the idea of substituting a different syscall.ReadFile implementation for testing; this CL continues that idea but substituting for syscall.ReadConsole instead. To avoid the regression of putting ReadFile back, this CL adds a comment warning against that. Fixes #17427. Fixes #17939. Change-Id: Ibaabd0ceb2d7af501d44ac66d53f64aba3944142 Reviewed-on: https://go-review.googlesource.com/33451 Run-TryBot: Russ Cox <[email protected]> Reviewed-by: Quentin Smith <[email protected]> Reviewed-by: Alex Brainman <[email protected]> TryBot-Result: Gobot Gobot <[email protected]>
1 parent 8a2c34e commit 610d522

File tree

3 files changed

+131
-171
lines changed

3 files changed

+131
-171
lines changed

src/os/export_windows_test.go

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,7 @@ package os
77
// Export for testing.
88

99
var (
10-
NewConsoleFile = newConsoleFile
11-
GetCPP = &getCP
12-
ReadFileP = &readFile
13-
ResetGetConsoleCPAndReadFileFuncs = resetGetConsoleCPAndReadFileFuncs
14-
FixLongPath = fixLongPath
10+
FixLongPath = fixLongPath
11+
NewConsoleFile = newConsoleFile
12+
ReadConsoleFunc = &readConsole
1513
)

src/os/file_windows.go

Lines changed: 65 additions & 87 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,6 @@
55
package os
66

77
import (
8-
"errors"
98
"internal/syscall/windows"
109
"io"
1110
"runtime"
@@ -27,9 +26,11 @@ type file struct {
2726
l sync.Mutex // used to implement windows pread/pwrite
2827

2928
// only for console io
30-
isConsole bool
31-
lastbits []byte // first few bytes of the last incomplete rune in last write
32-
readbuf []byte // last few bytes of the last read that did not fit in the user buffer
29+
isConsole bool
30+
lastbits []byte // first few bytes of the last incomplete rune in last write
31+
readuint16 []uint16 // buffer to hold uint16s obtained with ReadConsole
32+
readbyte []byte // buffer to hold decoding of readuint16 from utf16 to utf8
33+
readbyteOffset int // readbyte[readOffset:] is yet to be consumed with file.Read
3334
}
3435

3536
// Fd returns the Windows handle referencing the open file.
@@ -53,7 +54,6 @@ func newFile(h syscall.Handle, name string) *File {
5354
func newConsoleFile(h syscall.Handle, name string) *File {
5455
f := newFile(h, name)
5556
f.isConsole = true
56-
f.readbuf = make([]byte, 0, 4)
5757
return f
5858
}
5959

@@ -203,101 +203,79 @@ func (file *file) close() error {
203203
return err
204204
}
205205

206-
var (
207-
// These variables are used for testing readConsole.
208-
getCP = windows.GetConsoleCP
209-
readFile = syscall.ReadFile
210-
)
206+
var readConsole = syscall.ReadConsole // changed for testing
211207

212-
func resetGetConsoleCPAndReadFileFuncs() {
213-
getCP = windows.GetConsoleCP
214-
readFile = syscall.ReadFile
215-
}
208+
// readConsole reads utf16 characters from console File,
209+
// encodes them into utf8 and stores them in buffer b.
210+
// It returns the number of utf8 bytes read and an error, if any.
211+
func (f *File) readConsole(b []byte) (n int, err error) {
212+
if len(b) == 0 {
213+
return 0, nil
214+
}
216215

217-
// copyReadConsoleBuffer copies data stored in f.readbuf into buf.
218-
// It adjusts f.readbuf accordingly and returns number of bytes copied.
219-
func (f *File) copyReadConsoleBuffer(buf []byte) (n int, err error) {
220-
n = copy(buf, f.readbuf)
221-
newsize := copy(f.readbuf, f.readbuf[n:])
222-
f.readbuf = f.readbuf[:newsize]
223-
return n, nil
224-
}
216+
if f.readuint16 == nil {
217+
// Note: syscall.ReadConsole fails for very large buffers.
218+
// The limit is somewhere around (but not exactly) 16384.
219+
// Stay well below.
220+
f.readuint16 = make([]uint16, 0, 10000)
221+
f.readbyte = make([]byte, 0, 4*cap(f.readuint16))
222+
}
225223

226-
// readOneUTF16FromConsole reads single character from console,
227-
// converts it into utf16 and return it to the caller.
228-
func (f *File) readOneUTF16FromConsole() (uint16, error) {
229-
var buf [1]byte
230-
mbytes := make([]byte, 0, 4)
231-
cp := getCP()
232-
for {
233-
var nmb uint32
234-
err := readFile(f.fd, buf[:], &nmb, nil)
235-
if err != nil {
236-
return 0, err
224+
for f.readbyteOffset >= len(f.readbyte) {
225+
n := cap(f.readuint16) - len(f.readuint16)
226+
if n > len(b) {
227+
n = len(b)
237228
}
238-
if nmb == 0 {
239-
continue
240-
}
241-
mbytes = append(mbytes, buf[0])
242-
243-
// Convert from 8-bit console encoding to UTF16.
244-
// MultiByteToWideChar defaults to Unicode NFC form, which is the expected one.
245-
nwc, err := windows.MultiByteToWideChar(cp, windows.MB_ERR_INVALID_CHARS, &mbytes[0], int32(len(mbytes)), nil, 0)
229+
var nw uint32
230+
err := readConsole(f.fd, &f.readuint16[:len(f.readuint16)+1][len(f.readuint16)], uint32(n), &nw, nil)
246231
if err != nil {
247-
if err == windows.ERROR_NO_UNICODE_TRANSLATION {
248-
continue
249-
}
250232
return 0, err
251233
}
252-
if nwc != 1 {
253-
return 0, errors.New("MultiByteToWideChar returns " + itoa(int(nwc)) + " characters, but only 1 expected")
234+
uint16s := f.readuint16[:len(f.readuint16)+int(nw)]
235+
f.readuint16 = f.readuint16[:0]
236+
buf := f.readbyte[:0]
237+
for i := 0; i < len(uint16s); i++ {
238+
r := rune(uint16s[i])
239+
if utf16.IsSurrogate(r) {
240+
if i+1 == len(uint16s) {
241+
if nw > 0 {
242+
// Save half surrogate pair for next time.
243+
f.readuint16 = f.readuint16[:1]
244+
f.readuint16[0] = uint16(r)
245+
break
246+
}
247+
r = utf8.RuneError
248+
} else {
249+
r = utf16.DecodeRune(r, rune(uint16s[i+1]))
250+
if r != utf8.RuneError {
251+
i++
252+
}
253+
}
254+
}
255+
n := utf8.EncodeRune(buf[len(buf):cap(buf)], r)
256+
buf = buf[:len(buf)+n]
254257
}
255-
var wchars [1]uint16
256-
nwc, err = windows.MultiByteToWideChar(cp, windows.MB_ERR_INVALID_CHARS, &mbytes[0], int32(len(mbytes)), &wchars[0], nwc)
257-
if err != nil {
258-
return 0, err
258+
f.readbyte = buf
259+
f.readbyteOffset = 0
260+
if nw == 0 {
261+
break
259262
}
260-
return wchars[0], nil
261263
}
262-
}
263264

264-
// readConsole reads utf16 characters from console File,
265-
// encodes them into utf8 and stores them in buffer buf.
266-
// It returns the number of utf8 bytes read and an error, if any.
267-
func (f *File) readConsole(buf []byte) (n int, err error) {
268-
if len(buf) == 0 {
269-
return 0, nil
270-
}
271-
if len(f.readbuf) > 0 {
272-
return f.copyReadConsoleBuffer(buf)
273-
}
274-
wchar, err := f.readOneUTF16FromConsole()
275-
if err != nil {
276-
return 0, err
277-
}
278-
r := rune(wchar)
279-
if utf16.IsSurrogate(r) {
280-
wchar, err := f.readOneUTF16FromConsole()
281-
if err != nil {
282-
return 0, err
283-
}
284-
r = utf16.DecodeRune(r, rune(wchar))
285-
}
286-
if nr := utf8.RuneLen(r); nr > len(buf) {
287-
start := len(f.readbuf)
288-
for ; nr > 0; nr-- {
289-
f.readbuf = append(f.readbuf, 0)
265+
src := f.readbyte[f.readbyteOffset:]
266+
var i int
267+
for i = 0; i < len(src) && i < len(b); i++ {
268+
x := src[i]
269+
if x == 0x1A { // Ctrl-Z
270+
if i == 0 {
271+
f.readbyteOffset++
272+
}
273+
break
290274
}
291-
utf8.EncodeRune(f.readbuf[start:cap(f.readbuf)], r)
292-
} else {
293-
utf8.EncodeRune(buf, r)
294-
buf = buf[nr:]
295-
n += nr
296-
}
297-
if n > 0 {
298-
return n, nil
275+
b[i] = x
299276
}
300-
return f.copyReadConsoleBuffer(buf)
277+
f.readbyteOffset += i
278+
return i, nil
301279
}
302280

303281
// read reads up to len(b) bytes from the File.

src/os/os_windows_test.go

Lines changed: 63 additions & 79 deletions
Original file line numberDiff line numberDiff line change
@@ -5,19 +5,21 @@
55
package os_test
66

77
import (
8-
"bytes"
9-
"encoding/hex"
8+
"fmt"
109
"internal/syscall/windows"
1110
"internal/testenv"
11+
"io"
1212
"io/ioutil"
1313
"os"
1414
osexec "os/exec"
1515
"path/filepath"
16+
"reflect"
1617
"runtime"
1718
"sort"
1819
"strings"
1920
"syscall"
2021
"testing"
22+
"unicode/utf16"
2123
"unsafe"
2224
)
2325

@@ -641,88 +643,70 @@ func TestStatSymlinkLoop(t *testing.T) {
641643
}
642644

643645
func TestReadStdin(t *testing.T) {
644-
defer os.ResetGetConsoleCPAndReadFileFuncs()
646+
old := *os.ReadConsoleFunc
647+
defer func() {
648+
*os.ReadConsoleFunc = old
649+
}()
645650

646651
testConsole := os.NewConsoleFile(syscall.Stdin, "test")
647652

648-
var (
649-
hiraganaA_CP932 = []byte{0x82, 0xa0}
650-
hiraganaA_UTF8 = "\u3042"
651-
652-
tests = []struct {
653-
cp uint32
654-
input []byte
655-
output string // always utf8
656-
}{
657-
{
658-
cp: 437,
659-
input: []byte("abc"),
660-
output: "abc",
661-
},
662-
{
663-
cp: 850,
664-
input: []byte{0x84, 0x94, 0x81},
665-
output: "äöü",
666-
},
667-
{
668-
cp: 932,
669-
input: hiraganaA_CP932,
670-
output: hiraganaA_UTF8,
671-
},
672-
{
673-
cp: 932,
674-
input: bytes.Repeat(hiraganaA_CP932, 2),
675-
output: strings.Repeat(hiraganaA_UTF8, 2),
676-
},
677-
{
678-
cp: 932,
679-
input: append(bytes.Repeat(hiraganaA_CP932, 3), '.'),
680-
output: strings.Repeat(hiraganaA_UTF8, 3) + ".",
681-
},
682-
{
683-
cp: 932,
684-
input: append(append([]byte("hello"), hiraganaA_CP932...), []byte("world")...),
685-
output: "hello" + hiraganaA_UTF8 + "world",
686-
},
687-
{
688-
cp: 932,
689-
input: append(append([]byte("hello"), bytes.Repeat(hiraganaA_CP932, 5)...), []byte("world")...),
690-
output: "hello" + strings.Repeat(hiraganaA_UTF8, 5) + "world",
691-
},
692-
}
693-
)
694-
for _, consoleReadBufSize := range []int{1, 2, 3, 4, 5, 8, 10, 16, 20, 50, 100} {
695-
for _, readFileBufSize := range []int{1, 2, 3, 10, 16, 100, 1000} {
696-
nextTest:
697-
for ti, test := range tests {
698-
input := bytes.NewBuffer(test.input)
699-
*os.ReadFileP = func(h syscall.Handle, buf []byte, done *uint32, o *syscall.Overlapped) error {
700-
if len(buf) > readFileBufSize {
701-
buf = buf[:readFileBufSize]
653+
var tests = []string{
654+
"abc",
655+
"äöü",
656+
"\u3042",
657+
"“hi”™",
658+
"hello\x1aworld",
659+
"\U0001F648\U0001F649\U0001F64A",
660+
}
661+
662+
for _, consoleSize := range []int{1, 2, 3, 10, 16, 100, 1000} {
663+
for _, readSize := range []int{1, 2, 3, 4, 5, 8, 10, 16, 20, 50, 100} {
664+
for _, s := range tests {
665+
t.Run(fmt.Sprintf("c%d/r%d/%s", consoleSize, readSize, s), func(t *testing.T) {
666+
s16 := utf16.Encode([]rune(s))
667+
*os.ReadConsoleFunc = func(h syscall.Handle, buf *uint16, toread uint32, read *uint32, inputControl *byte) error {
668+
if inputControl != nil {
669+
t.Fatalf("inputControl not nil")
670+
}
671+
n := int(toread)
672+
if n > consoleSize {
673+
n = consoleSize
674+
}
675+
n = copy((*[10000]uint16)(unsafe.Pointer(buf))[:n], s16)
676+
s16 = s16[n:]
677+
*read = uint32(n)
678+
t.Logf("read %d -> %d", toread, *read)
679+
return nil
702680
}
703-
n, err := input.Read(buf)
704-
*done = uint32(n)
705-
return err
706-
}
707-
*os.GetCPP = func() uint32 {
708-
return test.cp
709-
}
710-
var bigbuf []byte
711-
for len(bigbuf) < len([]byte(test.output)) {
712-
buf := make([]byte, consoleReadBufSize)
713-
n, err := testConsole.Read(buf)
714-
if err != nil {
715-
t.Errorf("test=%d bufsizes=%d,%d: read failed: %v", ti, consoleReadBufSize, readFileBufSize, err)
716-
continue nextTest
681+
682+
var all []string
683+
var buf []byte
684+
chunk := make([]byte, readSize)
685+
for {
686+
n, err := testConsole.Read(chunk)
687+
buf = append(buf, chunk[:n]...)
688+
if err == io.EOF {
689+
all = append(all, string(buf))
690+
if len(all) >= 5 {
691+
break
692+
}
693+
buf = buf[:0]
694+
} else if err != nil {
695+
t.Fatalf("reading %q: error: %v", s, err)
696+
}
697+
if len(buf) >= 2000 {
698+
t.Fatalf("reading %q: stuck in loop: %q", s, buf)
699+
}
700+
}
701+
702+
want := strings.Split(s, "\x1a")
703+
for len(want) < 5 {
704+
want = append(want, "")
705+
}
706+
if !reflect.DeepEqual(all, want) {
707+
t.Errorf("reading %q:\nhave %x\nwant %x", s, all, want)
717708
}
718-
bigbuf = append(bigbuf, buf[:n]...)
719-
}
720-
have := hex.Dump(bigbuf)
721-
expected := hex.Dump([]byte(test.output))
722-
if have != expected {
723-
t.Errorf("test=%d bufsizes=%d,%d: %q expected, but %q received", ti, consoleReadBufSize, readFileBufSize, expected, have)
724-
continue nextTest
725-
}
709+
})
726710
}
727711
}
728712
}

0 commit comments

Comments
 (0)