Skip to content

"invalid continuation byte" error - UTF-8 and OS code page, see solution #451

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Vic-Lau opened this issue Mar 16, 2023 · 42 comments
Closed
Labels

Comments

@Vic-Lau
Copy link

Vic-Lau commented Mar 16, 2023

Hello @kensoh, I am Chinese, I think there is a problem with tagui for python handling of Chinese characters, For example:

r.type('//*[@name="q"]', '撒') # google search input type test, It will cause 'invalid continuation byte'.

and

r.type('D:\input.png', '中文') # chrome input png type test, It will nothing happens and script will pending.

Mr.kensoh, Can you give me some advice? I really need your help! Thank you so much!

@Vic-Lau Vic-Lau changed the title Chinese characters problems. Chinese character set problems. Mar 17, 2023
@kensoh
Copy link
Member

kensoh commented Mar 19, 2023

Hi @Vic-Lau, this package was not built and tested on international languages, but the following works for me on a Mac.

And the Google search box populates correctly the 3 Chinese characters. Can you try it and tell me your OS?

PS - I'm Chinese too! 🙌

# -*- coding: utf-8 -*-
import rpa as r
r.init()
r.debug(True)
r.url('https://www.google.com')
r.type('//*[@name="q"]', '撒')
r.type('//*[@name="q"]', '中文')
r.close()

@kensoh kensoh changed the title Chinese character set problems. Chinese character set problems - pending replication and user to try sample Mar 19, 2023
@kensoh kensoh added the query label Mar 19, 2023
@Vic-Lau
Copy link
Author

Vic-Lau commented Mar 20, 2023

Hi @Vic-Lau, this package was not built and tested on international languages, but the following works for me on a Mac.

And the Google search box populates correctly the 3 Chinese characters. Can you try it and tell me your OS?

PS - I'm Chinese too! 🙌

# -*- coding: utf-8 -*-
import rpa as r
r.init()
r.debug(True)
r.url('https://www.google.com')
r.type('//*[@name="q"]', '撒')
r.type('//*[@name="q"]', '中文')
r.close()

Hi @kensoh Mr.kensoh, Thank you for your reply. My os is windows10, and all the tests were done on different machines.

First problem: Some Chinese characters will report this error. for example: '撒'

r.type('//*[@name="q"]', '撒')

# Debug Info:
[RPA][1] - https://www.google.com
[RPA][1] - listening for inputs
[RPA][2] - exist_result = exist('//*[@name="q"]').toString()
[RPA][2] - listening for inputs
[RPA][3] - dump exist_result to rpa_python.txt
[RPA][3] - listening for inputs
[RPA][ERROR] - 'utf-8' codec can't decode bytes in position 34-35: invalid continuation byte
[RPA][4] - listening for inputs

Second problem: It's a bit different from your understanding, I snap the chrome address input "about:blank" as input.png to test 'visual_automation=True', It have some problems, show you the code:

# -*- coding: utf-8 -*-

import rpa as r


def test():
    r.init(visual_automation = True, chrome_browser=True)
    # r.type(r'D:\work\tagui-python\tagui_scripts\input.png', '[clear]') # It's not working.
    # r.type(r'D:\work\tagui-python\tagui_scripts\input.png', 'aaa') # It's good.
    r.type(r'D:\work\tagui-python\tagui_scripts\input.png', '中文') # It's not working.
    r.wait(10)
    r.close()
    
if __name__ == '__main__':
    test()

Mr.kensoh, Give me some advice please, Best Regard!

PS - 哈哈, 你居然也是中国人~, 那这段话你一定可以看懂了, TagUI这个开源项目真的是太棒了, 我正在试用TagUI, 并打算深度运用它来建设一个为解决复杂重复业务而生的RPA平台, 我试用了很多类似的开源工具, 直到遇到TagUI, 你的技术的专业性, 回答问题的耐心, 还有对RPA的热爱决定了我技术选型选择了TagUI, 在中国程序员中一个人的最高成就的称呼为"大神"二字, So, 大神kensoh, 很高兴能与你沟通, 再次感谢.

@kensoh
Copy link
Member

kensoh commented Mar 29, 2023

Hi @Vic-Lau thanks for your detailed reply!

I don't have a Windows computer but just got hold of a Windows 11.

For the first problem you mentioned, on my PC, it works both from Python interactive mode:

>>> import rpa as r
>>> r.init()
True
>>> r.debug(True)
True
>>> r.url('https://www.google.com')
[RPA][1] - https://www.google.com
[RPA][1] - listening for inputs
True
>>> r.type('//*[@name="q"]', '撒')
[RPA][2] - exist_result = exist('//*[@name="q"]').toString()
[RPA][2] - listening for inputs
[RPA][3] - dump exist_result to rpa_python.txt
[RPA][3] - listening for inputs
[RPA][4] - type //*[@name="q"] as 
[RPA][4] - listening for inputs
True
>>> r.type('//*[@name="q"]', '中文')
[RPA][5] - exist_result = exist('//*[@name="q"]').toString()
[RPA][5] - listening for inputs
[RPA][6] - dump exist_result to rpa_python.txt
[RPA][6] - listening for inputs
[RPA][7] - type //*[@name="q"] as 中文
[RPA][7] - listening for inputs
True

image

And it also works from running the Python script directly:

# -*- coding: utf-8 -*-
import rpa as r
r.init()
r.debug(True)
r.url('https://www.google.com')
r.type('//*[@name="q"]', '撒')
r.type('//*[@name="q"]', '中文')
r.close()

My Windows 11 is on English language, and I didn't do any special configuration or language setup to run above successfully.

Are you able to try on another computer (personal computer or friend/colleague's computer) to see it is able to work?

I'll reply your second report problem in the next message.

@kensoh
Copy link
Member

kensoh commented Mar 30, 2023

For the second problem, below are my comments:

# [clear] does not work with visual automation, it only work with normal web automation because a backend command is sent to Chrome to make a text field empty 
    r.type(r'D:\work\tagui-python\tagui_scripts\input.png', '[clear]') # It's not working.

# this is a working use case and yes it should work 
    r.type(r'D:\work\tagui-python\tagui_scripts\input.png', 'aaa') # It's good.

# the SikuliX engine used by rpa package does not support typing international characters
    r.type(r'D:\work\tagui-python\tagui_scripts\input.png', '中文') # It's not working.

# try using r.clipboard('中文') and then r.keyboard('[ctrl]v') after clicking on the text box visually
    r.clipboard('中文')
    r.click(r'D:\work\tagui-python\tagui_scripts\input.png')
    r.keyboard('[ctrl]v')

@kensoh
Copy link
Member

kensoh commented Mar 30, 2023

(Typing Chinese reply on my mobile phone, because my PC can't type in Chinese)

我当然看得懂,我很高兴这个开源项目得到你的认同!也非常非常谢谢你的赞赏!

你认识轶文吗?他在积极的推广这个项目的原版本TagUI。你可以用以下网址联络他。说不定有一起合作的机会。

http://www.tagui.com.cn/

@Vic-Lau
Copy link
Author

Vic-Lau commented Mar 30, 2023

Hi @Vic-Lau thanks for your detailed reply!

I don't have a Windows computer but just got hold of a Windows 11.

For the first problem you mentioned, on my PC, it works both from Python interactive mode:

>>> import rpa as r
>>> r.init()
True
>>> r.debug(True)
True
>>> r.url('https://www.google.com')
[RPA][1] - https://www.google.com
[RPA][1] - listening for inputs
True
>>> r.type('//*[@name="q"]', '撒')
[RPA][2] - exist_result = exist('//*[@name="q"]').toString()
[RPA][2] - listening for inputs
[RPA][3] - dump exist_result to rpa_python.txt
[RPA][3] - listening for inputs
[RPA][4] - type //*[@name="q"] as 
[RPA][4] - listening for inputs
True
>>> r.type('//*[@name="q"]', '中文')
[RPA][5] - exist_result = exist('//*[@name="q"]').toString()
[RPA][5] - listening for inputs
[RPA][6] - dump exist_result to rpa_python.txt
[RPA][6] - listening for inputs
[RPA][7] - type //*[@name="q"] as 中文
[RPA][7] - listening for inputs
True

image

And it also works from running the Python script directly:

# -*- coding: utf-8 -*-
import rpa as r
r.init()
r.debug(True)
r.url('https://www.google.com')
r.type('//*[@name="q"]', '撒')
r.type('//*[@name="q"]', '中文')
r.close()

My Windows 11 is on English language, and I didn't do any special configuration or language setup to run above successfully.

Are you able to try on another computer (personal computer or friend/colleague's computer) to see it is able to work?

I'll reply your second report problem in the next message.

Hi @kensoh Mr.kensoh,Thank you for your reply first. I've tried this on 4 different computers, and the executed script uses different tool and set utf-8 , The difference between us is that my Windows 10 is on Chinese language, Is it possible that this is the cause of this problem?

@Vic-Lau
Copy link
Author

Vic-Lau commented Mar 30, 2023

For the second problem, below are my comments:

# [clear] does not work with visual automation, it only work with normal web automation because a backend command is sent to Chrome to make a text field empty 
    r.type(r'D:\work\tagui-python\tagui_scripts\input.png', '[clear]') # It's not working.

# this is a working use case and yes it should work 
    r.type(r'D:\work\tagui-python\tagui_scripts\input.png', 'aaa') # It's good.

# the SikuliX engine used by rpa package does not support typing international characters
    r.type(r'D:\work\tagui-python\tagui_scripts\input.png', '中文') # It's not working.

# try using r.clipboard('中文') and then r.keyboard('[ctrl]v') after clicking on the text box visually
    r.clipboard('中文')
    r.click(r'D:\work\tagui-python\tagui_scripts\input.png')
    r.keyboard('[ctrl]v')

Hi @kensoh Mr.kensoh,Thank you for your reply. I see~~~, I have tested, clipboard() is working ! By the way, if i want to visual automation [clear], I have to replace it with [ctrl]a + [backspace] right ? So I should think about keyboard operation more in the future.

@Vic-Lau
Copy link
Author

Vic-Lau commented Mar 30, 2023

(Typing Chinese reply on my mobile phone, because my PC can't type in Chinese)

我当然看得懂,我很高兴这个开源项目得到你的认同!也非常非常谢谢你的赞赏!

你认识轶文吗?他在积极的推广这个项目的原版本TagUI。你可以用以下网址联络他。说不定有一起合作的机会。

http://www.tagui.com.cn/

你好Kensoh大神,感谢你的中文回复,我还不认识轶文,但是如果真的有合作的机会,我一定会联系他的。哈哈,原来你中文也这么好啊,厉害!

@kensoh
Copy link
Member

kensoh commented Apr 1, 2023

Hi @Vic-Lau,

  1. Can you try right click and download below google.txt to your computer, rename it to google.py (I cannot attach .py file here), and run python google.py?. From the data points so far, my best guess is encoding issue. Maybe on Chinese Windows computers, the default encoding is not UTF-8 but some other encoding. Python rpa package tries to process the text characters using the standard UTF-8 encoding you run into the errors you reported. Below file uses UTF-8 encoding. If below file works, you can try when saving your Python scripts, see if you can choose UTF-8 as the encoding to see if it works.

google.txt

  1. Yes you can use r.keyboard('[ctrl]a') and then r.keyboard('[delete]') or r.keyboard('[backspace]'). In the future, if there is strong user demand, I can see if the package can automatically change r.keyboard('[clear]') to this workaround. It isn't easy to implement accurately because need to make it work on Windows, Mac, Linux and when [clear] is used as part of the string in the parameter, not as a single parameter. So I'm not adding this auto-conversion for now.

  2. I'll type from my phone for the last part of my reply :)

@kensoh
Copy link
Member

kensoh commented Apr 1, 2023

我很欣慰我会华语。我认为这个是个很美,和壮观的语言。虽然比别的语言难学哈哈。

@kensoh
Copy link
Member

kensoh commented Apr 1, 2023

对于大神的称呼,我当然很荣幸,但也不敢当。世界之大,一山还有一山高。我只是尽我的能力,做些有意义的事。

@Vic-Lau
Copy link
Author

Vic-Lau commented Apr 3, 2023

Hi @kensoh Mr.kensoh, Thank you for your reply, I'm sure I have set utf-8, But I still get error :

(‘中文’ no problem, but the '撒' ... )

decode error

I think it should be a problem with the Chinese windows10, I can ignore this error using this method:

decode('utf-8', 'ignore') # ignore error.

But I don't think it's a good idea. Is it possible to solve it by changing the global character set to GBK? For example, the character set as the method parameter.

Hi @Vic-Lau,

  1. Can you try right click and download below google.txt to your computer, rename it to google.py (I cannot attach .py file here), and run python google.py?. From the data points so far, my best guess is encoding issue. Maybe on Chinese Windows computers, the default encoding is not UTF-8 but some other encoding. Python rpa package tries to process the text characters using the standard UTF-8 encoding you run into the errors you reported. Below file uses UTF-8 encoding. If below file works, you can try when saving your Python scripts, see if you can choose UTF-8 as the encoding to see if it works.

google.txt

  1. Yes you can use r.keyboard('[ctrl]a') and then r.keyboard('[delete]') or r.keyboard('[backspace]'). In the future, if there is strong user demand, I can see if the package can automatically change r.keyboard('[clear]') to this workaround. It isn't easy to implement accurately because need to make it work on Windows, Mac, Linux and when [clear] is used as part of the string in the parameter, not as a single parameter. So I'm not adding this auto-conversion for now.
  2. I'll type from my phone for the last part of my reply :)

@Vic-Lau
Copy link
Author

Vic-Lau commented Apr 3, 2023

Mr.kensoh,您所做的事情不仅有意义,而且也非常出色,就不要谦虚了,哈哈,致敬。

对于大神的称呼,我当然很荣幸,但也不敢当。世界之大,一山还有一山高。我只是尽我的能力,做些有意义的事。

@kensoh
Copy link
Member

kensoh commented Apr 9, 2023

Hi @Vic-Lau, just got time to look at the GitHub issues. Kid school holidays now, busier with childcare 😅

Do you know what is the encoding that your Windows OS uses? If you know, you can edit the tagui.py file and replace all occurence of utf-8 with the encoding code that your Windows OS uses. The location of tagui.py file can be found at import rpa as r; print(r.__file__). After modifying the file, you can run a new session or Python to test.

I will also ask my friend in China who is familiar with TagUI engine to test if he has issue. There are a lot of China users for this package, but no one has reported this utf-8 coding issue before, need to understand more about the problem in order to find the best solution.

@kensoh
Copy link
Member

kensoh commented Apr 9, 2023

Hi @kangyiwen, good day to you! Hope you have time to start touching Python. If you have started on Python, could I kindly ask you if you have issues using r.type() for this package with Chinese characters? @Vic-Lau in this issue has problems with some Chinese characters, but I can't replicate the problem on my Mac and Windows PC.

There are a lot of China users for this package, but no one has reported this utf-8 coding issue before, I'm trying to understand more about the problem in order to find the best solution reported in this GitHub issue.

@kangyiwen
Copy link

kangyiwen commented Apr 9, 2023 via email

@Vic-Lau
Copy link
Author

Vic-Lau commented Apr 10, 2023

Hi @kensoh Mr.Kensoh, Best wish to your family😊! Thank you for your reply. I modified GBK, but an error was reported :
error1

So, I debugged this problem, First I get '撒' utf-8 value is '\xe6\x92\x92' :

if __name__ == '__main__':
    str = '撒'.encode('utf-8')
    print(str)  # '撒' utf-8 value is '\xe6\x92\x92'

and When I watch the 'input_variable' value I found this : the '撒' utf-8 value '\xe6\x92\x92' was modified to '\xe6\x92?', the code drops a 'x92'' and adds a '?'.
error2

So, I executed this code and got the same error :

# -*- coding: utf-8 -*-
import tagui as tagui

    
if __name__ == '__main__':
    tagui._py23_decode(b'[RPA][4] - type //*[@name="q"] as \xe6\x92?\r\n')

error3

So is it possible that there is a problem with substring or replace when doing the conversion?

Hi @Vic-Lau, just got time to look at the GitHub issues. Kid school holidays now, busier with childcare 😅

Do you know what is the encoding that your Windows OS uses? If you know, you can edit the tagui.py file and replace all occurence of utf-8 with the encoding code that your Windows OS uses. The location of tagui.py file can be found at import rpa as r; print(r.__file__). After modifying the file, you can run a new session or Python to test.

I will also ask my friend in China who is familiar with TagUI engine to test if he has issue. There are a lot of China users for this package, but no one has reported this utf-8 coding issue before, need to understand more about the problem in order to find the best solution.

@kensoh
Copy link
Member

kensoh commented Apr 10, 2023

Hi @Vic-Lau thank you very much! :)

In your screenshot the decode() function is still using 'utf-8'. Can you try searching and replace 'utf-8' to 'gbk' in the tagui.py and reload to see if that works? I think there at 7 occurrences that need to be replaced. The others are comments I think.

@Vic-Lau
Copy link
Author

Vic-Lau commented Apr 11, 2023

Hi @kensoh Mr.Kensoh, The first screenshot is completely replaced GBK result, It's not working. Other screenshot are debug by utf-8 result. I thought the problem might not be related to the character set, so I changed back to utf-8 and started debugging.

By the way, Do you mind if I friend you on Facebook / Wechat / Email?

Hi @Vic-Lau thank you very much! :)

In your screenshot the decode() function is still using 'utf-8'. Can you try searching and replace 'utf-8' to 'gbk' in the tagui.py and reload to see if that works? I think there at 7 occurrences that need to be replaced. The others are comments I think.

@kensoh
Copy link
Member

kensoh commented Apr 16, 2023

Hi @Vic-Lau,

I've checked that the decode() error comes from below line when trying to read the live output of TagUI engine.

global _process; return _py23_decode(_process.stdout.readline())

From your finding above, the encode() changed the output from utf-8 value '\xe6\x92\x92' was modified to '\xe6\x92?'

You've changed the code for encode/decode in the tagui.py to use gbk but there is still this error.

Can you try below code in Python interactive mode? It works on my Windows PC.

>>> a = '撒'
>>> a.encode('utf-8')
b'\xe6\x92\x92'
>>> a.encode('gbk')
b'\xc8\xf6'
>>> a.encode('utf-8').decode('utf-8')
'撒'
>>> a.encode('gbk').decode('gbk')
'撒'

If it doesn't work on yours, there may be something specific to the Python environment and Windows.

If it works on yours, then the only other possible cause is for TagUI live mode which runs using Python's subprocess, the default character encoding is somehow not compatible with the Python environment, causing encoding issues despite when you are already switching to 'gbk' encoding/decoding.

Try the following code in interactive mode and share your finding? It shows the code page used by subprocess.

>>> import os
>>> os.device_encoding(0)
'cp437'
>>> os.device_encoding(1)
'cp437'

It might be required to use an encoding compatible with that codepage above in encode/decode, instead of gbk. Or changing the encoding used by subprocess using a workaround like https://discuss.python.org/t/choosing-correct-encoding-for-subprocess-popen/3388/3

But what is very puzzling is package has a large number of users from China, I don't understand why this issue wasn't reported so far before by other users. Was it because no one ran into characters with the issue, or difference in Windows environments, or no one just bother to raise the issue. Knowing this can help to find a better solution than hardcoding locally.

Sure you can add me on Facebook! I don't have WeChat account

@Vic-Lau
Copy link
Author

Vic-Lau commented Apr 17, 2023

Hi @kensoh, Mr.Kensoh, Maybe my description is wrong, When I modified tagui.py to gbk, tagui.py cannot working. So, utf-8 value '\xe6\x92\x92' was modified to '\xe6\x92?' error is tagui.py's result in utf-8. I still think it is possible that there is a problem with substring or replace when doing the conversion.

It works on my Windows PC too :

>>> a = '撒'
>>> a.encode('utf-8')
b'\xe6\x92\x92'
>>> a.encode('gbk')
b'\xc8\xf6'
>>> a.encode('utf-8').decode('utf-8')
'撒'
>>> a.encode('gbk').decode('gbk')
'撒'

test1

device_encoding is gbk :

test2

I have tested on many Windows10 OS and used many versions of Python(3.8.0, 3.8.1, 3.11.2), I believe this error [RPA][ERROR] - 'utf-8' codec can't decode bytes in position 34-35: invalid continuation byte has always existed, I think it may be because this error does not affect the final type() result, so no one raise the issue.

PS: Mr.Kensoh, I already added your FB friend, Can you pass it? Thx.

@kensoh
Copy link
Member

kensoh commented Apr 22, 2023

With your testing above and your original error messages below, it seems that problem happens when python process is trying to read the output from the subprocess running TagUI engine.

[RPA][3] - listening for inputs
[RPA][ERROR] - 'utf-8' codec can't decode bytes in position 34-35: invalid continuation byte

I think the next step is to try the following to change code page to my 437 to see if that happens. I will try to do some testing to switch mine to see if I can replicate the problem. But I don't have a windows laptop, so it is hard for me to debug at moments when I have some time at hand when I'm outside. So will take a longer time to debug this.

https://discuss.python.org/t/choosing-correct-encoding-for-subprocess-popen/3388/3

@Vic-Lau
Copy link
Author

Vic-Lau commented Apr 22, 2023

OK, I will continue testing, Thank you very much.

With your testing above and your original error messages below, it seems that problem happens when python process is trying to read the output from the subprocess running TagUI engine.

[RPA][3] - listening for inputs [RPA][ERROR] - 'utf-8' codec can't decode bytes in position 34-35: invalid continuation byte

I think the next step is to try the following to change code page to my 437 to see if that happens. I will try to do some testing to switch mine to see if I can replicate the problem. But I don't have a windows laptop, so it is hard for me to debug at moments when I have some time at hand when I'm outside. So will take a longer time to debug this.

https://discuss.python.org/t/choosing-correct-encoding-for-subprocess-popen/3388/3

@lf1981
Copy link

lf1981 commented Apr 22, 2023

I am the same, as long as I encounter Chinese, the program will freeze
image

@lf1981
Copy link

lf1981 commented Apr 22, 2023

But if I use the clipboard method to paste Chinese, it can pass smoothly
image

@lf1981
Copy link

lf1981 commented Apr 22, 2023

But I can use the code google.txt to pass smoothly, it's great, (windows 10)
thx!

@Vic-Lau
Copy link
Author

Vic-Lau commented Apr 22, 2023

Kensoh: The SikuliX engine used by rpa package does not support typing international characters.

Check this: #451 (comment)

I am the same, as long as I encounter Chinese, the program will freeze image

@lf1981
Copy link

lf1981 commented Apr 22, 2023

OKOK,Thx

@kensoh
Copy link
Member

kensoh commented Apr 23, 2023

Ok @Vic-Lau would be difficult for you to test because may need to modify subprocess call too. But any clues will be helpful. When I get time and a windows PC I will test the hypothesis whether the problem-solution works.

@Vic-Lau
Copy link
Author

Vic-Lau commented Apr 24, 2023

OK @kensoh Thank you very much.

Ok @Vic-Lau would be difficult for you to test because may need to modify subprocess call too. But any clues will be helpful. When I get time and a windows PC I will test the hypothesis whether the problem-solution works.

@kensoh kensoh changed the title Chinese character set problems - pending replication and user to try sample Chinese character set problems - pending replication of problem and solution May 20, 2023
@kensoh kensoh changed the title Chinese character set problems - pending replication of problem and solution Chinese character set problems - pending problem replication and solution May 20, 2023
@kensoh
Copy link
Member

kensoh commented May 21, 2023

@Vic-Lau I tried changing code page to 936 but works:

C:\Users\kenso>chcp 936
Active code page: 936

C:\Users\kenso>python
Python 3.10.7 (tags/v3.10.7:6cc6b13, Sep  5 2022, 14:08:36) [MSC v.1933 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.device_encoding(0)
'cp936'
>>> a = '撒'
>>> a.encode('utf-8')
b'\xe6\x92\x92'
>>> a.encode('gbk')
b'\xc8\xf6'
>>> a.encode('utf-8').decode('utf-8')
'撒'
>>> a.encode('gbk').decode('gbk')
'撒'
>>>

Will next try to run the rpa package code with this code page to see if there is any error.

@kensoh
Copy link
Member

kensoh commented May 21, 2023

@Vic-Lau I can replicate the issue with code page 936:

C:\Users\kenso\Desktop>chcp
Active code page: 936

C:\Users\kenso\Desktop>python google.py
[RPA][1] - https://www.google.com
[RPA][1] - listening for inputs
[RPA][2] - exist_result = exist('//*[@name="q"]').toString()
[RPA][2] - listening for inputs
[RPA][3] - dump exist_result to rpa_python.txt
[RPA][3] - listening for inputs
[RPA][ERROR] - 'utf-8' codec can't decode bytes in position 34-35: invalid continuation byte
[RPA][4] - listening for inputs
[RPA][5] - exist_result = exist('//*[@name="q"]').toString()
[RPA][5] - listening for inputs
[RPA][6] - dump exist_result to rpa_python.txt
[RPA][6] - listening for inputs
[RPA][7] - type //*[@name="q"] as 中文
[RPA][7] - listening for inputs

@kensoh
Copy link
Member

kensoh commented May 21, 2023

Best guess now is code page 936 and the utf-8 encoding used by default in rpa package isn't 100% compatible. To check more.

@kensoh kensoh changed the title Chinese character set problems - pending problem replication and solution Chinese character set problems - pending problem confirmation and solution May 21, 2023
@kensoh kensoh added bug and removed query labels May 21, 2023
@kensoh
Copy link
Member

kensoh commented May 21, 2023

@Vic-Lau, can you try to run the following from the command prompt, then run the python command on the google.py to see if it works?

chcp 437

Above will change code page to US and should work with UTF-8.

Another thing to try is change the utf-8 header in the google.py file to see if it works in your default code page.

Trying to explore different solutions to see which is the best. The other solution is having an option for rpa package to change default encoding, but will take more time to create.

@kensoh
Copy link
Member

kensoh commented May 21, 2023

By header I mean the following, in your case of Chinese Windows OS:

# -*- coding: gbk -*-

@kensoh
Copy link
Member

kensoh commented May 21, 2023

Try the 2 possible solutions separately not at the same time.

Possible solution 1, chcp 437 from command prompt
Possible solution 2, change header in .py file

@Vic-Lau
Copy link
Author

Vic-Lau commented May 22, 2023

Hi, @kensoh Mr.Kensoh, Thank you for your reply, I've tried both solutions, all successful!!! 👍 But I think the solution 1 is better, So I think this question can be closed. Thanks again.

By the way, Share update default chcp 437 method with others who have the same problem:

1. "win + r" and type "regedit".
2. find "\HKEY_CURRENT_USER\Software\Microsoft\Command Processor".
3. create "autorun" type value "chcp 437" and save! enjoy it ~

Try the 2 possible solutions separately not at the same time.

Possible solution 1, chcp 437 from command prompt Possible solution 2, change header in .py file

@kensoh kensoh added query and removed bug labels May 27, 2023
@kensoh kensoh changed the title Chinese character set problems - pending problem confirmation and solution "invalid continuation byte" error - UTF-8 and OS default code page, see solution May 28, 2023
@kensoh kensoh changed the title "invalid continuation byte" error - UTF-8 and OS default code page, see solution "invalid continuation byte" error - UTF-8 and OS code page, see solution May 28, 2023
kensoh added a commit that referenced this issue May 28, 2023
@kensoh
Copy link
Member

kensoh commented May 28, 2023

Thanks @Vic-Lau !! Updated readme with these tips:

image

@qeq66
Copy link

qeq66 commented Nov 18, 2024

我的也是不可以。完全不能运行。

# -*- coding: utf-8 -*-
import rpa as r
r.init()
r.debug(True)
r.url('https://www.google.com')
r.type('//*[@name="q"]', '撒')
r.type('//*[@name="q"]', '中文')
r.close()

返回


[RPA][ERROR] - 'utf-8' codec can't decode bytes in position 70-71: invalid continuation byte
[RPA][ERROR] - use init() before using url()
[RPA][ERROR] - use init() before using type()
[RPA][ERROR] - use init() before using type()
[RPA][ERROR] - use init() before using close()

@kensoh
Copy link
Member

kensoh commented Nov 18, 2024

See above, either do

# -*- coding: gbk -*- 

Or change code page with
chcp 437

@qeq66
Copy link

qeq66 commented Nov 19, 2024

See above, either do

# -*- coding: gbk -*- 

Or change code page with chcp 437

再自己的代码上面增加编码标识同样不能运行。chcp 437需要再命令行修改?没有找到合适的修改方式。
目前我使用一下方法

def _py23_decode(input_variable = None):
    """function for python 2 and 3 str-byte compatibility handling"""
    if input_variable is None: return None
    elif _python2_env(): return input_variable
    else: return input_variable.decode('utf-8', errors='ignore')

增加了 errors='ignore' 暂时能运行。不确定有没有其他问题。

@kensoh
Copy link
Member

kensoh commented Nov 19, 2024

Yes chcp 437 is to run from command line.

Thanks for sharing your solution! Interesting, because so far other users with the issue can solve with either chcp or change python file header. Will look out for more reports from other users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

5 participants