-
Notifications
You must be signed in to change notification settings - Fork 699
"invalid continuation byte" error - UTF-8 and OS code page, see solution #451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @Vic-Lau, this package was not built and tested on international languages, but the following works for me on a Mac. And the Google search box populates correctly the 3 Chinese characters. Can you try it and tell me your OS? PS - I'm Chinese too! 🙌 # -*- coding: utf-8 -*-
import rpa as r
r.init()
r.debug(True)
r.url('https://www.google.com')
r.type('//*[@name="q"]', '撒')
r.type('//*[@name="q"]', '中文')
r.close() |
Hi @kensoh Mr.kensoh, Thank you for your reply. My os is windows10, and all the tests were done on different machines. First problem: Some Chinese characters will report this error. for example: '撒' r.type('//*[@name="q"]', '撒')
# Debug Info:
[RPA][1] - https://www.google.com
[RPA][1] - listening for inputs
[RPA][2] - exist_result = exist('//*[@name="q"]').toString()
[RPA][2] - listening for inputs
[RPA][3] - dump exist_result to rpa_python.txt
[RPA][3] - listening for inputs
[RPA][ERROR] - 'utf-8' codec can't decode bytes in position 34-35: invalid continuation byte
[RPA][4] - listening for inputs Second problem: It's a bit different from your understanding, I snap the chrome address input "about:blank" as input.png to test 'visual_automation=True', It have some problems, show you the code: # -*- coding: utf-8 -*-
import rpa as r
def test():
r.init(visual_automation = True, chrome_browser=True)
# r.type(r'D:\work\tagui-python\tagui_scripts\input.png', '[clear]') # It's not working.
# r.type(r'D:\work\tagui-python\tagui_scripts\input.png', 'aaa') # It's good.
r.type(r'D:\work\tagui-python\tagui_scripts\input.png', '中文') # It's not working.
r.wait(10)
r.close()
if __name__ == '__main__':
test() Mr.kensoh, Give me some advice please, Best Regard! PS - 哈哈, 你居然也是中国人~, 那这段话你一定可以看懂了, TagUI这个开源项目真的是太棒了, 我正在试用TagUI, 并打算深度运用它来建设一个为解决复杂重复业务而生的RPA平台, 我试用了很多类似的开源工具, 直到遇到TagUI, 你的技术的专业性, 回答问题的耐心, 还有对RPA的热爱决定了我技术选型选择了TagUI, 在中国程序员中一个人的最高成就的称呼为"大神"二字, So, 大神kensoh, 很高兴能与你沟通, 再次感谢. |
Hi @Vic-Lau thanks for your detailed reply! I don't have a Windows computer but just got hold of a Windows 11. For the first problem you mentioned, on my PC, it works both from Python interactive mode: >>> import rpa as r
>>> r.init()
True
>>> r.debug(True)
True
>>> r.url('https://www.google.com')
[RPA][1] - https://www.google.com
[RPA][1] - listening for inputs
True
>>> r.type('//*[@name="q"]', '撒')
[RPA][2] - exist_result = exist('//*[@name="q"]').toString()
[RPA][2] - listening for inputs
[RPA][3] - dump exist_result to rpa_python.txt
[RPA][3] - listening for inputs
[RPA][4] - type //*[@name="q"] as 撒
[RPA][4] - listening for inputs
True
>>> r.type('//*[@name="q"]', '中文')
[RPA][5] - exist_result = exist('//*[@name="q"]').toString()
[RPA][5] - listening for inputs
[RPA][6] - dump exist_result to rpa_python.txt
[RPA][6] - listening for inputs
[RPA][7] - type //*[@name="q"] as 中文
[RPA][7] - listening for inputs
True And it also works from running the Python script directly: # -*- coding: utf-8 -*-
import rpa as r
r.init()
r.debug(True)
r.url('https://www.google.com')
r.type('//*[@name="q"]', '撒')
r.type('//*[@name="q"]', '中文')
r.close() My Windows 11 is on English language, and I didn't do any special configuration or language setup to run above successfully. Are you able to try on another computer (personal computer or friend/colleague's computer) to see it is able to work? I'll reply your second report problem in the next message. |
For the second problem, below are my comments: # [clear] does not work with visual automation, it only work with normal web automation because a backend command is sent to Chrome to make a text field empty
r.type(r'D:\work\tagui-python\tagui_scripts\input.png', '[clear]') # It's not working.
# this is a working use case and yes it should work
r.type(r'D:\work\tagui-python\tagui_scripts\input.png', 'aaa') # It's good.
# the SikuliX engine used by rpa package does not support typing international characters
r.type(r'D:\work\tagui-python\tagui_scripts\input.png', '中文') # It's not working.
# try using r.clipboard('中文') and then r.keyboard('[ctrl]v') after clicking on the text box visually
r.clipboard('中文')
r.click(r'D:\work\tagui-python\tagui_scripts\input.png')
r.keyboard('[ctrl]v') |
(Typing Chinese reply on my mobile phone, because my PC can't type in Chinese) 我当然看得懂,我很高兴这个开源项目得到你的认同!也非常非常谢谢你的赞赏! 你认识轶文吗?他在积极的推广这个项目的原版本TagUI。你可以用以下网址联络他。说不定有一起合作的机会。 |
Hi @kensoh Mr.kensoh,Thank you for your reply first. I've tried this on 4 different computers, and the executed script uses different tool and set utf-8 , The difference between us is that my Windows 10 is on Chinese language, Is it possible that this is the cause of this problem? |
Hi @kensoh Mr.kensoh,Thank you for your reply. I see~~~, I have tested, clipboard() is working ! By the way, if i want to visual automation [clear], I have to replace it with |
你好Kensoh大神,感谢你的中文回复,我还不认识轶文,但是如果真的有合作的机会,我一定会联系他的。哈哈,原来你中文也这么好啊,厉害! |
Hi @Vic-Lau,
|
我很欣慰我会华语。我认为这个是个很美,和壮观的语言。虽然比别的语言难学哈哈。 |
对于大神的称呼,我当然很荣幸,但也不敢当。世界之大,一山还有一山高。我只是尽我的能力,做些有意义的事。 |
Hi @kensoh Mr.kensoh, Thank you for your reply, I'm sure I have set utf-8, But I still get error : (‘中文’ no problem, but the '撒' ... ) I think it should be a problem with the Chinese windows10, I can ignore this error using this method:
But I don't think it's a good idea. Is it possible to solve it by changing the global character set to GBK? For example, the character set as the method parameter.
|
Mr.kensoh,您所做的事情不仅有意义,而且也非常出色,就不要谦虚了,哈哈,致敬。
|
Hi @Vic-Lau, just got time to look at the GitHub issues. Kid school holidays now, busier with childcare 😅 Do you know what is the encoding that your Windows OS uses? If you know, you can edit the tagui.py file and replace all occurence of I will also ask my friend in China who is familiar with TagUI engine to test if he has issue. There are a lot of China users for this package, but no one has reported this utf-8 coding issue before, need to understand more about the problem in order to find the best solution. |
Hi @kangyiwen, good day to you! Hope you have time to start touching Python. If you have started on Python, could I kindly ask you if you have issues using r.type() for this package with Chinese characters? @Vic-Lau in this issue has problems with some Chinese characters, but I can't replicate the problem on my Mac and Windows PC. There are a lot of China users for this package, but no one has reported this utf-8 coding issue before, I'm trying to understand more about the problem in order to find the best solution reported in this GitHub issue. |
抱歉,我还没用上 Python,这个问题给不了建议。
…----
康轶文
13816359064
Ken Soh ***@***.***> 于2023年4月9日周日 21:30写道:
Hi @kangyiwen <https://github.com/kangyiwen>, good day to you! Hope you
have time to start touching Python. If you have started on Python, could I
kindly ask you if you are issues using r.type() for this package with
Chinese characters? @Vic-Lau <https://github.com/Vic-Lau> in this issue
has problems with some Chinese characters, but I can't replicate the
problem on my Mac and Windows PC.
There are a lot of China users for this package, but no one has reported
this utf-8 coding issue before, I'm trying to understand more about the
problem before finding the best solution reported in this GitHub issue.
—
Reply to this email directly, view it on GitHub
<#451 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ASI64QNBJBNIAK2HK4OHM3LXAK2WPANCNFSM6AAAAAAV4TZSTA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Hi @kensoh Mr.Kensoh, Best wish to your family😊! Thank you for your reply. I modified GBK, but an error was reported : So, I debugged this problem, First I get '撒' utf-8 value is '\xe6\x92\x92' : if __name__ == '__main__':
str = '撒'.encode('utf-8')
print(str) # '撒' utf-8 value is '\xe6\x92\x92' and When I watch the 'input_variable' value I found this : the '撒' utf-8 value '\xe6\x92\x92' was modified to '\xe6\x92?', the code drops a 'x92'' and adds a '?'. So, I executed this code and got the same error : # -*- coding: utf-8 -*-
import tagui as tagui
if __name__ == '__main__':
tagui._py23_decode(b'[RPA][4] - type //*[@name="q"] as \xe6\x92?\r\n') So is it possible that there is a problem with substring or replace when doing the conversion?
|
Hi @Vic-Lau thank you very much! :) In your screenshot the decode() function is still using 'utf-8'. Can you try searching and replace 'utf-8' to 'gbk' in the tagui.py and reload to see if that works? I think there at 7 occurrences that need to be replaced. The others are comments I think. |
Hi @kensoh Mr.Kensoh, The first screenshot is completely replaced GBK result, It's not working. Other screenshot are debug by utf-8 result. I thought the problem might not be related to the character set, so I changed back to utf-8 and started debugging. By the way, Do you mind if I friend you on Facebook / Wechat / Email?
|
Hi @Vic-Lau, I've checked that the decode() error comes from below line when trying to read the live output of TagUI engine. Line 130 in 2f0691e
From your finding above, the encode() changed the output from You've changed the code for encode/decode in the tagui.py to use gbk but there is still this error. Can you try below code in Python interactive mode? It works on my Windows PC. >>> a = '撒'
>>> a.encode('utf-8')
b'\xe6\x92\x92'
>>> a.encode('gbk')
b'\xc8\xf6'
>>> a.encode('utf-8').decode('utf-8')
'撒'
>>> a.encode('gbk').decode('gbk')
'撒' If it doesn't work on yours, there may be something specific to the Python environment and Windows. If it works on yours, then the only other possible cause is for TagUI live mode which runs using Python's subprocess, the default character encoding is somehow not compatible with the Python environment, causing encoding issues despite when you are already switching to 'gbk' encoding/decoding. Try the following code in interactive mode and share your finding? It shows the code page used by subprocess. >>> import os
>>> os.device_encoding(0)
'cp437'
>>> os.device_encoding(1)
'cp437' It might be required to use an encoding compatible with that codepage above in encode/decode, instead of gbk. Or changing the encoding used by subprocess using a workaround like https://discuss.python.org/t/choosing-correct-encoding-for-subprocess-popen/3388/3 But what is very puzzling is package has a large number of users from China, I don't understand why this issue wasn't reported so far before by other users. Was it because no one ran into characters with the issue, or difference in Windows environments, or no one just bother to raise the issue. Knowing this can help to find a better solution than hardcoding locally. Sure you can add me on Facebook! I don't have WeChat account |
Hi @kensoh, Mr.Kensoh, Maybe my description is wrong, When I modified tagui.py to gbk, tagui.py cannot working. So, It works on my Windows PC too : >>> a = '撒'
>>> a.encode('utf-8')
b'\xe6\x92\x92'
>>> a.encode('gbk')
b'\xc8\xf6'
>>> a.encode('utf-8').decode('utf-8')
'撒'
>>> a.encode('gbk').decode('gbk')
'撒' device_encoding is gbk : I have tested on many Windows10 OS and used many versions of Python(3.8.0, 3.8.1, 3.11.2), I believe this error PS: Mr.Kensoh, I already added your FB friend, Can you pass it? Thx. |
With your testing above and your original error messages below, it seems that problem happens when python process is trying to read the output from the subprocess running TagUI engine. [RPA][3] - listening for inputs I think the next step is to try the following to change code page to my 437 to see if that happens. I will try to do some testing to switch mine to see if I can replicate the problem. But I don't have a windows laptop, so it is hard for me to debug at moments when I have some time at hand when I'm outside. So will take a longer time to debug this. https://discuss.python.org/t/choosing-correct-encoding-for-subprocess-popen/3388/3 |
OK, I will continue testing, Thank you very much.
|
But I can use the code google.txt to pass smoothly, it's great, (windows 10) |
Kensoh: The SikuliX engine used by rpa package does not support typing international characters. Check this: #451 (comment)
|
OKOK,Thx |
Ok @Vic-Lau would be difficult for you to test because may need to modify subprocess call too. But any clues will be helpful. When I get time and a windows PC I will test the hypothesis whether the problem-solution works. |
@Vic-Lau I tried changing code page to 936 but works:
Will next try to run the rpa package code with this code page to see if there is any error. |
@Vic-Lau I can replicate the issue with code page 936:
|
Best guess now is code page 936 and the utf-8 encoding used by default in rpa package isn't 100% compatible. To check more. |
@Vic-Lau, can you try to run the following from the command prompt, then run the python command on the google.py to see if it works? chcp 437 Above will change code page to US and should work with UTF-8. Another thing to try is change the utf-8 header in the google.py file to see if it works in your default code page. Trying to explore different solutions to see which is the best. The other solution is having an option for rpa package to change default encoding, but will take more time to create. |
By header I mean the following, in your case of Chinese Windows OS:
|
Try the 2 possible solutions separately not at the same time. Possible solution 1, chcp 437 from command prompt |
Hi, @kensoh Mr.Kensoh, Thank you for your reply, I've tried both solutions, all successful!!! 👍 But I think the solution 1 is better, So I think this question can be closed. Thanks again. By the way, Share update default chcp 437 method with others who have the same problem:
|
Thanks @Vic-Lau !! Updated readme with these tips: |
我的也是不可以。完全不能运行。
返回
|
See above, either do # -*- coding: gbk -*- Or change code page with |
再自己的代码上面增加编码标识同样不能运行。chcp 437需要再命令行修改?没有找到合适的修改方式。
增加了 errors='ignore' 暂时能运行。不确定有没有其他问题。 |
Yes chcp 437 is to run from command line. Thanks for sharing your solution! Interesting, because so far other users with the issue can solve with either chcp or change python file header. Will look out for more reports from other users. |
Hello @kensoh, I am Chinese, I think there is a problem with tagui for python handling of Chinese characters, For example:
r.type('//*[@name="q"]', '撒') # google search input type test, It will cause 'invalid continuation byte'.
and
r.type('D:\input.png', '中文') # chrome input png type test, It will nothing happens and script will pending.
Mr.kensoh, Can you give me some advice? I really need your help! Thank you so much!
The text was updated successfully, but these errors were encountered: