"invalid continuation byte" error - UTF-8 and OS code page, see solution #451

Vic-Lau · 2023-03-16T02:37:10Z

Hello @kensoh, I am Chinese, I think there is a problem with tagui for python handling of Chinese characters, For example：

r.type('//*[@name="q"]', '撒') # google search input type test, It will cause 'invalid continuation byte'.

and

r.type('D:\input.png', '中文') # chrome input png type test, It will nothing happens and script will pending.

Mr.kensoh, Can you give me some advice? I really need your help! Thank you so much!

The text was updated successfully, but these errors were encountered:

kensoh · 2023-03-19T16:18:30Z

Hi @Vic-Lau, this package was not built and tested on international languages, but the following works for me on a Mac.

And the Google search box populates correctly the 3 Chinese characters. Can you try it and tell me your OS?

PS - I'm Chinese too! 🙌

# -*- coding: utf-8 -*-
import rpa as r
r.init()
r.debug(True)
r.url('https://www.google.com')
r.type('//*[@name="q"]', '撒')
r.type('//*[@name="q"]', '中文')
r.close()

Vic-Lau · 2023-03-20T03:17:20Z

Hi @Vic-Lau, this package was not built and tested on international languages, but the following works for me on a Mac.

And the Google search box populates correctly the 3 Chinese characters. Can you try it and tell me your OS?

PS - I'm Chinese too! 🙌
# -*- coding: utf-8 -*-
import rpa as r
r.init()
r.debug(True)
r.url('https://www.google.com')
r.type('//*[@name="q"]', '撒')
r.type('//*[@name="q"]', '中文')
r.close()

Hi @kensoh Mr.kensoh, Thank you for your reply. My os is windows10, and all the tests were done on different machines.

First problem: Some Chinese characters will report this error. for example: '撒'

r.type('//*[@name="q"]', '撒')

# Debug Info:
[RPA][1] - https://www.google.com
[RPA][1] - listening for inputs
[RPA][2] - exist_result = exist('//*[@name="q"]').toString()
[RPA][2] - listening for inputs
[RPA][3] - dump exist_result to rpa_python.txt
[RPA][3] - listening for inputs
[RPA][ERROR] - 'utf-8' codec can't decode bytes in position 34-35: invalid continuation byte
[RPA][4] - listening for inputs

Second problem: It's a bit different from your understanding, I snap the chrome address input "about:blank" as input.png to test 'visual_automation=True', It have some problems, show you the code:

# -*- coding: utf-8 -*-

import rpa as r


def test():
    r.init(visual_automation = True, chrome_browser=True)
    # r.type(r'D:\work\tagui-python\tagui_scripts\input.png', '[clear]') # It's not working.
    # r.type(r'D:\work\tagui-python\tagui_scripts\input.png', 'aaa') # It's good.
    r.type(r'D:\work\tagui-python\tagui_scripts\input.png', '中文') # It's not working.
    r.wait(10)
    r.close()
    
if __name__ == '__main__':
    test()

Mr.kensoh, Give me some advice please, Best Regard!

PS - 哈哈, 你居然也是中国人~, 那这段话你一定可以看懂了, TagUI这个开源项目真的是太棒了, 我正在试用TagUI, 并打算深度运用它来建设一个为解决复杂重复业务而生的RPA平台, 我试用了很多类似的开源工具, 直到遇到TagUI, 你的技术的专业性, 回答问题的耐心, 还有对RPA的热爱决定了我技术选型选择了TagUI, 在中国程序员中一个人的最高成就的称呼为"大神"二字, So, 大神kensoh, 很高兴能与你沟通, 再次感谢.

kensoh · 2023-03-29T23:59:13Z

Hi @Vic-Lau thanks for your detailed reply!

I don't have a Windows computer but just got hold of a Windows 11.

For the first problem you mentioned, on my PC, it works both from Python interactive mode:

>>> import rpa as r
>>> r.init()
True
>>> r.debug(True)
True
>>> r.url('https://www.google.com')
[RPA][1] - https://www.google.com
[RPA][1] - listening for inputs
True
>>> r.type('//*[@name="q"]', '撒')
[RPA][2] - exist_result = exist('//*[@name="q"]').toString()
[RPA][2] - listening for inputs
[RPA][3] - dump exist_result to rpa_python.txt
[RPA][3] - listening for inputs
[RPA][4] - type //*[@name="q"] as 撒
[RPA][4] - listening for inputs
True
>>> r.type('//*[@name="q"]', '中文')
[RPA][5] - exist_result = exist('//*[@name="q"]').toString()
[RPA][5] - listening for inputs
[RPA][6] - dump exist_result to rpa_python.txt
[RPA][6] - listening for inputs
[RPA][7] - type //*[@name="q"] as 中文
[RPA][7] - listening for inputs
True

And it also works from running the Python script directly:

# -*- coding: utf-8 -*-
import rpa as r
r.init()
r.debug(True)
r.url('https://www.google.com')
r.type('//*[@name="q"]', '撒')
r.type('//*[@name="q"]', '中文')
r.close()

My Windows 11 is on English language, and I didn't do any special configuration or language setup to run above successfully.

Are you able to try on another computer (personal computer or friend/colleague's computer) to see it is able to work?

I'll reply your second report problem in the next message.

kensoh · 2023-03-30T00:17:18Z

For the second problem, below are my comments:

# [clear] does not work with visual automation, it only work with normal web automation because a backend command is sent to Chrome to make a text field empty 
    r.type(r'D:\work\tagui-python\tagui_scripts\input.png', '[clear]') # It's not working.

# this is a working use case and yes it should work 
    r.type(r'D:\work\tagui-python\tagui_scripts\input.png', 'aaa') # It's good.

# the SikuliX engine used by rpa package does not support typing international characters
    r.type(r'D:\work\tagui-python\tagui_scripts\input.png', '中文') # It's not working.

# try using r.clipboard('中文') and then r.keyboard('[ctrl]v') after clicking on the text box visually
    r.clipboard('中文')
    r.click(r'D:\work\tagui-python\tagui_scripts\input.png')
    r.keyboard('[ctrl]v')

kensoh · 2023-03-30T00:29:06Z

(Typing Chinese reply on my mobile phone, because my PC can't type in Chinese)

我当然看得懂，我很高兴这个开源项目得到你的认同！也非常非常谢谢你的赞赏！

你认识轶文吗？他在积极的推广这个项目的原版本TagUI。你可以用以下网址联络他。说不定有一起合作的机会。

http://www.tagui.com.cn/

Vic-Lau · 2023-03-30T00:46:51Z

Hi @Vic-Lau thanks for your detailed reply!

I don't have a Windows computer but just got hold of a Windows 11.

For the first problem you mentioned, on my PC, it works both from Python interactive mode:
>>> import rpa as r
>>> r.init()
True
>>> r.debug(True)
True
>>> r.url('https://www.google.com')
[RPA][1] - https://www.google.com
[RPA][1] - listening for inputs
True
>>> r.type('//*[@name="q"]', '撒')
[RPA][2] - exist_result = exist('//*[@name="q"]').toString()
[RPA][2] - listening for inputs
[RPA][3] - dump exist_result to rpa_python.txt
[RPA][3] - listening for inputs
[RPA][4] - type //*[@name="q"] as 撒
[RPA][4] - listening for inputs
True
>>> r.type('//*[@name="q"]', '中文')
[RPA][5] - exist_result = exist('//*[@name="q"]').toString()
[RPA][5] - listening for inputs
[RPA][6] - dump exist_result to rpa_python.txt
[RPA][6] - listening for inputs
[RPA][7] - type //*[@name="q"] as 中文
[RPA][7] - listening for inputs
True
And it also works from running the Python script directly:
# -*- coding: utf-8 -*-
import rpa as r
r.init()
r.debug(True)
r.url('https://www.google.com')
r.type('//*[@name="q"]', '撒')
r.type('//*[@name="q"]', '中文')
r.close()
My Windows 11 is on English language, and I didn't do any special configuration or language setup to run above successfully.

Are you able to try on another computer (personal computer or friend/colleague's computer) to see it is able to work?

I'll reply your second report problem in the next message.

Hi @kensoh Mr.kensoh，Thank you for your reply first. I've tried this on 4 different computers, and the executed script uses different tool and set utf-8 , The difference between us is that my Windows 10 is on Chinese language, Is it possible that this is the cause of this problem?

Vic-Lau · 2023-03-30T01:07:13Z

For the second problem, below are my comments:

# [clear] does not work with visual automation, it only work with normal web automation because a backend command is sent to Chrome to make a text field empty 
    r.type(r'D:\work\tagui-python\tagui_scripts\input.png', '[clear]') # It's not working.

# this is a working use case and yes it should work 
    r.type(r'D:\work\tagui-python\tagui_scripts\input.png', 'aaa') # It's good.

# the SikuliX engine used by rpa package does not support typing international characters
    r.type(r'D:\work\tagui-python\tagui_scripts\input.png', '中文') # It's not working.

# try using r.clipboard('中文') and then r.keyboard('[ctrl]v') after clicking on the text box visually
    r.clipboard('中文')
    r.click(r'D:\work\tagui-python\tagui_scripts\input.png')
    r.keyboard('[ctrl]v')

Hi @kensoh Mr.kensoh，Thank you for your reply. I see~~~, I have tested, clipboard() is working ! By the way, if i want to visual automation [clear], I have to replace it with [ctrl]a + [backspace] right ? So I should think about keyboard operation more in the future.

Vic-Lau · 2023-03-30T01:11:57Z

(Typing Chinese reply on my mobile phone, because my PC can't type in Chinese)

我当然看得懂，我很高兴这个开源项目得到你的认同！也非常非常谢谢你的赞赏！

你认识轶文吗？他在积极的推广这个项目的原版本TagUI。你可以用以下网址联络他。说不定有一起合作的机会。

http://www.tagui.com.cn/

你好Kensoh大神，感谢你的中文回复，我还不认识轶文，但是如果真的有合作的机会，我一定会联系他的。哈哈，原来你中文也这么好啊，厉害！

kensoh · 2023-04-01T13:05:13Z

Hi @Vic-Lau,

Can you try right click and download below google.txt to your computer, rename it to google.py (I cannot attach .py file here), and run python google.py?. From the data points so far, my best guess is encoding issue. Maybe on Chinese Windows computers, the default encoding is not UTF-8 but some other encoding. Python rpa package tries to process the text characters using the standard UTF-8 encoding you run into the errors you reported. Below file uses UTF-8 encoding. If below file works, you can try when saving your Python scripts, see if you can choose UTF-8 as the encoding to see if it works.

google.txt

Yes you can use r.keyboard('[ctrl]a') and then r.keyboard('[delete]') or r.keyboard('[backspace]'). In the future, if there is strong user demand, I can see if the package can automatically change r.keyboard('[clear]') to this workaround. It isn't easy to implement accurately because need to make it work on Windows, Mac, Linux and when [clear] is used as part of the string in the parameter, not as a single parameter. So I'm not adding this auto-conversion for now.
I'll type from my phone for the last part of my reply :)

kensoh · 2023-04-01T13:14:48Z

我很欣慰我会华语。我认为这个是个很美，和壮观的语言。虽然比别的语言难学哈哈。

kensoh · 2023-04-01T13:18:25Z

对于大神的称呼，我当然很荣幸，但也不敢当。世界之大，一山还有一山高。我只是尽我的能力，做些有意义的事。

Vic-Lau · 2023-04-03T01:30:18Z

Hi @kensoh Mr.kensoh, Thank you for your reply, I'm sure I have set utf-8, But I still get error :

(‘中文’ no problem, but the '撒' ... )

I think it should be a problem with the Chinese windows10, I can ignore this error using this method:

decode('utf-8', 'ignore') # ignore error.

But I don't think it's a good idea. Is it possible to solve it by changing the global character set to GBK? For example, the character set as the method parameter.

Hi @Vic-Lau,

Can you try right click and download below google.txt to your computer, rename it to google.py (I cannot attach .py file here), and run python google.py?. From the data points so far, my best guess is encoding issue. Maybe on Chinese Windows computers, the default encoding is not UTF-8 but some other encoding. Python rpa package tries to process the text characters using the standard UTF-8 encoding you run into the errors you reported. Below file uses UTF-8 encoding. If below file works, you can try when saving your Python scripts, see if you can choose UTF-8 as the encoding to see if it works.

google.txt

Yes you can use r.keyboard('[ctrl]a') and then r.keyboard('[delete]') or r.keyboard('[backspace]'). In the future, if there is strong user demand, I can see if the package can automatically change r.keyboard('[clear]') to this workaround. It isn't easy to implement accurately because need to make it work on Windows, Mac, Linux and when [clear] is used as part of the string in the parameter, not as a single parameter. So I'm not adding this auto-conversion for now.

I'll type from my phone for the last part of my reply :)

Vic-Lau · 2023-04-03T01:42:08Z

Mr.kensoh，您所做的事情不仅有意义，而且也非常出色，就不要谦虚了，哈哈，致敬。

对于大神的称呼，我当然很荣幸，但也不敢当。世界之大，一山还有一山高。我只是尽我的能力，做些有意义的事。

kensoh · 2023-04-09T13:28:05Z

Hi @Vic-Lau, just got time to look at the GitHub issues. Kid school holidays now, busier with childcare 😅

Do you know what is the encoding that your Windows OS uses? If you know, you can edit the tagui.py file and replace all occurence of utf-8 with the encoding code that your Windows OS uses. The location of tagui.py file can be found at import rpa as r; print(r.__file__). After modifying the file, you can run a new session or Python to test.

I will also ask my friend in China who is familiar with TagUI engine to test if he has issue. There are a lot of China users for this package, but no one has reported this utf-8 coding issue before, need to understand more about the problem in order to find the best solution.

kensoh · 2023-04-09T13:30:05Z

Hi @kangyiwen, good day to you! Hope you have time to start touching Python. If you have started on Python, could I kindly ask you if you have issues using r.type() for this package with Chinese characters? @Vic-Lau in this issue has problems with some Chinese characters, but I can't replicate the problem on my Mac and Windows PC.

There are a lot of China users for this package, but no one has reported this utf-8 coding issue before, I'm trying to understand more about the problem in order to find the best solution reported in this GitHub issue.

kangyiwen · 2023-04-09T14:14:59Z

抱歉，我还没用上 Python，这个问题给不了建议。

…

---- 康轶文 13816359064 Ken Soh ***@***.***> 于2023年4月9日周日 21:30写道：

Hi @kangyiwen <https://github.com/kangyiwen>, good day to you! Hope you have time to start touching Python. If you have started on Python, could I kindly ask you if you are issues using r.type() for this package with Chinese characters? @Vic-Lau <https://github.com/Vic-Lau> in this issue has problems with some Chinese characters, but I can't replicate the problem on my Mac and Windows PC. There are a lot of China users for this package, but no one has reported this utf-8 coding issue before, I'm trying to understand more about the problem before finding the best solution reported in this GitHub issue. — Reply to this email directly, view it on GitHub <#451 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ASI64QNBJBNIAK2HK4OHM3LXAK2WPANCNFSM6AAAAAAV4TZSTA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Vic-Lau · 2023-04-10T09:02:29Z

Hi @kensoh Mr.Kensoh, Best wish to your family😊! Thank you for your reply. I modified GBK, but an error was reported :

So, I debugged this problem, First I get '撒' utf-8 value is '\xe6\x92\x92' :

if __name__ == '__main__':
    str = '撒'.encode('utf-8')
    print(str)  # '撒' utf-8 value is '\xe6\x92\x92'

and When I watch the 'input_variable' value I found this : the '撒' utf-8 value '\xe6\x92\x92' was modified to '\xe6\x92?', the code drops a 'x92'' and adds a '?'.

So, I executed this code and got the same error :

# -*- coding: utf-8 -*-
import tagui as tagui

    
if __name__ == '__main__':
    tagui._py23_decode(b'[RPA][4] - type //*[@name="q"] as \xe6\x92?\r\n')

So is it possible that there is a problem with substring or replace when doing the conversion?

Hi @Vic-Lau, just got time to look at the GitHub issues. Kid school holidays now, busier with childcare 😅

Do you know what is the encoding that your Windows OS uses? If you know, you can edit the tagui.py file and replace all occurence of utf-8 with the encoding code that your Windows OS uses. The location of tagui.py file can be found at import rpa as r; print(r.__file__). After modifying the file, you can run a new session or Python to test.

I will also ask my friend in China who is familiar with TagUI engine to test if he has issue. There are a lot of China users for this package, but no one has reported this utf-8 coding issue before, need to understand more about the problem in order to find the best solution.

kensoh · 2023-04-10T23:18:07Z

Hi @Vic-Lau thank you very much! :)

In your screenshot the decode() function is still using 'utf-8'. Can you try searching and replace 'utf-8' to 'gbk' in the tagui.py and reload to see if that works? I think there at 7 occurrences that need to be replaced. The others are comments I think.

Vic-Lau · 2023-04-11T03:27:50Z

Hi @kensoh Mr.Kensoh, The first screenshot is completely replaced GBK result, It's not working. Other screenshot are debug by utf-8 result. I thought the problem might not be related to the character set, so I changed back to utf-8 and started debugging.

By the way, Do you mind if I friend you on Facebook / Wechat / Email?

Hi @Vic-Lau thank you very much! :)

In your screenshot the decode() function is still using 'utf-8'. Can you try searching and replace 'utf-8' to 'gbk' in the tagui.py and reload to see if that works? I think there at 7 occurrences that need to be replaced. The others are comments I think.

kensoh · 2023-04-16T23:14:16Z

Hi @Vic-Lau,

I've checked that the decode() error comes from below line when trying to read the live output of TagUI engine.

RPA-Python/tagui.py

Line 130 in 2f0691e

global _process; return _py23_decode(_process.stdout.readline())

From your finding above, the encode() changed the output from utf-8 value '\xe6\x92\x92' was modified to '\xe6\x92?'

You've changed the code for encode/decode in the tagui.py to use gbk but there is still this error.

Can you try below code in Python interactive mode? It works on my Windows PC.

>>> a = '撒'
>>> a.encode('utf-8')
b'\xe6\x92\x92'
>>> a.encode('gbk')
b'\xc8\xf6'
>>> a.encode('utf-8').decode('utf-8')
'撒'
>>> a.encode('gbk').decode('gbk')
'撒'

If it doesn't work on yours, there may be something specific to the Python environment and Windows.

If it works on yours, then the only other possible cause is for TagUI live mode which runs using Python's subprocess, the default character encoding is somehow not compatible with the Python environment, causing encoding issues despite when you are already switching to 'gbk' encoding/decoding.

Try the following code in interactive mode and share your finding? It shows the code page used by subprocess.

>>> import os
>>> os.device_encoding(0)
'cp437'
>>> os.device_encoding(1)
'cp437'

It might be required to use an encoding compatible with that codepage above in encode/decode, instead of gbk. Or changing the encoding used by subprocess using a workaround like https://discuss.python.org/t/choosing-correct-encoding-for-subprocess-popen/3388/3

But what is very puzzling is package has a large number of users from China, I don't understand why this issue wasn't reported so far before by other users. Was it because no one ran into characters with the issue, or difference in Windows environments, or no one just bother to raise the issue. Knowing this can help to find a better solution than hardcoding locally.

Sure you can add me on Facebook! I don't have WeChat account

Vic-Lau · 2023-04-17T09:16:49Z

Hi @kensoh, Mr.Kensoh, Maybe my description is wrong, When I modified tagui.py to gbk, tagui.py cannot working. So, utf-8 value '\xe6\x92\x92' was modified to '\xe6\x92?' error is tagui.py's result in utf-8. I still think it is possible that there is a problem with substring or replace when doing the conversion.

It works on my Windows PC too :

>>> a = '撒'
>>> a.encode('utf-8')
b'\xe6\x92\x92'
>>> a.encode('gbk')
b'\xc8\xf6'
>>> a.encode('utf-8').decode('utf-8')
'撒'
>>> a.encode('gbk').decode('gbk')
'撒'

device_encoding is gbk :

I have tested on many Windows10 OS and used many versions of Python(3.8.0, 3.8.1, 3.11.2), I believe this error [RPA][ERROR] - 'utf-8' codec can't decode bytes in position 34-35: invalid continuation byte has always existed, I think it may be because this error does not affect the final type() result, so no one raise the issue.

PS: Mr.Kensoh, I already added your FB friend, Can you pass it? Thx.

kensoh · 2023-04-22T14:17:20Z

With your testing above and your original error messages below, it seems that problem happens when python process is trying to read the output from the subprocess running TagUI engine.

[RPA][3] - listening for inputs
[RPA][ERROR] - 'utf-8' codec can't decode bytes in position 34-35: invalid continuation byte

I think the next step is to try the following to change code page to my 437 to see if that happens. I will try to do some testing to switch mine to see if I can replicate the problem. But I don't have a windows laptop, so it is hard for me to debug at moments when I have some time at hand when I'm outside. So will take a longer time to debug this.

https://discuss.python.org/t/choosing-correct-encoding-for-subprocess-popen/3388/3

Vic-Lau · 2023-04-22T14:38:20Z

OK, I will continue testing, Thank you very much.

With your testing above and your original error messages below, it seems that problem happens when python process is trying to read the output from the subprocess running TagUI engine.

[RPA][3] - listening for inputs [RPA][ERROR] - 'utf-8' codec can't decode bytes in position 34-35: invalid continuation byte

I think the next step is to try the following to change code page to my 437 to see if that happens. I will try to do some testing to switch mine to see if I can replicate the problem. But I don't have a windows laptop, so it is hard for me to debug at moments when I have some time at hand when I'm outside. So will take a longer time to debug this.

https://discuss.python.org/t/choosing-correct-encoding-for-subprocess-popen/3388/3

lf1981 · 2023-04-22T15:27:52Z

I am the same, as long as I encounter Chinese, the program will freeze

lf1981 · 2023-04-22T15:50:47Z

But if I use the clipboard method to paste Chinese, it can pass smoothly

lf1981 · 2023-04-22T15:56:33Z

But I can use the code google.txt to pass smoothly, it's great, (windows 10)
thx！

Vic-Lau · 2023-04-22T15:57:25Z

Kensoh: The SikuliX engine used by rpa package does not support typing international characters.

Check this: #451 (comment)

I am the same, as long as I encounter Chinese, the program will freeze

lf1981 · 2023-04-22T16:01:47Z

OKOK,Thx

kensoh · 2023-04-23T10:36:03Z

Ok @Vic-Lau would be difficult for you to test because may need to modify subprocess call too. But any clues will be helpful. When I get time and a windows PC I will test the hypothesis whether the problem-solution works.

Vic-Lau · 2023-04-24T06:34:29Z

OK @kensoh Thank you very much.

Ok @Vic-Lau would be difficult for you to test because may need to modify subprocess call too. But any clues will be helpful. When I get time and a windows PC I will test the hypothesis whether the problem-solution works.

kensoh · 2023-05-21T11:44:09Z

@Vic-Lau I tried changing code page to 936 but works:

C:\Users\kenso>chcp 936
Active code page: 936

C:\Users\kenso>python
Python 3.10.7 (tags/v3.10.7:6cc6b13, Sep  5 2022, 14:08:36) [MSC v.1933 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.device_encoding(0)
'cp936'
>>> a = '撒'
>>> a.encode('utf-8')
b'\xe6\x92\x92'
>>> a.encode('gbk')
b'\xc8\xf6'
>>> a.encode('utf-8').decode('utf-8')
'撒'
>>> a.encode('gbk').decode('gbk')
'撒'
>>>

Will next try to run the rpa package code with this code page to see if there is any error.

kensoh · 2023-05-21T11:47:13Z

@Vic-Lau I can replicate the issue with code page 936:

C:\Users\kenso\Desktop>chcp
Active code page: 936

C:\Users\kenso\Desktop>python google.py
[RPA][1] - https://www.google.com
[RPA][1] - listening for inputs
[RPA][2] - exist_result = exist('//*[@name="q"]').toString()
[RPA][2] - listening for inputs
[RPA][3] - dump exist_result to rpa_python.txt
[RPA][3] - listening for inputs
[RPA][ERROR] - 'utf-8' codec can't decode bytes in position 34-35: invalid continuation byte
[RPA][4] - listening for inputs
[RPA][5] - exist_result = exist('//*[@name="q"]').toString()
[RPA][5] - listening for inputs
[RPA][6] - dump exist_result to rpa_python.txt
[RPA][6] - listening for inputs
[RPA][7] - type //*[@name="q"] as 中文
[RPA][7] - listening for inputs

kensoh · 2023-05-21T11:48:31Z

Best guess now is code page 936 and the utf-8 encoding used by default in rpa package isn't 100% compatible. To check more.

kensoh · 2023-05-21T12:37:03Z

@Vic-Lau, can you try to run the following from the command prompt, then run the python command on the google.py to see if it works?

chcp 437

Above will change code page to US and should work with UTF-8.

Another thing to try is change the utf-8 header in the google.py file to see if it works in your default code page.

Trying to explore different solutions to see which is the best. The other solution is having an option for rpa package to change default encoding, but will take more time to create.

kensoh · 2023-05-21T12:37:47Z

By header I mean the following, in your case of Chinese Windows OS:

# -*- coding: gbk -*-

kensoh · 2023-05-21T12:39:13Z

Try the 2 possible solutions separately not at the same time.

Possible solution 1, chcp 437 from command prompt
Possible solution 2, change header in .py file

Vic-Lau · 2023-05-22T01:41:21Z

Hi, @kensoh Mr.Kensoh, Thank you for your reply, I've tried both solutions, all successful!!! 👍 But I think the solution 1 is better, So I think this question can be closed. Thanks again.

By the way, Share update default chcp 437 method with others who have the same problem:

1. "win + r" and type "regedit".
2. find "\HKEY_CURRENT_USER\Software\Microsoft\Command Processor".
3. create "autorun" type value "chcp 437" and save! enjoy it ~

Try the 2 possible solutions separately not at the same time.

Possible solution 1, chcp 437 from command prompt Possible solution 2, change header in .py file

kensoh · 2023-05-28T01:57:29Z

Thanks @Vic-Lau !! Updated readme with these tips:

qeq66 · 2024-11-18T08:36:47Z

我的也是不可以。完全不能运行。

# -*- coding: utf-8 -*-
import rpa as r
r.init()
r.debug(True)
r.url('https://www.google.com')
r.type('//*[@name="q"]', '撒')
r.type('//*[@name="q"]', '中文')
r.close()

返回


[RPA][ERROR] - 'utf-8' codec can't decode bytes in position 70-71: invalid continuation byte
[RPA][ERROR] - use init() before using url()
[RPA][ERROR] - use init() before using type()
[RPA][ERROR] - use init() before using type()
[RPA][ERROR] - use init() before using close()

kensoh · 2024-11-18T18:43:45Z

See above, either do

# -*- coding: gbk -*-

Or change code page with
chcp 437

qeq66 · 2024-11-19T02:18:06Z

See above, either do
# -*- coding: gbk -*- 
Or change code page with chcp 437

再自己的代码上面增加编码标识同样不能运行。chcp 437需要再命令行修改？没有找到合适的修改方式。
目前我使用一下方法

def _py23_decode(input_variable = None):
    """function for python 2 and 3 str-byte compatibility handling"""
    if input_variable is None: return None
    elif _python2_env(): return input_variable
    else: return input_variable.decode('utf-8', errors='ignore')

增加了 errors='ignore' 暂时能运行。不确定有没有其他问题。

kensoh · 2024-11-19T18:44:46Z

Yes chcp 437 is to run from command line.

Thanks for sharing your solution! Interesting, because so far other users with the issue can solve with either chcp or change python file header. Will look out for more reports from other users.

Vic-Lau changed the title ~~Chinese characters problems.~~ Chinese character set problems. Mar 17, 2023

kensoh changed the title ~~Chinese character set problems.~~ Chinese character set problems - pending replication and user to try sample Mar 19, 2023

kensoh added the query label Mar 19, 2023

kensoh changed the title ~~Chinese character set problems - pending replication and user to try sample~~ Chinese character set problems - pending replication of problem and solution May 20, 2023

kensoh changed the title ~~Chinese character set problems - pending replication of problem and solution~~ Chinese character set problems - pending problem replication and solution May 20, 2023

kensoh changed the title ~~Chinese character set problems - pending problem replication and solution~~ Chinese character set problems - pending problem confirmation and solution May 21, 2023

kensoh added bug and removed query labels May 21, 2023

kensoh added query and removed bug labels May 27, 2023

kensoh changed the title ~~Chinese character set problems - pending problem confirmation and solution~~ "invalid continuation byte" error - UTF-8 and OS default code page, see solution May 28, 2023

kensoh changed the title ~~"invalid continuation byte" error - UTF-8 and OS default code page, see solution~~ "invalid continuation byte" error - UTF-8 and OS code page, see solution May 28, 2023

kensoh added a commit that referenced this issue May 28, 2023

#451 - readme tip on non-UTF-8 OS

f8f266c

kensoh closed this as completed May 28, 2023

kensoh mentioned this issue Jul 4, 2023

frame() failed for iframe in different domain - pending replication #476

Closed

"invalid continuation byte" error - UTF-8 and OS code page, see solution #451

"invalid continuation byte" error - UTF-8 and OS code page, see solution #451

Comments

Vic-Lau commented Mar 16, 2023 • edited Loading

kensoh commented Mar 19, 2023

Vic-Lau commented Mar 20, 2023 • edited Loading

kensoh commented Mar 29, 2023

kensoh commented Mar 30, 2023

kensoh commented Mar 30, 2023

Vic-Lau commented Mar 30, 2023 • edited Loading

Vic-Lau commented Mar 30, 2023 • edited Loading

Vic-Lau commented Mar 30, 2023 • edited Loading

kensoh commented Apr 1, 2023

kensoh commented Apr 1, 2023

kensoh commented Apr 1, 2023

Vic-Lau commented Apr 3, 2023 • edited Loading

Vic-Lau commented Apr 3, 2023

kensoh commented Apr 9, 2023 • edited Loading

kensoh commented Apr 9, 2023 • edited Loading

kangyiwen commented Apr 9, 2023 via email

Vic-Lau commented Apr 10, 2023 • edited Loading

kensoh commented Apr 10, 2023

Vic-Lau commented Apr 11, 2023 • edited Loading

kensoh commented Apr 16, 2023 • edited Loading

Vic-Lau commented Apr 17, 2023 • edited Loading

kensoh commented Apr 22, 2023 • edited Loading

Vic-Lau commented Apr 22, 2023

lf1981 commented Apr 22, 2023

lf1981 commented Apr 22, 2023

lf1981 commented Apr 22, 2023

Vic-Lau commented Apr 22, 2023

lf1981 commented Apr 22, 2023

kensoh commented Apr 23, 2023

Vic-Lau commented Apr 24, 2023

kensoh commented May 21, 2023 • edited Loading

kensoh commented May 21, 2023

kensoh commented May 21, 2023

kensoh commented May 21, 2023

kensoh commented May 21, 2023 • edited Loading

kensoh commented May 21, 2023

Vic-Lau commented May 22, 2023 • edited Loading

kensoh commented May 28, 2023

qeq66 commented Nov 18, 2024

kensoh commented Nov 18, 2024 • edited Loading

qeq66 commented Nov 19, 2024

kensoh commented Nov 19, 2024

Vic-Lau commented Mar 16, 2023 •

edited

Loading

Vic-Lau commented Mar 20, 2023 •

edited

Loading

Vic-Lau commented Mar 30, 2023 •

edited

Loading

Vic-Lau commented Mar 30, 2023 •

edited

Loading

Vic-Lau commented Mar 30, 2023 •

edited

Loading

Vic-Lau commented Apr 3, 2023 •

edited

Loading

kensoh commented Apr 9, 2023 •

edited

Loading

kensoh commented Apr 9, 2023 •

edited

Loading

Vic-Lau commented Apr 10, 2023 •

edited

Loading

Vic-Lau commented Apr 11, 2023 •

edited

Loading

kensoh commented Apr 16, 2023 •

edited

Loading

Vic-Lau commented Apr 17, 2023 •

edited

Loading

kensoh commented Apr 22, 2023 •

edited

Loading

kensoh commented May 21, 2023 •

edited

Loading

kensoh commented May 21, 2023 •

edited

Loading

Vic-Lau commented May 22, 2023 •

edited

Loading

kensoh commented Nov 18, 2024 •

edited

Loading