Description
Issue Details
The PowerShellForGitHub.log
is written with different character encoding depending upon whether PowerShell 5 or PowerShell 7 are used. PowerShell 5 encoding is UTF-16 LE
and PowerShell 7 is UTF-8
. This causes an issue if the PowerShellForGithub
module is used on both PowerShell 5 and PowerShell 7 on the same computer, and corrupted entries in the log will occur.
Steps to reproduce the issue
$PowerShellForGitHubLogPath = "$([Environment]::GetFolderPath('MyDocuments'))\PowerShellForGitHub.log"
Remove-Item -Path $PowerShellForGitHubLogPath
pwsh.exe -Command { Get-GitHubRepository -OwnerName 'Octocat' | Out-Null }
powershell.exe -Command { Get-GitHubRepository -OwnerName 'Octocat' | Out-Null }
Get-Content -Path $PowerShellForGitHubLogPath
Review
Verbose logs showing the problem
2020-06-14 12:18:45 : Simon : VERBOSE : [0.14.0] Executing: Get-GitHubRepository -OwnerName "Octocat"
2020-06-14 12:18:45 : Simon : VERBOSE : Restoring Access Token from file. This value can be cleared in the future by calling Clear-GitHubAuthentication.
2020-06-14 12:18:45 : Simon : VERBOSE : Getting repos for Octocat
2020-06-14 12:18:45 : Simon : VERBOSE : Accessing [Get] https://github.com/api/users/Octocat/repos? [Timeout = 0)]
2020-06-14 12:18:46 : Simon : VERBOSE : Telemetry has been disabled via configuration. Skipping reporting event [Get-GitHubRepository].
2 0 2 0 - 0 6 - 1 4 1 2 : 1 8 : 4 8 : S i m o n : V E R B O S E : [ 0 . 1 4 . 0 ] E x e c u t i n g : G e t - G i t H u b R e p o s i t o r y - O w n e r N a m e " O c t o c a t "
2 0 2 0 - 0 6 - 1 4 1 2 : 1 8 : 4 8 : S i m o n : V E R B O S E : R e s t o r i n g A c c e s s T o k e n f r o m f i l e . T h i s v a l u e c a n b e c l e a r e d i n t h e f u t u r e b y c a l l i n g C l e a r - G i t H u b A u
t h e n t i c a t i o n .
2 0 2 0 - 0 6 - 1 4 1 2 : 1 8 : 4 8 : S i m o n : V E R B O S E : G e t t i n g r e p o s f o r O c t o c a t
2 0 2 0 - 0 6 - 1 4 1 2 : 1 8 : 4 8 : S i m o n : V E R B O S E : A c c e s s i n g [ G e t ] h t t p s : / / a p i . g i t h u b . c o m / u s e r s / O c t o c a t / r e p o s ? [ T i m e o u t = 0 ) ]
2 0 2 0 - 0 6 - 1 4 1 2 : 1 8 : 4 9 : S i m o n : V E R B O S E : T e l e m e t r y h a s b e e n d i s a b l e d v i a c o n f i g u r a t i o n . S k i p p i n g r e p o r t i n g e v e n t [ G e t - G i t H u b R e p o s i t o r y ] .
Suggested solution to the issue
Add -Encoding ascii
to the Out-File
cmdlet call in the Write-Log
function of the Helpers.ps1
module. This will force the log output to be UTF-8 without BOM on all PowerShell versions.
Note, on PowerShell 5 and below, any current log will have been written using UTF-16 LE, so this log would need to be dealt with when making this change (backed up, converted, deleted ?)
Requested Assignment
- I'm just reporting this problem, but don't want to fix it.
Operating System
OsName : Microsoft Windows 10 Pro
OsOperatingSystemSKU : 48
OsArchitecture : 64-bit
WindowsVersion : 1903
WindowsBuildLabEx : 18362.1.amd64fre.19h1_release.190318-1202
OsLanguage : en-GB
OsMuiLanguages : {en-GB, en-US}
PowerShell Version
PowerShell 5
Name Value
---- -----
PSVersion 5.1.18362.752
PSEdition Desktop
PSCompatibleVersions {1.0, 2.0, 3.0, 4.0...}
BuildVersion 10.0.18362.752
CLRVersion 4.0.30319.42000
WSManStackVersion 3.0
PSRemotingProtocolVersion 2.3
SerializationVersion 1.1.0.1
PowerShell 7
Name Value
---- -----
PSVersion 7.0.1
PSEdition Core
GitCommitId 7.0.1
OS Microsoft Windows 10.0.18362
Platform Win32NT
PSCompatibleVersions {1.0, 2.0, 3.0, 4.0…}
PSRemotingProtocolVersion 2.3
SerializationVersion 1.1.0.1
WSManStackVersion 3.0
Module Version
Running: 0.14.0
Installed: 0.14.0
Activity
HowardWolosky commentedon Jun 15, 2020
Thanks for catching this!
I think I had noticed it in passing and was wondering what was going on here, but not enough to look into it any further.
Given that Unicode characters can be used with the GitHub API, it makes more sense to use
utf8
vsansi
, but otherwise I agree that it was an oversight that an explicit encoding format wasn't specified in that function.giuseppecampanelli commentedon Jun 16, 2020
Is this up for grabs?
HowardWolosky commentedon Jun 16, 2020
Go for it. To be clear, have it use
utf8
.HowardWolosky commentedon Jun 16, 2020
Actually, play with it a bit. I'm still seeing oddities using
utf8
in PS5. You might want to tryunicode
instead. Do some testing with the different encoding formats across both PS5 and PS7 to make sure that logging from commandline usage as well as from Pester all output the same without any unexpected characters.giuseppecampanelli commentedon Jun 16, 2020
Thanks and will do!
giuseppecampanelli commentedon Jun 18, 2020
@HowardWolosky I've tested both
Unicode
andUTF8
encodings and both seem to solve the issue. The issue was caused due to the fact that Out-File in PS5 has a default encoding ofUnicode
and Out-File in PS7 has a default encoding ofUTF8NoBOM
so they were mismatched.I couldn't notice any issues with either, what were the oddities you noticed with
UTF8
? I tested outputs from both executing the command directly and from Pester.HowardWolosky commentedon Jun 21, 2020
It looks like the issue I saw was related to what @X-Guardian mentioned in his initial post:
When I left my existing file as-is bit changed it to append using UTF-8, this is what I would find at the end of the file when new content got added after the change:
If I deleted the file entirely, the problem went away when it got re-created.
I think what we need to do:
Add a new method to Helpers.ps1. Something like
Repair-LogFile
.@(239, 187, 191)
(e.g.EF. BB. BF
) then it's the wrong encoding.Then I think a new script file needs to be added to the module (something like
OnModuleLoad.ps1
) and then have that added toScriptsToProcess
in the manifest. HaveOnModuleLoad.ps1
callRepaid-LogFile
. Then, the log check and migration will happen immediately upon being loaded.Make sense?
X-Guardian commentedon Jun 21, 2020
How about just converting the file, it is then transparent to the user? Do the same byte check then read the file and write back out with the required encoding if required.
I wouldn't use
-Encoding utf8
as that add a BOM to the file.-Encoding ascii
writes UTF8 without a BOM.giuseppecampanelli commentedon Jun 21, 2020
@X-Guardian wouldn't ASCII be an issue since the GitHub API can use Unicode characters?
HowardWolosky commentedon Jun 21, 2020
Generally speaking, I like that idea. I think a potential problem with it (albeit a one-time only problem) is if the file is substantially large (because it's never been pruned), it could take a while to read in and read out, which will silently block the user's console window with no clear explanation.
According to this article, when writing the file back out, we should use
StreamWriter
to achieve the fastest performance (tip of the hat to StackOverflow for that link.Similarly, it appears that when trying to read the whole file in, you should probably use
[System.IO.File]::ReadLines()
.I've just finished reading through lots of opinions on the BOM vs no BOM argument. I see the merit of avoiding the BOM, but I see no way of doing so while sticking strictly with PowerShell given the encoding options that v4/5 have (as opposed to 7 which does have a no BOM option).
Luckily, it looks like our preferred way of writing, based on that provided article, does provide a
utf8
option that excludes the BOM....Given this, I think the path forward is to have the
Repair-LogFile
method as I previously described. It should detect the first two bytes of the file. If they are@(255, 254)
, that means that it's theUTF-16 LE
encoding from PS4/5. That's when you would read in the file with[System.IO.File]::ReadLines()
and then write the whole content out again withStreamWriter
.Then you would also update
Write-Log
to useStreamWriter.WriteLine()
to append the log file entry moving forward, as opposed toOut-File -Append
. That way, we can ensure uniform encoding across all PowerShell versions moving forward.@X-Guardian -- are you seeing issues with this plan?