Skip to content

JLM tests incorrectly assuming connected process has terminated #402

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lumpfish opened this issue Feb 3, 2021 · 5 comments
Closed

JLM tests incorrectly assuming connected process has terminated #402

lumpfish opened this issue Feb 3, 2021 · 5 comments
Assignees
Labels
Milestone

Comments

@lumpfish
Copy link
Contributor

lumpfish commented Feb 3, 2021

The JLM tests fail intermittently with output similar to:

STF 03:09:36.371 - +------ Step 5 - Wait for the processes to complete
STF 03:09:36.371 - | Wait for processes to meet expectations
STF 03:09:36.371 - |   Processes: [LT1, CL1]
STF 03:09:36.371 - |
STF 03:09:36.371 - Monitoring processes: CL1 LT1
CL1 j> 2021/02/02 03:09:37.055 ServerURL=service:jmx:rmi:///jndi/rmi://localhost:1234/jmxrmi
CL1 j> 2021/02/02 03:09:37.370 Attempting to connect
CL1 j> 2021/02/02 03:09:37.953 Monitored VM not ready at Feb 2, 2021, 3:09:37 AM (attempt 1, elapsed 517ms).
CL1 j> 2021/02/02 03:09:37.954 Waiting 5 secs and trying again...
CL1 j> 2021/02/02 03:09:42.955 Attempting to connect
CL1 j> 2021/02/02 03:09:45.364 Connection established!
CL1 j> 2021/02/02 03:09:45.622 Starting to write data
CL1 ...j> 2021/02/02 03:10:26.326 Exiting as JVM we are connected to has finished
CL1 stderr Exception in thread "main" java.lang.AssertionError: Exiting as JVM we are connected to has finished
CL1 stderr 	at org.junit.Assert.fail(Assert.java:88)
CL1 stderr 	at net.adoptopenjdk.test.jlm.remote.ThreadProfiler.getStatsViaProxy(ThreadProfiler.java:157)
CL1 stderr 	at net.adoptopenjdk.test.jlm.remote.ThreadProfiler.main(ThreadProfiler.java:97)
STF 03:10:26.656 - **FAILED** Process CL1 ended with exit code (1) and not the expected exit code/s (0)
STF 03:10:26.657 - Monitoring Report Summary:
STF 03:10:26.658 -   o Process CL1 ended with exit code (1) and not the expected exit code/s (0)
STF 03:10:26.658 -   o Process LT1 is still running as expected
STF 03:10:26.665 - Killing processes: CL1 LT1
STF 03:10:26.665 -   o Process CL1 pid 22062 is not running
STF 03:10:26.665 -   o Process LT1 pid 22061 stop()
STF 03:10:36.668 -   o Process LT1 pid 22061 terminate()
STF 03:10:37.669 -   o Process LT1 pid 22061 killed

We see the message Exiting as JVM we are connected to has finished, but later Process LT1 is still running as expected (LT1 is the connected JVM process). The stdout / stderr for the process does not contain any error messages - everything looks normal up to the point the process is killed.

The test code where the failure message is output is:

		} catch (UndeclaredThrowableException ue) {
			// If the exception was caused by a Connect or Unmarshal Exception
			// assume the monitored JVM has finished.
			Throwable cause = ue.getCause();
			Class<ConnectException> connectExcept = ConnectException.class;
			Class<UnmarshalException> unmarshalExcept = UnmarshalException.class;

			if (connectExcept.isInstance(cause) || unmarshalExcept.isInstance(cause)) {
				this.closeCSVFile();
				Message.logOut("Exiting as JVM we are connected to has finished");t th
				Assert.fail("Exiting as JVM we are connected to has finished");
			} else {
				Message.logOut(ue.getMessage());
				ue.printStackTrace();
				Assert.fail(ue.getMessage());
			}

So it looks like either a ConnectException or UnmarshalException was thrown, and this is interpreted as meaning the connected process is no longer running, which is incorrect in this case.

The code should be more explicit about which exception is thrown and output a stack trace to aid investigation.

@lumpfish
Copy link
Contributor Author

lumpfish commented Feb 4, 2021

Created PR #403.

Output at the time the test fails now provides full details of the exception thrown - from a test run of the PR:

11:08:28  STF 11:08:28.432 - +------ Step 5 - Wait for the processes to complete
11:08:28  STF 11:08:28.432 - | Wait for processes to meet expectations
11:08:28  STF 11:08:28.432 - |   Processes: [LT1, CL1]
11:08:28  STF 11:08:28.432 - |
11:08:28  STF 11:08:28.432 - Monitoring processes: CL1 LT1
11:08:29  CL1 j> 2021/02/04 11:08:29.110 ServerURL=service:jmx:rmi:///jndi/rmi://localhost:1234/jmxrmi
11:08:29  CL1 j> 2021/02/04 11:08:29.412 Attempting to connect

11:08:30  CL1 j> 2021/02/04 11:08:30.024 Monitored VM not ready at Feb 4, 2021, 11:08:30 AM (attempt 1, elapsed 532ms).
11:08:30  CL1 j> 2021/02/04 11:08:30.025 Waiting 5 secs and trying again...

11:08:35  CL1 j> 2021/02/04 11:08:35.025 Attempting to connect

11:08:37  CL1 j> 2021/02/04 11:08:37.355 Connection established!
11:08:37  CL1 j> 2021/02/04 11:08:37.624 Starting to write data

11:08:39  CL1 j> 2021/02/04 11:08:38.318 Exiting as UnmarshalException thrown receiving data from the connected JVM.  This may mean the JVM we are connected to has finished. null
11:08:39  CL1 stderr java.lang.reflect.UndeclaredThrowableException
11:08:39  CL1 stderr 	at jdk.proxy1/jdk.proxy1.$Proxy0.getSystemProperties(Unknown Source)
11:08:39  CL1 stderr 	at net.adoptopenjdk.test.jlm.resources.EnvironmentData.writeData(EnvironmentData.java:120)
11:08:39  CL1 stderr 	at net.adoptopenjdk.test.jlm.remote.ThreadProfiler.getStatsViaProxy(ThreadProfiler.java:128)
11:08:39  CL1 stderr 	at net.adoptopenjdk.test.jlm.remote.ThreadProfiler.main(ThreadProfiler.java:97)
11:08:39  CL1 stderr Caused by: java.rmi.UnmarshalException: error unmarshalling return; nested exception is: 
11:08:39  CL1 stderr 	javax.net.ssl.SSLException: Tag mismatch!
11:08:39  CL1 stderr 	at java.rmi/sun.rmi.server.UnicastRef.invoke(UnicastRef.java:197)
11:08:39  CL1 stderr 	at jdk.remoteref/jdk.jmx.remote.internal.rmi.PRef.invoke(Unknown Source)
11:08:39  CL1 stderr 	at java.management.rmi/javax.management.remote.rmi.RMIConnectionImpl_Stub.getAttribute(RMIConnectionImpl_Stub.java:279)
11:08:39  CL1 stderr 	at java.management.rmi/javax.management.remote.rmi.RMIConnector$RemoteMBeanServerConnection.getAttribute(RMIConnector.java:904)
11:08:39  CL1 stderr 	at java.management/com.sun.jmx.mbeanserver.MXBeanProxy$GetHandler.invoke(MXBeanProxy.java:122)
11:08:39  CL1 stderr 	at java.management/com.sun.jmx.mbeanserver.MXBeanProxy.invoke(MXBeanProxy.java:167)
11:08:39  CL1 stderr 	at java.management/javax.management.MBeanServerInvocationHandler.invoke(MBeanServerInvocationHandler.java:258)
11:08:39  CL1 stderr 	... 4 more
11:08:39  CL1 stderr Caused by: javax.net.ssl.SSLException: Tag mismatch!
11:08:39  CL1 stderr 	at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:133)
11:08:39  CL1 stderr 	at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:369)
11:08:39  CL1 stderr 	at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:312)
11:08:39  CL1 stderr 	at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:307)
11:08:39  CL1 stderr 	at java.base/sun.security.ssl.SSLTransport.decode(SSLTransport.java:122)
11:08:39  CL1 stderr 	at java.base/sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1500)
11:08:39  CL1 stderr 	at java.base/sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1465)
11:08:39  CL1 stderr 	at java.base/sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:1047)
11:08:39  CL1 stderr 	at java.base/java.io.BufferedInputStream.fill(BufferedInputStream.java:244)
11:08:39  CL1 stderr 	at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:263)
11:08:39  CL1 stderr 	at java.base/java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2950)
11:08:39  CL1 stderr 	at java.base/java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:3277)
11:08:39  CL1 stderr 	at java.base/java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:3287)
11:08:39  CL1 stderr 	at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1763)
11:08:39  CL1 stderr 	at java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:580)
11:08:39  CL1 stderr 	at java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:490)
11:08:39  CL1 stderr 	at java.base/java.util.HashMap.readObject(HashMap.java:1551)
11:08:39  CL1 stderr 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
11:08:39  CL1 stderr 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:78)
11:08:39  CL1 stderr 	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
11:08:39  CL1 stderr 	at java.base/java.lang.reflect.Method.invoke(Method.java:567)
11:08:39  CL1 stderr 	at java.base/java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1224)
11:08:39  CL1 stderr 	at java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2506)
11:08:39  CL1 stderr 	at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2335)
11:08:39  CL1 stderr 	at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1812)
11:08:39  CL1 stderr 	at java.base/java.io.ObjectInputStream$FieldValues.<init>(ObjectInputStream.java:2684)
11:08:39  CL1 stderr 	at java.base/java.io.ObjectInputStream.defaultReadObject(ObjectInputStream.java:748)
11:08:39  CL1 stderr 	at java.management/javax.management.openmbean.TabularDataSupport.readObject(TabularDataSupport.java:925)
11:08:39  CL1 stderr 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
11:08:39  CL1 stderr 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:78)
11:08:39  CL1 stderr 	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
11:08:39  CL1 stderr 	at java.base/java.lang.reflect.Method.invoke(Method.java:567)
11:08:39  CL1 stderr 	at java.base/java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1224)
11:08:39  CL1 stderr 	at java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2506)
11:08:39  CL1 stderr 	at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2335)
11:08:39  CL1 stderr 	at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1812)
11:08:39  CL1 stderr 	at java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:580)
11:08:39  CL1 stderr 	at java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:490)
11:08:39  CL1 stderr 	at java.rmi/sun.rmi.server.UnicastRef.unmarshalValue(UnicastRef.java:327)
11:08:39  CL1 stderr 	at java.rmi/sun.rmi.server.UnicastRef.invoke(UnicastRef.java:177)
11:08:39  CL1 stderr 	... 10 more
11:08:39  CL1 stderr Caused by: javax.crypto.AEADBadTagException: Tag mismatch!
11:08:39  CL1 stderr 	at java.base/com.sun.crypto.provider.NativeGaloisCounterMode.decryptFinal(NativeGaloisCounterMode.java:454)
11:08:39  CL1 stderr 	at java.base/com.sun.crypto.provider.CipherCore.finalNoPadding(CipherCore.java:1183)
11:08:39  CL1 stderr 	at java.base/com.sun.crypto.provider.CipherCore.fillOutputBuffer(CipherCore.java:1120)
11:08:39  CL1 stderr 	at java.base/com.sun.crypto.provider.CipherCore.doFinal(CipherCore.java:1007)
11:08:39  CL1 stderr 	at java.base/com.sun.crypto.provider.AESCipher.engineDoFinal(AESCipher.java:501)
11:08:39  CL1 stderr 	at java.base/javax.crypto.CipherSpi.bufferCrypt(CipherSpi.java:799)
11:08:39  CL1 stderr 	at java.base/javax.crypto.CipherSpi.engineDoFinal(CipherSpi.java:735)
11:08:39  CL1 stderr 	at java.base/com.sun.crypto.provider.AESCipher.engineDoFinal(AESCipher.java:673)
11:08:39  CL1 stderr 	at java.base/javax.crypto.Cipher.doFinal(Cipher.java:2500)
11:08:39  CL1 stderr 	at java.base/sun.security.ssl.SSLCipher$T13GcmReadCipherGenerator$GcmReadCipher.decrypt(SSLCipher.java:1930)
11:08:39  CL1 stderr 	at java.base/sun.security.ssl.SSLSocketInputRecord.decodeInputRecord(SSLSocketInputRecord.java:260)
11:08:39  CL1 stderr 	at java.base/sun.security.ssl.SSLSocketInputRecord.decode(SSLSocketInputRecord.java:181)
11:08:39  CL1 stderr 	at java.base/sun.security.ssl.SSLTransport.decode(SSLTransport.java:110)
11:08:39  CL1 stderr 	... 45 more
11:08:39  CL1 stderr Exception in thread "main" java.lang.AssertionError: Exiting as UnmarshalException thrown receiving data from the connected JVM.  This may mean the JVM we are connected to has finished. null
11:08:39  CL1 stderr 	at org.junit.Assert.fail(Assert.java:88)
11:08:39  CL1 stderr 	at net.adoptopenjdk.test.jlm.remote.ThreadProfiler.getStatsViaProxy(ThreadProfiler.java:164)
11:08:39  CL1 stderr 	at net.adoptopenjdk.test.jlm.remote.ThreadProfiler.main(ThreadProfiler.java:97)
11:08:39  STF 11:08:38.403 - **FAILED** Process CL1 ended with exit code (1) and not the expected exit code/s (0)
11:08:39  STF 11:08:38.404 - Monitoring Report Summary:
11:08:39  STF 11:08:38.405 -   o Process CL1 ended with exit code (1) and not the expected exit code/s (0)
11:08:39  STF 11:08:38.405 -   o Process LT1 is still running as expected
11:08:39  STF 11:08:38.412 - Killing processes: CL1 LT1
11:08:39  STF 11:08:38.412 -   o Process CL1 pid 22405 is not running
11:08:39  STF 11:08:38.412 -   o Process LT1 pid 22404 stop()

11:08:48  STF 11:08:48.414 -   o Process LT1 pid 22404 terminate()
11:08:49  STF 11:08:49.415 -   o Process LT1 pid 22404 killed
11:08:49  **FAILED** at step 5 (Wait for the processes to complete). Expected return value=0 Actual=1 at /home/jenkins/workspace/Grinder/openjdk-tests/TKG/../TKG/output_16124345967425/TestJlmRemoteThreadAuth_0/20210204-110821-TestJlmRemoteThreadAuth/execute.pl line 176.
11:08:49  STF 11:08:49.476 - **FAILED** execute script failed. Expected return value=0 Actual=1

@lumpfish
Copy link
Contributor Author

lumpfish commented Feb 4, 2021

This should help provide information to resolve eclipse-openj9/openj9#7397, eclipse-openj9/openj9#9046

@sxa
Copy link
Member

sxa commented Feb 9, 2021

@lumpfish Which host was that on?

@lumpfish
Copy link
Contributor Author

lumpfish commented Feb 9, 2021

@sxa - sorry, I didn't note which machine, I can only say it was aarch64_linux. Just getting the extra diagnostics was all I was concentrating on at the time.
adoptium/STF#103 is now merged, so when it reoccurs we will now get this additional information.

@lumpfish
Copy link
Contributor Author

lumpfish commented Feb 9, 2021

Fixed by #403

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants