Avatar

Please consider registering
guest

sp_LogInOut Log In sp_Registration Register

Register | Lost password?
Advanced Search

— Forum Scope —




— Match —





— Forum Options —





Minimum search word length is 3 characters - maximum search word length is 84 characters

sp_Feed Topic RSS sp_TopicIcon
uaNode.getComponents causes SubscriptionTimeout
February 5, 2021
17:16, EET
Avatar
ivfa
Member
Members
Forum Posts: 10
Member Since:
October 21, 2019
sp_UserOfflineSmall Offline

Hello Support,
we have a problem with an upgraded customer system when calling uaNode.getComponents().

The status when the problem occurs is:
– connection is established
– a subscription exists and data changes for subscribed MonitoredItems are received.

We are then trying to browse through the opc server recursively to check for new devices configured on the server in a seperate thread. At one point we call uaNode.getComponents() to get all “hasComponent” – references of the uaNode in question.
On the old server this process took around 5-10s, on the new server this runs for around 30s until I receive a Timeout Event in the SubscriptionAliveHandler.
I’m not sure if the getComponents – call causes something in the server to block the keep-alive process or if that is even possible but somehow a timeout is triggered.
Both system, old and upgraded, run side by side on different ports and from what i can see, the server structure is identically. The uaNode in question exists and has 12 “hasComponent”- References to the same nodeTypes on both servers.
Old and new server are configured with the same timeout settings.
Could this problem be caused due to the upgraded server SDK/newer OPC versions (old server is at least 5 years old and our client implementation is also from around that time, so the SDK used in the client is 2.3.2 and the old server should also be somewhere in the 2.x range).

I would be grateful if someone has an idea or a pointer what to look for (on client and server side).

Kind regards,
Florin

February 8, 2021
11:43, EET
Avatar
Bjarne Boström
Moderator
Moderators
Forum Posts: 1026
Member Since:
April 3, 2012
sp_UserOfflineSmall Offline

Hi,

Your question is missing the new SDK version, but I assume at least 4.3.0.

That is odd. Like there has been (in my opinion) some significant changes in OPC UA and our SDK, but it should be the other way around in most cases, i.e. it should be faster now. The number of Types in the base specification information model has sky-rocketed (mostly because all PubSub-related types are also there, but also a number of other things have been added by the OPC Foundation to the base model), that may affect some things in general.

Connecting time in 4.x can take a bit longer in some scenarios, since we do Read/Browse all Types upon connection. Though, that can also be faster, since it is also done in bulk. Or the very least upon touching the very first UaNode instance, it wont need to try to read types one-by-one-recursively for that node anymore (since those are cached). So the end result should be faster.

In 4.3.0 (https://downloads.prosysopc.com/opcua/Prosys_OPC_UA_SDK_for_Java_4_Release_Notes.html#version-4-3-0) we changed the getComponents to use the AddressSpace.getNodes(…) previously it basically did a sequential Read/Browse for each component.

It should receive the same data, just with less Read/Browse calls, since it will do all of them (respecting OperationLimits) in bulk. So each Request by itself is large, but there is less of them.

Anyway, for 12 components, the 5-10 seconds in the old impls sounds a lot. Is there a big latency to the server? And/or does it run on a machine with not that much resources? In the old impl latency would be multiplied by each component x 3 (since we needed 2 Read + 1 Browse per component), but in the new it should be pretty much 1-2 Read + Browse in total for all of them.

The server side should be able to concurrently process Requests, assuming it is run on a JVM that has typical Thread scheduling (e.g. if run on in some special JVM, it may behave differently). So it should not block keep-alives. Though, all the messages are read in a dedicated ReadThread, which will read the raw binary messages from the socket. In 4.x, their decoding is done in the same blocking-work-pool which would before just notice upper layers of the message (so there is _some_ difference, though probably shouldn’t affect this).

Some options to test/check:

1. Can you call AddressSpace.getNodes(NodeId…) version with all the Component NodeIds and check if that by itself is slow? If yes, can you then do the same sequentially using the AddressSpace.getNode instead and compare? If you can attach a profiler tool to the server, does it’s resource consumption (CPU+RAM+ GC intervals if any) behave differently (e.g. I guess in theory it is possible, that processing a single large request is slower than multiple smaller ones which would equate in size)

2. Any option to check with https://www.prosysopc.com/blog/opc-ua-wireshark/ is the problem of us not sending a PublishRequest from the client or not responding in time with a PublishResponse from the server?

3. Check the logs on both sides for any WARN or ERROR lines?

4. Try to make a reproduceable situation using our SampleConsoleServer (+ AddressSpace.getNodes(…)) that we could use to reproduce situation here?

5. The same as 1+4, but try via static InternalAddressSpaceAccessHelper.internalGetNodesWithNodeIds(boolean parallelCalls, AddressSpace addressSpace, Set nodeIds), with false for parallelCalls, the address space + a set of the same NodeIds, and then for the InternalGetNodesResults output getBrowseCalls(), getBrowseNextCalls(), getReadCalls(). Note that that API is internal, only exist in some of the latest versions, plus will probably cease to exist in some future version (preferably something like that is public/non-internal in 5.x). But that is what is internally called now. I would expect to see low digits (0-2, depending on things maybe a bit more) for the number of requests, but if they would be more, that might be an indication of something. You can also try with passing true as the parallerCalls option, though in practice it was slower in our tests (so we internally use false currently).

February 8, 2021
19:12, EET
Avatar
ivfa
Member
Members
Forum Posts: 10
Member Since:
October 21, 2019
sp_UserOfflineSmall Offline

Thank you very much for the detailed answer, I’ll try to clarify some things:
client sdk is 2.3.2, server sdk > 4.3.
I think we will have to try what you are proposing in 1.,2. and 4., since the only warn/error we have is the subscription timeout log when SubscriptionAliveListener.onTimeout() is called.

Thanks & kind regards

February 9, 2021
9:14, EET
Avatar
Bjarne Boström
Moderator
Moderators
Forum Posts: 1026
Member Since:
April 3, 2012
sp_UserOfflineSmall Offline

I guess one more option with that:

6. That SubscriptionAliveListener gained a new method in newer versions, onLifetimeTimeout, you could check is that also called. The onTimeout is “keepalive not received in MaxKeepAlive time + a margin”, but the subscription should exist still on the server. The onLifetimeTimeout is “keepalive not received in LifeTimeCount + a margin”, which means the server side would/should have deleted it at this point.

Like, both should not be called (by the SDK) if everything is working normally, but if only onTimeout is called, then the situation was recovered (but you might lose data, depends how big of a queue was made for the monitored items). On some rare occasions this could also happen if the latency to the server is a lot and the margings would not be increased (like, if the server triggers the keepalive basically at the last moment, it will still take _some_ time for it to go via wire to the client side).

If onLifetimeTimeout is called, then currently you must do the subscription yourself manually (but this typically indicates some big problem anyway), though the situation is a bit clunky since if a connection break happens long enough for us to go into reconnect mode, then we would also try to make the subscription again once we recover from the connection break.

If you check log timestamps for both methods (assuming both are called), if they are very close to eachother it could mean the PublishTask Thread for UaClient, which is a worker-thread per UaClient was not able to run properly (or that you chose a LifeTimeCount close to the MaxKeepAlive). Though, that would not cause the timeout, just delay it’s reporting since the last alive status is updated when we receive a PublishResponse, assuming status was good; and we send one more request immediately.

Forum Timezone: Europe/Helsinki

Most Users Ever Online: 1919

Currently Online:
17 Guest(s)

Currently Browsing this Page:
1 Guest(s)

Top Posters:

Heikki Tahvanainen: 402

hbrackel: 144

rocket science: 88

pramanj: 86

Francesco Zambon: 83

Ibrahim: 78

Sabari: 62

kapsl: 57

gjevremovic: 49

Xavier: 43

Member Stats:

Guest Posters: 0

Members: 735

Moderators: 7

Admins: 1

Forum Stats:

Groups: 3

Forums: 15

Topics: 1523

Posts: 6449

Newest Members:

rust, christamcdowall, redaahern07571, nigelbdhmp, travistimmons, AnnelCib, dalenegettinger, howardkennerley, Thomassnism, biancacraft16

Moderators: Jouni Aro: 1026, Pyry: 1, Petri: 0, Bjarne Boström: 1026, Jimmy Ni: 26, Matti Siponen: 346, Lusetti: 0

Administrators: admin: 1