Always On Availability groups resolving state after failover

Always On Availability groups resolving state after failover - Remote harden of transaction...

Should corporate security training be tailored based on a users' job role?

Does human life have innate value over that of other animals?

Limit involving inverse functions

Father gets chickenpox, but doesn't infect his two children. How is this possible?

Is it possible to re-order a List of continous entries?

Who, if anyone, was the first astronaut to return to earth in a different vessel?

The totem pole can be grouped into

Identical projects by students at two different colleges: still plagiarism?

Sci fi book, man buys a beat up spaceship and intervenes in a civil war on a planet and eventually becomes a space cop

How to not forget my phone in the bathroom?

Animate an airplane in Beamer

Is the "hot network questions" element on Stack Overflow a dark pattern?

Headless horseman claims new head

Coworker is trying to get me to sign his petition to run for office. How to decline politely?

Why did Tywin never remarry?

What happens to someone who dies before their clone has matured?

How to write a character over another character

How can a kingdom keep the secret of a missing monarchy from the public?

Taking an academic pseudonym?

Is 'bad luck' with former employees a red flag?

Is candidate anonymity at all practical?

Does a star need to be inside a galaxy?

Negotiating 1-year delay to my Assistant Professor Offer

Why do we charge our flashes?

Always On Availability groups resolving state after failover - Remote harden of transaction 'GhostCleanupTask' failed

Specifying the failover partner when using Availability GroupsAvailability mode - manual failover mode, best practice for availability mode?Always On AG is Down when service is stoppedPrimary Role DB Not AccessibleSQL Server Cluster Service Online Pending StatusPrimary Server is “Resolving state” & Quorum node is in “failed” stateAlways on Failover ClusterSQL Cluster failover doesn't work after applying SQL Server 2016 Service Pack 2AlwaysOn Availability Groups: connecting correctly?Remote harden of transaction 'CREATE INDEX' - when moving an index to a different filegroup

My explanation will be opinion based as it is based on my findings. If more information is needed, I am always happy to add it.

The setup

We have 4 SQL Server 2016 Instances, and all 4 underlying servers are in a WSFC (In my eyes a strange setup since Always On AG's are 2 by 2. But that is a different question.)

enter image description here

4 nodes + Fileshare witness

Version: 13.0.5026.0 (all nodes)

The instances that are having the failover issues are instances 1 & 2.
They have 3 AG's between them. AG1 is primary on instance 1 and is having the issue. AG2 and AG3 are primary on instance 2.

Instances 3 and 4 have their own separate AG's (2).

It does not seem like any succesful failover has happened since recreating the AG a couple of months ago, the servers have Updates pending aswell. (We will be doing reboots in maintenance window)

AG1's settings

enter image description here

The week before

Last week, after checking for long running queries (none) and the synchronization state (green accross the board) we initiated a manual failover without data loss on AG1 from instance1 (primary) to instance2 (secondary).

The failover failed, leaving both the former primary (instance1) and the former secondary (instance2) in a temporary resolving state. Two databases on the former primary even showed Reverting / In Recovery for a few minutes. In the error log, a message was shown on many occasions:

enter image description here

The databases where not accessible during the outage, the outage took about 10 minutes total.

Before failing over I checked:

Current running queries (no long running queries)

DBCC OPENTRAN

Always On dashboard (Green accross the board)

We found a post by microsoft to enable a traceflag, 3923. We enabled this one on both instances after the outage.

Take #2, today

We tried testing the manual failover without data loss again to see if the issues where resolved.

Spoiler alert, they where not resolved.

enter image description here

Remote harden of transaction 'GhostCleanupTask' (ID 0x00000000141b9d97 0000:000cad62) started at Feb 19 2019 11:17AM in database 'DB1' at LSN (102:23518:2) failed.



Remote harden of transaction 'GhostCleanupTask' (ID 0x00000000141b9d97 0000:000cad62) started at Feb 19 2019 11:17AM in database 'DB1' at LSN (102:23518:2) failed.



Remote harden of transaction 'GhostCleanupTask' (ID 0x00000000141b9da2 0000:078d645c) started at Feb 19 2019 11:17AM in database 'DB2' at LSN (5909:54307:137) failed.



Remote harden of transaction 'GhostCleanupTask' (ID 0x00000000141b9dee 0000:0017e40a) started at Feb 19 2019 11:17AM in database 'DB1' at LSN (164:21769:2) failed.



Remote harden of transaction 'GhostCleanupTask' (ID 0x00000000141ba1a8 0000:0024e912) started at Feb 19 2019 11:17AM in database 'DB3' at LSN (1246:7285:2) failed.

Edit

There are different remote harden error's, on FTEPersistIndex & IFTSAutoNested & INSERT

Remote harden of transaction 'FTEPersistIndex' (ID 0x00000000141b98b5 0000:078d6459) started at Feb 19 2019 11:16AM in database 'DB2' at LSN (5909:54266:104) failed.



Remote harden of transaction 'IFTSAutoNested' (ID 0x00000000141b98bf 0000:1d20f419) started at Feb 19 2019 11:16AM in database 'DB4' at LSN (93341:183352:3) failed.



Remote harden of transaction 'INSERT' (ID 0x00000000141b989c 0000:1d20f418) started at Feb 19 2019 11:16AM in database 'DB4' at LSN (93341:183346:17) failed.

We canceled the failover and it failed back to the former primary.

The actual question(s)

The beginning question is, am i right to assume that the failover
not happening is due to the GhostCleanupTask and other tasks not being able to harden the transaction log, or is it a consequence of the failure
and not the cause.

If it is the cause, what could be done to resolve this?

Something I am looking into, since the problem does not seem to be due to any long running queries rolling back, that the secondary databases are in an inconsistent state? But they show up as Synchronized.

Extra info

The only interesting message in the clusterlog from last week appeared to be this one:

 mscs::TopologyPersister::TryGetNetworkPrivateProperties: (2)' because of 'OpenSubKey failed.'

Failover cluster events

Cluster resource 'AG1' of type 'SQL Server Availability Group' in
clustered role 'AG1' failed.

Based on the failure policies for the resource and role, the cluster
service may try to bring the resource online on this node or move the
group to another node of the cluster and then restart it. Check the
resource and group state using Failover Cluster Manager or the
Get-ClusterResource Windows PowerShell cmdlet.

The Cluster service failed to bring clustered role 'AG1'
completely online or offline. One or more resources may be in a failed
state. This may impact the availability of the clustered role.

Event log secondary server

First message

Windows cannot load the extensible counter DLL
InterceptCountersManager. The first four bytes (DWORD) of the Data
section contains the Windows error code.

Second message

The configuration information of the performance library
"perf-MSSQLSERVER-sqlctr13.2.5026.0.dll" for the "MSSQLSERVER" service
does not match the trusted performance library information stored in
the registry. The functions in this library will not be treated as
trusted.

SQL Server Log

Big part of the sql server log when the failover happened here

The State is switching from sync to not sync a lot during the resolving period

DbMgrPartnerCommitPolicy::SetSyncAndRecoveryPoint:
1B55DB6F-066D-442B-B97E-E32D6990756A:1

DbMgrPartnerCommitPolicy::SetSyncAndRecoveryPoint:
1B55DB6F-066D-442B-B97E-E32D6990756A:4

DbMgrPartnerCommitPolicy::SetSyncAndRecoveryPoint:
1B55DB6F-066D-442B-B97E-E32D6990756A:4

DbMgrPartnerCommitPolicy::SetSyncAndRecoveryPoint:
1B55DB6F-066D-442B-B97E-E32D6990756A:1

DbMgrPartnerCommitPolicy::SetSyncAndRecoveryPoint:
1B55DB6F-066D-442B-B97E-E32D6990756A:1

DbMgrPartnerCommitPolicy::SetSyncAndRecoveryPoint:
1B55DB6F-066D-442B-B97E-E32D6990756A:4

DbMgrPartnerCommitPolicy::SetSyncAndRecoveryPoint:
1B55DB6F-066D-442B-B97E-E32D6990756A:1
...

edited 31 mins ago

asked 5 hours ago

Randi Vertongen

2,776721

1

~Those two look unrelated to me - I get those regularly on our secondary that fails over fine. Is that the only log in the cluster?

– George.Palacios
4 hours ago

1

If you can repro this, I would recommend opening a support incident with Microsoft.

– Tony Hinkle
4 hours ago

1

Is this the Bug which resurfaced

– Shanky
4 hours ago

1

I'd tend to agree with @TonyHinkle in that case - If there are no error messages there's not a lot you can do short of raising a support case - I assume your cluster validation reports are all okay?

– George.Palacios
4 hours ago

1

I had different kind of a problem with AGs before we tried everything with it but it did not solved the issue. We rebooted our servers one by one (First secondary servers and after that primary node.) unexpectedly it resolved the issue on its own.

– Shashank Tiwari
3 hours ago

|
show 8 more comments

My explanation will be opinion based as it is based on my findings. If more information is needed, I am always happy to add it.

The setup

We have 4 SQL Server 2016 Instances, and all 4 underlying servers are in a WSFC (In my eyes a strange setup since Always On AG's are 2 by 2. But that is a different question.)

enter image description here

4 nodes + Fileshare witness

Version: 13.0.5026.0 (all nodes)

The instances that are having the failover issues are instances 1 & 2.
They have 3 AG's between them. AG1 is primary on instance 1 and is having the issue. AG2 and AG3 are primary on instance 2.

Instances 3 and 4 have their own separate AG's (2).

It does not seem like any succesful failover has happened since recreating the AG a couple of months ago, the servers have Updates pending aswell. (We will be doing reboots in maintenance window)

AG1's settings

enter image description here

The week before

enter image description here

The databases where not accessible during the outage, the outage took about 10 minutes total.

Before failing over I checked:

Current running queries (no long running queries)

DBCC OPENTRAN

Always On dashboard (Green accross the board)

We found a post by microsoft to enable a traceflag, 3923. We enabled this one on both instances after the outage.

Take #2, today

We tried testing the manual failover without data loss again to see if the issues where resolved.

Spoiler alert, they where not resolved.

enter image description here

Remote harden of transaction 'GhostCleanupTask' (ID 0x00000000141b9d97 0000:000cad62) started at Feb 19 2019 11:17AM in database 'DB1' at LSN (102:23518:2) failed.



Remote harden of transaction 'GhostCleanupTask' (ID 0x00000000141b9d97 0000:000cad62) started at Feb 19 2019 11:17AM in database 'DB1' at LSN (102:23518:2) failed.



Remote harden of transaction 'GhostCleanupTask' (ID 0x00000000141b9da2 0000:078d645c) started at Feb 19 2019 11:17AM in database 'DB2' at LSN (5909:54307:137) failed.



Remote harden of transaction 'GhostCleanupTask' (ID 0x00000000141b9dee 0000:0017e40a) started at Feb 19 2019 11:17AM in database 'DB1' at LSN (164:21769:2) failed.



Remote harden of transaction 'GhostCleanupTask' (ID 0x00000000141ba1a8 0000:0024e912) started at Feb 19 2019 11:17AM in database 'DB3' at LSN (1246:7285:2) failed.

Edit

There are different remote harden error's, on FTEPersistIndex & IFTSAutoNested & INSERT

Remote harden of transaction 'FTEPersistIndex' (ID 0x00000000141b98b5 0000:078d6459) started at Feb 19 2019 11:16AM in database 'DB2' at LSN (5909:54266:104) failed.



Remote harden of transaction 'IFTSAutoNested' (ID 0x00000000141b98bf 0000:1d20f419) started at Feb 19 2019 11:16AM in database 'DB4' at LSN (93341:183352:3) failed.



Remote harden of transaction 'INSERT' (ID 0x00000000141b989c 0000:1d20f418) started at Feb 19 2019 11:16AM in database 'DB4' at LSN (93341:183346:17) failed.

We canceled the failover and it failed back to the former primary.

The actual question(s)

The beginning question is, am i right to assume that the failover
not happening is due to the GhostCleanupTask and other tasks not being able to harden the transaction log, or is it a consequence of the failure
and not the cause.

If it is the cause, what could be done to resolve this?

Extra info

The only interesting message in the clusterlog from last week appeared to be this one:

 mscs::TopologyPersister::TryGetNetworkPrivateProperties: (2)' because of 'OpenSubKey failed.'

Failover cluster events

Cluster resource 'AG1' of type 'SQL Server Availability Group' in
clustered role 'AG1' failed.

Based on the failure policies for the resource and role, the cluster
service may try to bring the resource online on this node or move the
group to another node of the cluster and then restart it. Check the
resource and group state using Failover Cluster Manager or the
Get-ClusterResource Windows PowerShell cmdlet.

The Cluster service failed to bring clustered role 'AG1'
completely online or offline. One or more resources may be in a failed
state. This may impact the availability of the clustered role.

Event log secondary server

First message

Windows cannot load the extensible counter DLL
InterceptCountersManager. The first four bytes (DWORD) of the Data
section contains the Windows error code.

Second message

The configuration information of the performance library
"perf-MSSQLSERVER-sqlctr13.2.5026.0.dll" for the "MSSQLSERVER" service
does not match the trusted performance library information stored in
the registry. The functions in this library will not be treated as
trusted.

SQL Server Log

Big part of the sql server log when the failover happened here

The State is switching from sync to not sync a lot during the resolving period

DbMgrPartnerCommitPolicy::SetSyncAndRecoveryPoint:
1B55DB6F-066D-442B-B97E-E32D6990756A:1

DbMgrPartnerCommitPolicy::SetSyncAndRecoveryPoint:
1B55DB6F-066D-442B-B97E-E32D6990756A:4

DbMgrPartnerCommitPolicy::SetSyncAndRecoveryPoint:
1B55DB6F-066D-442B-B97E-E32D6990756A:4

DbMgrPartnerCommitPolicy::SetSyncAndRecoveryPoint:
1B55DB6F-066D-442B-B97E-E32D6990756A:1

DbMgrPartnerCommitPolicy::SetSyncAndRecoveryPoint:
1B55DB6F-066D-442B-B97E-E32D6990756A:1

DbMgrPartnerCommitPolicy::SetSyncAndRecoveryPoint:
1B55DB6F-066D-442B-B97E-E32D6990756A:4

DbMgrPartnerCommitPolicy::SetSyncAndRecoveryPoint:
1B55DB6F-066D-442B-B97E-E32D6990756A:1
...

edited 31 mins ago

asked 5 hours ago

Randi Vertongen

2,776721

1

~Those two look unrelated to me - I get those regularly on our secondary that fails over fine. Is that the only log in the cluster?

– George.Palacios
4 hours ago

1

If you can repro this, I would recommend opening a support incident with Microsoft.

– Tony Hinkle
4 hours ago

1

Is this the Bug which resurfaced

– Shanky
4 hours ago

1

I'd tend to agree with @TonyHinkle in that case - If there are no error messages there's not a lot you can do short of raising a support case - I assume your cluster validation reports are all okay?

– George.Palacios
4 hours ago

1

I had different kind of a problem with AGs before we tried everything with it but it did not solved the issue. We rebooted our servers one by one (First secondary servers and after that primary node.) unexpectedly it resolved the issue on its own.

– Shashank Tiwari
3 hours ago

|
show 8 more comments

My explanation will be opinion based as it is based on my findings. If more information is needed, I am always happy to add it.

The setup

We have 4 SQL Server 2016 Instances, and all 4 underlying servers are in a WSFC (In my eyes a strange setup since Always On AG's are 2 by 2. But that is a different question.)

enter image description here

4 nodes + Fileshare witness

Version: 13.0.5026.0 (all nodes)

The instances that are having the failover issues are instances 1 & 2.
They have 3 AG's between them. AG1 is primary on instance 1 and is having the issue. AG2 and AG3 are primary on instance 2.

Instances 3 and 4 have their own separate AG's (2).

It does not seem like any succesful failover has happened since recreating the AG a couple of months ago, the servers have Updates pending aswell. (We will be doing reboots in maintenance window)

AG1's settings

enter image description here

The week before

enter image description here

The databases where not accessible during the outage, the outage took about 10 minutes total.

Before failing over I checked:

Current running queries (no long running queries)

DBCC OPENTRAN

Always On dashboard (Green accross the board)

We found a post by microsoft to enable a traceflag, 3923. We enabled this one on both instances after the outage.

Take #2, today

We tried testing the manual failover without data loss again to see if the issues where resolved.

Spoiler alert, they where not resolved.

enter image description here

Remote harden of transaction 'GhostCleanupTask' (ID 0x00000000141b9d97 0000:000cad62) started at Feb 19 2019 11:17AM in database 'DB1' at LSN (102:23518:2) failed.



Remote harden of transaction 'GhostCleanupTask' (ID 0x00000000141b9d97 0000:000cad62) started at Feb 19 2019 11:17AM in database 'DB1' at LSN (102:23518:2) failed.



Remote harden of transaction 'GhostCleanupTask' (ID 0x00000000141b9da2 0000:078d645c) started at Feb 19 2019 11:17AM in database 'DB2' at LSN (5909:54307:137) failed.



Remote harden of transaction 'GhostCleanupTask' (ID 0x00000000141b9dee 0000:0017e40a) started at Feb 19 2019 11:17AM in database 'DB1' at LSN (164:21769:2) failed.



Remote harden of transaction 'GhostCleanupTask' (ID 0x00000000141ba1a8 0000:0024e912) started at Feb 19 2019 11:17AM in database 'DB3' at LSN (1246:7285:2) failed.

Edit

There are different remote harden error's, on FTEPersistIndex & IFTSAutoNested & INSERT

Remote harden of transaction 'FTEPersistIndex' (ID 0x00000000141b98b5 0000:078d6459) started at Feb 19 2019 11:16AM in database 'DB2' at LSN (5909:54266:104) failed.



Remote harden of transaction 'IFTSAutoNested' (ID 0x00000000141b98bf 0000:1d20f419) started at Feb 19 2019 11:16AM in database 'DB4' at LSN (93341:183352:3) failed.



Remote harden of transaction 'INSERT' (ID 0x00000000141b989c 0000:1d20f418) started at Feb 19 2019 11:16AM in database 'DB4' at LSN (93341:183346:17) failed.

We canceled the failover and it failed back to the former primary.

The actual question(s)

The beginning question is, am i right to assume that the failover
not happening is due to the GhostCleanupTask and other tasks not being able to harden the transaction log, or is it a consequence of the failure
and not the cause.

If it is the cause, what could be done to resolve this?

Extra info

The only interesting message in the clusterlog from last week appeared to be this one:

 mscs::TopologyPersister::TryGetNetworkPrivateProperties: (2)' because of 'OpenSubKey failed.'

Failover cluster events

Cluster resource 'AG1' of type 'SQL Server Availability Group' in
clustered role 'AG1' failed.

Based on the failure policies for the resource and role, the cluster
service may try to bring the resource online on this node or move the
group to another node of the cluster and then restart it. Check the
resource and group state using Failover Cluster Manager or the
Get-ClusterResource Windows PowerShell cmdlet.

The Cluster service failed to bring clustered role 'AG1'
completely online or offline. One or more resources may be in a failed
state. This may impact the availability of the clustered role.

Event log secondary server

First message

Windows cannot load the extensible counter DLL
InterceptCountersManager. The first four bytes (DWORD) of the Data
section contains the Windows error code.

Second message

The configuration information of the performance library
"perf-MSSQLSERVER-sqlctr13.2.5026.0.dll" for the "MSSQLSERVER" service
does not match the trusted performance library information stored in
the registry. The functions in this library will not be treated as
trusted.

SQL Server Log

Big part of the sql server log when the failover happened here

The State is switching from sync to not sync a lot during the resolving period

DbMgrPartnerCommitPolicy::SetSyncAndRecoveryPoint:
1B55DB6F-066D-442B-B97E-E32D6990756A:1

DbMgrPartnerCommitPolicy::SetSyncAndRecoveryPoint:
1B55DB6F-066D-442B-B97E-E32D6990756A:4

DbMgrPartnerCommitPolicy::SetSyncAndRecoveryPoint:
1B55DB6F-066D-442B-B97E-E32D6990756A:4

DbMgrPartnerCommitPolicy::SetSyncAndRecoveryPoint:
1B55DB6F-066D-442B-B97E-E32D6990756A:1

DbMgrPartnerCommitPolicy::SetSyncAndRecoveryPoint:
1B55DB6F-066D-442B-B97E-E32D6990756A:1

DbMgrPartnerCommitPolicy::SetSyncAndRecoveryPoint:
1B55DB6F-066D-442B-B97E-E32D6990756A:4

DbMgrPartnerCommitPolicy::SetSyncAndRecoveryPoint:
1B55DB6F-066D-442B-B97E-E32D6990756A:1
...

edited 31 mins ago

asked 5 hours ago

Randi Vertongen

2,776721

My explanation will be opinion based as it is based on my findings. If more information is needed, I am always happy to add it.

The setup

We have 4 SQL Server 2016 Instances, and all 4 underlying servers are in a WSFC (In my eyes a strange setup since Always On AG's are 2 by 2. But that is a different question.)

enter image description here

4 nodes + Fileshare witness

Version: 13.0.5026.0 (all nodes)

The instances that are having the failover issues are instances 1 & 2.
They have 3 AG's between them. AG1 is primary on instance 1 and is having the issue. AG2 and AG3 are primary on instance 2.

Instances 3 and 4 have their own separate AG's (2).

It does not seem like any succesful failover has happened since recreating the AG a couple of months ago, the servers have Updates pending aswell. (We will be doing reboots in maintenance window)

AG1's settings

enter image description here

The week before

enter image description here

The databases where not accessible during the outage, the outage took about 10 minutes total.

Before failing over I checked:

Current running queries (no long running queries)

DBCC OPENTRAN

Always On dashboard (Green accross the board)

We found a post by microsoft to enable a traceflag, 3923. We enabled this one on both instances after the outage.

Take #2, today

We tried testing the manual failover without data loss again to see if the issues where resolved.

Spoiler alert, they where not resolved.

enter image description here

Remote harden of transaction 'GhostCleanupTask' (ID 0x00000000141b9d97 0000:000cad62) started at Feb 19 2019 11:17AM in database 'DB1' at LSN (102:23518:2) failed.



Remote harden of transaction 'GhostCleanupTask' (ID 0x00000000141b9d97 0000:000cad62) started at Feb 19 2019 11:17AM in database 'DB1' at LSN (102:23518:2) failed.



Remote harden of transaction 'GhostCleanupTask' (ID 0x00000000141b9da2 0000:078d645c) started at Feb 19 2019 11:17AM in database 'DB2' at LSN (5909:54307:137) failed.



Remote harden of transaction 'GhostCleanupTask' (ID 0x00000000141b9dee 0000:0017e40a) started at Feb 19 2019 11:17AM in database 'DB1' at LSN (164:21769:2) failed.



Remote harden of transaction 'GhostCleanupTask' (ID 0x00000000141ba1a8 0000:0024e912) started at Feb 19 2019 11:17AM in database 'DB3' at LSN (1246:7285:2) failed.

Edit

There are different remote harden error's, on FTEPersistIndex & IFTSAutoNested & INSERT

Remote harden of transaction 'FTEPersistIndex' (ID 0x00000000141b98b5 0000:078d6459) started at Feb 19 2019 11:16AM in database 'DB2' at LSN (5909:54266:104) failed.



Remote harden of transaction 'IFTSAutoNested' (ID 0x00000000141b98bf 0000:1d20f419) started at Feb 19 2019 11:16AM in database 'DB4' at LSN (93341:183352:3) failed.



Remote harden of transaction 'INSERT' (ID 0x00000000141b989c 0000:1d20f418) started at Feb 19 2019 11:16AM in database 'DB4' at LSN (93341:183346:17) failed.

We canceled the failover and it failed back to the former primary.

The actual question(s)

The beginning question is, am i right to assume that the failover
not happening is due to the GhostCleanupTask and other tasks not being able to harden the transaction log, or is it a consequence of the failure
and not the cause.

If it is the cause, what could be done to resolve this?

Extra info

The only interesting message in the clusterlog from last week appeared to be this one:

 mscs::TopologyPersister::TryGetNetworkPrivateProperties: (2)' because of 'OpenSubKey failed.'

Failover cluster events

Cluster resource 'AG1' of type 'SQL Server Availability Group' in
clustered role 'AG1' failed.

Based on the failure policies for the resource and role, the cluster
service may try to bring the resource online on this node or move the
group to another node of the cluster and then restart it. Check the
resource and group state using Failover Cluster Manager or the
Get-ClusterResource Windows PowerShell cmdlet.

The Cluster service failed to bring clustered role 'AG1'
completely online or offline. One or more resources may be in a failed
state. This may impact the availability of the clustered role.

Event log secondary server

First message

Windows cannot load the extensible counter DLL
InterceptCountersManager. The first four bytes (DWORD) of the Data
section contains the Windows error code.

Second message

The configuration information of the performance library
"perf-MSSQLSERVER-sqlctr13.2.5026.0.dll" for the "MSSQLSERVER" service
does not match the trusted performance library information stored in
the registry. The functions in this library will not be treated as
trusted.

SQL Server Log

Big part of the sql server log when the failover happened here

The State is switching from sync to not sync a lot during the resolving period

DbMgrPartnerCommitPolicy::SetSyncAndRecoveryPoint:
1B55DB6F-066D-442B-B97E-E32D6990756A:1

DbMgrPartnerCommitPolicy::SetSyncAndRecoveryPoint:
1B55DB6F-066D-442B-B97E-E32D6990756A:4

DbMgrPartnerCommitPolicy::SetSyncAndRecoveryPoint:
1B55DB6F-066D-442B-B97E-E32D6990756A:4

DbMgrPartnerCommitPolicy::SetSyncAndRecoveryPoint:
1B55DB6F-066D-442B-B97E-E32D6990756A:1

DbMgrPartnerCommitPolicy::SetSyncAndRecoveryPoint:
1B55DB6F-066D-442B-B97E-E32D6990756A:1

DbMgrPartnerCommitPolicy::SetSyncAndRecoveryPoint:
1B55DB6F-066D-442B-B97E-E32D6990756A:4

DbMgrPartnerCommitPolicy::SetSyncAndRecoveryPoint:
1B55DB6F-066D-442B-B97E-E32D6990756A:1
...

sql-server sql-server-2016 availability-groups failover

edited 31 mins ago

asked 5 hours ago

Randi Vertongen

2,776721

edited 31 mins ago

asked 5 hours ago

Randi Vertongen

2,776721

edited 31 mins ago

asked 5 hours ago

Randi Vertongen

2,776721

asked 5 hours ago

Randi Vertongen

2,776721

asked 5 hours ago

Randi Vertongen

2,776721

1

~Those two look unrelated to me - I get those regularly on our secondary that fails over fine. Is that the only log in the cluster?

– George.Palacios
4 hours ago

1

If you can repro this, I would recommend opening a support incident with Microsoft.

– Tony Hinkle
4 hours ago

1

Is this the Bug which resurfaced

– Shanky
4 hours ago

1

I'd tend to agree with @TonyHinkle in that case - If there are no error messages there's not a lot you can do short of raising a support case - I assume your cluster validation reports are all okay?

– George.Palacios
4 hours ago

1

I had different kind of a problem with AGs before we tried everything with it but it did not solved the issue. We rebooted our servers one by one (First secondary servers and after that primary node.) unexpectedly it resolved the issue on its own.

– Shashank Tiwari
3 hours ago

|
show 8 more comments

1

~Those two look unrelated to me - I get those regularly on our secondary that fails over fine. Is that the only log in the cluster?

– George.Palacios
4 hours ago

1

If you can repro this, I would recommend opening a support incident with Microsoft.

– Tony Hinkle
4 hours ago

1

Is this the Bug which resurfaced

– Shanky
4 hours ago

1

I'd tend to agree with @TonyHinkle in that case - If there are no error messages there's not a lot you can do short of raising a support case - I assume your cluster validation reports are all okay?

– George.Palacios
4 hours ago

1

I had different kind of a problem with AGs before we tried everything with it but it did not solved the issue. We rebooted our servers one by one (First secondary servers and after that primary node.) unexpectedly it resolved the issue on its own.

– Shashank Tiwari
3 hours ago

~Those two look unrelated to me - I get those regularly on our secondary that fails over fine. Is that the only log in the cluster?

– George.Palacios
4 hours ago

If you can repro this, I would recommend opening a support incident with Microsoft.

– Tony Hinkle
4 hours ago

Is this the Bug which resurfaced

– Shanky
4 hours ago

I'd tend to agree with @TonyHinkle in that case - If there are no error messages there's not a lot you can do short of raising a support case - I assume your cluster validation reports are all okay?

– George.Palacios
4 hours ago

I had different kind of a problem with AGs before we tried everything with it but it did not solved the issue. We rebooted our servers one by one (First secondary servers and after that primary node.) unexpectedly it resolved the issue on its own.

– Shashank Tiwari
3 hours ago

|
show 8 more comments

0

active

oldest

votes

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f230128%2falways-on-availability-groups-resolving-state-after-failover-remote-harden-of%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Database Administrators Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

hZ,hYXqE,CerbL0oV,lffSf9DliGd2BDaEHeE0 CPr,Zp,teYT4qzWDs

搜尋此網誌

Sfrgttk