Resolving OSPF MTU problems with SROS

OSPF is a popular Interior Gateway Routing Protocol and in many instances it “just works” for a lot of situations, however care must be taken even in simple deployments. An issue that comes up from time to time is with regards to the maximum transmission unit (MTU). The network topology is a three router topology where I only have direct control a Nokia SROS based router.

OSPF MTU Test Topology

TL;DR – OSPF neighbor in ExchStart – you need to increase your MTU, OSPF neighbor in Exchange – you need to decrease your MTU. Keep reading to see how you can identify and resolve the MTU issues on Nokia Routers with SROS.

Below is the configuration of SR (the router under our administrative control):
[php]configure
system
name “SR”
exit
card 1
card-type iom3-xp-b
mda 1
mda-type m5-1gb-sfp-b
no shutdown
exit
no shutdown
exit
port 1/1/1
ethernet
mode access
exit
no shutdown
exit
port 1/1/2
ethernet
mode access
exit
no shutdown
exit
#————————————————–
echo “Router (Network Side) Configuration”
#————————————————–
router Base
interface “system”
address 1.1.1.1/32
no shutdown
exit
#————————————————–
echo “OSPFv2 Configuration”
#————————————————–
ospf 0
area 0.0.0.0
interface “system”
no shutdown
exit
exit
no shutdown
exit
exit

#————————————————–
echo “Service Configuration”
#————————————————–
service
customer 1 create
description “Default customer”
exit
ies 100 customer 1 create
description “PEER1”
interface “PEER1” create
address 10.1.2.1/27
sap 1/1/1 create
exit
exit
no shutdown
exit
ies 200 customer 1 create
description “PEER2”
interface “PEER2” create
address 10.1.3.1/27
sap 1/1/2 create
exit
exit
no shutdown
exit
exit
#————————————————–
echo “Router (Service Side) Configuration”
#————————————————–
router
ospf 0
area 0.0.0.1
interface “PEER1”
no shutdown
exit
interface “PEER2″
no shutdown
exit
exit
no shutdown
exit
exit
exit all
[/php]

One thing to note is that the Peer routers are attached to an Internet Enhanced Service (IES) and not part of the OSPF Backbone Area – from a stored configuration perspective there is a distinction between core network and customer configurations but from a protocol pespective things are the same. IES Interfaces that are bound to Service Access Points (SAPs) which must be changed from the default mode of network – in this case we are using access, however hybrid is an option as well.

As this post is about resolving issues, obviously things are not working as straight forward as expected.
[php]A:SR# show router route-table

===============================================================================
Route Table (Router: Base)
===============================================================================
Dest Prefix[Flags] Type Proto Age Pref
Next Hop[Interface Name] Metric
——————————————————————————-
1.1.1.1/32 Local Local 00h40m51s 0
system 0
10.1.2.0/27 Local Local 00h33m32s 0
PEER1 0
10.1.3.0/27 Local Local 00h34m02s 0
PEER2 0
——————————————————————————-
No. of Routes: 3
Flags: n = Number of times nexthop is repeated
B = BGP backup route available
L = LFA nexthop available
S = Sticky ECMP requested
===============================================================================[/php]
So nothing from OSPF is in the routing table, while its possible (but unlikely) that our peers aren’t advertising anything, an alternate explanation is that it could be a connectivity issue, so we’ll ping each peer router first
[php]A:SR# ping 10.1.2.2 count 3
PING 10.1.2.2 56 data bytes
64 bytes from 10.1.2.2: icmp_seq=1 ttl=64 time=1.28ms.
64 bytes from 10.1.2.2: icmp_seq=2 ttl=64 time=1.15ms.
64 bytes from 10.1.2.2: icmp_seq=3 ttl=64 time=1.08ms.

—- 10.1.2.2 PING Statistics —-
3 packets transmitted, 3 packets received, 0.00% packet loss
round-trip min = 1.08ms, avg = 1.17ms, max = 1.28ms, stddev = 0.081ms
A:SR# ping 10.1.3.3 count 3
PING 10.1.3.3 56 data bytes
64 bytes from 10.1.3.3: icmp_seq=1 ttl=64 time=1.51ms.
64 bytes from 10.1.3.3: icmp_seq=2 ttl=64 time=1.31ms.
64 bytes from 10.1.3.3: icmp_seq=3 ttl=64 time=1.24ms.

—- 10.1.3.3 PING Statistics —-
3 packets transmitted, 3 packets received, 0.00% packet loss
round-trip min = 1.24ms, avg = 1.35ms, max = 1.51ms, stddev = 0.116ms[/php]Okay so IP connectivity is established, lets check the OSPF interface state
[php highlight=”10″]A:SR# show router ospf interface

===============================================================================
Rtr Base OSPFv2 Instance 0 Interfaces
===============================================================================
If Name Area Id Designated Rtr Bkup Desig Rtr Adm Oper
——————————————————————————-
system 0.0.0.0 1.1.1.1 0.0.0.0 Up DR
PEER1 0.0.0.1 1.1.1.1 100.100.100.100 Up DR
PEER2 0.0.0.1 1.1.1.1 0.0.0.0 Up DR
——————————————————————————-
No. of OSPF Interfaces: 3
===============================================================================[/php]
From first glance PEER1 seems okay but PEER2 doesn’t have a BDR and since we are using the default ospf interface type (broadcast) we would expect that to see both the DR and BDR – lets get some more details
[php highlight=”42,44”]A:SR# show router ospf interface “PEER2” detail

===============================================================================
Rtr Base OSPFv2 Instance 0 Interface “PEER2” (detail)
===============================================================================
——————————————————————————-
Configuration
——————————————————————————-
IP Address : 10.1.3.1
Area Id : 0.0.0.1 Priority : 1
Hello Intrvl : 10 sec Rtr Dead Intrvl : 40 sec
Retrans Intrvl : 5 sec Poll Intrvl : 120 sec
Cfg Metric : 0 Advert Subnet : True
Transit Delay : 1 Cfg IF Type : None
Passive : False Cfg MTU : 0
LSA-filter-out : None Adv Rtr Capab : Yes
LFA : Include LFA NH Template :
RIB-priority : None
Auth Type : None
——————————————————————————-
State
——————————————————————————-
Admin Status : Enabled Oper State : Designated Rtr
Designated Rtr : 1.1.1.1 Backup Desig Rtr : 0.0.0.0
IF Type : Broadcast Network Type : Stub
Oper MTU : 1500 Last Enabled : 06/02/2017 01:44:23
Oper Metric : 100 Bfd Enabled : No
Te Metric : 100 Te State : Down
Admin Groups : None
Ldp Sync : outOfService Ldp Sync Wait : Disabled
Ldp Timer State : Disabled Ldp Tm Left : 0
——————————————————————————-
Statistics
——————————————————————————-
Nbr Count : 0 If Events : 2
Tot Rx Packets : 0 Tot Tx Packets : 76
Rx Hellos : 0 Tx Hellos : 76
Rx DBDs : 0 Tx DBDs : 0
Rx LSRs : 0 Tx LSRs : 0
Rx LSUs : 0 Tx LSUs : 0
Rx LS Acks : 0 Tx LS Acks : 0
Retransmits : 0 Discards : 78
Bad Networks : 0 Bad Virt Links : 0
Bad Areas : 78 Bad Dest Addrs : 0
Bad Auth Types : 0 Auth Failures : 0
Bad Neighbors : 0 Bad Pkt Types : 0
Bad Lengths : 0 Bad Hello Int. : 0
Bad Dead Int. : 0 Bad Options : 0
Bad Versions : 0 Bad Checksums : 0
LSA Count : 0 LSA Checksum : 0x0
===============================================================================[/php]Okay, we can see that there are discards which align with the Bad Area Count – this means that PEER2 doesn’t believe it’s part of OSPF Area 1.

Log-id 99 is automatically configured on Nokia SROS devices to capture a number of event messages however it can get a bit overwhelming to find something specific. Fortunately there are ways to reduce the output by specifying the application (OSPF) and something that may be part of the log message itself we want to see (PEER2)
[php]A:SR# show log log-id 99 application OSPF message PEER2

===============================================================================
Event Log 99
===============================================================================
Description : Default System Log
Memory Log contents [size=500 next event=944 (wrapped)]

941 2017/06/02 02:10:15.55 UTC WARNING: OSPF #2043 Base VR: 1 OSPFv2 (0)
“LCL_RTR_ID 1.1.1.1: Conflicting configuration areaMismatch on interface PEER2 from 10.1.3.3 in hello”[/php]So while we have identified a problem – OSPF Area MisMatch, we need to overcome it – remembering we cant configure PEER2 (the person that manages it is on a training course and cannot be contacted, while your project manager is wanting solutions, not problems..)

This is where using show and debug commands can help identify and resolve issues – SROS is quite powerful with its debugging tools and while they can be used in production, it is always best to attempt to narrow down what you are attempting to collect – firstly we need to create a debug log if one doesn’t already exist – for this example I’m just logging to a circular memory buffer but it could go to SNMP, syslog or a file if necessary.
[php]A:SR# configure log log-id 10
*A:SR>config>log>log-id$ from debug-trace
*A:SR>config>log>log-id$ to memory
*A:SR>config>log>log-id$ no shutdown
*A:SR>config>log>log-id$ back
*A:SR>config>log# info
———————————————-
log-id 10
from debug-trace
to memory
no shutdown
exit
———————————————-[/php]Now to set up the debug – we know it’s from interface PEER2 and the log message kindly told us the packet type (in hello)..
[php]*A:SR>config>log# /debug router ospf packet hello “PEER2”
*A:SR>config>log# show debug
debug
router “Base”
ospf
packet hello “PEER2″
exit
exit
exit[/php]Router Base is the global routing table of the router, the debug can reference other services e.g. a VPRN if necessary by changing the router – After a few seconds (OSPF hello packets will come every 10 seconds or so) we can look in log 10 to see what was received.
[php highlight=”15,24,32”]*A:SR>config>log# show log log-id 10

===============================================================================
Event Log 10
===============================================================================
Description : (Not Specified)
Memory Log contents [size=100 next event=10 (not wrapped)]

9 2017/06/02 02:19:27.10 UTC MINOR: DEBUG #2001 Base OSPFv2
“OSPFv2: PKT

>> Outgoing OSPF packet on I/F PEER2 area 0.0.0.1
OSPF Version : 2
Router Id : 1.1.1.1
Area Id : 0.0.0.1
Checksum : ecb9
Auth Type : Null
Auth Key : 00 00 00 00 00 00 00 00
Packet Type : HELLO
Packet Length : 44 ”

8 2017/06/02 02:19:26.55 UTC MINOR: DEBUG #2001 Base OSPFv2
“OSPFv2: PKT DROPPED
area mismatch”

7 2017/06/02 02:19:26.54 UTC MINOR: DEBUG #2001 Base OSPFv2
“OSPFv2: PKT

>> Incoming OSPF packet on I/F PEER2 area 0.0.0.2
OSPF Version : 2
Router Id : 200.200.200.200
Area Id : 0.0.0.2
Checksum : 5d27
Auth Type : Null
Auth Key : 00 00 00 00 00 00 00 00
Packet Type : HELLO
Packet Length : 44 “[/php]SR is configured with PEER2 in Area 1 but it should be in Area 2, lets fix that
[php]*A:SR>config>log# /configure router ospf
*A:SR>config>router>ospf# info
———————————————-
area 0.0.0.0
interface “system”
no shutdown
exit
exit
area 0.0.0.1
interface “PEER1”
no shutdown
exit
interface “PEER2”
no shutdown
exit
exit
no shutdown
———————————————-
*A:SR>config>router>ospf# area 1 interface “PEER2” shutdown
*A:SR>config>router>ospf# area 1 no interface “PEER2”
*A:SR>config>router>ospf# area 2 interface “PEER2” no shutdown
*A:SR>config>router>ospf# info
———————————————-
area 0.0.0.0
interface “system”
no shutdown
exit
exit
area 0.0.0.1
interface “PEER1”
no shutdown
exit
exit
area 0.0.0.2
interface “PEER2″
no shutdown
exit
exit
no shutdown
———————————————-[/php]Now see if that fixes that problem.
[php]*A:SR>config>router>ospf# show router ospf interface

===============================================================================
Rtr Base OSPFv2 Instance 0 Interfaces
===============================================================================
If Name Area Id Designated Rtr Bkup Desig Rtr Adm Oper
——————————————————————————-
system 0.0.0.0 1.1.1.1 0.0.0.0 Up DR
PEER1 0.0.0.1 1.1.1.1 100.100.100.100 Up DR
PEER2 0.0.0.2 200.200.200.200 1.1.1.1 Up BDR
——————————————————————————-
No. of OSPF Interfaces: 3
===============================================================================[/php]Yes we can see both the DR and BDR for our OSPF peers but before we move on, we should stop the debug activity
[php]*A:SR>config>router>ospf# /debug router no ospf
*A:SR>config>router>ospf# show debug
debug
exit[/php]Now lets see if OSPF routing exchange is occurring.
[php]*A:SR>config>router>ospf# show router route-table

===============================================================================
Route Table (Router: Base)
===============================================================================
Dest Prefix[Flags] Type Proto Age Pref
Next Hop[Interface Name] Metric
——————————————————————————-
1.1.1.1/32 Local Local 01h15m46s 0
system 0
10.1.2.0/27 Local Local 01h08m27s 0
PEER1 0
10.1.3.0/27 Local Local 01h08m57s 0
PEER2 0
——————————————————————————-
No. of Routes: 3
Flags: n = Number of times nexthop is repeated
B = BGP backup route available
L = LFA nexthop available
S = Sticky ECMP requested
===============================================================================[/php]Well that isn’t fixed yet (which should be no surprise as this is about MTU issues) so lets move onto the next phase and examine the state of our OSPF neighbors
[php highlight=”9,11″]*A:SR>config>router>ospf# show router ospf neighbor

===============================================================================
Rtr Base OSPFv2 Instance 0 Neighbors
===============================================================================
Interface-Name Rtr Id State Pri RetxQ TTL
Area-Id
——————————————————————————-
PEER1 100.100.100.100 ExchStart 1 0 34
0.0.0.1
PEER2 200.200.200.200 Exchange 1 0 32
0.0.0.2
——————————————————————————-
No. of Neighbors: 2
===============================================================================[/php]A router that is stuck in ExchStart or Exchange is a hallmark of OSPF MTU related problems.
Let’s start working on PEER1.[php highlight=”16,23”]*A:SR>config>router>ospf# show router ospf neighbor “PEER1” detail

===============================================================================
Rtr Base OSPFv2 Instance 0 Neighbors for Interface “PEER1” (detail)
===============================================================================
——————————————————————————-
Neighbor : 10.1.2.2
——————————————————————————-
——————————————————————————-
Neighbor Rtr Id : 100.100.100.100 Interface: PEER1
——————————————————————————-
Neighbor IP Addr : 10.1.2.2
Local IF IP Addr : 10.1.2.1
Area Id : 0.0.0.1
Designated Rtr : 1.1.1.1 Backup Desig Rtr : 100.100.100.100
Neighbor State : ExchStart Priority : 1
Retrans Q Length : 0 Options : – E – – – – – —
Events : 1068 Last Event Time : 06/02/2017 02:59:34
Up Time : 0d 01:11:00 Time Before Dead : 38 sec
GR Helper : Not Helping GR Helper Age : 0 sec
GR Exit Reason : None GR Restart Reason: Unknown (0)
Bad Nbr States : 0 LSA Inst fails : 0
Bad Seq Nums : 0 Bad MTUs : 1066
Bad Packets : 0 LSA not in LSDB : 0
Option Mismatches: 0 Nbr Duplicates : 0
Num Restarts : 0 Last Restart at : Never
===============================================================================[/php]There are quite a few Bad MTUs being reported – While some vendors have an option to ignore the OSPF MTU, there are quite a number of MTU implications that can occur within the core when you consider various tunnel options that this is not provided.

Before we start to change things lets see if our trusty log 99 to see says anything about this:[php]*A:SR>config>router>ospf# show log log-id 99 application OSPF message PEER1

===============================================================================
Event Log 99
===============================================================================
Description : Default System Log
Memory Log contents [size=500 next event=1473 (wrapped)]

1472 2017/06/02 02:40:17.67 UTC WARNING: OSPF #2043 Base VR: 1 OSPFv2 (0)
“LCL_RTR_ID 1.1.1.1: Conflicting configuration mtuMismatch on interface PEER1 from 10.1.2.2 in dbDescript”[/php]We can use another debug to determine what the actual MTU should be (as before with the area mismatch, log 99 gave us a hint as to the packet type we should be investigating):
[php]*A:SR>config>router>ospf# /debug router ospf packet dbdescr ingress “PEER1″[/php]
Clear the log and see what we are recieving:
[php highlight=”27”]*A:SR>config>router>ospf# /clear log 10
*A:SR>config>router>ospf# /show log log-id 10

===============================================================================
Event Log 10
===============================================================================
Description : (Not Specified)
Memory Log contents [size=100 next event=3 (not wrapped)]

2 2017/06/02 02:55:17.67 UTC MINOR: DEBUG #2001 Base OSPFv2
“OSPFv2: PKT DROPPED
MTU mismatch”

1 2017/06/02 02:55:17.67 UTC MINOR: DEBUG #2001 Base OSPFv2
“OSPFv2: PKT

>> Incoming OSPF packet on I/F PEER1 area 0.0.0.1
OSPF Version : 2
Router Id : 100.100.100.100
Area Id : 0.0.0.1
Checksum : e35a
Auth Type : Null
Auth Key : 00 00 00 00 00 00 00 00
Packet Type : DB_DESC
Packet Length : 32

Interface MTU : 1504
Options : 000042
Flags : 7 INIT MORE MAST
Sequence Num : 2514
“[/php]
Okay, so PEER1 requires an MTU of 1504, lets modify that within the OSPF configuration:[php]*A:SR>config>router>ospf# info
———————————————-
area 0.0.0.0
interface “system”
no shutdown
exit
exit
area 0.0.0.1
interface “PEER1”
no shutdown
exit
exit
area 0.0.0.2
interface “PEER2”
no shutdown
exit
exit
no shutdown
———————————————-
*A:SR>config>router>ospf# area 1 interface “PEER1″ mtu 1504[/php]When applying a configuration it is good to verify things are working as expected:
[php highlight=”2,3”]*A:SR>config>router>ospf# show router ospf interface “PEER1” detail | match MTU
Passive : False Cfg MTU : 1504
Oper MTU : 1500 Last Enabled : 06/02/2017 01:44:23[/php]Although we configured the MTU to be 1504, the Operational MTU is 1500 (This is because the IP MTU is 1500 so OSPF cant be given a larger MTU on this interface)
[php]*A:SR>config>router>ospf# show router interface “PEER1” detail | match MTU
IP MTU : (default)
IP Oper MTU : 1500[/php]
When Ethernet Ports are configured as mode access and left at the default encapsulation (null) the Ethernet port MTU is 1514 bytes (to support a 1500 byte IP MTU and 14 bytes of Ethernet Header – FCS is not included in MTU calculations)
[php]*A:SR>config>router>ospf# show port 1/1/1 | match MTU
Physical Link : Yes MTU : 1514[/php]To get a 1504 byte IP MTU, we can just add 4 bytes to the Port Ethernet MTU[php]*A:SR>config>router>ospf# /configure port 1/1/1 ethernet mtu 1518
*A:SR>config>router>ospf# show router interface “PEER1” detail | match MTU
IP MTU : (default)
IP Oper MTU : 1504
*A:SR>config>router>ospf# show router ospf interface “PEER1” detail | match MTU
Passive : False Cfg MTU : 1504
Oper MTU : 1504 Last Enabled : 06/02/2017 01:44:23[/php]This should mean that the OSPF neighbor will now perform the database exchange and enter the Full state.[php]*A:SR>config>router>ospf# show router ospf neighbor “PEER1”

===============================================================================
Rtr Base OSPFv2 Instance 0 Neighbors for Interface “PEER1″
===============================================================================
Interface-Name Rtr Id State Pri RetxQ TTL
Area-Id
——————————————————————————-
PEER1 100.100.100.100 Full 1 0 34
0.0.0.1
——————————————————————————-
No. of Neighbors: 1
===============================================================================
*A:SR>config>router>ospf# show router route-table

===============================================================================
Route Table (Router: Base)
===============================================================================
Dest Prefix[Flags] Type Proto Age Pref
Next Hop[Interface Name] Metric
——————————————————————————-
1.1.1.1/32 Local Local 02h10m36s 0
system 0
10.1.2.0/27 Local Local 02h03m16s 0
PEER1 0
10.1.3.0/27 Local Local 02h03m46s 0
PEER2 0
100.100.100.100/32 Remote OSPF 00h03m18s 10
10.1.2.2 100
——————————————————————————-
No. of Routes: 4
Flags: n = Number of times nexthop is repeated
B = BGP backup route available
L = LFA nexthop available
S = Sticky ECMP requested
===============================================================================[/php]The OSPF issue with PEER1 appears to have been resolved so back to PEER2.

[php highlight=”16,23”]*A:SR>config>router>ospf# show router ospf neighbor “PEER2” detail

===============================================================================
Rtr Base OSPFv2 Instance 0 Neighbors for Interface “PEER2″ (detail)
===============================================================================
——————————————————————————-
Neighbor : 10.1.3.3
——————————————————————————-
——————————————————————————-
Neighbor Rtr Id : 200.200.200.200 Interface: PEER2
——————————————————————————-
Neighbor IP Addr : 10.1.3.3
Local IF IP Addr : 10.1.3.1
Area Id : 0.0.0.2
Designated Rtr : 200.200.200.200 Backup Desig Rtr : 1.1.1.1
Neighbor State : Exchange Priority : 1
Retrans Q Length : 3 Options : – E – – – – O —
Events : 3 Last Event Time : 06/02/2017 02:22:36
Up Time : 0d 01:01:10 Time Before Dead : 37 sec
GR Helper : Not Helping GR Helper Age : 0 sec
GR Exit Reason : None GR Restart Reason: Unknown (0)
Bad Nbr States : 0 LSA Inst fails : 0
Bad Seq Nums : 0 Bad MTUs : 0
Bad Packets : 0 LSA not in LSDB : 0
Option Mismatches: 0 Nbr Duplicates : 917
Num Restarts : 0 Last Restart at : Never
===============================================================================[/php]There are no Bad MTUs being reported here, all we can see is that we are forever in Exchange state – lets check log 99 to see if anything at all related to PEER2 is present
[php linenumbers=”True”]*A:SR>config>router>ospf# show log log-id 99 message PEER2

===============================================================================
Event Log 99
===============================================================================
Description : Default System Log
Memory Log contents [size=500 next event=2033 (wrapped)][/php]There is nothing present (the older events have wrapped around since we are only keeping the last 500 events)
What I have found is when OSPF neighbors are stuck in ExchStart, your router is the one with the MTU too small but while the router that is stuck in Exchange is the one with the MTU that is too big for its peer.
To work out what the smaller MTU should be, we’ll send ping packets of various lengths to work out what is the biggest unfragmented packet that can be sent to PEER2. Note: when we send a ping and specify the size, we are actually calling out what the ICMP payload size should be, so we need to ensure for IPv4 we consider the 20 byte IP header and 8 byte ICMP header – so an IP interface with an IP-MTU of 1500 would work for a ping with a payload size of 1472 but would fail at 1473.
We can test this concept on a known quantity (PEER1 which has an IP MTU of 1504) we should be able to get a ping payload of 1476 through okay but 1477 should fail – make sure we set the DF bit!
[php]*A:SR>config>router>ospf# ping 10.1.2.2 size 1476 do-not-fragment count 3
PING 10.1.2.2 1476 data bytes
1484 bytes from 10.1.2.2: icmp_seq=1 ttl=64 time=1.34ms.
1484 bytes from 10.1.2.2: icmp_seq=2 ttl=64 time=1.27ms.
1484 bytes from 10.1.2.2: icmp_seq=3 ttl=64 time=1.16ms.

—- 10.1.2.2 PING Statistics —-
3 packets transmitted, 3 packets received, 0.00% packet loss
round-trip min = 1.16ms, avg = 1.26ms, max = 1.34ms, stddev = 0.072ms
*A:SR>config>router>ospf# ping 10.1.2.2 size 1477 do-not-fragment count 3
PING 10.1.2.2 1477 data bytes

—- 10.1.2.2 PING Statistics —-
3 packets transmitted, 3 packets bounced, 0 packets received, 100% packet loss[/php]This works as expected, so the concept appears sound.

[php]*A:SR>config>router>ospf# show router interface “PEER2” detail | match MTU
IP MTU : (default)
IP Oper MTU : 1500[/php]We know that we have a ceiling of 1500 and we know the MTU must be lower than this. But just to be certain, we’ll try based on a 1500 byte IP packet anyway[php]*A:SR>config>router>ospf# ping 10.1.3.3 size 1472 do-not-fragment count 3
PING 10.1.3.3 1472 data bytes
Request timed out. icmp_seq=1.
Request timed out. icmp_seq=2.
Request timed out. icmp_seq=3.

—- 10.1.3.3 PING Statistics —-
3 packets transmitted, 0 packets received, 100% packet loss[/php] Unsurprising, the Peer MTU is less than 1500 bytes, lets try a slightly smaller payload[php]*A:SR>config>router>ospf# ping 10.1.3.3 size 1462 do-not-fragment count 3
PING 10.1.3.3 1462 data bytes
1470 bytes from 10.1.3.3: icmp_seq=1 ttl=64 time=1.44ms.
1470 bytes from 10.1.3.3: icmp_seq=2 ttl=64 time=1.36ms.
1470 bytes from 10.1.3.3: icmp_seq=3 ttl=64 time=1.34ms.

—- 10.1.3.3 PING Statistics —-
3 packets transmitted, 3 packets received, 0.00% packet loss
round-trip min = 1.34ms, avg = 1.38ms, max = 1.44ms, stddev = 0.042ms[/php]Okay, time to divide and conquer to determine the largest payload that gets through[php]*A:SR>config>router>ospf# ping 10.1.3.3 size 1467 do-not-fragment count 3
PING 10.1.3.3 1467 data bytes
1475 bytes from 10.1.3.3: icmp_seq=1 ttl=64 time=1.24ms.
1475 bytes from 10.1.3.3: icmp_seq=2 ttl=64 time=1.30ms.
1475 bytes from 10.1.3.3: icmp_seq=3 ttl=64 time=2.16ms.

—- 10.1.3.3 PING Statistics —-
3 packets transmitted, 3 packets received, 0.00% packet loss
round-trip min = 1.24ms, avg = 1.57ms, max = 2.16ms, stddev = 0.417ms
*A:SR>config>router>ospf# ping 10.1.3.3 size 1469 do-not-fragment count 3
PING 10.1.3.3 1469 data bytes
Request timed out. icmp_seq=1.
Request timed out. icmp_seq=2.
Request timed out. icmp_seq=3.

—- 10.1.3.3 PING Statistics —-
3 packets transmitted, 0 packets received, 100% packet loss
*A:SR>config>router>ospf# ping 10.1.3.3 size 1468 do-not-fragment count 3
PING 10.1.3.3 1468 data bytes
1476 bytes from 10.1.3.3: icmp_seq=1 ttl=64 time=1.19ms.
1476 bytes from 10.1.3.3: icmp_seq=2 ttl=64 time=1.30ms.
1476 bytes from 10.1.3.3: icmp_seq=3 ttl=64 time=1.26ms.

—- 10.1.3.3 PING Statistics —-
3 packets transmitted, 3 packets received, 0.00% packet loss
round-trip min = 1.19ms, avg = 1.25ms, max = 1.30ms, stddev = 0.043ms[/php]An ICMP payload of 1468 fits within an IP packet with a size of 1496 – adjust the OSPF MTU to 1496 and see if that results in getting a full adjacency.
[php]*A:SR>config>router>ospf# area 2 interface “PEER2” mtu 1496
*A:SR>config>router>ospf# show router ospf interface “PEER2” detail | match MTU
Passive : False Cfg MTU : 1496
Oper MTU : 1496 Last Enabled : 06/02/2017 02:22:36
*A:SR>config>router>ospf# show router ospf neighbor “PEER2”

===============================================================================
Rtr Base OSPFv2 Instance 0 Neighbors for Interface “PEER2”
===============================================================================
Interface-Name Rtr Id State Pri RetxQ TTL
Area-Id
——————————————————————————-
PEER2 200.200.200.200 Full 1 0 35
0.0.0.2
——————————————————————————-
No. of Neighbors: 1
===============================================================================[/php]The adjacency is up – lets see what routes we have learnt
[php]*A:SR>config>router>ospf# show router route-table

===============================================================================
Route Table (Router: Base)
===============================================================================
Dest Prefix[Flags] Type Proto Age Pref
Next Hop[Interface Name] Metric
——————————————————————————-
1.1.1.1/32 Local Local 02h40m46s 0
system 0
10.1.2.0/27 Local Local 02h33m27s 0
PEER1 0
10.1.3.0/27 Local Local 02h33m56s 0
PEER2 0
100.100.100.100/32 Remote OSPF 00h33m29s 10
10.1.2.2 100
200.200.200.200/32 Remote OSPF 00h01m27s 10
10.1.3.3 100
——————————————————————————-
No. of Routes: 5
Flags: n = Number of times nexthop is repeated
B = BGP backup route available
L = LFA nexthop available
S = Sticky ECMP requested
===============================================================================[/php]
We now have learnt routes from PEER1 and PEER2, time for a quick dataplane verification:
[php]*A:SR>config>router>ospf# ping 100.100.100.100 source 1.1.1.1 count 1
PING 100.100.100.100 56 data bytes
64 bytes from 100.100.100.100: icmp_seq=1 ttl=64 time=1.38ms.

—- 100.100.100.100 PING Statistics —-
1 packet transmitted, 1 packet received, 0.00% packet loss
round-trip min = 1.38ms, avg = 1.38ms, max = 1.38ms, stddev = 0.000ms
*A:SR>config>router>ospf# ping 200.200.200.200 source 1.1.1.1 count 1
PING 200.200.200.200 56 data bytes
64 bytes from 200.200.200.200: icmp_seq=1 ttl=64 time=1.14ms.

—- 200.200.200.200 PING Statistics —-
1 packet transmitted, 1 packet received, 0.00% packet loss
round-trip min = 1.14ms, avg = 1.14ms, max = 1.14ms, stddev = 0.000ms
[/php] We now have successful routing exchange and data plane reachability.

The case of Nokia Virtual Service Router and the non-unique Chassis MAC Address

So I’m playing with eve-ng and have decided to work on a Layer 2 scenario and a few problems with my emulation environment came up which needed a way forward, which resulted in this rambling tale…

SROS 12.0R6 5 Router Topology

R1, R2 and R3 Will be the MPLS Core with VPLS configured, while R4 and R5 will be Layer 3 CE devices that talk to each other over the VPLS.

The CE Devices are pretty straight forward so we’ll get those up first

R4 is a single-ended configuration with Interface R5 on Port 1/1/1 having the IP 192.168.1.4/27
[python linenumbers=”false” tab=”R4 CE Config”]
configure
system
name “R4”
card 1
card-type iom3-xp-b
mda 1
mda-type m5-1gb-sfp-b
no shutdown
exit
no shutdown
exit
port 1/1/1
ethernet
exit
no shutdown
exit
router
interface “R5”
address 192.168.1.4/27
port 1/1/1
no shutdown
exit
interface “system”
no shutdown
exit
exit
exit all
[/python]

R5 is a a little more complex, it has a LAG toward – Interface R4 on LAG-1 with Ports 1/1/1 and 1/1/2 having the IP 192.168.1.5/27
[python linenumbers=”false” tab=”R5 CE Config”]
configure
system
name “R5”
exit
card 1
card-type iom3-xp-b
mda 1
mda-type m5-1gb-sfp-b
no shutdown
exit
no shutdown
exit
port 1/1/1
ethernet
autonegotiate limited
exit
no shutdown
exit
port 1/1/2
ethernet
autonegotiate limited
exit
no shutdown
exit
lag 1
port 1/1/1
port 1/1/2
lacp active administrative-key 32768
no shutdown
exit
router
interface “R4”
address 192.168.1.5/27
port lag-1
no shutdown
exit
interface “system”
no shutdown
exit
exit
exit all
[/python]
Multi-speed Ethernet interfaces when associated with a LAG must have autonegotiate set to limited to control the bundle member speed so they all bundle members operate the same speed

Now to Develop the MPLS Core Configuration on R1, R2 and R3 – this is quite straight forward, we are just going to use OSPF and LDP on the directly connected interfaces:

[codegroup]
[python linenumbers=”false” tab=”R1 Core Base Config”]
configure
system
name “R1”
exit
card 1
card-type iom3-xp-b
mda 1
mda-type m5-1gb-sfp-b
no shutdown
exit
no shutdown
exit
port 1/1/1
ethernet
exit
no shutdown
exit
port 1/1/2
ethernet
exit
no shutdown
exit
port 1/1/3
shutdown
ethernet
exit
exit
router
interface “R2”
address 10.1.2.1/27
port 1/1/1
no shutdown
exit
interface “R3”
address 10.1.3.1/27
port 1/1/2
no shutdown
exit
interface “system”
address 10.10.10.1/32
no shutdown
exit
ospf
area 0.0.0.0
interface “system”
no shutdown
exit
interface “R2”
no shutdown
exit
interface “R3”
no shutdown
exit
exit
exit
ldp
interface-parameters
interface “R2”
exit
interface “R3″
exit
exit
targeted-session
exit
no shutdown
exit
exit
exit all
[/python]
[python linenumbers=”false” tab=”R2 Core Base Config”]
configure
system
name “R2”
exit
card 1
card-type iom3-xp-b
mda 1
mda-type m5-1gb-sfp-b
no shutdown
exit
no shutdown
exit
port 1/1/1
ethernet
exit
no shutdown
exit
port 1/1/2
ethernet
exit
no shutdown
exit
port 1/1/3
shutdown
ethernet
exit
exit
router
interface “R1”
address 10.1.2.2/27
port 1/1/1
no shutdown
exit
interface “R3”
address 10.2.3.2/27
port 1/1/2
no shutdown
exit
interface “system”
address 10.10.10.2/32
no shutdown
exit
ospf
area 0.0.0.0
interface “system”
no shutdown
exit
interface “R1”
no shutdown
exit
interface “R3”
no shutdown
exit
exit
exit
ldp
interface-parameters
interface “R1”
exit
interface “R3″
exit
exit
targeted-session
exit
no shutdown
exit
exit
exit all
[/python]
[python linenumbers=”false” tab=”R3 Core Base Config”]
configure
system
name “R3”
exit
card 1
card-type iom3-xp-b
mda 1
mda-type m5-1gb-sfp-b
no shutdown
exit
no shutdown
exit
port 1/1/1
ethernet
exit
no shutdown
exit
port 1/1/2
ethernet
exit
no shutdown
exit
port 1/1/3
shutdown
ethernet
exit
exit
router
interface “R1”
address 10.1.3.3/27
port 1/1/2
no shutdown
exit
interface “R2”
address 10.2.3.3/27
port 1/1/3
no shutdown
exit
interface “system”
address 10.10.10.3/32
no shutdown
exit
ospf
area 0.0.0.0
interface “system”
no shutdown
exit
interface “R1”
no shutdown
exit
interface “R2”
no shutdown
exit
exit
exit
ldp
interface-parameters
interface “R1”
exit
interface “R2″
exit
exit
targeted-session
exit
no shutdown
exit
exit
exit all
[/python][/codegroup]
The Layer 2 Service that we are going to build is a VPLS and will be using Spoke-SDPs that connected to each adjacent router (an alternate could be to use a full-mesh but I specifically want to test STP operation here)
[codegroup]
[python linenumbers=”false” tab=”R1 SDP to R2 and R3″]
*A:R1>config>service# info
———————————————-
sdp 2 mpls create
far-end 10.10.10.2
ldp
keep-alive
shutdown
exit
no shutdown
exit
sdp 3 mpls create
far-end 10.10.10.3
ldp
keep-alive
shutdown
exit
no shutdown
exit
[/python]
[python linenumbers=”false” tab=”R2 SDP to R1 and R3″]
*A:R2>config>service# info
———————————————-
sdp 1 mpls create
far-end 10.10.10.1
ldp
keep-alive
shutdown
exit
no shutdown
exit
sdp 3 mpls create
far-end 10.10.10.3
ldp
keep-alive
shutdown
exit
no shutdown
exit
[/python]
[python linenumbers=”false” tab=”R3 SDP to R1 and R2″]
*A:R3>config>service# info
———————————————-
sdp 1 mpls create
far-end 10.10.10.1
ldp
keep-alive
shutdown
exit
no shutdown
exit
sdp 2 mpls create
far-end 10.10.10.2
ldp
keep-alive
shutdown
exit
no shutdown
exit
[/python][/codegroup]
Verifying the SDPs are up:
[codegroup]
[python linenumbers=”false” tab=”R1 SDP State”]
A:R1# show service sdp

============================================================================
Services: Service Destination Points
============================================================================
SdpId AdmMTU OprMTU Far End Adm Opr Del LSP Sig
—————————————————————————-
2 0 8914 10.10.10.2 Up Up MPLS L TLDP
3 0 8914 10.10.10.3 Up Up MPLS L TLDP
—————————————————————————-
Number of SDPs : 2
—————————————————————————-
Legend: R = RSVP, L = LDP, B = BGP, M = MPLS-TP, n/a = Not Applicable
============================================================================
[/python]
[python linenumbers=”false” tab=”R2 SDP State”]
A:R2# show service sdp

============================================================================
Services: Service Destination Points
============================================================================
SdpId AdmMTU OprMTU Far End Adm Opr Del LSP Sig
—————————————————————————-
1 0 8914 10.10.10.1 Up Up MPLS L TLDP
3 0 8914 10.10.10.3 Up Up MPLS L TLDP
—————————————————————————-
Number of SDPs : 2
—————————————————————————-
Legend: R = RSVP, L = LDP, B = BGP, M = MPLS-TP, n/a = Not Applicable
============================================================================
[/python]
[python linenumbers=”false” tab=”R3 SDP State”]
A:R3# show service sdp

============================================================================
Services: Service Destination Points
============================================================================
SdpId AdmMTU OprMTU Far End Adm Opr Del LSP Sig
—————————————————————————-
1 0 8914 10.10.10.1 Up Up MPLS L TLDP
2 0 8914 10.10.10.2 Up Up MPLS L TLDP
—————————————————————————-
Number of SDPs : 2
—————————————————————————-
Legend: R = RSVP, L = LDP, B = BGP, M = MPLS-TP, n/a = Not Applicable
============================================================================
[/python]
[/codegroup]
With the transport infrastructure in place VPLS 100 without the customer access components can be set up:
[codegroup]
[python linenumbers=”false” tab=”Initial R1 VPLS 100 Config”]
*A:R1>config>service>vpls$ pwc
——————————————————————————-
Present Working Context :
——————————————————————————-

configure
service
vpls “100” customer 1 create
——————————————————————————-
A:R1>config>service>vpls$ info
———————————————-
stp
no shutdown
exit
spoke-sdp 2:100 create
no shutdown
exit
spoke-sdp 3:100 create
no shutdown
exit
no shutdown
[/python]
[python linenumbers=”false” tab=”Initial R2 VPLS 100 Config”]
*A:R2>config>service>vpls$ pwc
——————————————————————————-
Present Working Context :
——————————————————————————-

configure
service
vpls “100” customer 1 create
——————————————————————————-
A:R2>config>service>vpls$ info
———————————————-
stp
no shutdown
exit
spoke-sdp 1:100 create
no shutdown
exit
spoke-sdp 3:100 create
no shutdown
exit
no shutdown
[/python]
[python linenumbers=”false” tab=”Initial R3 VPLS 100 Config”]
*A:R3>config>service>vpls$ pwc
——————————————————————————-
Present Working Context :
——————————————————————————-

configure
service
vpls “100” customer 1 create
——————————————————————————-
A:R3>config>service>vpls$ info
———————————————-
stp
no shutdown
exit
spoke-sdp 1:100 create
no shutdown
exit
spoke-sdp 2:100 create
no shutdown
exit
no shutdown
[/python]
[/codegroup]
Verify that VPLS 100 is up and running:
[codegroup]
[python linenumbers=”false” tab=”R1 VPLS 100 Spoke SDP State”]
*A:R1>config>service>*A:R1# show service id 100 base | match Ident post-lines 3
Identifier Type AdmMTU OprMTU Adm Opr
——————————————————————————-
sdp:2:100 S(10.10.10.2) Spok 0 8914 Up Up
sdp:3:100 S(10.10.10.3) Spok 0 8914 Up Up
[/python]
[python linenumbers=”false” tab=”R2 VPLS 100 Spoke SDP State”]
A:R2# show service id 100 base | match Ident post-lines 3
Identifier Type AdmMTU OprMTU Adm Opr
——————————————————————————-
sdp:1:100 S(10.10.10.1) Spok 0 8914 Up Up
sdp:3:100 S(10.10.10.3) Spok 0 8914 Up Up
[/python]
[python linenumbers=”false” tab=”R3 VPLS 100 Spoke SDP State”]
A:R3# show service id 100 base | match Ident post-lines 3
Identifier Type AdmMTU OprMTU Adm Opr
——————————————————————————-
sdp:1:100 S(10.10.10.1) Spok 0 8914 Up Up
sdp:2:100 S(10.10.10.2) Spok 0 8914 Up Up
[/python]
[/codegroup]
Looks good With 3 routers each connecting to each other using spokes will introduce a bridging loop so we need a loop avoidance mechanism – luckily we enabled STP, so lets see how STP is behaving:
[codegroup]
[python linenumbers=”false” tab=”R1 VPLS 100 STP State” highlight=”6-7″]
*A:R1# show service id 100 stp

===============================================================================
Stp info, Service 100
===============================================================================
Bridge Id : 80:00.da:00:ff:00:00:01 Top. Change Count : 4
Root Bridge : This Bridge Stp Oper State : Up
Primary Bridge : N/A Topology Change : Inactive
Mode : Rstp Last Top. Change : 0d 00:10:13
Vcp Active Prot. : N/A
Root Port : N/A External RPC : 0

===============================================================================
Stp port info
===============================================================================
Sap/Sdp/PIP Id Oper- Port- Port- Port- Oper- Link- Active
State Role State Num Edge Type Prot.
——————————————————————————-
2:100 Up Designated Forward 2049 True Pt-pt Rstp
3:100 Up Backup Discard 2050 False Pt-pt Rstp
===============================================================================
[/python]
[python linenumbers=”false” tab=”R2 VPLS 100 STP State” highlight=”6-7″]
*A:R2# show service id 100 stp

===============================================================================
Stp info, Service 100
===============================================================================
Bridge Id : 80:00.da:00:ff:00:00:01 Top. Change Count : 3
Root Bridge : This Bridge Stp Oper State : Up
Primary Bridge : N/A Topology Change : Inactive
Mode : Rstp Last Top. Change : 0d 00:10:47
Vcp Active Prot. : N/A
Root Port : N/A External RPC : 0

===============================================================================
Stp port info
===============================================================================
Sap/Sdp/PIP Id Oper- Port- Port- Port- Oper- Link- Active
State Role State Num Edge Type Prot.
——————————————————————————-
1:100 DwnstrmLp Designated Discard 2049 False Pt-pt Rstp
3:100 Up Backup Discard 2050 False Pt-pt Rstp
===============================================================================
[/python]
[python linenumbers=”false” tab=”R3 VPLS 100 STP State” highlight=”6-7″]
*A:R3# show service id 100 stp

===============================================================================
Stp info, Service 100
===============================================================================
Bridge Id : 80:00.da:00:ff:00:00:01 Top. Change Count : 3
Root Bridge : This Bridge Stp Oper State : Up
Primary Bridge : N/A Topology Change : Inactive
Mode : Rstp Last Top. Change : 0d 00:10:54
Vcp Active Prot. : N/A
Root Port : N/A External RPC : 0

===============================================================================
Stp port info
===============================================================================
Sap/Sdp/PIP Id Oper- Port- Port- Port- Oper- Link- Active
State Role State Num Edge Type Prot.
——————————————————————————-
1:100 Up Designated Forward 2048 False Pt-pt Rstp
2:100 Up Designated Forward 2049 False Pt-pt Rstp
===============================================================================
[/python][/codegroup]
This doesn’t seem right, SDP 1:100 on R2 is saying that the downstream interface is looped and both interfaces are discarding!

If we look at the highlighted lines on each of the router outputs we notice that all Routers in the VPLS have the same Bridge ID, which is definitely a bad thing.

For SROS, the Bridge Id is partly derived from the chassis MAC address:
[python linenumbers=”false”]*A:R1# show chassis detail | match MAC
Base MAC address : da:00:ff:00:00:01[/python]
[python linenumbers=”false”]*A:R2# show chassis detail | match MAC
Base MAC address : da:00:ff:00:00:01[/python]
[python linenumbers=”false”]*A:R3# show chassis detail | match MAC
Base MAC address : da:00:ff:00:00:01[/python]
With real hardware, the Chassis MAC address actually is unique so this problem wont come up – however with the VSRs they’re all the same.

As an asside, the Chassis MAC address is used in a few places besides STP, one is with the SNMP engine id
[python linenumbers=”false” highlight=”2,4″]*A:R1# show chassis detail | match MAC
Base MAC address : da:00:ff:00:00:01
*A:R1# show system information | match Engine
SNMP Engine ID : 0000197f0000da00ff000001
SNMP Engine Boots : 11[/python]

It is possible within the configuration to manually set the Engine ID (I think it would probably be best to do this in production just in case you end up replacing faulty hardware)

With SROS version 14.0R4 a new option for the boot options file (or bof) was introduced which allows the manual setting of the chassis MAC address (followed by a reboot):
[python linenumbers=”false”]*A:R14# bof system-base-mac 00:11:22:33:44:02
*A:R14# bof save
Writing BOF to cf3:/bof.cfg … OK
Completed.
Writing configuration to cf3:\config.cfg
Saving configuration … OK
Completed.
A:R14# /admin reboot
Are you sure you want to reboot (y/n)? y[/python]
Which is great but this particular set up is using SROS 12.0R6 and that BOF option doesn’t exist an alternate method is required.

For STP we can cast our mind back to remember what the Bridge ID consists of… It’s both the Priority (which by default is 32768) and the Bridge MAC address.

So as a quick and nasty fix, I should just be able to change the STP Priority in VPLS 100 on R1/R2/R3 and resolve the STP problem, it also will allow me to specifically select a root bridge which is probably a good thing to do.
[python linenumbers=”false” tab= “R1 VPLS 100 STP”]*A:R1# configure service vpls 100 stp priority 4096[/python]
[python linenumbers=”false” tab= “R2 VPLS 100 STP”]*A:R2# configure service vpls 100 stp priority 8192[/python]
[python linenumbers=”false” tab= “R3 VPLS 100 STP”]*A:R3# configure service vpls 100 stp priority 16384[/python]
Lets see how things are going now:

[codegroup]
[python linenumbers=”false” tab= “R1 VPLS 100 STP”]*A:R1# show service id 100 stp

===============================================================================
Stp info, Service 100
===============================================================================
Bridge Id : 10:00.da:00:ff:00:00:01 Top. Change Count : 6
Root Bridge : This Bridge Stp Oper State : Up
Primary Bridge : N/A Topology Change : Inactive
Mode : Rstp Last Top. Change : 0d 00:00:35
Vcp Active Prot. : N/A
Root Port : N/A External RPC : 0

===============================================================================
Stp port info
===============================================================================
Sap/Sdp/PIP Id Oper- Port- Port- Port- Oper- Link- Active
State Role State Num Edge Type Prot.
——————————————————————————-
2:100 Up Designated Forward 2049 False Pt-pt Rstp
3:100 Up Designated Forward 2050 False Pt-pt Rstp
===============================================================================[/python]
[python linenumbers=”false” tab= “R2 VPLS 100 STP”]*A:R2# show service id 100 stp

===============================================================================
Stp info, Service 100
===============================================================================
Bridge Id : 20:00.da:00:ff:00:00:01 Top. Change Count : 4
Root Bridge : 10:00.da:00:ff:00:00:01 Stp Oper State : Up
Primary Bridge : N/A Topology Change : Inactive
Mode : Rstp Last Top. Change : 0d 00:01:07
Vcp Active Prot. : N/A
Root Port : 2049 External RPC : 10

===============================================================================
Stp port info
===============================================================================
Sap/Sdp/PIP Id Oper- Port- Port- Port- Oper- Link- Active
State Role State Num Edge Type Prot.
——————————————————————————-
1:100 Up Root Forward 2049 False Pt-pt Rstp
3:100 Up Designated Forward 2050 False Pt-pt Rstp
===============================================================================[/python]
[python linenumbers=”false” tab= “R3 VPLS 100 STP”]*A:R3# show service id 100 stp

===============================================================================
Stp info, Service 100
===============================================================================
Bridge Id : 40:00.da:00:ff:00:00:01 Top. Change Count : 4
Root Bridge : 10:00.da:00:ff:00:00:01 Stp Oper State : Up
Primary Bridge : N/A Topology Change : Inactive
Mode : Rstp Last Top. Change : 0d 00:01:52
Vcp Active Prot. : N/A
Root Port : 2048 External RPC : 10

===============================================================================
Stp port info
===============================================================================
Sap/Sdp/PIP Id Oper- Port- Port- Port- Oper- Link- Active
State Role State Num Edge Type Prot.
——————————————————————————-
1:100 Up Root Forward 2048 False Pt-pt Rstp
2:100 Up Alternate Discard 2049 False Pt-pt Rstp
===============================================================================[/python][/codegroup]
Success, all routers have different bridge IDs and all agree that R1 is the root and only one port is in discarding state.

Now we will create the CE router attachments (Service Access Points) on the Core starting with R3 which is facing R4 – by default Ethernet ports are in network mode, to be able to bind to a service, the port must be mode access (or hybrid)
[python linenumbers=”false”]*A:R3# /configure port 1/1/1
*A:R3>config>port# shutdown
*A:R3>config>port# ethernet mode access
*A:R3>config>port# ethernet encap-type null
*A:R3>config>port# no shutdown
*A:R3>config>port# /configure service vpls 100
*A:R3>config>service>vpls# sap 1/1/1 create
*A:R3>config>service>vpls>sap$ show service id 100 base

===============================================================================
Service Basic Information
===============================================================================
Service Id : 100 Vpn Id : 0
Service Type : VPLS
Name : (Not Specified)
Description : (Not Specified)
Customer Id : 1 Creation Origin : manual
Last Status Change: 04/21/2017 13:20:28
Last Mgmt Change : 04/21/2017 13:44:59
Etree Mode : Disabled
Admin State : Up Oper State : Up
MTU : 1514 Def. Mesh VC Id : 100
SAP Count : 1 SDP Bind Count : 2
Snd Flush on Fail : Disabled Host Conn Verify : Disabled
Propagate MacFlush: Disabled Per Svc Hashing : Disabled
Allow IP Intf Bind: Disabled
Def. Gateway IP : None
Def. Gateway MAC : None
Temp Flood Time : Disabled Temp Flood : Inactive
Temp Flood Chg Cnt: 0
VSD Domain :

——————————————————————————-
Service Access & Destination Points
——————————————————————————-
Identifier Type AdmMTU OprMTU Adm Opr
——————————————————————————-
sap:1/1/1 null 1514 1514 Up Up
sdp:1:100 S(10.10.10.1) Spok 0 8914 Up Up
sdp:2:100 S(10.10.10.2) Spok 0 8914 Up Up
===============================================================================[/python]
Now things are going to get a little more complicated on R1 and R2 as we are going to establish a Multi-Chassis LAG towards R5. R5 is unaware of the MC-LAG, it is just talking LACP to R1 and R2 thinking they are just one system. R1 and R2 require synchronisation between each other to set up the Active-Standby LAG.

We’ll start by creating regular LAG-1 Facing R5 on R1 and R2 with a single port in each:
[codegroup]
[python linenumbers=”false” tab=”R1″]*A:R1# /configure port 1/1/3 shutdown
*A:R1# /configure port 1/1/3 ethernet mode access
*A:R1# /configure port 1/1/3 ethernet encap-type null
*A:R1# /configure port 1/1/3 ethernet autonegotiate limited
*A:R1# /configure port 1/1/3 no shutdown
*A:R1# /configure lag 1
*A:R1>config>lag$ mode access
*A:R1>config>lag$ lacp active
*A:R1>config>lag$ port 1/1/3
*A:R1>config>lag$ no shutdown[/python]
[python linenumbers=”false” tab=”R2″]*A:R2# /configure port 1/1/3 shutdown
*A:R2# /configure port 1/1/3 ethernet mode access
*A:R2# /configure port 1/1/3 ethernet encap-type null
*A:R2# /configure port 1/1/3 ethernet autonegotiate limited
*A:R2# /configure port 1/1/3 no shutdown
*A:R2# /configure lag 1
*A:R2>config>lag$ mode access
*A:R2>config>lag$ lacp active
*A:R2>config>lag$ port 1/1/3
*A:R2>config>lag$ no shutdown[/python]
[/codegroup]
Now to set up MC-LAG we need to set up a multi-chassis peering between R1 and R2 (multi-chassis redundancy supports more than just MC-LAG):
[codegroup]
[python linenumbers=”false” tab=”R1 MC Peer with R2″]*A:R1>config>lag# /configure redundancy multi-chassis peer 10.10.10.2 create
*A:R1>config>redundancy>multi-chassis>peer# no shutdown[/python]
[python linenumbers=”false” tab=”R2 MC Peer with R1″]*A:R2>config>lag# /configure redundancy multi-chassis peer 10.10.10.1 create
*A:R2>config>redundancy>multi-chassis>peer# no shutdown[/python]
[/codegroup]
Then we create the MC-LAG itself, we require the lacp-key, system-id and priority to be the same on each router:
[codegroup]
[python linenumbers=”false” tab=”R1 MC-LAG to R5″]*A:R1>config>redundancy>multi-chassis>peer# mc-lag
*A:R1>config>redundancy>mc>peer>mc-lag#lag 1 lacp-key 2468 remote-lag 1 system-id 00:00:be:ef:ca:fe system-priority 1000
*A:R1>config>redundancy>mc>peer>mc-lag#no shutdown[/python]
[python linenumbers=”false” tab=”R2 MC-LAG to R5″]*A:R2>config>redundancy>multi-chassis>peer# mc-lag
*A:R2>config>redundancy>mc>peer>mc-lag#lag 1 lacp-key 2468 remote-lag 1 system-id 00:00:be:ef:ca:fe system-priority 1000
*A:R2>config>redundancy>mc>peer>mc-lag#no shutdown[/python]
[/codegroup]
Now the MC-LAG should be up and running, first we’ll check the peering
[python linenumbers=”false”]*A:R1>config>redundancy>mc>peer>mc-lag# show redundancy multi-chassis all

===============================================================================
Multi-Chassis Peers
===============================================================================
Peer IP Peer Admin Client Admin Oper State
Src IP Auth
——————————————————————————-
10.10.10.2 Enabled MC-Sync: — — —
10.10.10.1 None MC-Ring: — — —
MC-Endpt: — — —
MC-Lag: Enabled Enabled —
MC-IPsec: — — Disabled
===============================================================================[/python]
[python linenumbers=”false”]*A:R2>config>redundancy>mc>peer>mc-lag# show redundancy multi-chassis all

===============================================================================
Multi-Chassis Peers
===============================================================================
Peer IP Peer Admin Client Admin Oper State
Src IP Auth
——————————————————————————-
10.10.10.1 Enabled MC-Sync: — — —
10.10.10.2 None MC-Ring: — — —
MC-Endpt: — — —
MC-Lag: Enabled Enabled —
MC-IPsec: — — Disabled
===============================================================================[/python]
Looks promising, lets check our LAG status
[codegroup][python linenumbers=”false” tab=”R1 LAG Status”]*A:R1>config>redundancy>mc>peer>mc-lag# show lag

===============================================================================
Lag Data
===============================================================================
Lag-id Adm Opr Weighted Threshold Up-Count MC Act/Stdby
——————————————————————————-
1 up down No 0 0 standby
——————————————————————————-
Total Lag-ids: 1 Single Chassis: 0 MC Act: 0 MC Stdby: 1
===============================================================================[/python]
[python linenumbers=”false” tab=”R2 LAG Status”]*A:R2>config>redundancy>mc>peer>mc-lag# show lag

===============================================================================
Lag Data
===============================================================================
Lag-id Adm Opr Weighted Threshold Up-Count MC Act/Stdby
——————————————————————————-
1 up down No 0 0 standby
——————————————————————————-
Total Lag-ids: 1 Single Chassis: 0 MC Act: 0 MC Stdby: 1
===============================================================================[/python][/codegroup]
Ummm… both of these are showing that they are in Multi-Chassis Standby

It turns out that within the MC-LAG configuration, the Base Chassis MAC needs to be unique too. While we cannot directly change the Base MAC prior to SROS version 14.0R4 there is actually an alternative method available. if we set the out-of-band management ethernet IP address, this will influence the chassis MAC address.
[python linenumbers=”false”]*A:R1>config>lag# show bof
===============================================================================
BOF (Memory)
===============================================================================
primary-image cf3:\timos\both.tim
primary-config cf3:\config.cfg
autonegotiate
duplex full
speed 100
wait 3
persist off
no li-local-save
no li-separate
console-speed 115200
===============================================================================
*A:R1>config>lag# /bof address 192.168.100.1/24
*A:R1>config>lag# /bof save
Writing BOF to cf3:/bof.cfg … OK
Completed.
*A:R1>config>lag# show bof
===============================================================================
BOF (Memory)
===============================================================================
primary-image cf3:\timos\both.tim
primary-config cf3:\config.cfg
address 192.168.100.1/24 active
autonegotiate
duplex full
speed 100
wait 3
persist off
no li-local-save
no li-separate
console-speed 115200
===============================================================================[/python]
Save and reboot
[python linenumbers=”false”]*A:R1>config>lag# /admin save
Writing configuration to cf3:\config.cfg
Saving configuration … OK
Completed.
A:R1>config>lag# /admin reboot
Are you sure you want to reboot (y/n)? y[/python]
We’ll do the same thing with R2 but give it a different IP so the MAC Addresses should be different:
[python linenumbers=”false”]*A:R2>config>lag# /bof address 192.168.100.2/24
*A:R2>config>lag# /bof save
Writing BOF to cf3:/bof.cfg … OK
Completed.
*A:R2>config>lag# /admin save
Writing configuration to cf3:\config.cfg
Saving configuration … OK
Completed.
A:R2>config>lag# /admin reboot
Are you sure you want to reboot (y/n)? y [/python]
After the reboot we can compare R1 and R2’s Base MAC Address
[python linenumbers=”false”]A:R1# show chassis detail | match MAC
Base MAC address : c8:01:ff:00:00:00[/python]
[python linenumbers=”false”]A:R2# show chassis detail | match MAC
Base MAC address : c8:02:ff:00:00:00[/python]
Okay they are different now – has it resolved our MC-LAG issue?
[codegroup][python linenumbers=”false” tab=”R1 LAG Port”]A:R1# show lag 1 port

===============================================================================
Lag Port States
LACP Status: e – Enabled, d – Disabled
===============================================================================
Lag-id Port-id Adm Act/Stdby Opr Primary Sub-group Forced Priority
——————————————————————————-
1(e) 1/1/3 up active up yes 1 – 32768
===============================================================================[/python]

[python linenumbers=”false” tab=”R1 LAG Port”]A:R2# show lag 1 port

===============================================================================
Lag Port States
LACP Status: e – Enabled, d – Disabled
===============================================================================
Lag-id Port-id Adm Act/Stdby Opr Primary Sub-group Forced Priority
——————————————————————————-
1(e) 1/1/3 up standby down yes 1 – 32768
===============================================================================[/python]
[python linenumbers=”false” tab=”R5 LAG Port”]A:R5# show lag 1 port

===============================================================================
Lag Port States
LACP Status: e – Enabled, d – Disabled
===============================================================================
Lag-id Port-id Adm Act/Stdby Opr Primary Sub-group Forced Priority
——————————————————————————-
1(e) 1/1/1 up active up yes 1 – 32768
1/1/2 up active down 1 – 32768
===============================================================================[/python][/codegroup]
Yes R1, R2 and R5 are in alignment, now lets put the LAG into VPLS 100 on R1 and R2
[python linenumbers=”false”]A:R1# /configure service vpls 100 sap lag-1 create[/python]
[python linenumbers=”false”]A:R2# /configure service vpls 100 sap lag-1 create[/python]
Lets see if R5 can ping R4
[python linenumbers=”false”]A:R5# ping 192.168.1.4 count 1
PING 192.168.1.4 56 data bytes
64 bytes from 192.168.1.4: icmp_seq=1 ttl=64 time=12.3ms.

—- 192.168.1.4 PING Statistics —-
1 packet transmitted, 1 packet received, 0.00% packet loss
round-trip min = 12.3ms, avg = 12.3ms, max = 12.3ms, stddev = 0.000ms[/python]
Success!

Lets check the MAC address table in vpls 100 (Forwarding Data Base):
[codegroup]
[python linenumbers=”false” tab=”R1 FDB” highlight “9,10”]*A:R1>config>service>vpls>sap$ show service id 100 fdb detail

===============================================================================
Forwarding Database, Service 100
===============================================================================
ServId MAC Source-Identifier Type Last Change
Age
——————————————————————————-
100 50:00:00:07:00:01 sdp:3:100 L/0 04/21/17 14:47:33
100 da:00:ff:00:01:42 sap:lag-1 L/0 04/21/17 14:52:57
——————————————————————————-
No. of MAC Entries: 2
——————————————————————————-
Legend: L=Learned O=Oam P=Protected-MAC C=Conditional S=Static
===============================================================================[/python]
[python linenumbers=”false” tab=”R2 FDB” highlight “9,10”]*A:R2>config>service>vpls>sap$ show service id 100 fdb detail

===============================================================================
Forwarding Database, Service 100
===============================================================================
ServId MAC Source-Identifier Type Last Change
Age
——————————————————————————-
100 50:00:00:07:00:01 sdp:1:100 L/90 04/21/17 14:53:01
100 da:00:ff:00:01:42 sdp:1:100 L/90 04/21/17 14:45:05
——————————————————————————-
No. of MAC Entries: 2
——————————————————————————-
Legend: L=Learned O=Oam P=Protected-MAC C=Conditional S=Static
===============================================================================[/python]
[python linenumbers=”false” tab=”R3 FDB” highlight “9,10”]*A:R2>config>service>vpls>sap$ show service id 100 fdb detail

===============================================================================
Forwarding Database, Service 100
===============================================================================
ServId MAC Source-Identifier Type Last Change
Age
——————————————————————————-
100 50:00:00:07:00:01 sap:1/1/1 L/0 04/21/17 14:52:42
100 da:00:ff:00:01:42 sdp:1:100 L/0 04/21/17 14:44:46
——————————————————————————-
No. of MAC Entries: 2
——————————————————————————-
Legend: L=Learned O=Oam P=Protected-MAC C=Conditional S=Static
===============================================================================[/python][/codegroup]
Now to check out the MC-LAG resiliency, we’ll start a continuous ping on R5 to R4 and then shutdown port 1/1/3 (LAG-1) on R1
[python linenumbers=”false”]*A:R1>config>service>vpls>sap$ /configure port 1/1/3 shutdown[/python]
And Check if R2 LAG 1 Port 1/1/3 goes from standby to active
[python linenumbers=”false”]*A:R2>config>service>vpls>sap$ show lag 1 port

===============================================================================
Lag Port States
LACP Status: e – Enabled, d – Disabled
===============================================================================
Lag-id Port-id Adm Act/Stdby Opr Primary Sub-group Forced Priority
——————————————————————————-
1(e) 1/1/3 up active up yes 1 – 32768
===============================================================================[/python]
We can see the interface has come up and there were a few packets lost but the link recovered – we could speed up the link convergence time but I think the general concept has been demonstrated sucessfully.

The moral of the story here – with Virtual SROS systems, it’s worth ensuring you have a unique chassis MAC address!