Thursday, August 17, 2017

Automating NSX Integration with PCF

Customers using Pivotal Cloud Foundry (PCF) with VMware NSX (SDN Solution) can now automate the creation of NSX Edge Gateway instances with a pre-configured set of subnets and Load balancers for various PCF tiles/products components. nsx-edge-gen helps automate the creation of NSX Edge instances pre-configured for PCF, while nsx-ci-pipeline helps install core set of PCF products integrated with NSX using the Concourse pipeline respectively.

Overview

Pivotal Cloud Foundry is a Cloud Foundry Platform for running cloud native applications on various IaaS (like vSphere, AWS, GCP, Azure, Openstack). 

Customers using VMware NSX as the SDN solution and vSphere as the IaaS, configuring NSX Edge instances for networking and load balancing requires multiple manual steps to create different networks for logical partitioning of different PCF products, create and configure NSX edge instances with these subnets, virtual servers etc. in order to get PCF Platform up and running. Creation of a NSX Edge can be tricky and time consuming and repeating it for creation of multiple PCF Foundations is quite a labor even for experienced administrators. The manual steps are detailed in the NSX Edge Cookbook for PCF.

Reference architecture of NSX + PCF

Ref: https://docs.pivotal.io/pivotalcf/1-11/refarch/images/vsphere-overview-arch.png

Details

NSX Edge acts as the gatekeeper to a set of logical switches and load balancers associated with a PCF Foundation (managed by one Ops Mgr and BOSH Director).

The Logical switches are associated with the subnets used by various products/layers:
  • Infrastructure - for BOSH and Ops Mgr managing the entire install
  • Deployment - for main Elastic Runtime Tile that is Cloud Foundry
  • Services - for other supporting tiles that provides services (like MySQL, RabbitMQ, SCS...)
  • Dynamic Services - for those tiles that support On-Demand Broker model of spinning of new service instances on demand.
  • Isolation Segments - for apps that require their own Routers and Diego cells for specialized hardware/routing/isolation.


Virtual servers are required with pools (application profiles, roles and monitors) to handle load balancing for components like GoRouters, Diego SSH access, MySQL proxy, RabbitMQ proxy and so on.

Additionally some generic coarse grained firewall rules can be applied at the NSX Edge instance level to allow or disallow communication between tiles/products or inbound and outbound directions (East-West and North-South). 

Creation of all these manual steps requires tedious and careful steps and repeating them for each foundation means more time and grunt work.


nsx-edge-gen Tool


Automating the NSX Edge Creation

Users can now automate the creation of the NSX edge instances with a set of logical switches, load balancers, pools that conforms to a template using the nsx-edge-gen toolkit. The tool is supported against the NSX-V (6.2.x versions, not NSX-T to be released later in 2017 or early 2018) of the VMware NSX product. 

Note: The tool does not install or configure vSphere or NSX Manager itself, only works on an existing installation and creates/deletes NSX edge instances on an existing NSX Manager.

To start with, nsx-edge-gen provides a template of logical switches and routed components as provided in the reference architecture and the user can modify it either via command line or a yaml config file.


Distributed Logical Router (DLR)

Distributed Logical Router is a special use case that allows all communication between the logical switches and subnets to avoid doing hair-pin bend across the NSX Edge and rather use the Distributed Logical router (DLR). Also, a DLR allows well above the the maximum default of 10 logical switches that can be associated with a NSX edge instance. There is an auto-generated OSPF subnet wrapping the DLR and connecting it to the NSX Edge instance. There is a 1-1-1 mapping between each NSX edge, OSPF and DLR and these are auto-created. There is no overhead cost (license or performance) due to the OSPF and DLR layers.

If the DLR option is disabled, then the standard PCF reference architecture as defined in the NSX cookbook would be used (no DLR or OSPF).

Logical switches


```
logical_switches:
- name: OSPF  
  cidr: 172.16.100.10/24
  primary_ip: 172.16.100.2
- name: Infra  
  cidr: 192.168.10.0/26
  primary_ip: 192.168.10.2
- name: Ert
  cidr: 192.168.20.0/22
  primary_ip: 192.168.20.2
- name: PCF-Tiles
  cidr: 192.168.24.0/22
  primary_ip: 192.168.24.2
- name: Dynamic-Services
  cidr: 192.168.28.0/22
  primary_ip: 192.168.28.2
#- name: IsoZone-01
#  cidr: 192.168.32.0/22
#  primary_ip: 192.168.32.2
# - name: IsoZone-02
#   cidr: 192.168.36.0/22
#   primary_ip: 192.168.36.2
```
Additional logical switches can be added, like additional isolation segments. The subnets can also be tweaked.

Routed Components

PCF components like GoRouter, Diego Brain, MySQL Proxy require load balancer in front for HA and distribution of traffic across multiple instances. Similar requirement exists for the RabbitMQ, MySQL Tiles etc. Some others like Operations Manager (Ops Mgr for short) requires only a vip to access it.

nsx-edge-gen provides a default set of routed components with associated logical switches, offsets, number of instances for each component etc.

```
routed_components:
- id: OPS 
  name: OPS
  switch: INFRA
  external: true
  useVIP: false
  instances: 1
  offset: 5
  monitor_id: monitor-3
  transport:
    ingress:
      port: '443'
      protocol: https
    egress:
      port: '443'
      protocol: https
      monitor_port: '443'
      url: "/"
  
- id: GO-ROUTER
  name: GO-ROUTER
  switch: ERT
  external: true
  useVIP: true
  instances: 4
  offset: 200
  monitor_id: monitor-4
  transport:
    ingress:
      port: '443'
      protocol: https
    egress:
      port: '80'
      protocol: tcp
      monitor_port: '80'
      # protocol: http
      # monitor_port: '8080'
      # url: "/health"
```

One can specify whether the component needs to be external or not, number of vms hosting that component, use VIP, type of monitor (http/tcp/..), ingress and egress for the load balancer. Sample pasted above specifies Ops Manager that needs to be exposed outside via a VIP but does not require a load balancer  (single instance) and needs to run on Infra subnet. The offset determines the IP to be assigned to vms from the subnet CIDR.

Default configuration includes Ops Manager (tagged as OPS), GoRouter (GO-ROUTER), Diego Brain for ssh (DIEGO), TCP Router (TCP-ROUTER), MySQL bundled within ERT as well as the separate service Tile (as MYSQL-<type>), RabbitMQ (RABBITMQ-TILE), Iso Segment GoRouters (GO-ROUTER-ISO). Each of these components are associated with the logical switches.

The default built-in configuration is good enough for most deployments.

These components are then tied together with NSX Load balancers and pools with Application Rules and Profiles. Profile specify the ingress and egress protocol for the load balancer (LBR). For instance, GoRouter might let the LBR handle SSL termination and only allow plain traffic over Http. So, it can use the https-http profile while MySQL would use a pure tcp style profile. The Application Roles include http logging, forward, including X-Forward-Proto headers etc.


Generation

nsx-edge-gen requires user to provide some default configurations (like endpoints and credentials to vSphere and NSX Manager, Cluster, Datastore, Datacenter). Other configurations required are name of the edge, transport zone used for the logical switches, ssl certs (or allow autogeneration) for the LBRs, distributed portgroup in case of DLR enablement, uplink ips for the various components (as VIP) etc.  Multiple edge instances can be created too using the same template. 

Each NSX Edge instance would be created with a default set of firewall rules, virtual servers, pools, profiles etc. For those that have DLR enabled, there would be an OSPF network acting as the bridge between the NSX Edge and its DLR. 

Use the tool with build, list or delete options to either build a NSX Edge instance, list available instances (& logical switches and verify the parameters) or delete a specified edge instance.

OSPF and DLR



Logical Switches



Firewall Rules




NATs




Virtual Servers





Additionally, based on user indicating the target BOSH environment supports NSX or not, NSX would not populate the pool members (would rather use NSX Security group association for jobs) in case of BOSh supporting NSX (as in PCF 1.11) or statically fill in the member ips based on offset and instance counts.

Customizing the configuration

Using command line arguments, its entirely possible to override the subnets, names, offsets, instances etc. Check the sample test script under the test folder of nsx-edge-gen.

```
#!/bin/bash
echo "Use build, list, delete"
echo "Default option: list"
echo ""

RUN_CMD=${1:-list}
CONFIG_NAME=test-nsx
rm -rf $CONFIG_NAME

./nsx-gen/bin/nsxgen -i $CONFIG_NAME init

./nsx-gen/bin/nsxgen -c $CONFIG_NAME  \
  -esg_name_1 edge1 \
  -esg_size_1 compact \
  -esg_cli_user_1 admin \
  -esg_cli_pass_1 'P1v0t4l!P1v0t4l!' \
  -esg_ert_certs_1 Foundation1 \
  -nsxmanager_dportgroup DPortGroupTest \
  -nsxmanager_en_dlr true \
  -nsxmanager_bosh_nsx_enabled true \
  -nsxmanager_tz TestTZ \
  -nsxmanager_tz_clusters 'Cluster1,Cluster2' \
  -esg_ert_certs_config_sysd_1 sys2.test.pivotal.io \
  -esg_ert_certs_config_appd_1 apps3.test.pivotal.io \
  -esg_iso_certs_1_1 iso-1 \
  -esg_iso_certs_config_switch_1_1 IsoZone-1 \
  -esg_iso_certs_config_ou_1_1 Pivotal \
  -esg_iso_certs_config_cc_1_1 US \
  -esg_iso_certs_config_domains_1_1 zone1-app.test.pivotal.io \
  -esg_opsmgr_uplink_ip_1 10.193.99.171 \
  -esg_go_router_uplink_ip_1 10.193.99.172 \
  -esg_diego_brain_uplink_ip_1 10.193.99.173 \
  -esg_tcp_router_uplink_ip_1 10.193.99.174 \
  -esg_mysql_ert_uplink_ip_1 192.168.23.250 \
  -esg_mysql_ert_inst_1 5  \
  -esg_mysql_tile_uplink_ip_1 192.168.27.250 \
  -esg_mysql_tile_inst_1 2  \
  -esg_rabbitmq_tile_uplink_ip_1 192.168.27.251 \
  -esg_rabbitmq_tile_inst_1 5 \
  -esg_rabbitmq_tile_off_1 10 \
  -vcenter_addr vcsa-01.test.pivotal.io \
  -vcenter_user administrator@vsphere.local \
  -vcenter_pass 'passwd!' \
  -vcenter_dc "" \
  -vcenter_ds vsanDatastore \
  -vcenter_cluster Cluster1 \
  -nsxmanager_addr 10.193.99.20 \
  -nsxmanager_user admin \
  -nsxmanager_pass 'passwd!' \
  -nsxmanager_uplink_ip 10.193.99.170 \
  -nsxmanager_uplink_port 'VM Network' \
  -esg_gateway_1 10.193.99.1  \
 -isozone_switch_name_1 IsoZone-1 \
 -isozone_switch_cidr_1 192.168.34.0/22 \
 -isozone_switch_name_2 IsoZone-2 \
 -isozone_switch_cidr_2 192.168.38.0/22 \
 -esg_go_router_isozone_1_uplink_ip_1  10.193.99.181 \
 ....
```
This allows the user to override the configuration using env variables without rebuilding yaml configurations for each run.

NSX CI Pipeline

The concourse pipelines built in nsx-ci-pipeline allow users to use Concourse pipeline to drive the creation of the NSX Edge instances and then install the Ops Mgr along with ERT and other Pivotal Products that leverage the NSX Edge for load balancing and or security policies.

The users can create a parameter file that is consumed by the pipeline to connect and create NSX edge instances by utilizing the nsx-edge-gen tool mentioned previously. The Ops Mgr tile would be installed after the edge creation and the networks would be auto-populated based on the recently created edge instance logical switches followed by installation of ERT and other product Tiles.

Users can either stop with just installation of the Ops Mgr (for vSphere) and ERT or go all the way with installation of MySQL, RabbitMQ, Spring Cloud Service and Isolation Segment tiles.

Just Ops Mgr and ERT:




Full Install (with MySQL, RabbitMQ, Iso Segments):




The pipeline supports both PCF 1.10 and 1.11. The Product versions, NSX connection information all these can be specified in the concourse parameters file.

With the NSX integration in PCF1.11 BOSH layer, the pipeline automatically calls Ops Mgr APIs to register the routed components with the pre-created load balancer pools along with any additional security groups specified for components that require proxy/load balancing (like GoRouters, TCP routers, mysql proxy, rabbit proxy etc.) This allows the pool to be correctly associated with the members as the platform scales up or down its components, rather than going with a static predefined set of members. The behavior is not available in 1.10 and so the pools would be statically populated with the IP addresses pre-determined by the nsx-edge-gen execution.

Multiple Isolation Segment Tiles can be installed using the add-additional-iso-segment pipeline.

Summary

Using the nsx-edge-gen and nsx-ci-pipeline, users of PCF and NSX can automate the manual steps involved in NSX Edge creation and configuration and PCF installs, while allowing a fast, easy and efficient creation of multiple NSX Edges and PCF foundations that are built to conform to a desired layout in a consistent manner.