Virtualized Infrastructure Manager ================================== Prerequisites ############# * Access to fabric management nodes * `Basic services `_ are available and configured - dhcpd - tftpd - httpd - coredns - registry - monitor Requirements ^^^^^^^^^^^^ **Basic connectivity model using OOB NWs** * Connectivity between DC Services and Nodes is available Overview ######## The VIM is responsible for controlling and managing the NFV infrastructure (NFVI) compute, storage, and network resources, usually within one operator’s infrastructure domain. The VIM is a specific part of the MANO framework, but can have multiple instances in a network. There are two general kinds of instances. Some manage multiple types of NFVI resources, or they can be specialized to handle a certain type. Management aspects concerning virtualized resources, including information, provisioning, reservation, capacity, performance and fault management. This scope of management concerns to the functionality produced by the VIM and exposed over the Or-Vi and Vi-Vnfm reference points. .. image:: _static/vim.png :align: center :alt: VIM Stages ^^^^^^ Fabric livecycle stages assume that the infrastructure is available and basic VIM services are available and reachable via OOB by all nodes. * Introspection * Node install including Basic Fabric and Keepalived static pods on all nodes * Create initial cluster * Decommission bootstrap and install the third controller * Add workers to the cluster * Deploy fabric services * Node Update * Node Decommissioning Quick Start ########### If VIM is remote located, it is possible to create a L3 VXLAN tunnel between VIM and a node direct conneted to the fabric .. note:: The underlay network could be either ipv4 or ipv6 Example ------- On the VIM node .. code-block:: sh ip link add name vxlan42 type vxlan id 42 dev eth1 remote 2602:807:900e:141:0:1866:dae6:4674 local 2602:807:9000:8::10b dstport 4789 ip -6 address add fd02:9a04::1/64 dev vxlan42 ip link set up vxlan42 On the direct connected node: .. code-block:: sh ip link add name vxlan42 type vxlan id 42 dev mgmt0 remote 2602:807:9000:8::10b local 2602:807:900e:141:0:1866:dae6:4674 dstport 4789 ip -6 address add fd02:9a04::2/64 dev vxlan42 ip link set up vxlan42 Add a static route to svc0 network via VXLAN tunnel .. code-block:: sh ip -6 route add fd02:9a01::/64 via fd02:9a04::2 dev vxlan42 VIM node ^^^^^^^^ Get the code .. code-block:: sh mkdir vim && cd vim git clone --recurse-submodules -j8 https://gitlab.stroila.ca/slg/ocp/cluster.git pushd cluster git submodule update --remote popd Install basic services in the VIM node .. code-block:: sh pushd cluster make clean && make deps popd PXE infrastructure nodes ^^^^^^^^^^^^^^^^^^^^^^^^ Use Bios or iPxe method Boostrap node ^^^^^^^^^^^^^ Patch etcd after api server is up .. code-block:: sh oc patch etcd cluster -p='{"spec": {"unsupportedConfigOverrides": {"useUnsupportedUnsafeNonHANonProductionUnstableEtcd": true}}}' --type=merge --kubeconfig /etc/kubernetes/kubeconfig Master nodes ^^^^^^^^^^^^ Ignite the two masters .. code-block:: sh curl http://registry.ocp.labs.stroila.ca/cfg/master.ign > /boot/ignition/config.ign touch /boot/ignition.firstboot reboot Conver Bootstrap into Master ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Ignite the third master .. code-block:: sh curl http://registry.ocp.labs.stroila.ca/cfg/master.ign > /boot/ignition/config.ign touch /boot/ignition.firstboot reboot Approuve CSR's ^^^^^^^^^^^^^^ From VIM node .. code-block:: sh oc get csr --kubeconfig /opt/ocp/assets/auth/kubeconfig For each "Pending" csr: .. code-block:: sh oc adm certificate approve --kubeconfig /opt/ocp/assets/auth/kubeconfig Verify the Installation ^^^^^^^^^^^^^^^^^^^^^^^ **Verify cluster operators** .. code-block:: sh [root@node-f01fafce5ded infra]# oc get co --kubeconfig /opt/ocp/assets/auth/kubeconfig NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication 4.9.23 True False False 159m baremetal 4.9.23 True False False 3h7m cloud-controller-manager 4.9.23 True False False 3h23m cloud-credential 4.9.23 True False False 4h8m cluster-autoscaler 4.9.23 True False False 3h7m config-operator 4.9.23 True False False 3h9m console 4.9.23 True False False 162m csi-snapshot-controller 4.9.23 True False False 3h8m dns 4.9.23 True False False 3h6m etcd 4.9.23 True False False 3h6m image-registry 4.9.23 True False False 171m ingress 4.9.23 True False False 165m insights 4.9.23 True False False 168m kube-apiserver 4.9.23 True False False 165m kube-controller-manager 4.9.23 True False False 3h6m kube-scheduler 4.9.23 True False False 3h6m kube-storage-version-migrator 4.9.23 True False False 3h9m machine-api 4.9.23 True False False 3h7m machine-approver 4.9.23 True False False 3h7m machine-config 4.9.23 True False False 166m marketplace 4.9.23 True False False 3h7m monitoring 4.9.23 True False False 164m network 4.9.23 True False False 3h10m node-tuning 4.9.23 True False False 3h7m openshift-apiserver 4.9.23 True False False 165m openshift-controller-manager 4.9.23 True False False 3h6m openshift-samples 4.9.23 True False False 166m operator-lifecycle-manager 4.9.23 True False False 3h7m operator-lifecycle-manager-catalog 4.9.23 True False False 3h7m operator-lifecycle-manager-packageserver 4.9.23 True False False 169m service-ca 4.9.23 True False False 3h9m storage 4.9.23 True False False 3h9m **Verify the cluster version** .. code-block:: sh [root@node-f01fafce5ded infra]# oc get clusterversion --kubeconfig /opt/ocp/assets/auth/kubeconfig NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.9.23 True False 163m Cluster version is 4.9.23 **Verify the cluster nodes** .. code-block:: sh [root@node-f01fafce5ded infra]# oc get nodes -o wide --kubeconfig /opt/ocp/assets/auth/kubeconfig NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME node-90b11c4c8bf6.montreal-317.slabs.stroila.ca Ready master,worker 3h13m v1.22.3+b93fd35 fd53:f:a:b:0:90b1:1c4c:8bf6 Red Hat Enterprise Linux CoreOS 49.84.202202230006-0 (Ootpa) 4.18.0-305.34.2.el8_4.x86_64 cri-o://1.22.1-21.rhaos4.9.git74a7981.2.el8 node-b8ca3a6f3d48.montreal-317.slabs.stroila.ca Ready master,worker 3h13m v1.22.3+b93fd35 fd53:f:a:b:0:b8ca:3a6f:3d48 Red Hat Enterprise Linux CoreOS 49.84.202202230006-0 (Ootpa) 4.18.0-305.34.2.el8_4.x86_64 cri-o://1.22.1-21.rhaos4.9.git74a7981.2.el8 node-b8ca3a6f7eb8.montreal-317.slabs.stroila.ca Ready master,worker 3h13m v1.22.3+b93fd35 fd53:f:a:b:0:b8ca:3a6f:7eb8 Red Hat Enterprise Linux CoreOS 49.84.202202230006-0 (Ootpa) 4.18.0-305.34.2.el8_4.x86_64 cri-o://1.22.1-21.rhaos4.9.git74a7981.2.el8 Node Introspection & Installation ################################# The introspection is a function of the VIM performed by in memory system boot using nested ignition which calls run-coreos-installer service. If it is successful, the run-coreos-installer service will install coreos on disk using a custom ignition and report back to the monitoring function via REST. If not will report the information about the faulty hardware. .. image:: _static/introspection.png :align: center :alt: introspection HardWare discovery/inspection library ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Functions implemented: ---------------------- * getMemory() * getCpu() * getBlockStorage() * getTopology() * getNetwork() * getLocalAddresses() * getPci() * getDevices() * getGPU() * getChassis() * getBaseboard() * getProduct() * getSerialization() Report example: .. literalinclude:: invetrory.log :linenos: :language: text :lines: 0, 0-50 :start-after: 0 Configuration ^^^^^^^^^^^^^ FCCT ---- fcct, the Fedora CoreOS Config Transpiler, is a tool that produces a JSON Ignition file from the YAML FCC file. Using the FCC file, an FCOS machine can be told to create users, create filesystems, set up the network, install systemd units, and more. Create the FCCT definition file automated_install.yaml .. literalinclude:: config/automated_install.yaml :linenos: :language: yaml .. note:: The sha512 verification is from the original file not the compressed resource #. Pull the container for fcct .. code-block:: sh podman pull quay.io/coreos/fcct #. Run fcct on the FCC file .. code-block:: sh podman run -i --rm quay.io/coreos/fcct:latest --pretty --strict < ./docs/source/config/automated_install.yaml > automated_install.ign Static Pods ^^^^^^^^^^^ Static pods are included in the fabric ignition and are role and hardware specific. #. **keepalived** static pod will be deployed in all the controllers in an active-active-active configuration that will monitor the apiserver. Node install ^^^^^^^^^^^^ The Node installation is triggered via nested ignition upon succesful node introspection .. image:: _static/node_install.png :align: center :alt: installation Create initial cluster ###################### Select one of the controllers to act as a temporary deployment node. .. image:: _static/deployment-mon.png :align: center :alt: deployment Sequence diagram describes how—and in what order—a group of objects works together .. image:: _static/bootstrap.png :align: center :alt: bootstrap The desired state .. image:: _static/deployment-mon-f.png :align: center :alt: deployment .. note:: The bootstrap stage can be executed in memory and after successful completion execute the OS install using a master ignition. The cluster monitoring can be access from ipv4 network dashbord service via socat node_exporter running over ipv6 .. image:: _static/cluster-monitor.png :align: center :alt: monitoring Socat ^^^^^ The socat utility is a relay for bidirectional data transfers between two independent data channels. There are many different types of channels socat can connect, including: * Files * Pipes * Devices (serial line, pseudo-terminal, etc) * Sockets (UNIX, IP4, IP6 - raw, UDP, TCP) * SSL sockets * Proxy CONNECT connections * File descriptors (stdin, etc) * The GNU line editor (readline) * Programs * Combinations of two of these There are many ways to use socat effectively. Here are a few examples: * TCP port forwarder (one-shot or daemon) * External socksifier * Tool to attack weak firewalls (security and audit) * Shell interface to Unix sockets * IP6 relay * Redirect TCP-oriented programs to a serial line * Logically connect serial lines on different computers * Establish a relatively secure environment (su and chroot) for running client or server shell scripts with network connections Here we use socat to forward between IPv6 and IPv4 networks. The reason is to temporarly use VIM server as a port forwarder to reach bootstrap api and config machine services. **Examples** * Redirect console .. code-block:: sh socat tcp4-listen:443,fork,reuseaddr tcp6-connect:[fd02:9a01::10]:443 |& logger & .. note:: You need to add console-openshift-console.apps.ocp.labs.stroila.ca oauth-openshift.apps.ts25.labs.stroila.ca to /etc/hosts or dns .. image:: _static/ksdf-console.png :align: center :alt: console * Redirect api and machine-config server .. code-block:: sh socat tcp6-listen:6443,fork,reuseaddr tcp6-connect:[fd02:9a01::10]:6443 |& logger & socat tcp6-listen:22623,fork,reuseaddr tcp6-connect:[fd02:9a01::10]:22623 |& logger & * Redirect node_exporter from ipv4 to ipv6 .. code-block:: sh socat tcp4-listen:9101,fork,reuseaddr tcp6-connect:[2602:807:900e:141:0:1866:dae6:4674]:9100 |& logger & socat tcp4-listen:9102,fork,reuseaddr tcp6-connect:[2602:807:900e:141:0:549f:3509:8eac]:9100 |& logger & socat tcp4-listen:9103,fork,reuseaddr tcp6-connect:[2602:807:900e:141:0:90b1:1c26:6aa2]:9100 |& logger & * Fake local DNS service .. code-block:: sh socat udp6-recvfrom:53,fork tcp:localhost:5353 |& logger & socat tcp6-listen:5353,reuseaddr,fork udp6-sendto:[2602:807:900e:141:0:f01f:afce:5ded]:53 |& logger & Bootstrap ^^^^^^^^^ .. code-block:: sh curl http://registry.ocp.labs.stroila.ca/cfg/bootstrap.ign > /boot/ignition/config.ign touch /boot/ignition.firstboot reboot etcd-quorum-guard should not attempt to deploy 3 replicas if useUnsupportedUnsafeNonHANonProductionUnstableEtcd is set. As soon as the API is available from the Bootstrap node, set useUnsupportedUnsafeNonHANonProductionUnstableEtcd .. code-block:: sh oc patch etcd cluster -p='{"spec": {"unsupportedConfigOverrides": {"useUnsupportedUnsafeNonHANonProductionUnstableEtcd": true}}}' --type=merge --kubeconfig /etc/kubernetes/kubeconfig Master ^^^^^^ .. code-block:: sh curl http://registry.ocp.labs.stroila.ca/cfg/master.ign > /boot/ignition/config.ign touch /boot/ignition.firstboot reboot #. System firstboot followd by reboot * machine-config-daemon-pull.service * machine-config-daemon-firstboot.service Master from Bootstrap --------------------- .. code-block:: sh curl -k https://api-int.ocp.labs.stroila.ca:22623/config/master > master.ign Decommission bootstrap ^^^^^^^^^^^^^^^^^^^^^^ Upon successful deployment of the initial controllers, the bootstrap node will reach out to the active controllers to fetch the master ignition to be installed on disk. Add workers ########### Once the basic OCP cluster is up and running, the workers will be attached to the cluster. Deploy fabric services ###################### When the basic OCP deplyment is completed the fabric deployment services will be initiated. Node Update ########### Providing the first node provisioning has basic RHCOS installed, iPXE chain load can be used to trigger node reinstallation using VIM services. iPXE ^^^^ **Embedding via the command line** For systems that do not have IPMI support there is a need for a method to trigger a fresh install especially for CICD use case. This could be achived by embedding via the command line a script into an iPXE image loaded by GRUB: .. code-block:: text timeout 10 default 0 title iPXE kernel (hd0,0)/ipxe.lkrn dhcp && chain http://vimserver/boot.php This uses a standard version of the iPXE binary ipxe.lkrn. You can change the embedded script simply by editing the GRUB configuration file, with no need to rebuild the iPXE binary. This method works only with iPXE binary formats that support a command line, such as .lkrn. .. note:: You may need to escape some special characters to allow them to be passed through to iPXE. **Embedding via an initrd** You can embed a script by passing in an initrd to iPXE. For example, to embed a script saved as myscript.ipxe into an iPXE image loaded by GRUB: .. code-block:: text timeout 10 default 0 title iPXE kernel (hd0,0)/ipxe.lkrn initrd myscript.ipxe This uses a standard version of the iPXE binary ipxe.lkrn. The myscript.ipxe file is a plain iPXE script file; there is no need to use a tool such as mkinitrd. You can change the embedded script by editing the myscript.ipxe file, with no need to rebuild the iPXE binary. This method works only with iPXE binary formats that support an initrd, such as .lkrn. Node Decommissioning #################### To remove a node from the cluster, node draining should follow the standard OCP procedure. Reference ######### * `FCCT Schema `_ * `cluster-etcd-operator `_ * `benchmark-operator `_