3 minute read

Sometimes it is hard to analyse what is happening at the networking level in your pods deployed in OpenShift or Kubernetes.

How can you debug and/or analyse the network traffic to your application to solve issues quicker and more effectively? How can you use the well-known Wireshark tool as always?

We will be using tcpdump to capture a so-called, PCAP (packet capture) file that will contain the pod’s network traffic. This PCAP file can then be loaded in a tool like Wireshark to analyze the traffic and, in this case, the RESTful communication of a service running in a pod.

This tcpdump will be running in a sidecar container beside our app container within our pod.

A sidecar container is a container that is running in the same pod as the actual service/application and is able to provide additional functionality to the service/application.

Deploying the sidecar

  • Create a new project for testing purposes:
$ oc new-project test-delete-rcarrata
  • Deploy an example application for testing it:
$ oc new-app django-psql-example
$ oc get pod
NAME                           READY   STATUS              RESTARTS   AGE
django-psql-example-1-build    0/1     Completed           0          3m4s
django-psql-example-1-deploy   0/1     Completed           0          74s
django-psql-example-1-j4w28    1/1     Running             0          65s
django-psql-example-2-deploy   0/1     ContainerCreating   0          4s
postgresql-1-2q9h7             1/1     Running             0          2m49s
postgresql-1-deploy            0/1     Completed           0          2m57s
  • Fetch the deploymentconfig of the django-psql-example:
    $ oc get dc django-psql-example -o yaml > django-psql-tcpdump.yaml
    
  • In the deploymentconfig, add the container that you want to run tcpdump in:
- name: tcpdump
  image: corfr/tcpdump
  command:
    - /bin/sleep
    - infinity
  • In the case of the django app, the sidecar will be in the container spec:
spec:
  containers:
  - name: tcpdump
    image: corfr/tcpdump
    command:
      - /bin/sleep
      - infinity
  - env:
    - name: DATABASE_SERVICE_NAME
      value: postgresql

This will spin up an additional sidecar container where you can execute tcpdump to capture and further analyse the packets that the django container is receiving and sending (remember that the tcpdump and django containers are in the same pod).

  • Apply the sidecar deploymentconfig django psql:
$ oc apply -f django-psql-example.yaml
deploymentconfig.apps.openshift.io/django-psql-example configured

$ oc get pod -w
NAME                           READY   STATUS              RESTARTS   AGE
django-psql-example-1-build    0/1     Completed           0          3m24s
django-psql-example-1-deploy   0/1     Completed           0          94s
django-psql-example-2-deploy   1/1     Running             0          24s
django-psql-example-2-gfws6    0/2     ContainerCreating   0          8s
postgresql-1-2q9h7             1/1     Running             0          3m9s
postgresql-1-deploy            0/1     Completed           0          3m17s
django-psql-example-2-gfws6   0/2   ContainerCreating   0     8s
django-psql-example-2-gfws6   1/2   Running   0     10s
django-psql-example-2-gfws6   2/2   Running   0     14s
django-psql-example-2-deploy   0/1   Completed   0     30s
django-psql-example-2-deploy   0/1   Completed   0     30s

Capturing and analyzing traffic

With the sidecar deployed and running, we can now start capturing data

  • Log in to the tcpdump container:
~ $ oc rsh -c tcpdump django-psql-example-2-gfws6
~ $ tcpdump -s 0 -n -w /tmp/example.pcap
tcpdump: eth0: You don't have permission to capture on that device
(socket: Operation not permitted)

What happened? Due to the SCCs, tcpdump is not able to capture the packets on eth0 because the container does not have the proper SCC permissions.

  • To avoid that you need to add a specific cluster-admin permissions to the default Service Account of the namespace with the anyuid scc:
oc adm policy add-scc-to-user anyuid -z default -n `oc project -q` --as=system:admin

IMPORTANT: This could cause a security issue, because any pod can run as root, so be careful and only implement this in testing namespaces, or in namespaces that are controlled by the cluster-admin, keeping in mind that security capabilities are disabled.

  • Roll out the deploymentconfig to deploy with the proper SCC:
oc rollout latest dc django-psql
  • Inside the tcpdump container of the pod that we deployed before (django-psql-example sidecar), execute tcpdump:
$ tcpdump -s 0 -n -w /tmp/example.pcap

  • Generate requests to this application that will be captured by the tcpdump sidecar:
$ curl django-psql-example-test-delete-rcarrata.apps.ocp4.rcarrata.com -I
HTTP/1.1 200 OK
Server: gunicorn/19.4.5
Date: Thu, 27 Feb 2020 19:43:32 GMT
Content-Type: text/html; charset=utf-8
X-Frame-Options: SAMEORIGIN
Content-Length: 18255
Set-Cookie: 320587f6606431b421a7ed809db87323=ec4dec0bb6e99d5a3aaed6dd165eaa51; path=/; HttpOnly
Cache-control: private
  • Control+C the tcpdump command to exit and see how many packets are captured:
$ tcpdump -s 0 -n -w /tmp/example.pcap

  tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
  ^C574 packets captured
  574 packets received by filter
  0 packets dropped by kernel
  • Copy the example.pcap to your localhost:
$ oc cp -c tcpdump django-psql-example-2-gfws6 :/tmp/example.pcap example.pcap
  • Examine the pcap with wireshark… and voila! You can analyse your network traffic!
$ wireshark example.pcap

This is very useful for debugging and for seeing connectivity and application issues with external systems, or with interactions between other pods.

NOTE: Opinions expressed in this blog are my own and do not necessarily reflect that of the company I work for.

Happy OpenShifting!