Wednesday 12 July 2023

My journey through BLFS - part 2

Continuing from where I left in part 1 with a functional X Windows System, it's time to aim for something closer to a usable system.

16 May 2022 - GTK+ 2

I cloned the VM to have a checkpoint to restart from in case anything went horribly wrong. At this point there are many possible paths but I decided to go for LXDE due to its small memory footprint and dependencies compared to other more mainstream desktop environments.

Building GTK+ 2

GTK+ is the centerpiece for all graphics used in LXDE. I started with version 2. As explained in the guide, some libraries will need to be rebuilt with extra support previously disabled. On that note I had to rebuild cairo, pango, harfbuzz due to the later addition of gobject-introspection.

GTK+ 2 up and running from the demo app.

Quick gotcha: ISO Codes source code was not available from the proposed location, instead I got it from http://ftp.debian.org/debian/pool/main/i/iso-codes/iso-codes_4.9.0.orig.tar.xz not a big issue but it required a bit of googling around.

I wanted to see if it was possible to build LXDE with GTK+ version 3, so I built GTK+ 3, here is how it looks from the demo app.

17 May 2022

LSB Tools link is broken it gives 404, instead of releases, I got it from Github repositor tags page https://github.com/djlucas/LSB-Tools/archive/refs/tags/v0.9.tar.gz

Building apps with GTK+ 3 support.

Building LXSession

./configure \
  --prefix=/usr \
  --enable-gtk3 \
  --enable-buildin-clipboard \
  --enable-buildin-polkit

Building PCManFM

./configure \
  --prefix=/usr \
  --sysconfdir=/etc \
  --with-gtk=3

Building libindicator-12.10.1

It has a bug which is allegedly fixed, however, I could find the issue in the source code I downloaded from https://launchpad.net/libindicator/+download as per this comment. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=806470.

CFLAGS=-Wno-error ./configure \
  --prefix=/usr \
  --with-gtk=3 \
  --enable-deprecations=yes

18 May 2022

I tried many different ways to get LXPanel built with GTK+ 3 to work. Long story short, I couldn’t.

Building LXPanel (not possible to install with GTK 3)

./configure \
  --prefix=/usr \
  --with-plugins=-netstat

It required wireless-tools-devel.

Building libwnck, download source from: https://download.gnome.org/sources/libwnck/3.24/libwnck-3.24.1.tar.xz

./configure \
  --prefix=/usr \
  --disable-static

Building keybinder, download source from: https://github.com/kupferlauncher/keybinder/releases/download/keybinder-3.0-v0.3.2/keybinder-3.0-0.3.2.tar.gz

./configure --prefix=/usr

End of LXPanel, at this point all dependencies were met and LXPanel is installed.

Building LXAppearance

./configure \
  --prefix=/usr \
  --sysconfdir=/etc \
  --enable-dbus \
  --enable-gtk3

9 June 2022

I gave up on LXPanel with GTK+ 3, the main reason being that even if it installed correctly in the system, it constantly crashed randomly and this seems to be a known bug. I built libfm-gtk as well so LXPanel has a GTK+ 2 dependency built before and ./configure is happy.

Unfortunately (or fortunately) I was very busy during the following months and I couldn't spend time on this. Finally I got some spare time and I went back to it.

3rd March 2023

Installed Lightdm and lightdm-gtk-greeter.

At this point, a new problem appeared, the VM integration didn't work until I log in which is when SPICE agent runs. I added a note for future research.

I need to find out how to run spice-vdagent with lightdm or the greeter. Potentially exploring more here: https://wiki.ubuntu.com/LightDM.

Copying the desktop entry to the relevant location works to autostart the SPICE agent when you log in to the session.

sudo cp \
  /sources/spice-vdagent-0.21.0/data/spice-vdagent.desktop \
  /etc/xdg/autostart/

I installed Mousepad without any trouble.

Important note: compiling Firefox requires a lot a lot of RAM, no less than 8GB!

It also requires send-notify program to complete installation

Gtkmm - it wouldn’t compile unless disabling building documentation in all subprojects:

meson --prefix=/usr --buildtype=release .. \
  -Dmm-common:use-network=true \
  -Dbuild-documentation=false \
  -Datkmm-1.6:build-documentation=false \
  -Dcairomm-1.0:build-documentation=false \
  -Dglibmm-2.4:build-documentation=false \
  -Dlibsigcplusplus-2.0:build-documentation=false \
  -Dpangomm-1.4:build-documentation=false 

Sunday 8 May 2022

My journey through BLFS - part 1

Having successfully finished LFS, I felt inspired to continue the journey, since all I had was a very minimal system. As I used version LFS 11.0-systemd, it only made sense to continue with the same version BLFS 11.0-systemd

15 Feb 2022 - SSH & curl

The top priority for me was to be able to connect remotely, primarily motivated by the fact that I was typing all commands from the guide since I left the chroot. OpenSSH to the rescue.

Another priority was the ability to download packages to continue building the system with new packages as I followed along, but there was no curl or wget. I could have used some bash magic or python minimal script, however, the main issue was the lack of support to download files via HTTPS. This is where make-ca, pk11-kit, GnuTLS and a few others came handy.

During all that time, I was making use of virtio 9p to share a folder with my host, this would simulate copying files manually before being able to use the network to download the rest.

6 April 2022 - LLVM compiled - required a lot of RAM

At some point I went down the rabbit hole, probably my mistake being too ambitious, building packages that I didn’t need so early in the process, but hey, that’s the learning process. A particular package that took me too deep in the dependency hell was texlive-20210325 as I was also building the documentation along with the binaries, to such extent that I decided to leave it and use the binary option (install-tl-unx) which is also huge in size.

One issue I encountered while building some of those dependencies of dependencies, I was building LLVM when all of the sudden the compiling process was killed, not enough memory. I set up 4GB swap space and tried again, to my surprise, it got killed again, so I ended up creating a swap file of 8 GB to complete the compilation process.

Other interesting milestones were:

Linux PAM - it allowed a more streamlined integration with many programs. In my particular use case, I wanted to control how I was using the “su” command without having to type root password every time and without using sudo. It also made me rebuild the whole systemd again to include PAM support.

Polkit - a somehow ubiquitous dependency for GUI apps and because it has quite a few dependencies that take some time to build like js engine, some of the next packages benefited from not having to build them as dependencies.

Having built all this, it was time to start preparing one of the most daunting parts of the whole BLFS, the X Window System !

25 April 2022 - X Window System up and running

This part is another one, where you can see yourself building and installing tons of packages and not seeing anything encouraging. It’s not until that very last moment of truth when you run startx that you see the result of all that hard work.

To my surprise, it worked the first time ! Well, not without some issues, but I have to say, the default values for xorg configuration are pretty good. Without proper graphic card drivers and barely any configuration in place, it managed to launch the X server with twm and some insanely big terminal emulators.

It took me some time to get my head around this poor man’s window manager, especially because for some reason the mouse movement was a bit awkward when the VM’s window had a different aspect ratio from the guest resolution.

Some issues I noted to fix immediately after this successful milestone:

  • Mouse cursor trapped in VM’s window
  • Clipboard integration host-guest
  • Mouse movement impacted by VM’s window size

26 April 2022 - Qemu agent integration

All the issues above mentioned are related to the same root cause, not having the "guest drivers" running in the VM. In this case, I needed both the qemu-agent and spice-vdagent running as well as the actual kernel drivers.

In order to get spice channel working, I had to add virtio vsockets in the kernel, so that the devices /dev/virtio-ports/com.redhat.spice.0 and /dev/virtio-ports/org.qemu.guest_agent.0 were available after setting them up in VM’s hardware configuration. Together with some udev rules provided by the agent, it was all I needed to get the daemons running.

Something I found out, which is not covered in the guide (again, because it was specific to my VM scenario) when building spice-vdagent, I had to add a few extra parameters to make it work:

CFLAGS=-Wno-error ./configure \
  --prefix=/usr \
  --with-init-script=systemd \
  --with-session-info=systemd

For some reason it wasn’t recognising that it’s being compiled in a running systemd environment, so I provided the flags to force using it, also the warning being treated as errors.

The second part of the integration, as per documentation is a per X-session process spice-vdagent, which I had to somehow manually invoke, and when I did, magic happened ! The mouse cursor was free as a bird to move all over my screen. Next, figure out how to adjust the resolution and bundle everything together to run automatically when all X starts up.

This is the tweaked version of my ".xinitrc" which worked well enough to get by without having any desktop environment to handle all these tasks.

xrandr --output Virtual-0 --mode 1366x768
spice-vdagent
xrdb -load $HOME/.Xresources
xsetroot -solid gray &
xclock -g 50x50-0+0 -bw 0 &
xload -g 50x50-50+0 -bw 0 &
xterm -g 80x24+0+0 &
xterm -g 80x24+700+320 &
twm

To wrap this part up, it’s been an incredible learning experience, this time focusing on building packages that are closer to what it’s being used on a daily basis in any Linux system. This is the base to add a desktop environment which will be my next task, at this point I’m leaning towards LXDE, but I’ll keep a backup copy at this stage in case I’d like to go for a different one later on.

The journey continues.

Wednesday 4 May 2022

My journey through LFS

Building a whole Linux system completely from the ground up is something I wanted to do for some time and finally I had all I needed to take the plunge. In this post I’m sharing the highlights of this journey. Also, this is something I’m doing in my spare time, hence the time gaps between milestones.

1st Jan 2022 - Started reading the guide

I decided to go for the LFS-11.0-systemd released in September 2021, the latest stable at the time and I'm comfortable enough with systemd.

Before even starting with LFS I needed a host system, I gave it a try to the latest Arch iso at the time, which was archlinux-2022.01.01-x86_64.iso. Also, as my first attempt in such a venture, I went for a virtual machine using KVM/QEMU with its own challenges which are not fully covered in the guide.

25 Jan 2022 - Created VM with vanilla Arch as host system

Some initial decisions about my target system:

  • Virtual machine KVM/QEMU
  • 4 CPU (compiling can be CPU intensive)
  • 2 GB RAM (aim for a small system)
  • 60 GB Disk storage (that was too optimistic)
  • UEFI boot (mimic modern systems)
  • Network - bridged (ease of connection via SSH)
  • 1 partition for everything (simplicity)
  • No swap partition, swap file when needed

I learned that a plain vanilla Arch Linux base system is good enough for building a full LFS with no extra packages. Every requirement was met out of the box, that’s a good start !

Once I had my VM up and running with all the base configuration, including my SSH key in place so I can connect to it via SSH so it makes my life easier by enabling me to copy and paste, I made a clone of it just in case I had to go back and start again.

The first few days required a lot of patience and will power, since you’re only compiling and compiling and not seeing much in use. I had the impression sometimes that some of the commands to follow were somehow hacking my way to just get it compiled despite the environment conditions.

It’s only once you enter the chroot that it starts to feel like you’re getting somewhere.

10 Feb 2022 - chroot getting ready

Working inside the chroot can give you some frustration since it’s such a minimal system, the LFS developers made good effort to provide enough tooling to get the job done, but anything outside that, you’ll be faced with the dreaded "-bash: xxxx: command not found"

This was the first time I installed neofetch in the pseudo new system, mostly since I was intrigued by which logo it would display and how it’d find system information when there is barely any system at all.

Because I used UEFI in my target system, I had to jump ahead to BLFS for boot loader installation and configuration, that break in the flow can be a little disrupting and confusing, but nothing major.

14 Feb 2022 - First successful boot

There is always a moment that makes or breaks the installation, in this case is when configuring the Linux kernel. As almost expected after such a long build up preparing and configuring all the packages, I was greeted by a big kernel panic when I booted the system for the first time outside of the chroot.

It couldn’t find /dev/vda3 from the kernel command line argument root=/dev/vda3 which happens to be the root partition in the virtual hard drive. Some symbols were not selected (or indicated in the guide, since it’s specific to my setup using a VM) such as CONFIG_VIRTIO_BLK, more information on that can be found at linux-kvm website.

After adding the missing symbols and rebuilding the kernel a couple of times, following the classical trial and error method, I could see a normal boot screen all the way to the login prompt, which I have to say was lightning fast !

At this point, the system had been built but this was not the end of the journey, there was some extra networking configuration required, again, due to use of a VM which required virtio network as well as 9p so I had a mechanism to share files between the host and the guest LFS.

One more thing before calling it a milestone complete, I got counted and this is my information:

You have successfully registered!
ID: 29352
Name: Abel Perez Martinez
First LFS Version: 11.0-systemd

Date: 14 Feb 2022

My key takeaways from this process:

  • Arch Linux is minimal but powerful
  • Be patient, compiling takes time
  • Compiling software takes a lot of disk space
  • Use -j4 whenever possible
  • Make a backup of your VM after a significant milestone
  • This is only the beginning, it’s a minimal system

Sunday 17 October 2021

How to fix “failed to parse field [xxxxxx] in document with id [yyyyy]. Preview of field's value: ‘zzzzzzz’”

I came across this issue while chasing the infamous “[ warn] [engine] failed to flush chunk” in fluent-bit connected to Elasticsearch. For some context, I'm using Amazon EKS to run my workloads and I use fluent-bit to parse the logs and push them to Elasticsearch so I can query them later on using Kibana.

The first step in this investigation was to set "Trace_Error On" in [OUTPUT] section as part of the fluent-bit configuration, a config map in this instance.

What is the problem ?

[2021/10/07 11:03:51] [ warn] [engine] failed to flush chunk '1-1633604630.
197484999.flb', retry in 11 seconds: task_id=29, input=tail.0 > output=es.0 
(out_id=0)
[2021/10/07 11:03:52] [error] [output:es:es.0] error: Output

{
    "create": {
        "_index": "fluent-bit-000135",
        "_type": "_doc",
        "_id": "qQZsWnwBkg-cCPSefqtj",
        "status": 400,
        "error": {
            "type": "mapper_parsing_exception",
            "reason": "failed to parse field [log_processed.Properties.
StatusCode] of type [long] in document with id 'qQZsWnwBkg-cCPSefqtj'. 
Preview of field's value: 'Unauthorized'",
            "caused_by": {
                "type": "illegal_argument_exception",
                "reason": "For input string: \"Unauthorized\""
            }
        }
    }
},

... more logs, removed for brevity ...

This essentially means that the field "log_processed.Properties.StatusCode" was initially mapped (automatically in my case) as "long" and therefore it won't allow the current value "Unauthorized" due to it not being able to parse that as a long value. It's probably my mistake initially not having mapped explictily to a "text" value instead, I could've avoided this situation. But I didn't and here we are.

What is the solution ?

Basically, you need to let Elasticsearch know about the data types in your indexes. A common pattern when working with Elasticearch and Kibana is to creata a State management policy to automatically rolls over your data to a new index and eventually delete old data to avoid disk space to run too low. This requires to have index templates in place so Elasticsearch knows how to create new indexes when the time comes to roll over.

I already use a policy that deletes indices older than 14 days and rolls over after either 1 day or 1 GB of size.

At this point, I'm not terribly bothered about existing data, my priority is to keep new data in good shape. Old data should be reindexed if you want to rescue it and make it available to search on.

What needs to be done is to add a mapping to the index template (assuming you have already one) that contains the property name and type explictily declared.

How to do it ?

Luckily, Kibana gives us a good developer console, that we can run the commands against Elasticsearch endpoints.

Using Kibana Dev Tools Console which you can find under
"https://<your-endpoint>/_plugin/kibana/app/dev_tools#/console"

1 - Update the template

Under "mappings/properties", add a new entry with the property name and type, in this case "log_processed.Properties.StatusCode" of type "text". This will not impact any existing data.

PUT /_index_template/ism_retention_14d
{
  "index_patterns": ["fluent-bit-*"],
  "template": {
    "mappings": {
      "properties": {
        "@timestamp": {
          "type": "date_nanos"                 
        },
        "log_processed.Properties.StatusCode": {
          "type": "text"                 
        }
      }
    },
    "settings": {
      "index.opendistro.index_state_management.rollover_alias":"fluent-bit",
      "sort.field": [ "@timestamp"],          
      "sort.order": [ "desc"]
    }
  }
}

2- Verify it's updated

This is only to double check that the change has been applied with the correct values for new indexes to be created matching the pattern (fluent-bit-*) in this case.

GET /_index_template/ism_retention_14d
{
  "index_templates" : [
    {
      "name" : "ism_retention_14d",
      "index_template" : {
        "index_patterns" : [
          "fluent-bit-*"
        ],
        "template" : {
          "settings" : {
            "index" : {
              "opendistro" : {
                "index_state_management" : {
                  "rollover_alias" : "fluent-bit"
                }
              },
              "sort" : {
                "field" : [
                  "@timestamp"
                ],
                "order" : [
                  "desc"
                ]
              }
            }
          },
          "mappings" : {
            "properties" : {
              "@timestamp" : {
                "type" : "date_nanos"
              },
              "log_processed.Properties.StatusCode" : {
                "type" : "text"
              }
            }
          }
        },
        "composed_of" : [ ]
      }
    }
  ]
}

3- Create new index (+1 the current sequence)

Find out which is the latest sequence number in this index parttern (fluent-bit-000135 in this case), add 1 and create a new index with that name (fluent-bit-000136 in this case).

PUT /fluent-bit-000136
{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "fluent-bit-000136"
}

4- Verify the mapping is actually in place

Once the index has been created, it should have inherited that mappings from the template updated above.

GET /fluent-bit-000136/_mapping/field/log_processed.Properties.StatusCode
{
  "fluent-bit-000136" : {
    "mappings" : {
      "log_processed.Properties.StatusCode" : {
        "full_name" : "log_processed.Properties.StatusCode",
        "mapping" : {
          "StatusCode" : {
            "type" : "text"
          }
        }
      }
    }
  }
}

5- Update the alias with "is_write_index"

The new index is now ready to start receiving data. To accomplish that, set the "is_write_index" property to true in the new index and to false in the current target index.

POST /_aliases
{
  "actions" : [
    { "add" : { 
      "index" : "fluent-bit-000135", 
      "alias" : "fluent-bit", 
      "is_write_index": false 
      }
    },
    { "add" : {
      "index" : "fluent-bit-000136", 
      "alias" : "fluent-bit", 
      "is_write_index": true 
      }
    }
  ]
}

6- Verify the aliases are correct

Verify the "is_write_index" property is set appropriately to each index from the previous step.

GET fluent-bit-000135/_alias
{
  "fluent-bit-000135" : {
    "aliases" : {
      "fluent-bit" : {
        "is_write_index" : false
      }
    }
  }
}

GET fluent-bit-000136/_alias
{
  "fluent-bit-000136" : {
    "aliases" : {
      "fluent-bit" : {
        "is_write_index" : true
      }
    }
  }
}

Conclusion

Following these steps will help you to stop those errors that prevent the logs from being added to Elasticsearch and therefore missing from your queries and potentially from alerts based on that information.

Having state management policies is crucial to achieve a sensible retention policy as well as avoiding creating too large indexes. It also eases the process of updating the template from time to time to keep up with your applications' needs.

Saturday 11 April 2020

AWS HttpApi with Cognito as JWT Authorizer

With the recent release of HttpApi from AWS I've been playing with it for a bit and I wanted to see how far I can get it to use authorization without handling any logic in my application.

Creating a base code

Started with a simple base, let's set up the initial scenario which is no authentication at all. The architecture is the typical HttpApi -> Lambda, in this case, the Lambda content is irrelevant and therefore I've just used an inline code to test it's working.

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: >
  Sample SAM Template using HTTP API and Cognito Authorizer
Resources:

  # Dummy Lambda function 
  HttpApiTestFunction:
    Type: AWS::Serverless::Function
    Properties:
      InlineCode: |
        exports.handler = function(event, context, callback) {
          const response = {
            test: 'Hello HttpApi',
            claims: event.requestContext.authorizer && 
                    event.requestContext.authorizer.jwt.claims
          };
          callback(null, response);
        };
      Handler: index.handler
      Runtime: nodejs12.x
      Timeout: 30
      MemorySize: 256
      Events:
        GetOpen:
          Type: HttpApi
          Properties:
            Path: /test
            Method: GET
            ApiId: !Ref HttpApi
            Auth:
              Authorizer: NONE

  HttpApi:
    Type: AWS::Serverless::HttpApi
    Properties:
      CorsConfiguration: 
        AllowOrigins:
          - "*"

Outputs:
  HttpApiUrl:
    Description: URL of your API endpoint
    Value: !Sub 'https://${HttpApi}.execute-api.${AWS::Region}.${AWS::URLSuffix}/'
  HttpApiId:
    Description: Api id of HttpApi
    Value: !Ref HttpApi

With the outputs of this template we can hit the endpoint and see that in fact, it's accessible. So far, nothing crazy here.

$ curl https://abc1234.execute-api.us-east-1.amazonaws.com/test
{"test":"Hello HttpApi"}

Creating a Cognito UserPool and Client

The claims object is not populated because the request wasn't authenticated since no token was provided, as expected.

Let's create the Cognito UserPool with a very simple configuration assuming lots of default values since they're not relevant for this example.

  ## Add this fragment under Resources:

  # User pool - simple configuration 
  UserPool:
    Type: AWS::Cognito::UserPool
    Properties: 
      AdminCreateUserConfig: 
        AllowAdminCreateUserOnly: false
      AutoVerifiedAttributes: 
        - email
      MfaConfiguration: "OFF"
      Schema: 
        - AttributeDataType: String
          Mutable: true
          Name: name
          Required: true
        - AttributeDataType: String
          Mutable: true
          Name: email
          Required: true
      UsernameAttributes: 
        - email
  
  # User Pool client
  UserPoolClient:
    Type: AWS::Cognito::UserPoolClient
    Properties: 
      ClientName: AspNetAppLambdaClient
      ExplicitAuthFlows: 
        - ALLOW_USER_PASSWORD_AUTH
        - ALLOW_USER_SRP_AUTH
        - ALLOW_REFRESH_TOKEN_AUTH
      GenerateSecret: false
      PreventUserExistenceErrors: ENABLED
      RefreshTokenValidity: 30
      SupportedIdentityProviders: 
        - COGNITO
      UserPoolId: !Ref UserPool

  ## Add this fragment under Outputs:

  UserPoolId:
    Description: UserPool ID
    Value: !Ref UserPool
  UserPoolClientId:
    Description: UserPoolClient ID
    Value: !Ref UserPoolClient

Once we have the cognito UserPool and a client, we are in a position to start putting things together. But before there are a few things to clarify:

  • Username is the email and only two fields are required to create a user: name and email.
  • The client defines both ALLOW_USER_PASSWORD_AUTH and ALLOW_USER_SRP_AUTH Auth flows to be used by different client code.
  • No secret is generated for this client, if you intend to use other flows, you'll need to create other clients accordingly.

Adding authorization information

Next step is to add authorization information to the HttpApi.

  ## Replace the HttpApi resource with this one.

  HttpApi:
    Type: AWS::Serverless::HttpApi
    Properties:
      CorsConfiguration: 
        AllowOrigins:
          - "*"
      Auth:
        Authorizers:
          OpenIdAuthorizer:
            IdentitySource: $request.header.Authorization
            JwtConfiguration:
              audience:
                - !Ref UserPoolClient
              issuer: !Sub https://cognito-idp.${AWS::Region}.amazonaws.com/${UserPool}
        DefaultAuthorizer: OpenIdAuthorizer

We've added authorization information to the HttpApi where JWT issuer is the the Cognito UserPool previously created and the token are intended only for that client.

If we test again, nothing changes because the event associated with the lambda function says explictly "Authorizer: NONE".

To test this, we'll create a new event associated with the same lambda function but this time we'll add some authorization information to it.

        ## Add this fragment at the same level as GetOpen
        ## under Events as part of the function properties

        GetSecure:
          Type: HttpApi
          Properties:
            ApiId: !Ref HttpApi
            Method: GET
            Path: /secure
            Auth:
              Authorizer: OpenIdAuthorizer

If we test the new endpoint /secure, then we'll see the difference.

$ curl -v https://abc1234.execute-api.us-east-1.amazonaws.com/secure

>>>>>> removed for brevity >>>>>>
> GET /secure HTTP/1.1
> Host: abc1234.execute-api.us-east-1.amazonaws.com
> User-Agent: curl/7.52.1
> Accept: */*
> 
* Connection state changed (MAX_CONCURRENT_STREAMS updated)!
< HTTP/2 401 
< date: Sat, 11 Apr 2020 17:19:50 GMT
< content-length: 26
< www-authenticate: Bearer
< apigw-requestid: K1RYliB1IAMESNA=
< 
* Curl_http_done: called premature == 0
* Connection #0 to host abc1234.execute-api.us-east-1.amazonaws.com left intact

{"message":"Unauthorized"}

At this point we have a new endpoint that requires an access token. Now we need a token, but to get a token, we need a user first.

Fortunately Cognito can provide us with all we need in this case. Let's see how.

Creating a Cognito User

Cognito cli provides the commands to sign up and verify user accounts.

$ aws cognito-idp sign-up \
  --client-id asdfsdfgsdfgsdfgfghsdf \
  --username abel@example.com \
  --password Test.1234 \
  --user-attributes Name="email",Value="abel@example.com" Name="name",Value="Abel Perez" \
  --profile default \
  --region us-east-1

{
    "UserConfirmed": false, 
    "UserSub": "aaa30358-3c09-44ad-a2ec-5f7fca7yyy16", 
    "CodeDeliveryDetails": {
        "AttributeName": "email", 
        "Destination": "a***@e***.com", 
        "DeliveryMedium": "EMAIL"
    }
}

After creating the user, it needs to be verified.

$ aws cognito-idp admin-confirm-sign-up \
  --user-pool-id us-east-qewretry \
  --username abel@example.com \
  --profile default \
  --region us-east-1

This commands gives no output, to test we are good to go, let's use the admin-get-user command

$ aws cognito-idp admin-get-user \
  --user-pool-id us-east-qewretry \
  --username abel@example.com \
  --profile default \
  --region us-east-1 \
  --query UserStatus

"CONFIRMED"

We have a confirmed user!

Getting a token for the Cognito User

To obtain an Access Token, we use the Cognito initiate-auth command providing the client, username and password.

$ TOKEN=`aws cognito-idp initiate-auth \
  --client-id asdfsdfgsdfgsdfgfghsdf \
  --auth-flow USER_PASSWORD_AUTH \
  --auth-parameters USERNAME=abel@example.com,PASSWORD=Test.1234 \
  --profile default \
  --region us-east-1 \
  --query AuthenticationResult.AccessToken \
  --output text`

$ echo $TOKEN

With the access token in hand, it's time to test the endpoint with it.

$ curl -H "Authorization:Bearer $TOKEN" https://abc1234.execute-api.us-east-1.amazonaws.com/secure
# some formatting added here
{
    "test": "Hello HttpApi",
    "claims": {
        "auth_time": "1586627310",
        "client_id": "asdfsdfgsdfgsdfgfghsdf",
        "event_id": "94872b9d-e5cc-42f2-8e8f-1f8ad5c6e1fd",
        "exp": "1586630910",
        "iat": "1586627310",
        "iss": "https://cognito-idp.us-east-1.amazonaws.com/us-east-qewretry",
        "jti": "878b2acd-ddbd-4e68-b097-acf834291d09",
        "sub": "cce30358-3c09-44ad-a2ec-5f7fca7dbd16",
        "token_use": "access",
        "username": "cce30358-3c09-44ad-a2ec-5f7fca7dbd16"
    }
}

Voilà! We've accessed the secure endpoint with a valid access token.

What about groups ?

I wanted to know more about possible granular control of the authorization and I went and created two Cognito Groups let's say Group1 and Group2. Then, I added my newly created user to both groups and repeated the experiment.

Once the user was added to the groups, I got a new token and issued the request to the secure endpoint.

$ curl -H "Authorization:Bearer $TOKEN" https://abc1234.execute-api.us-east-1.amazonaws.com/secure
# some formatting added here
{
    "test": "Hello HttpApi",
    "claims": {
        "auth_time": "1586627951",
        "client_id": "2p9k1pfhtsbr17a2fukr5mqiiq",
        "cognito:groups": "[Group2 Group1]",
        "event_id": "c450ae9e-bd4e-4882-b085-5e44f8b4cefd",
        "exp": "1586631551",
        "iat": "1586627951",
        "iss": "https://cognito-idp.us-east-1.amazonaws.com/us-east-qewretry",
        "jti": "51a39fd9-98f9-4359-9214-000ea40b664e",
        "sub": "cce30358-3c09-44ad-a2ec-5f7fca7dbd16",
        "token_use": "access",
        "username": "cce30358-3c09-44ad-a2ec-5f7fca7dbd16"
    }
}

Notice within the claims object, a new one has come up: "cognito:groups" and the value associated with it is "[Group2 Group1]".

Which means that we could potentially check this claim value to make some decisions in our application logic without having to handle all of the authentication inside the application code base.

This opens the possibility for more exploration within the AWS ecosystem. I hope this has been helpful, the full source code can be found at https://github.com/abelperezok/http-api-cognito-jwt-authorizer.

Monday 23 March 2020

AWS Serverless Web Application Architecture

Recently, I've been exploring ideas about how to put together different AWS services to achieve a totally serverless architecture for web applications. One of the new services is the HTTP API which simplifies the integration with Lambda.

One general principle I want to follow when designing these architectural models is the separation of three main subsystems:

  • Identity service to handle authentication and authorization
  • Static assets traffic being segregated
  • Dynamic page rendering and server side logic
  • Application configuration outside the code

All these component will be publicly accessible via Route 53 DNS record sets pointing to the relevant endpoints.

Other general ideas across all design diagrams below are:

  • Cognito will handle authentication and authorization
  • S3 will store all static assets
  • Lambda will execute server side logic
  • SSM Parameter Store will hold all configuration settings

Architecture variant 1 - HTTP API and CDN publicly exposed

In this first approach, we have the S3 bucket behind CloudFront which is a common pattern when creating CDN-like structures. CloudFront takes care of all the caching behaviours as well as distributing the cached versions all over the Edge locations, so subsequent requests will be dispatched at a reduced latency. Also, CloudFront has only one cache behaviour, which is the default and one origin which is the S3 bucket.

It's also important to notice the CloudFront distribution has an Alternative Domain Name set to the relevant record set e.g. media.example.com and let's not forget about referencing the ACM SSL certificate so we can use the custom url and not the random one from CloudFront.

From the HTTP API perspective, it has only one integration which is a Lambda integration on the $default route, which means all requests coming from the HTTP endpoint will be directed to the Lambda function in question.

Similar to the case of CloudFront, the HTTP API requires a Custom Domain and Certificate to be able to use a custom url as opposed to the random one given by the API service on creation.

Architecture variant 2 - HTTP API behind CloudFront

In this second approach, we still have the S3 bucket behind CloudFront following the same pattern. However we've placed the HTTP API also behind CloudFront.

CloudFront becomes the traffic controller in this case, where several cache behaviours can be defined to make the correct decision where to route the request to.

Both record sets (media and webapp) are pointing to the same CloudFront distribution, it's the application logic's responsibility to request all static assets using the appropriated domain nam.

Since the HTTP API is behind a CF distribution, I'd suggest to set it up as Regional endpoint.

Architecture variant 3 - No CloudFront at all

Continue playing with this idea, what if we don't use CloudFront distribution at all? I gave it a go and it turns out that it's possible to achieve similar results.

We can use two HTTP APIs and set one to forward traffic to S3 for static assets and the other one to Lambda as per the usual pattern, each of those with Custom Domain and that solves the problem.

But I wanted to push it a little bit further, this time I tried with only one HTTP API and setting several routes e.g "/css/*", "/js/*" integrates with S3 and any other integrates with Lambda, it's then, application logic's responsibility to request all static assets using the appropriated url

Conclusion

These are some ideas I've been experimenting with, the choice of including or not a CloudFront distribution is dependent on the concrete use case, whether the source of our requests is local or globally diverse. Also, whether it is more suitable to have static assets under a subdomain or virtual directory under the same host name.

Never underestimate the power and flexibility of an API Gateway, especially the new HTTP API where it can front any number of combination of resources in the back end.

Saturday 13 July 2019

Run ASP.NET Core 2.2 on a Raspberry Pi Zero

Raspberry PI Zero (and Zero W) is a cool and cheap piece of technology that can run software anywhere. Being a primarily .NET developer, I wanted to give it a try running the current version of ASP.NET Core, which is as of now 2.2.

However, the first problem is that even though .NET Core supports ARM CPUs, it does not support ARM32v6, only v7 and above. After digging a bit, I found out that mono does support that CPU and on top of that, it’s binary compatible with .NET Framework 4.7.

In this post, I’ll summarise several hours of trial and error to get it working. If developing on Windows, targeting both netcoreapp2.2 and net472 is easier since chances are we’ll have all installed already. On Linux, however, it’s not that easy and it’s when Mono comes in to help, we need the reference assemblies to build the net472 version

Let’s check the tools we’ll use:

$ dotnet --version
2.2.202
$ docker --version
Docker version 18.09.0, build 4d60db4

Create a new dotnet core MVC web application to get the starting template.

$ dotnet new mvc -o dotnet.mvc

Update the target framework to TargetFrameworks if we want to actually target both. We could target only net472 if desired. Also, since Microsoft.AspNetCore.App and Microsoft.AspNetCore.Razor.Design metapackages won’t be available on .NET Framework, instead reference directly the Nuget packages we’ll use, here is an example of the ones the default template uses.

<Project Sdk="Microsoft.NET.Sdk.Web">

  <PropertyGroup>
    <TargetFrameworks>net472;netcoreapp2.2</TargetFrameworks>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="Microsoft.AspNetCore" Version="2.2.0" />
    <PackageReference Include="Microsoft.AspNetCore.Hosting.WindowsServices" Version="2.2.0" />
    <PackageReference Include="Microsoft.AspNetCore.Mvc" Version="2.2.0" />
    <PackageReference Include="Microsoft.AspNetCore.StaticFiles" Version="2.2.0" />
    <PackageReference Include="Microsoft.AspNetCore.HttpsPolicy" Version="2.2.0" />
    <PackageReference Include="Microsoft.AspNetCore.CookiePolicy" Version="2.2.0" />
    <PackageReference Include="Microsoft.Extensions.Logging.Debug" Version="2.2.0" />
    <PackageReference Include="Microsoft.Extensions.Logging.EventLog" Version="2.2.0" />
    <PackageReference Include="Microsoft.Extensions.Options" Version="2.2.0" />
  </ItemGroup>

</Project>

When targeting net472, we need to use mono’s reference assemblies since they’re not part of dotnet core, to do that, we set the environment variable FrameworkPathOverride to the appropriated path, typically something like /usr/lib/mono/4.7.2-api/.

export FrameworkPathOverride=/usr/lib/mono/4.7.2-api/

Then proceed as usual with dotnet cli

  • dotnet restore
  • dotnet build
  • dotnet run

To make a more consistent build across platforms. We’ll create a Dockerfile, this will also enable us to build locally without having to install mono on our local development environments. This is a multi-stage docker file that builds the application and creates the runtime image.

More information on how to create the docker file for dotnet core applications. https://docs.docker.com/engine/examples/dotnetcore/

FROM pomma89/dotnet-mono:dotnet-2-mono-5-sdk AS build-env
WORKDIR /app

ENV FrameworkPathOverride /usr/lib/mono/4.7.2-api/

# Copy csproj and restore as distinct layers
COPY *.csproj ./
RUN dotnet restore

# Copy everything else and build
COPY . ./
RUN dotnet publish -c Release -o out -f net472

# Build runtime image for ARM v5 using mono
FROM arm32v5/mono:5.20
WORKDIR /app
COPY --from=build-env /app/out .
EXPOSE 5000
ENV ASPNETCORE_URLS http://*:5000
ENTRYPOINT [ "mono", "dotnet.mvc.exe" ]

Since we are building with mono, the resulting executable is in this case dotnet.mvc.exe file. The environment variable ASPNETCORE_URLS is required in order to be able to listen on any network interface, by default it will only listen on localhost which is in fact, the container itself and not our host. Combined with EXPOSE 5000 it’s possible to access the application from outside the container.

The secret ingredient to make it work on a raspberry pi zero (ARM32v6) is the line “FROM arm32v5/mono:5.20” which takes a base docker image using a compatible CPU architecture.

To run this application locally we use the traditional dotnet run but since we’ve specified two target frameworks, the parameter -f is required to specify which one we want to use. In this case, chances are we’re using netcoreapp2.2 locally and net472 to build and run on the raspberry pi.

$ dotnet run -f netcoreapp2.2

Once we’re happy the application works as expected locally, we could build the docker image locally to test the whole docker building process is working before deploying to the device. To test this way, we have to modify the second FROM line and remove the “arm32v5/” from the image name, taking the PC version of mono instead.

The following sequence shows the docker commands.

Build the image locally

$ docker build -t test .

Either run in interactive mode -i -t or run in detached mode -d

$ docker run --rm -it -p 3000:5000 test
$ docker run --rm -d -p 3000:5000 test

Once it’s running, we can just browse to http://localhost:3000/ and verify it’s up and running.

Now we can go back and modify the second FROM line by putting “arm32v5/” where it was.

# build the image and tag it to my docker hub registry 
$ docker build -t abelperezok/mono-mvc:arm32v5-test .
# don’t forget to log in to docker hub
$ docker login
# push this image to docker hub registry 
$ docker push abelperezok/mono-mvc:arm32v5-test

Once it’s uploaded to the registry, we can connect to the raspberry pi and run the follow docker run command.

# pull and run the container from this image 
$ docker run --rm -d -p 3000:5000 abelperezok/mono-mvc:arm32v5-test

Helpful links

https://www.c-sharpcorner.com/article/running-asp-net-core-2-0-via-mono/

https://stackoverflow.com/questions/44770702/build-nuget-package-on-linux-that-targets-net-framework

https://www.mono-project.com/download/stable/#download-lin-debian

https://andrewlock.net/building-net-framework-asp-net-core-apps-on-linux-using-mono-and-the-net-cli/

https://hub.docker.com/r/pomma89/dotnet-mono/