SCIM in a nutshell

By on February 21st, 2021

System for Cross-domain Identity Management (SCIM) is a standard set of REST API routes you can add to your API, enabling a SSO provider to manage users in your application.

There are two SCIM RFCs. They work together to define a comprehensive SCIM implementation. Despite the breadth, the RFCs are surprisingly readable if you are a software developer with a few years of experience in web tech.

The RFCs are strongly recommended reading. Or, honestly just skip this guide and go straight to the sources:

SCIM Protocol – RFC 7644
SCIM Data Schemas – RFC 7643

Why SCIM? Say you have a customer that’s a large enterprise. Rather than making them manually add and remove users in your app, getting out of sync with their internal directory, your customer can use SCIM. SCIM will handle syncing users from their own directory of users, into your app. Your customer might use Microsoft Active Directory (AD), and setup SCIM to allow all of their AD users to be synced into your app. In this way, your customer doesn’t have to worry about figuring out your user management system when they hire a new employee. And when your customer’s employee gets a new job, they don’t have to find their way into your system to remove the old employee.

SCIM’s “de-provisioning” of users helps your customer avoid a potentially large security hole. If a disgruntled employee leaves the company, they’ll almost certainly get their email disabled. But 3rd party apps, such as yours, could be forgotten. Enter SCIM – apps that support it will be able to automatically remove inactive employees – saving time and risk.

This overview of SCIM is written for a developer implementing SCIM user provisioning and de-provisioning in their SaaS application.

SCIM vs SAML

SCIM is different than Single Sign-On, like SAML. Both involve a 3rd party identity provider. SAML lets users log into your app with a 3rd party identity provider. SCIM adds and removes users in your app with a 3rd party identity provider.

SCIM Auth

A third party identity provider can authenticate to your SCIM endpoints using a variety of methods, which the RFC says are outside the scope. In practice this will be Oauth, HTTP basic auth, or some sort of bearer tokens. The WWW-Authenticate header must be returned from your server describing the authentication methods available.

SCIM API Expected Endpoints

EndpointMethodsDescription
/Users
/Users/{id}
/Groups/.search
GET
POST
PUT
PATCH
DELETE
Manage users. Search is POST only.
/Groups
/Groups/{id}
/Groups/.search
GET
POST
PUT
PATCH
DELETE
Manage groups. Search is POST only.
/MeGET
POST
PUT
PATCH
DELETE
Alias for the User resources at for the currently authenticated user
/ServiceProviderConfigGETService provider (your API server) config information
/ResourceTypesGETList supported resource types
/SchemasGETList supported schemas
/BulkPOSTPerform many updates at once, to one or more resources

MIME type for SCIM

The MIME type for SCIM is application/scim+json, which can be used in the Accept and Content-Type headers. In other words, an identity provider will speak SCIM JSON to your server, and you must speak SCIM JSON back.

Response Error JSON

In addition to normal HTTP status codes, your server will need to reply with a status code in JSON, per the RFC. Two other fields are recommended.

{
  "status": 409,
  "detail": "this is a user readable message - user already exists",
  "scimType": "uniqueness"
}

scimType applies to 400 and 409 status codes, and may be required in your API’s response.

scimType error valueWhen to use it
invalidFilterBad filter syntax on POST search, or bad PATCH filter.
tooManyThe server doesn’t want to return that many results, as it would be too many. It may be too resource intensive to process.
uniquenessCreating or updating a resource cannot be done because there is already a resource with a value being given which is supposed to be unique.
mutabilityUpdating a resource cannot be done because a field being modified is supposed to be immutable (read only, cannot be changed).
invalidSyntaxA search request or bulk request had bad syntax.
invalidPathInvalid URL path for SCIM.
noTargetThe path for an attribute does not match any results
invalidVersWrong SAML protocol version, or unsupported version by the API
sensitiveThe request contained personal or private information in the URI (URL), and is not allowed per the SCIM spec. For example, a server may disallow filtering by name in the URL querystring as this is personal information.

Uniqueness

A resource ID need not be identical globally, but it must be unique with the externalId. The externalId is controlled by the identity provider. If you are using globally unique IDs for users, this should not be a problem. However if you have limitations such as globally unique email addresses, and the SCIM partner wants those only unique per externalId, additional work may be required to modify your SaaS schema.

Creating and Modifying Resources

When the identity provider creates a user or a group, it may look like this:

If the user or group already exists, you must reply with status code 409 and scimType: uniqueness.

Create user request:

{
     "schemas":["urn:ietf:params:scim:schemas:core:2.0:User"],
     "userName":"bjensen",
     "externalId":"bjensen",
     "name":{
       "formatted":"Ms. Barbara J Jensen III",
       "familyName":"Jensen",
       "givenName":"Barbara"
     }
   }

and example response from your server:

{
     "schemas":["urn:ietf:params:scim:schemas:core:2.0:User"],
     "id":"2819c223-7f76-453a-919d-413861904646",
     "meta":{
       "resourceType":"User",
       "created":"2011-08-01T21:32:44.882Z",
       "lastModified":"2011-08-01T21:32:44.882Z",
       "location":
   "https://example.com/v2/Users/2819c223-7f76-453a-919d-413861904646",
       "version":"W\/\"e180ee84f0671b1\""
     },
     "name":{
       "formatted":"Ms. Barbara J Jensen III",
       "familyName":"Jensen",
       "givenName":"Barbara"
     },
     "userName":"bjensen",
     "emails":[
       {
           "value":"bjensen@example.com"
       },
       {
           "value":"babs@jensen.org"
       }
     ]
   }

Creating via POST, and updating via PUT (replace resource) or PATCH (modify only passed fields), must return the entire current resource and include an ETAG header.

Creates must return 201 on success. If the create happens during a Bulk operation (below), the create part will return a response field of 201, but the bulk request will return a 200.

Minimum Required Fields

Even if you are looking to do the bare minimum amount of integration work with SCIM, you must still include the required resource fields from RFC 7643.

The generally required format for any resource is:

{
  "id": "case-sensitive globally unique or unique when combined with externalId",
  "externalId": "case-sensitive set by provisioning client",
  "meta": {
    "resourceType": "User or Group or other resource name",
    "created": "when created on the responding server, format ISO8601-UTC timestamp string is probably best (2008-01-23T04:56:22Z)",
    "lastModified": "last modified by responding server, in same format as created"
    "location": "full URI of this specific resource - as if it were a GET request",
    "version": "the weak (prefixed with W/) ETAG (entity-tag) of this resource, if it were to be fetched individually with a GET request"
  }
}

Minimum Recommended Fields for User Resource

Only schemas and userName are required by the spec (in addition to the general resource properties above). However, omitting all names and email address may cause certain clients or servers to have undefined behavior. Thus it is strongly recommended to include either displayName or name, and emails.

The boolean property active is also not required, but for purposes of deprovisioning a user, may be expected by the identity provider. A user with "active": false" should not be allowed to log in.

employeeNumber may be expected if the identity provider implements the schema extension “Enterprise” User urn:ietf:params:scim:schemas:extension:enterprise:2.0:User.

This is the minimum recommended set of fields – in addition to the resource fields above – but not identically prescriptive of what’s minimally required in RFC7643.

{
  "schemas": ["urn:ietf:params:scim:schemas:core:2.0:User"],
  "userName": "unique (may be only by externalId) but case-INSENSITIVE user name for authentication; could be an email address and under SAML SSO should be equal to the NameID",
  "displayName": "name of user suitable for display to users",
  "name": {
    "formatted": "if not passing first, last, and other parts of name",
    "familyName": "aka last name, if not passing formatted",
    "givenName": "aka first name, if not passing formatted",
  },
  "active": true,
  "emails": [{
    "value": "email address, canonicalized according to RFC5321",
    "display": "optional but may indicate the 'primary' or 'work' email, in addition to 'home' or 'other' which may be less desirable"
  }],
  "employeeNumber": "if schemas has 'urn:ietf:params:scim:schemas:extension:enterprise:2.0:User' , alphanumeric value unique to the externalId"
}

Minimum response to /ServiceProviderConfig request

There are some assumptions below about what you have implemented in your server.

Even if a feature is disabled via the supported boolean, RFC 7643 still requires returning additional information about the feature. A few other notes:

  • maxPayloadSize is in bytes, integer (no scientific notation). The example shows 1 MB.
  • etag is required per the SCIM schema RFC for many responses, but can be disabled in ServiceProviderConfig, so we include it as enabled below.
  • authenticationSchemes is not required in the RFC, but the client consuming may require it because they may not have implemented the code to infer from a WWW-Authenticate header. The fields listed in the example are all required.

Assuming you are doing the minimum necessary and have many features omitted, here is an example:

{
  "schemas": ["urn:ietf:params:scim:schemas:core:2.0:ServiceProviderConfig"],
  "patch": { "supported": false },
  "bulk": { "supported": false, "maxOperations": 1, "maxPayloadSize": 1000000 },
  "filter": { "supported": false, "maxResults": 1 },
  "changePassword": { "supported": false },
  "sort": { "supported": false },
  "etag": { "supported": true },
  "authenticationSchemes": [{
    "type": "required - one of: 'oauth', 'oauth2', 'oauthbearertoken', 'httpbasic', 'httpdigest'",
    "name": "required - friendlier name of the value at type, like 'HTTP Basic Auth'",
    "description": "required - any additional information for an implementer to know"
  }]
}

Query Features

Sorting, pagination, attributes, and filters are encouraged, but optional.

Sorting queries may include a sortBy and sortOrder field. Default sorting must be ascending. Client example:
sortBy=userName&sortOrder=descending

Pagination queries include a startIndex integer, which is where the query must start, and count integer, which indicates the maximum results to return. Pagination does not require any locking or cursors to be stored. Clients must handle situation like results being added or removed while they are paginating.
Client example: ?startIndex=1&count=10
Server response example:

{
     "totalResults":100,
     "itemsPerPage":10,
     "startIndex":1,
     "schemas":["urn:ietf:params:scim:api:messages:2.0:ListResponse"],
     "Resources":[{
       ...
     }]
   }

Attributes can specify which specific fields should be returned. Likewise attributes may be excluded from the default set. For a request to GET /Users/{id} the client may also include a querystring of ?attributes=displayName to only return display name. Or perhaps they do not want to see any of the name attributes, so they could pass ?excludedAttributes=name.

Filtering is encouraged, but optional. It uses two-character operators as a query search syntax that a client can pass to a resource request. These operations are:

eqequal
nenot equal
cocontains
swstarts with
ewends with
prpresent; has value, is not null or empty
gtgreater than
gegreater than or equal to
ltless than
leless than or equal to
andlogical AND linking two or more expressions
orlogical OR linking two or more expressions
notlogically inverts an expression

Parentheses ( ) are used to group operations, and square brackets [ ] are used for attribute access.

Here are some example filters from RFC 7644:

filter=userName eq "bjensen"

filter=name.familyName co "O'Malley"

filter=userName sw "J"

filter=urn:ietf:params:scim:schemas:core:2.0:User:userName sw "J"

filter=title pr

filter=meta.lastModified gt "2011-05-13T04:42:34Z"

filter=meta.lastModified ge "2011-05-13T04:42:34Z"

filter=meta.lastModified lt "2011-05-13T04:42:34Z"

filter=meta.lastModified le "2011-05-13T04:42:34Z"

filter=title pr and userType eq "Employee"

filter=title pr or userType eq "Intern"

filter=
 schemas eq "urn:ietf:params:scim:schemas:extension:enterprise:2.0:User"

filter=userType eq "Employee" and (emails co "example.com" or
  emails.value co "example.org")

filter=userType ne "Employee" and not (emails co "example.com" or
  emails.value co "example.org")

filter=userType eq "Employee" and (emails.type eq "work")

filter=userType eq "Employee" and emails[type eq "work" and
  value co "@example.com"]

filter=emails[type eq "work" and value co "@example.com"] or
  ims[type eq "xmpp" and value co "@foo.com"]

Bulk

SCIM bulk operations are used to pass a variety of changes at once. It’s a way to stuffing what would be HTTP headers into a JSON array of objects. Each of the Operations will have a method (like POST, PUT, etc) and path – similar to a normal HTTP request.

Bulk requests will return status code 200 on the HTTP header, but may have different status codes for each of the sub-tasks.

Example bulk request:

POST /v2/Bulk
   Host: example.com
   Accept: application/scim+json
   Content-Type: application/scim+json
   Authorization: Bearer h480djs93hd8
   Content-Length: ...

   {
     "schemas": ["urn:ietf:params:scim:api:messages:2.0:BulkRequest"],
     "Operations": [
       {
         "method": "POST",
         "path": "/Groups",
         "bulkId": "qwerty",
         "data": {
           "schemas": ["urn:ietf:params:scim:schemas:core:2.0:Group"],
           "displayName": "Group A",
           "members": [
             {
               "type": "Group",
               "value": "bulkId:ytrewq"
             }
           ]
         }
       },
       {
         "method": "POST",
         "path": "/Groups",
         "bulkId": "ytrewq",
         "data": {
           "schemas": ["urn:ietf:params:scim:schemas:core:2.0:Group"],
           "displayName": "Group B",
           "members": [
             {
               "type": "Group",
               "value": "bulkId:qwerty"
             }
           ]
         }
       }
     ]
   }

The bulkId is a transient identifier used during the bulk request, used to track bulk operations historically.

The example response to the above bulk request is:

{
     "schemas": ["urn:ietf:params:scim:api:messages:2.0:ListResponse"],
     "totalResults": 2,
     "Resources": [
       {
         "id": "c3a26dd3-27a0-4dec-a2ac-ce211e105f97",
         "schemas": ["urn:ietf:params:scim:schemas:core:2.0:Group"],
         "displayName": "Group A",
         "meta": {
           "resourceType": "Group",
           "created": "2011-08-01T18:29:49.793Z",
           "lastModified": "2011-08-01T18:29:51.135Z",
           "location":
   "https://example.com/v2/Groups/c3a26dd3-27a0-4dec-a2ac-ce211e105f97",
           "version": "W\/\"mvwNGaxB5SDq074p\""
         },
         "members": [
           {
             "value": "6c5bb468-14b2-4183-baf2-06d523e03bd3",
             "$ref":
   "https://example.com/v2/Groups/6c5bb468-14b2-4183-baf2-06d523e03bd3",
             "type": "Group"
           }
         ]
       },
       {
         "id": "6c5bb468-14b2-4183-baf2-06d523e03bd3",
         "schemas": ["urn:ietf:params:scim:schemas:core:2.0:Group"],
         "displayName": "Group B",
         "meta": {
           "resourceType": "Group",
           "created": "2011-08-01T18:29:50.873Z",
           "lastModified": "2011-08-01T18:29:50.873Z",
           "location":
   "https://example.com/v2/Groups/6c5bb468-14b2-4183-baf2-06d523e03bd3",
           "version": "W\/\"wGB85s2QJMjiNnuI\""
         },
         "members": [
           {
             "value": "c3a26dd3-27a0-4dec-a2ac-ce211e105f97",
             "$ref":
   "https://example.com/v2/Groups/c3a26dd3-27a0-4dec-a2ac-ce211e105f97",
             "type": "Group"
           }
         ]
       }
     ]
   }

Implementing a partial SCIM API

The best way to know which resources and routes are required for a compliant SCIM API is to read RFC 7644. However there are some shortcuts which may make it possible to implement fewer parts of SCIM without catastrophic results. Only through testing will you know.

Perhaps your SaaS does not have user groups. Perhaps a special user should not be deleted. Perhaps you are feeling too lazy to implement the complex filtering language. Whatever the reason, for parts of SCIM you are not implementing for one reason or another, the following strategies may help:

  • List only the few supported operations of your server on the schemas and/or service provider config responses
  • Return 501 Not Implemented
  • Return 403 Forbidden for routes which should be implemented as part of the spec, but are not…(this isn’t recommended, but possible to do)

Private GitHub Issues Have Public Images

By on April 11th, 2018

And you can’t delete uploaded images without contacting support.

  • Go to a private GitHub repository.
  • Create a new issue.
  • In the issue description or a comment, insert an image.
  • Save the description/comment.
  • Now click the image – it will redirect to the raw image URL in your browser.
  • Copy the URL and share it with your friends – it’s public.
  • Now delete the issue or comment with the image.
  • The URL is still public.

I reported this to GitHub support, and while they agree it is probably not optimal, they do not have a plan to change the behavior (as of 4/11/2017).

GitHub documents the behavior here.

Firebase Pros and Cons for a SaaS Company

By on April 6th, 2018

Why Firebase

  • Realtime / streaming updates are pretty easy.
  • Getting a whole app off the ground is fast – auth, email, versioning, hosting, monitoring, devops, uptime.
  • The data structure is JSON which maps perfectly to UI JavaScript.
  • Libraries across programming languages are similar and pretty good.
  • Cheap/free initially.
  • Easy-ish to integrate with analytics and crash monitoring.
  • All you need for the backend of a small mobile app.
  • While schemaless, it has some basic ability to validate data types.
  • It should scale very well with certain kinds of traffic (reads and writes are not on shared objects).
  • Minimal, or no, knowledge of devops/sysadmin is necessary.
  • They now have decent tools to migrate your data out of it.

Why Not Firebase

  • Not a boring established technology.
    • You only get so many innovation tokens.
  • Your entire backend is proprietary (BaaS), owned and run by another company. If they shut down Firebase, you have to rewrite everything on their timeline instead of moving it.
    • This happened with a nearly identical service called Parse.
    • Parse was purchased by Facebook, then shut down. (Google bought Firebase but seems to be investing in it, and its replacement).
    • Google shuts things down all the time.
  • Firebase is somewhat deprecated in favor of Cloud Firestore.
  • Exceptionally expensive at scale compared to REST.
  • Not really possible to expose an API spec (swagger) with cloud functions.
  • Proprietary – complete lock-in:
    • Migrating off means rewriting all backend tests and much backend code. This is more dangerous than “just” rewriting the code and not the tests, because the tests make sure you didn’t mess something up during the migration.
    • Firebase is pretty unique in the way you interact with the APIs and realtime components, making a frontend migration a massive ordeal also.
  • Impossible to develop the app without an internet connection.
  • Security and data validation setup is tricky, and it cannot be unit tested.
    • Security and data validations are strings in a JSON file.
    • Must run security integration tests against a deployed app.
  • Having your main database public is a highly discouraged practice.
    • This may not be fully fair, but it is very easy to misconfigure things and expose data that shouldn’t be.
    • Normally databases are only listen on private interfaces or at least use IP restrictions.
  • You must duplicate any business logic across all systems that interact with firebase.
    • Strong anti-pattern for architecture and maintenance
  • Cloud functions are good for one-off tasks:
    • good for formatting a PDF invoice and sending it
    • good for processing a batch job
    • good for individual tasks delegated by a message bus
    • bad for a bunch of similar data update routes
    • bad for structuring and testing a large REST API
  • Unit testing firebase backend functions is way more complicated than a regular REST API.
  • Querying and aggregating are limited compared to SQL or popular NoSQL databases like MongoDB.
  • Transactions have some odd behavior – they might get “rerun” in the case of conflicts.
  • Database migrations are not supported at all.
    • red flag – basic service development
    • few band aid 3rd party open source solutions exist
    • means hand writing a migration framework that works with unit tests
  • Firebase recommends duplicating data because JOINs are unsupported.
    • red flag – architecture
    • not a problem in other NoSQL databases
    • this is a core competency of relational databases (SQL)
  • Integration with outside APIs and maintaining good testing is not simple like a regular server
    • stripe for example: need to expose backend webhook routes, test them pretty well
  • You are at the mercy of Firebase’s upgrade cycle.
    • If they decide to change or break something and force you to upgrade, you do not have a choice on the timeline, if there is one.
  • Optimized for realtime apps.
    • Only a downside if you don’t have a realtime app.
    • Many of the realtime benefits of Firebase can be had with a plethora of popular open source technologies.
  • Later services you write will likely not be written using firebase.
    • It reduces the surface area of your tech stack to pick a more boring database technology that will scale a long way and can be used for multiple services.
    • If you built your tech stack on say, Node.js or Go, each service would have similar paradigms.
    • With Firebase, now you have all the Firebase paradigms (coding, testing, build and deploying), plus your second service’s paradigms (coding, testing, build and deploying).

All these complaints are probably acceptable when you have a small mobile, a basic website or small ecommerce store, or something that will not grow beyond one server.

Building the core of a SaaS on Firebase, though, is not going to work. There are many blog posts about companies who hit a wall with Firebase, and eventually migrated off it. If you are building a SaaS, the question doesn’t seem to be if you will move off Firebase, but when.

 

Related:

https://crisp.chat/blog/why-you-should-never-use-firebase-realtime-database/

https://news.ycombinator.com/item?id=12526432

https://medium.com/@reactsharing.com/5-reasons-to-not-use-firebase-for-a-big-project-81b543c77e8c

https://firebase.googleblog.com/2017/09/why-we-migrated-to-firebase-and-gcp.html

 

VS Code, npm run-scripts, and mocha

By on June 29th, 2016

vscode touts itself as being easy to use for out-of-the-box Node.js debugging – which is true for single scripts – but debugging node-based executables (npm, npm scripts, mocha) takes additional setup based on your environment.

atom to vscode: faster with integrated debugger

Recently I switched from atom vscode, after being impressed by a talk at npm conf. I only had two complaints with atom that started to weigh on me – the often sluggish performance, and the lack of solid debugging for Node.js. (There are some 3rd party packages that try, but it is not core to the editor and often breaks).

vscode is like greased lightening. It’s everyting I missed from SublimeText 2, plus solid debugging and javascript support.

Setting up vscode for debugging basic node scripts is supported out of the box. The editor even generates working debug defaults at .vscode/launch.json. But if you want to debug npm scripts or other node-based executables, it is not straight forward at first. Turns out it’s pretty easy though.

I exclude .vscode/ in .gitignore because it ends up having settings specific to my environment and workflow.

The trick to running npm scripts or node executables is to use a hardcoded path to npm to launch the things. So for example, in package.json:

{
    "scripts": {
        "mocha": "mocha"
    }
}

and in .vscode/launch.json:

npm run mocha: use a hardcoded path to your npm exe. You can obtain it from the terminal with which npm.
{
    "name": "mocha",
    "type": "node",
    "request": "launch",
    "program": "/Users/jeffparrish/.nvm/versions/node/v6.3.1/bin/npm",
    "stopOnEntry": false,
    "args": ["run", "mocha"],
    "cwd": "${workspaceRoot}",
    "env": {
        "NODE_ENV": "test"
    }
}

At first it seemed like it would be possible to do something like:

DOES NOT WORK
{
    "program": "/usr/bin/env npm",
    "args": ["run", "mocha"]
}

or:

DOES NOT WORK
{
    "program": "${workspaceRoot}/node_modules/.bin/mocha",
}

but both failed.

 

Packaging Node.js Apps for Easy Deployment

By on May 12th, 2016

After 5 years in Node land, there’s nothing sadder than taking hours to deploy your new app. It was a breeze to develop and run locally. Throwing it on staging or production can become a beast of a task if you aren’t careful. There are so many guides with complicated deploy and server setup patterns, because Node and npm, plus build tools need to be installed on the server.

Easy deploys and easy rollbacks are my goal here. I usually just don’t want to hassle with a bunch of infrastructure. Docker is probably a nice tool if you have 50 servers, but that isn’t the case for most of us.

What I am not using

nexe

  • wonderful tool for producing small-ish Node executables
  • project mainentance has become questionable and it breaks a lot
  • native modules don’t work, despite what they may hint at

jxcore

  • compile your node app into a binary
  • more of a node.js competitor than node.js tool
  • for me, broke a lot and feels early days ( for example, at the time of writing, their website is unreachable, which is not a surprise)

nw.js (formerly node-webkit)

  • awesome toolset for making desktop node apps
  • build on one platform for another platform
  • smallest archive size is around 80mb
  • does not really work from the terminal
  • you can launch in background mode, but not primary use case

electron and electron ecosystem

  • TBD
  • also desktop-focused

What I use – nar

nar probably stands for “node archive” but who knows.

This tool just works. With it, you can package a node.js app into an executable archive. It compresses pretty small – I have decent sized express APIs that are around 15mb – 20mb.

You can actually build it on OSX, specify Linux as the target, and deploy to Linux – and it works.

Deployment workflow using nar

  1. build the executable
  2. scp the executable to the server
  3. (optional) add a startup script
  4. (optional) add logrotate script

Things we don’t have to really do:

  • build on the deployment environment
  • npm install
  • archive the app for transfer
  • install node and npm, build-essentials or whatever, on the app server

I consider nar a huge win for easy deployments because you can deploy to a fresh server from your local machine, like heroku or modulus, but using cheap VPS boxes.

What I Learned in a Two Week Microservice Bug Hunt

By on November 3rd, 2015

After a few years building platforms using (mostly) Node.js microservices, I thought I could troubleshoot problematic situations in minimal time, regardless of coding style, or lack thereof. It turns out – nope.

The two week bug hunt

There was this tiny bug where occasionally, a group-routed phone call displayed as if someone else answered. Should be easy to find. Except call logs touch several services:

  • service for doing the calls and mashing data from the call into call logs
  • internal service for CRUDding the call logs
  • external gateway api for pulling back calls that massages and transforms the data
  • native app display
  • 2 databases

Tripup One: complex local environment setup

We skipped setting things up with docker-compose or another tool that will spin up the whole environment locally, in one command. This is a must, these days. It would take 7 terminals to fire up the whole environment, plus a few databases and background services – and each service needs its own local config. There would still be phone service dependencies (this would be mocked in an ideal world) and external messaging dependencies (Respoke).

Not being able to spin up the whole environment means you better have excellent logging.

Tripup Two: not enough logging

Aggregated logs are the lifeblood of microservices, especially dockerized or load-balanced ones.

We use the ELK stack for log management. Elasticsearch, Logstash, and Kibana are wonderful tools when they have consumed all server resources and blocked the user interface processing data.

For these particular bugs, there was insufficient logging, and the problems only occurred when all of the microservices talked together. Because we have some special Asterisk hardware and phone number providers, it is a lot of work (if not impossible) to spin up the entire environment locally.

Thus, at first I started by adding a few logs here and there in the service. It was a round of Add Logs – PR – Deploy – Test – Add Logs – PR – Deploy – Test. Eventually I just added a ton of logging, everywhere.

I have this fear that I will add too much logging and it will cause things to go down, or get in the way. With few exceptions, you can’t have too much logging when things break. You can have bad log search. Also, at this point I have decided that the ELK stack will always consume all resources, so you might as well log everything anyways.

Tripup Three: forgotten internal supporting library

There was an internal library, written in the early days of the project, which had:

  • a unique coding style
  • no tests
  • poor commit messages
  • no comments
  • generic naming of variables and methods
  • several basic bugs in unused code paths

As it turned out, none of the bugs in this library were causing problems because those code paths were not in use. Nonetheless, I spent a full day understanding it.

Tripup Four: code generation in functional and unit tests

I am a firm believer, now, that DRY (don’t repeat yourself) has no place in unit tests, and probably not in functional tests either. Here are common things I ran into:

  • test setup has multiple layers of describe() and each has beforeEach()
  • beforeEach() blocks used factories which assigned many values to uuid.v4(), then further manipulated the output
  • layers of generated test values are impossible to debug

It’s best just to be explicit. Use string literals everywhere in unit tests. Minimize or eliminate nested describe()s.

Tripup Five: too much logic in one spot

In Node.js land, there’s no reason to have functions with complexity higher than 6 or 7 because adding a utility or library is cheap. It takes little effort to extract things into smaller and smaller functions, and use explicit and specific naming.

We had a ton of logic in Express routes/controllers. This is hard to unit test, because the realistic way to get at that logic is using supertest over mock HTTP. It’s better to make small functions and unit test input-output on those functions.

Conclusion

Eventually, I never found the bug – I found four, after careful refactoring of code and tests to the tune of several thousand SLOC.

The actual bug could have been only a one-line-fix, but finding it took weeks of coding.

Situations like this are often unavoidable. I am sure if I haven’t inflicted similar situations on colleagues in the past, I will in the future. It’s the nature of trade-offs you face when moving fast to test a business idea. The following things might help minimize that, though:

  • agree to a single .eslintrc file and never deviate
  • use a lot of small functions, and unit test them
  • don’t make a separate module without tests
  • be as explicit and repeat yourself in tests
  • be able to spin up a local dev environment with minimal commands, or run against a presetup testing environment

Simple Node.js Production Setup

By on August 4th, 2015

Right from your terminal – without a third party service

./deploy www.example.com

There are a lot of articles about how to setup Node.js in production, but they don’t always cover the full thing in an automated, easily deployable way. We will review how to setup a one-line Node.js deploy from your local terminal (OSX or Linux), with very minimal code.

No fancy third party services deploy services here – just a little bash and an upstart script.

Deploying Node via simple shell script

The script below will:

  • ensure Node.js is installed
  • make an archive out of your code
  • upload it via ssh to the server
  • log to a file and rotate the logs regularly to prevent filling up the disk
  • setup auto-starting on server reboot
  • setup auto-starting when the app crashes

You can reuse the script again and again to deploy your app.

While this isn’t a silver bullet, it lets you host Node.js apps on an extremely cheap VPS (virtual private server), if you like, without needing too much knowledge of server admin. VPS hosting can be orders of magnitude cheaper than cloud hosting – and faster. You can host a simple Node.js website for a dollar or two per month in many cases – extremely cheap.

Configuration (upstart .conf file) for Node.js app on Ubuntu 14.04

From inside your app directory:

touch myapp.conf # create it with your app name
chmod +x myapp.conf # make it executable

myapp.conf

####
# Edit these to fit your app
####
author "@ruffrey"
env NAME=myapp
env APP_BIN=app.js
####
# End editables
####

description "$NAME"

env NODE_BIN=/usr/bin/node

# Max open files are @ 1024 by default
limit nofile 32768 32768

start on runlevel [23]
stop on shutdown
# Respawn in case of a crash, with default parameters
respawn

script
    APP_DIRECTORY="/opt/$NAME"
    LOG_FILE="/var/log/$NAME.log"
    touch $LOG_FILE
    cd $APP_DIRECTORY
    sudo $NODE_BIN $APP_DIRECTORY/$APP_BIN >> $LOG_FILE 2>&1
end script

post-start script
  echo "\n---------\napp $NAME post-start event from upstart script\n---------\n" >> $LOG_FILE
end script

Deploy script for hosting on Ubuntu 14.04

From inside your app directory:

touch deploy # create it
chmod +x deploy # make it executable

deploy

#! /bin/sh

# immediately abort if any of these commands fail
set -e

####
# The name of your app goes here and should match the .conf file
####
APPNAME=myapp

LOGIN=$USER@$1
LOG_FILE='/var/log/$APPNAME.log'
LOGROTATE_FILE='/etc/logrotate.d/$APPNAME'
LOGROTATE_CONFIG="'$LOG_FILE' {
    weekly
    rotate 12
    size 10M
    create
    su root jpx
    compress
    delaycompress
    postrotate
        service '$APPNAME' restart > /dev/null
    endscript
}
"

# Make sure all the pre-reqs are installed. if not, install them.
echo 'Checking that the server is setup.'
    echo '\n Build tools \n'
    ssh $LOGIN 'sudo apt-get update; sudo apt-get install -y build-essential'
    # install node
    echo '\n Node.js \n'
    ssh $LOGIN '/usr/bin/node --version || (curl -sL https://deb.nodesource.com/setup_0.12 | sudo bash -; sudo apt-get install -y nodejs)'
    # setup logrotate
    echo '\n log rotation\n'
        ssh $LOGIN "sudo rm -f '$LOGROTATE_FILE'"
        # needs to use tee because echo does not work with sudo
        ssh $LOGIN "echo '$LOGROTATE_CONFIG' | sudo tee --append '$LOGROTATE_FILE'"

echo '\n Ensuring all necessary paths exist on the server.\n'
    ssh $LOGIN " sudo mkdir -p /opt/$APPNAME; sudo chown '$USER' /opt/$APPNAME; \
        sudo mkdir -p /opt/$APPNAME-temp; sudo chown '$USER' /opt/$APPNAME-temp; \
        rm -f /opt/$APPNAME-temp/$APPNAME.tar.gz"


echo '\n Doing some housecleaning \n'
    ssh $LOGIN 'rm -f /opt/$APPNAME-temp/$APPNAME.tar.gz;'
    rm -f "$APPNAME.tar.gz"


echo '\n Making the artifact \n'
    tar czf $APPNAME.tar.gz --exclude='node_modules' *
    du -h $APPNAME.tar.gz


echo '\n Copying the artifact \n'
    scp $APPNAME.tar.gz $LOGIN:/opt/$APPNAME-temp

echo '\n Setting up the new artifact on the server \n'
    ssh $LOGIN "cd /opt/$APPNAME; \
        sudo rm -rf *; \
        cp -f '/opt/$APPNAME-temp/$APPNAME.tar.gz' '/opt/$APPNAME'; \
        tar xzf '$APPNAME.tar.gz'; \
        sudo cp -f '$APPNAME.conf' /etc/init;"

echo '\n npm install\n'
    ssh $LOGIN "cd '/opt/$APPNAME'; sudo service '$APPNAME' stop; sudo /usr/bin/npm install --production;"

echo '\n Starting the app\n'
    ssh $LOGIN "sudo service '$APPNAME' restart"

echo 'Done.'
exit 0;

Deploy script usage

./deploy 255.255.255.255
# or
./deploy www.example.com

where the argument is the hostname or IP to deploy it to.

Troubleshooting

Check the upstart logs:

sudo tail /var/log/upstart/myapp.log

If those look ok, check your server logs

tail /var/log/myapp.log

Don’t stay with sudo

You should not run your app with sudo permanently (see the upstartmyapp.conf script).

After starting up the app (and listening on a port, if you do that):

process.setgid('users');
process.setuid('someuser');

Quickstart guide to setting up a static website on Google Cloud Storage

By on July 27th, 2015

Google Cloud Storage is durable file hosting (CDN) with advanced permissions. We’ll review how to setup a static website with a custom domain name, and deploy the files using a bash script.

Cost

Google Cloud Storage is at least as cheap as Amazon Cloud Storage.

In most cases, it will only cost a few cents per month.

But…HTTPS is not supported

For custom domains, you cannot (yet) use an SSL certificate for HTTPS.

HTTPS is available though when using https://storage.googleapis.com/YOUR-SITE-NAME-HERE/. That is best for backend files, not a public website.

Prerequesites

1. Google Cloud Credentials

Generate new service account credentials by going to the developer console: https://console.developers.google.com.

Click Create a new client ID then select a new Service Account. A JSON file will download. Save it in your project as gcloud.json.

2. Verify site ownership

Google requires you to verify that you own the site in Google Webmaster Tools. There are several ways to do that. If your website is new, most likely you’ll need to create a TXT DNS record with your registrar. Webmaster Tools will guide you through it.

3. Create a special bucket

Files on Google Cloud Storage are grouped into “buckets.” A bucket is just a bunch of files that you want to store together. I think of it like it’s own drive. You can have folders under a bucket.

The bucket name must be the domain name of your website. So forhttp://symboliclogic.io, the bucket name would be symboliclogic.io. For http://www.symboliclogic.io, the bucket name would bewww.symboliclogic.io.

Be sure to choose Standard storage. The other options are for backups and can take several seconds to be accessible. Standard class storage is fast and suitable for websites.

4. Set the default bucket permissions

You want to make all files public by default. Accomplish this by adding an access rules for allUsers which allows reading.

Do this for the Default bucket permissions, and the Default object permissions.

5. DNS record pointing to your site

After verifying ownership of your site, create a new DNS record that points your domain name to Google Cloud Storage.

It should be a CNAME type DNS record with the content c.storage.googleapis.com.

6. Upload files to the bucket with a Node.js script

First use the tool npm (bundled with Node.js) to install some dependencies into the current directory:

npm install async gcloud glob

Now put the following script at deploy.js then run it from the terminal:

node deploy.js
// Edit these to match your particular local setup.
var PATH_TO_GCLOUD_JSON = __dirname + '/gcloud.json';
var PATH_TO_STATIC_FILES = __dirname + '/build/**/*';
var GCLOUD_PROJECT_ID = 'mywebsite-project';
var GCLOUD_BUCKET_NAME = 'www.mywebsite.com';

// dependencies
var glob = require('glob')
var async = require('async');
var fs = require('fs');
var gcloud = require('gcloud');

var log = console.log;

var gcs = gcloud.storage({
    keyFilename: PATH_TO_GCLOUD_JSON,
    projectId: GCLOUD_PROJECT_ID
});

var bucket = gcs.bucket(GCLOUD_BUCKET_NAME);

glob(PATH_TO_STATIC_FILES, { nodir: true }, function (err, files) {
    if (err) {
        log(err.message, err.stack);
        return;
    }
    log(files.length, 'files to upload.');
    async.mapSeries(files, function (file, cb) {
        var dest = file.replace(__dirname + '/build/', '');
        var gFile = bucket.file(dest);
        log('Uploading', file, 'to', dest);
        bucket.upload(file, {
            destination: gFile
        }, function (err) {
            if (err) { log('ERROR', file, err.message); }
            else { log('Success', file); }
            cb(err);
        });
    }, function (err) {
        if (err) { log(err.message, err.stack); }
        else { log('Successfully uploaded files.'); }
    });
});

You should see the similar output:

7. Check your work

Now view your website.

Notes

Google Cloud Storage is a CDN, so it can take 10+ minutes to populate your files across the CDN regions.

You might also want to add a gzip step to reduce file sizes, and set the file metadata to have a header:

Content-Encoding: gzip

The Case for Stripe in MVP Apps

By on April 25th, 2014

With so many options available for online payments, I wanted to summarize the reasons why I feel strongly that it makes sense to use Stripe when building minimum viable products.

Focus on your business, not on payments

In a recent startup, we integrated with an internal payment system.

We probably lost 6 man-months of productivity. At times, it felt like we were in the payment business – not in the business we were trying to build.

Focusing on the wrong thing is what kills startups.

Mature, easy, and well documented APIs

Stripe has top-notch APIs, but their documentation and libraries are some of the best for any SaaS that I have ever seen. On several projects we’ve used Stripe’s docs as the model, but it’s deceivingly difficult to produce something so complete and so simple to digest.

Other providers have solid APIs, but they may not be mature.

Some providers have bad APIs, but are mature:

Easy to find developers

Along with excellent mature APIs comes an army of developers who can help work on your Stripe integrations. For non-technical founders, you will not have a hard time finding people with extensive Stripe experience (like me!).

The best user interface

People can forget that there’s a lot more to a payment provider than the APIs, too.

With Stripe’s UI, you can answer common business questions really easily. Things that you would have to build in your web app. Things that help you assess your startup’s burn rate. Things that help you service customers without wasting time.

  • What was the payment lifecycle?
  • How much has this customer payed me?
  • Do I have customers with the same email but different customerIds?
  • How many customers do I have?
  • When is a customer’s next invoice? When was their last invoice? Did they pay it?
  • Who has expired credit cards?
  • How much revenue did I have in the past week?
  • How much have I paid in credit card fees?
  • How much money do I have in escrow with Stripe?

Try to answer all of these questions on another provider in less than a minute – you can do it with Stripe.

The user interface is so good that you can just give customer service reps limited access. No technical knowledge required.

Easy international charges

Stripe does the currency conversions automatically and you never really have to think about it. I can’t express how much time this saves over other options, and allows your startup to charge internationally much earlier.

Bitcoin

Stripe is a mature payment provider that offers BTC integration. Other providers of Bitcoin billing are not as mature, but they do work pretty well.

Excellent webhook support

For any action on Stripe, you can get a POST webhook event to your server. This is incredibly useful for building all kinds of custom integrations with your CRM, doing additional billing, tracking internal analytics, and more.

Stripe will keep sending webhooks in case your server goes down, ensuring you get the data and respond with a success code. That saves you from having to implement a message queue (MQ) for payment things.

Recurring payments and trials

Stripe squarely handles this exceedingly complex trap-of-a-feature that plagues many SaaS services. I continue to be impressed how easy they make recurring payments.

Because of the payment lifecycle in Stripe, you can also do advanced billing pretty easily.

Advanced billing and storing metadata

I worked with a startup that had plans with tiers of service credits. So they wanted to:

  • charge a monthly service fee
  • give X credits with the plan
  • track credit usage during the month
  • bill all customers on the 1st of the month
  • calculate any credit overage and add that to the invoice before the customer was charged

This whole billing lifecycle took only about 25 hours to implement from start to finish. Doing the same thing on other providers isn’t really possible, or requires hacky workarounds. With Stripe, it was a natural part of the recurring billing lifecycle – that means we could produce clean code without dangerous hacks. Trust me, you don’t want hacky asynchonous checks in your billing system.

Every “object” in Stripe – planchargecustomer, etc. – can have stored metadata in the form of a simple JSON object hash. We just stored the plan limits inside the Stripe plan.metadata – so all the plan data was in one place. Then we used webhook events to add additional line items for plan overages. Stripe gives you a chance to update an invoice generated by recurring billing, before they actually charge the customer.

Where it falls short

Bank account charges. There are really limited options on the internet for charging bank accounts directly. Stripe is not one of those options. eCheck payment kind of suck though because you typically have to first have the customer verify their details with micro-deposits.

3rd party transfers. Stripe did away with their transfers API and is pushing Stripe Connect. Stripe Connect is a an excellent service, but it’s not as easy as it used to be – a simple ACH transfer by providing the routing number and account number. I miss those days.

Fees (maybe). The fees are about average. However, in my opinion they more than justify the time savings during development, the UI tools, and the customer service (yes, it’s pretty good, I have used it).