network game development
A hobo server running on Go
I have picked the cheapest server in this universe for $5 / Month and deployed a game prototype made in Go
So I have picked the cheapest server in this universe, which costs $5 per month and deployed a game prototype developed with Go. I needed to find out how much connections a bargain cheap game server running on Go can handle. In my case, it was a machine with such specs as a single core 2Ghz CPU with 512 RAM and a rather slow HDD onboard.
If you are not familiar with what Go is, check out the basics here. It's a programming language developed by Google which compiles into binaries. Aside from a comfortable syntax, Go demonstrates decent speeds, here are the benchmarks:
https://hashrocket.com/blog/posts/websocket-shooto...
https://benchmarksgame.alioth.debian.org/u64q/go.h...
http://eric.themoritzfamily.com/websocket-demo-results-v2.html

Go's key feature is goroutines. Goroutines are non-blocking functions, which are spread by Go runtime to different OS threads. If threads are running on different cores - goroutines are working in parallel. On the contrary, Go makes use of a complicated scheduler, capable of forcing threads to work concurrently. Here's a popular Rob Pike presentation about concurrency and parallelism https://blog.golang.org/concurrency-is-not-parallelism https://blog.golang.org/concurrency-is-not-parallelism.
The challenge was to create a fast and scalable game server based on WebSockets. As of now, the server is broadcasting 60FPS every 17ms.
The game prototype is a 2D arcade, which makes use of free tiles in JRPG style: players can walk around, shoot and collide with walls.

Download current version

After Go's installation, you would need to set a GOPATH environment variable.
Here's the stress test launch command (keep in mind that you need node.js and npm installed)
cd stresstest
npm install
node main
An online game is prone to a lot of issues:
server setup
scalability
game economy
game design
client and server parser and serialization
replication algorithm
prediction on client (right now it's lerp)
physics on server and client
client itself, and cross-platform
microservices, authentication and etc.
In this article, I will tell about the physics engine and server performance
In this article, I will tell you about the physics engine and server performance.
The game server is divided into 'rooms'. The demo includes hundreds of these rooms, with each one comprising a separate level and being calculated independently. A room can be described as some kind of sub server as well.
Any player inside a room receives all stats about the other ones. Such implementation really simplifies data filtration for the clients. Anyway, even for a large world implication, you can still use some grid for data filtration purposes. And open-world titles like Planetside demand some dynamic graph adjustment. Other players' stats are closely related to the visibility aspect, rather than sole mutual distance one.
Physics Engine
I haven't used any Go-ported physics for there is a very few implication and they are rather excessive. Still, you can make use of Chipmunk version for the Go language. In my case a smart function for collision detection, that can return the collision depth and the vector was a must. When you know the vector data you can iteratively push objects apart just like most physics engines do.

Within massive physics frameworks, all the collisions create big chunks of data, vectors, arrays and so on. Such approach is slow.

So this is my engine in its entirety, except collision function:
func collideTwo(a *DynamicObject, b *DynamicObject, delta float64) bool {
	var colided bool

	var response collision2d.Response
	aType := a.shape.shapeType
	bType := b.shape.shapeType
	var overlapX float64
	var overlapY float64
	var v *Vec2
	if aType == SBOX && bType == SBOX {
		colided, response = collision2d.TestPolygonPolygon(a.shape.boxPolygon, b.shape.boxPolygon)
		overlapX = response.OverlapV.X
		overlapY = response.OverlapV.Y
	} else if aType == SCIRCLE && bType == SCIRCLE {
		colided, response = collision2d.TestCircleCircle(a.shape.circle, b.shape.circle)
		overlapX = response.OverlapV.X
		overlapY = response.OverlapV.Y
	} else if aType == SCIRCLE && bType == SBOX {
		v = fastCircleAABB2(&b.shape.boxPolygon, &a.shape.circle, &b.shape.box)
		if v != nil {
			overlapX = v[0]
			overlapY = v[1]
		}
	} else if aType == SBOX && bType == SCIRCLE {
		v = fastCircleAABB2(&a.shape.boxPolygon, &b.shape.circle, &a.shape.box)
		if v != nil {
			overlapX = v[0]
			overlapY = v[1]
		}
	}

	if colided == true || v != nil {
		var dx, dy float64
		if math.IsNaN(overlapX) {
			dx = -10 * delta
		} else {
			dx = overlapX * delta
		}

		if math.IsNaN(overlapY) {
			dy = 10 * delta
		} else {
			dy = overlapY * delta
		}

		if !a.Type.HasFlag(typeWall) {
			a.POS[0] -= dx
			a.POS[1] -= dy
		}

		if !b.Type.HasFlag(typeWall) {
			b.POS[0] += dx
			b.POS[1] += dy
		}
	}

	return v != nil || colided
}


func iterate(af *ActorF, delta float64, iteration int) {
	for _, a := range af.actors {
		a.shape.setPos(&a.POS)
	}

	var res bool
	for _, w := range af.walls {
		for _, a := range af.actors {
			if shouldCollide(w, a) {
			res = collideTwo(w, a, delta)
			if res && iteration == 0 {
				//w.onCollide(a)
				a.onCollide(w)
			}
		   }
		}
	}

	totalActors := len(af.actors)
	for ai, a := range af.actors {
		for i := ai; i < totalActors; i++ {
			b := af.actors[i]
			if a != b {
				col := shouldCollide(a, b)

				if col {
					res = collideTwo(a, b, delta)

					if res && iteration == 0 {
						b.onCollide(a)
						a.onCollide(b)
					}
				}
			}
		}
	}
}

func shouldCollide(a *DynamicObject, b *DynamicObject) bool {
	if (a.shape.collideTypes.HasFlag(b.Type) && 
(b.domain == nil || a.shape.ignoreId != b.domain.player.ID)) ||
		(b.shape.collideTypes.HasFlag(a.Type) && 
(a.domain == nil || b.shape.ignoreId != a.domain.player.ID)) {
		return true
	}
	return false;
}
Of course, if you want objects to collide accurately, you need to recalculate object's velocity constantly. I don't want to change player's velocity for a network game, because I need to have exactly the same engine running on a client side, and changing player's speed outside of his movement may cause unexpected behavior.

An example of a rather useful optimization for this engine is having a grid and storing all the objects in cells. As far as you don't need to check every object vs every object in the loop, you can check only objects from its cell and the neighboring ones.

I took github.com/Tarliton/collision2d for collision detection, forked it and optimized few methods.
Server stack
pprof - profiler
websocket - fork gorilla websocket for fasthttp
govendor - vendoring tool
fasthttp + fasthttp router

Fasthttp is an alternative to net/http, and like its author said it should be performing at an approximately 10 times faster rate.
In our case, that's not critical because we don't have a lot of players/connections, but we do send tons of data through the sockets. If you know a C10K problem, I must point out that in real time gamedev it occurs much earlier than 10,000 connections are established.

The initial pick was gin-gonic (http framework) + melody (websocket), but that stack turned out to be a slow one. Actually, except convenient parallelism, Golang speeds largely depend on lack of frameworks. If you want the fast code, you ought to implement the specific bulk you need. Libraries like Gin-gonic will create many wrappers with error checkings, — with several middlewares in the chain it would look like this:
As for the stress test purposes, I used a Node.js client, which can create players' connections. Players can walk and shoot. It's a very simple client that takes data from the server, parses it, and returns it with slight changes (for movements and bullet creation). Ironically this Node.js client required far more powerful hardware in comparison to our humble hobo server. That's the V8 performance: I saw set Interval with 40ms delay up to 50ms, 80ms and so on. Both server and client were placed within one private network with 0.5ms ping so I could test heavy loads without bandwidth limit.
'use strict';
var Parser = require('../front/cocos/assets/scripts/Parser.js'); //Парсит строку, делает js объект
var W3CWebSocket = require('websocket').w3cwebsocket;
var http = require('http');
var clients = [];

const host = "10.10.0.2:5050";

var packets = 0;

setInterval(function send(){
    console.log(packets + ' packets per second');
    ackets = 0
}, 1000);

function createClient(room) {
  //connecting to random room
    var client = new W3CWebSocket('ws://' + host + '/ws/' + room, null, 'http://localhost/');
    client.clientID = 0;

    clients.push(client)
    console.log('added client to room: ' + room + '; total: ' + clients.length);

    client.onerror = function(e) {
        console.log('Connection Error');
    };

    client.onopen = function() {
        console.log('WebSocket Client Connected');
    };

    client.onclose = function() {
        console.log('echo-protocol Client Closed');
    };
  
//every second player shoot in random direction and change movement speed
    setInterval(function changeDir() {
        if (!client.me) return;
        var v = 300;
        var angle = Math.random()* Math.PI*2;
        client.VX = Math.cos(angle) * v;
        client.VY = Math.sin(angle) * v;

        client.BVX = Math.cos(angle + Math.PI / 2) * v * 2;
        client.BVY = Math.sin(angle + Math.PI / 2) * v * 2;

        client.bullet = {
            ID: -1,
            CID: client.clientID,
            POS: client.me.POS,
            V: [client.BVX, client.BVY],
            TYPE: 4,
            OWNER: client.me.OWNER
        }
        client.clientID++;
    }, 1000);

        client.onmessage = function(e) {
        if (typeof e.data === 'string') {
            if (!client.playerId) {
                client.playerId = e.data;
            } else {
                client.data = Parser.parse(e.data);
                client.me = null
                for (var i = 0 ;i < client.data.objectsToSync.length; ++i) {
                    if (client.data.objectsToSync[i].ID == client.playerId) {
                        client.me = client.data.objectsToSync[i];
                    }
                }

                if (client.me && client.VX && client.VY) {
                    client.me.V[0] = client.VX;
                    client.me.V[1] = client.VY;
                }
            }
        }
    };
}


var CLIENTS_COUNT = 150;
var NUM_ROOMS = 100;
for (var i = 0; i < CLIENTS_COUNT; ++i) {
    var room = Math.floor(Math.random() * (NUM_ROOMS));
    setTimeout(
        createClient.bind(this, room)
    , 200*i)
}

console.time("interval");
setInterval(function send(){
    console.timeEnd("interval")
    console.time("interval")
    var cl = clients.length;
    for (var i = 0; i < cl; ++i) {
        var client = clients[i];
        if (client.me) {
            var str = Parser.serialize(client.me);

            if (client.bullet) {
                str = str + ';'+ Parser.serialize(client.bullet);
                client.bullet = null;
            }
            if (client.readyState == 1) {
                client.send(str);
                packets++;
            }
        }
    }
}, 40);


process.stdin.resume();
process.on('SIGINT', function () {
    console.log('aborted all clients');

    for (var i = 0; i < clients.length; ++i)
        clients[i].close();

    setTimeout(function() {
        process.exit (0);
    })
});
This is what a 'bot'-filled level looks like:
Benchmark
Results
Unexpected at all and extremely unlikely, our tiny hobo server is capable of handling 250 players concurrently. This equals to some 14k write operations and 6k read operations per sec. 100 simultaneously running rooms with physical objects, bullets, and players to boot. Obviously, a 30fps broadcast would increase the server capacity up to 400+ players and the same number of I/O operations. Speaking of RAM, an empty server consumes 7MB per process, while under load the number increases to 15MB, in total it means 15MB * 6 processes = 90MB. A ridiculous number isn't it? Anyway, this is the simplest problem of all, meanwhile, the much more sophisticated one is to build a scalable architecture. Right now it works 8 times faster on the 8-core setup because I have no threadlocks. The only lock I use is the one dealing with the main loop on connection insert and remove.

I will dwell into architecture later. To cut a long story short, architecture is based upon several layers of goroutines, buffered channels and atomic operations by sync/atomic.

For the project deployment I have used Docker. That's what .Dockerfile looks like.
FROM golang

ENV GOPATH /glng/
ENV GOBIN /glng/

ADD . /glng/

RUN go install github.com/dearcj/golangproj

ENTRYPOINT ["/glng/golangproj", "-port", "5050"]
EXPOSE 5050
We can launch it with a command like this:
docker run --publish 5050:5050 --name game --rm --net=host game
You the need --net=host flagged on because of this and this

Moreover, there are several nuances during Linux configuration for the websocket server. Yet we have to make sure that our VPS packets amount is not limited.

For vendoring I used govendor.
govendor init 
#govendor initialization
govendor add +external

go get github.com/username/library

govendor fetch github.com/username/library 
#after each package install we need to fetch library into vendor directory
The client was designed in Cocos Creator. It's a Chinese Unity clone capable of producing a great HTML5 version of the game, as well as Windows and mobile versions. The main shortcoming of Cocos Creator is its small community, so it takes quite an effort and persistence to get your questions answered. The only current game level for this project was made in Tiled.
Summing it all up
Go's libraries, documentation, and package manager leave a devastating impression — yes, they are that scarce. It may seem like there is a whole bunch of them, but cordially speaking, it all pales dramatically compared too 400k npm packages — Go's offering is poor. The thing I don't understand is frequent type casting. In Go's case you have int, int32, int64, so in built-in libraries all of them are being used accidentally, that is why you need to typecast so often.

Multicore development is the essence of fun in Go. You just have to build project with -race parameter and then you can observe all the race condition, and non-atomic operations in real time.

In terms of development, Go can be considered a fast-paced, not too much time consuming language. Entire game prototype took me 10 days to make. You can make scalable applications, and your code will still be high-level. I also had an eye on Erlang, but Immutable data approach that is common in functional programming, is no good for gamedev. Gamedev is a all about data mutation: vectors, matrixes, structures and so on. Well, you can bother yourself with atomic operations and memory sharing, or you could use sync channels to synchronize goroutines otherwise. Both will be fast enough.

This text was adopted and translated from this article (originally in Russian).
Made on
Tilda