bencarpenter.co.uk

Homebrew with multiple users

2014-03-10T20:06:00+00:00

I’m a great fan of homebrew, the package manager for OS X that gives access to a raft of useful tools. However, it’s designed for a single user, really, which is OK for a developer’s laptop but less cool for a shared computer or one that uses network accounts. For these cases, here’s a solution.

I make the assumption here that you have the developer tools installed and know your way around the OS X Terminal/iTerm/etc.

Create a separate user account for homebrew, for example brew. Give it a home directory of /usr/local, make the UID less than 500 (such as 401) and hide this user from the graphical logins. The following script does that lot for you.

#!/bin/bash
dscl . create /Users/brew
dscl . create /Users/brew RealName "Homebrew"
dscl . create /Users/brew Password '*' 
dscl . create /Users/brew UniqueID 401 
dscl . create /Users/brew PrimaryGroupID 80
dscl . create /Users/brew UserShell /bin/bash
dscl . create /Users/brew NFSHomeDirectory /usr/local
cp -R /System/Library/User\ Template/English.lproj /usr/local
dseditgroup -o edit -a brew -t user staff
chown -R brew:staff /usr/local

To hide the new user, run this as root:

defaults write /Library/Preferences/com.apple.loginwindow Hide500Users -bool YES

An eventful commute, one April mornin’

2012-04-04T19:25:00+01:00

Wow, now this was an eventful commute… Snow in April anyone? This was the non-view from a place called ‘Surprise View’, complete with Land Rover Discovery at a funny angle in the wrong lane and taking a while to get it straight again! I’d already picked up a friend who works for another business in Hathersage after his RS4 had become a little too friendly with a snowbank.

Most of us made it to the office one way or another and sampled a bit of extreme commuting that made up for the lack of snow through the rest of the winter. It was just a bit of a bummer that I’d taken the winter tyres off the car a couple of weeks back.

Alas, the snowploughs got out throughout the day and various parts were remarkably clear this evening, if you discount the 10’ snow piles in places (no, that’s not a typo and I’m not kidding; we’re talking big piles of snow here)!

Awk for Apache / Nginx logs

2012-02-15T11:10:00+00:00

Apache, Nginx and the like log every request your web server processes, unless you’ve configured them not to. Whilst statistics packages such as Awstats, Webalizer, Google Analytics and friends provide a useful overview there’s nothing that beats the raw data for being able to answer your own questions. Here, I introduce an awk script you can use to get your own analysis going.

There’s a useful article on The Art of Web, that covers the basics of awk and gives some good ideas on what you might do with it: Analysing Apache log files. However, if the URL contains spaces, or the user id field is wrapped by double-quotes, the status code is no longer $9… I ran into this problem for a small number of requests, but wanted to include them.

The following awk script handles the above cases and filters the log into a number of other (compressed) files, one for each HTTP status code and request method. This makes it much easier to then analyse just the 403 errors for example and by using tabs to separate the filtered format of the files, we can make it much easier to get accurate results from subsequent awk runs.

You’re welcome to use this script too. Download it: filter.awk or fork it on github: lapsedtheorist/awk-for-apache-nginx-logs.

If you have a directory full of logs with date-base filenames and want to investigate everything that happened in January, for example, this may be of use:

for f in /path/to/logs/access.log-201201*; do \
    if [ "${f:(-3)}" = ".gz" ]; then gunzip -c $f; else cat $f; fi; \
done | awk -f filter.awk

The above deals with both gzipped and plain text files at the same time. Do update the path and the filename to suit your server, of course.

##
#	Web server log file analysis & filtering
#
#	v1.2; Oct 2012
#	Ben Carpenter
#	http://www.bencarpenter.co.uk/awk-for-apache-nginx-logs
#
#	This awk script processes lines from a log format that matches the
#	'combined' log often used by the Apache and Nginx web servers. If your log
#	file format is different, amend accordingly, but for reference this is the
#	combined format this script expects by default:
#
#		%h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-agent}i"
#
#		%h		Remote host
#		%l		Remote logname (ignored)
#		%u		Remote user (ignored)
#		%t		Date and time of the request
#		%r		First line of the request, typically "GET /something HTTP/1.1"
#		%>s		Status
#		%b		Size of response in bytes
#
#	It tries to be efficient on resources, so there's minimal progress messages
#	and no system commands in the main loop other than writing to a file based
#	on the status code. The output files are written in a simplified
#	tab-separated format, error corrected for some strange things like spaces
#	in URLs and double quotes for the userid. This revised format is easier to
#	pass reliably through other awk scripts when filtering for specific data,
#	etc. The file format is:
#
#		IP, Date/Time, Method, URL, Status, Size, Referer, User Agent
#
#	You should be able to send a large (>1GB) amount of log data through this
#	script quite comfortably. This works well for me, but usual clauses apply
#	(use it at your own risk, etc.). Bug reports and suggestions for
#	improvements are very welcome
##
BEGIN {
	FS="( \"|\" )"
	intro="Processing..."
	printf "%s", intro
}

{
	split($1, a, " ")
	ip=a[1]
	# It seems some browsers/bots set the 'user' part to the blank string,
	# double quoted, which is therefore something that can foul our detection
	# for the status code, unless we explicitly look for it
	if($2!="") {
		datetime=a[4]" "a[5]
		request=$2
		referer=$4
		useragent=$6
		split($3, c, " ")
		code=c[1]
		size=c[2]
	} else {
		split($3, b, " ")
		datetime=b[2]" "b[3]
		request=$4
		referer=$6
		useragent=$8
		split($5, c, " ")
		code=c[1]
		size=c[2]
	}
	total=NR
	if(match(code, /^[0-9]+$/)==0) {
		# This status code, whatever it is, isn't a number so let's set it to
		# UNKNOWN so it's obvious in the analysis that this is a dud
		code="UNKNOWN"
	}
	statuses[code]++

	# Analyse the request
	n=split(request, detail, " ")
	method=detail[1]
	if(match(method, /^[A-Z]+$/)==0) {
		# This request method, whatever it is, doesn't 'look like' a request
		# method, so let's set it to UNKNOWN so it's obvious in the analysis
		# that this is a dud
		method="UNKNOWN"
	}
	methods[method]++

	# We want the URL, but we need to handle the case where the URL contains
	# one or more space characters, even though they shouldn't be there
	url=""
	for(i=2; i<n; i++) {
		url=(url" "detail[i])
	}
	url=substr(url, 2)

	# Create and add to a file for each status code
	file="http-status-"code".log"
	printf "%s\t%s\t%s\t%s\t%d\t%d\t%s\t%s\n", \
		ip, datetime, method, url, code, size, referer, useragent > file

	# Create and add to a file for each request method
	file="http-request-"method".log"
	printf "%s\t%s\t%s\t%s\t%d\t%d\t%s\t%s\n", \
		ip, datetime, method, url, code, size, referer, useragent > file
}

END {
	for(l=0; l<length(intro); l++) {
		printf "\b"
	}
	printf "%d requests filtered\n", \
		total

	# Write out some useful summary data
	printf "\n%-8s\t%11s\t%6s\t%s\n", \
			"status", "occurrences", "%", "output\tfile"
	for(code in statuses) {
		printf "%-8d\t%11d\t%6.2f\t", \
			code, statuses[code], (100*statuses[code]/total)
		# Close and compress each file, because they can be large
		file="http-status-"code".log"
		close(file)
		system("gzip -f "file)
		system("du -sh "file".gz")
	}
	printf "\n%-8s\t%11s\t%6s\t%s\n", \
		"method", "occurrences", "%", "output\tfile"
	for(method in methods) {
		printf "%-8s\t%11d\t%6.2f\t", \
			method, methods[method], (100*methods[method]/total)
		# Close and compress each file, because they can be large
		file="http-request-"method".log"
		close(file)
		system("gzip -f "file)
		system("du -sh "file".gz")
	}

	printf "\n"
}

Jigsaw - a stop-motion animation

2012-01-02T11:47:00+00:00

One thousand pieces… Eighty seconds… Tower Bridge pieces itself into view before your very eyes. Of course, this reveals the method I used to make the puzzle but really - who makes the sky first?!?

If you’d rather, you can view this as a 720p HD movie on Vimeo.com

Soundtrack: a clip from Action!!! by Talamasca.

Landscapes with Paul Hill & Karen Frenkel

2011-08-08T21:01:00+01:00

Last weekend I had the pleasure of a day picking up landscape photographic tips from the esteemed photographers Paul Hill and Karen Frenkel. There were quite a lot of us; I don’t know how they managed it but the advice they gave was great. Here’s a brief story on improving a picture with a little thought.

We were based for the morning at the top of Padley Gorge in the Peak District, photographing whatever took our fancy. As a guide, Paul had suggested looking for marks and by that meaning pathways as well as other evidence of visitors to the landscape. This bridge is a relatively recent addition so the wood is still quite clean; this first photo was a thought for the bridge being a gateway to the landscape, all be it across a rather small stream. There are problems with this picture, not least the vantage point that makes it more of a gateway to a set of woodlands. Behind me, at the time of shooting, was a wide open landscape and a half-decent view but the sky was a totally uniform grey stratus cloud. The corresponding photo from the other side of the bridge would be about a third sky, hence not on.

Not particularly satisfied, I turned to walk away in search of a better subject to find Paul Hill heading my way to take a look. I explained the above and after a couple of minutes we were stood the other side of the bridge, the side I had assumed to be not useful.

Yes, the view I thought I might take from here wasn’t going to work, but how about from a higher perspective?

So, yes, a reasonable photo but nothing special. Paul’s insight came next: just because the view of the subject is tall doesn’t mean the photo has to be portrait… Once this preconception was thrown out, we have quite literally a whole different view.

Now this, to me, is not an improvement: there’s a fussiness about the trees on the left hand side that seem a bit of an invasion and the balance of the image is lost. There’s a lot of sky in the water and this is very even either side of the bridge, pushing against the far right of the frame. Paul agreed, but demonstrating on his compact, a few tweaks to the framing does improve things.

Raising the camera brings some helpful reflections into the water on the right and exaggerates the length of the bridge. The trees on the left aren’t quite arranged in the frame properly, but a simple re-frame again corrects this and here we have a result I’m finally proud of.

Without Paul’s help, I’d never thought of this shot, let alone worked it through to the result shown above. For the curious all these were taken hand-held and using whatever rocks and such were around to stand as high as possible.

Happy with my photo, Paul and I started experimenting with other angles of view including standing on the bridge and with my interest clearly raised Paul’s work was done for the time being and he left me to it. Now then, methinks, there must be something to be had from just using the rungs…

I like the reflections of the water and the abstract nature of the picture - it’s still a bridge and whilst staring downwards is interesting there’s not as much impact as there could be.

There’s a geometry to the bridge though and a randomness to the water. Standing on the top rungs astride the bridge with the camera as high as possible, I worked for a while to get the lines straight and just the right amount of bridge and reflections. Of course, looking directly downwards, the sky doesn’t matter and in fact the matt grey of it actually helps with the reflections. The effort paid off - I really like this photo and impressed Paul with the results too: I must have learnt something!

Standing on the top rungs of the bridge makes it possible to exclude half and lose most of the background in blur. The texture of the second rung really comes through but for my mind, it’s not as strong as the first.

Relatively Popular: a plugin for Habari

2011-05-17T19:10:00+01:00

Blog popularity: it’s nice to know what’s popular and what’s not with visitors on a blog and a summary of the most popular entries helps those who’ve not visited before. Thing is, there’s a problem… most blog popularity measures are based on visits for all time, so what if you write a new article?

How it works

Page views are tracked over the last month and the most popular rises to the top of the list. If one article becomes more or less popular, it’ll move in the list. Sounds obvious really, but most popularity trackers collect data for all time, so if an article was really popular last year then it’d still be at the top even if no-one was actually reading it.

My plugin works differently–by storing sequential numbers of page views over consecutive time periods, the ‘old ones’ can be thrown away and the popularity measure is thus always current.

You can have it too

If you run Habari, you’ll find this plugin in the Habari Extras repo under the name relativelypopular. This plugin works well with Habari 0.7–0.9. Once installed, you can configure the number of time periods it tracks for and the length of one period.

The menu can be shown in a location of your choosing by adding the Relatively Popular Posts block from your theme configuration screen.

Once you’ve let it run for a few days, why not show a mini-graph in your theme by using this:

$RelativelyPopular->sparkline( $content );

You could make it look a bit like this, for instance:

You can find the code for this at github.com/habari-extras/relativelypopular; it’s not actively maintained any more since I moved away from Habari for this blog, but is linked for general interest.

Column/grid width & arrangements tool

2011-03-17T11:00:00+00:00

This page contains a plugin you might like to use if you need to work out an arrangement of columns for a given width, such as for a website design. Select the number of columns and drag the box edge to choose the total width.

The plugin shows all the available arrangements for column widths greater than twice the separation, with the width, height and separation indicated on each one. For the discussion and maths behind this, see my next post: Arranging columns…

Now with added goodness: Aspect ratio selection for common image formats

Arranging columns for a web page theme

2011-03-17T09:39:00+00:00

Yesterday, I published the first version of a tool to calculate and visualise website column widths which simplifies the process of splitting a content area into a number of equally spaced equal width columns. Yes, there are plenty of tools out there that will take a given column width and spacing and tell you the overall width, but there aren’t many that you can give the total width and the number of columns you’re looking for and you get to see the available column width options.

… which is where my tool kicks off from: a problem a friend had where four columns needed to fit in a given width. The total width was non-negotiable, but the width of a given column could be varied. Punching numbers in to a calculator gave a couple of results but it’s a pain to have to do that every time. Surely a bit of javascript on a web page could sort this?

Basically, we approach the problem as follows: For n columns in a width w, the width of the column c and inter-column spacing s are represented by the following, where a solution is valid if c and s are natural numbers and c > s.

As it turns out, the solutions to such a problem call on an amazing array of areas of mathematics: integer programming, relaxation methods and the extended Euclidean algorithm because the mathematical representation of this problem is a linear Diophantine equation in two variables.

For what it’s worth, the arrangements tool does none of the above because the set of numbers involved in the solutions is sufficiently small the whole thing can be done iteratively; it is realistic to compute and store the full set of solutions, which is nice.

Modem mode with the DG834N: beware the factory defaults

2011-02-20T14:20:00+00:00

We’ve recently moved premises at work, giving us the opportunity for dual broadband for bandwidth sharing and failover. Whilst the DrayTek Vigor 2920n is an excellent device, it needs a couple of ADSL modems running in modem-only mode (sometimes known as bridge mode) to be used with ADSL connections. As I had a Netgear DG834G and a DG834N available, I figured that a factory reset followed by device mode setup from the hidden page, then plug it all in would just work. Nope.

The DrayTek is a wireless router with a dual-WAN (Wide Area Network, i.e. internet) capability designed for WAN connections over ethernet. Cable internet satisfies this, but ADSL can too given a modem in bridge mode as a PPPoE to PPPoA converter.

The Netgear DG834G (used as part of my old setup wireless repeating for range extending) is a v3, that can be converted to modem-only mode from a hidden page: 192.168.0.1/mode.htm or similar. I factory reset it first (stick a WD40 tube in the back for 10 seconds), wired it up over ethernet to the DrayTek’s WAN1 and the phone cable in the ADSL socket of the DG834G. After adding the username/password combo for the ISP, I sat back for a few moments, let the devices talk to each other and rejoiced: connection established! Now for the DG834N into WAN2; rinse and repeat.

…And it doesn’t work.

For whatever reason there’s no connection even though all the lights on the DG834N display the same information as the DG834G. I swapped the G and N around - the G works on whichever ISP I ask it to; the N works on neither. Both were factory reset again, reconfigured in modem mode again but the N still refused to play ball.

As it turns out, my assumption that the factory reset would have the effect of setting all the important values the same on both devices (they are both from Netgear, and both DG834’s…) was wrong, in the only place it mattered: The VPI and VCI settings.

There’s not a lot of settings available from a DG834 in modem mode, but the defaults for VPI/VCI are 0/38 for the DG834G and 8/35 for the DG834N. Any decent ISP should be able to supply you with the values they require, if you ask; or look it up from their documentation. I’m in the UK and both my ISPs required 0/38. Once set, the DG834N plays ball again!

Whoop whoop, and damn the factory defaults - they have their place but always check what has actually changed.

Javascript: Path to the current script

2011-01-23T11:54:00+00:00

If–within a javascript file–you wish to find the path to this file, how’s it done?

Javascript provides helpful objects like self.location that can be used to find the URL of the page running the script but I was completely unable to find an object representing the path to the current script. This is useful when creating a plugin for distribution to other users in which there is no control over the folder the user will put the plugin in. It’s a lot easier to say to people “put all these files in the same folder” than it is to say “put all these files in a folder /scripts/foo/bar/” because your user might not have that much control.

Assume we have a file script.js that needs to load some other resources flash.swf and image.png. The file script.js is in the directory foo/bar appearing therefore at the URL http://www.example.com/foo/bar/script.js. Also, we assume this script is loaded via a from an HTML file that could be anywhere in the website. We can use the following function to extract the /foo/bar/ from the above script statement:

var scriptPath = function () {
    var scripts = document.getElementsByTagName('SCRIPT');
    var path = '';
    if(scripts && scripts.length>0) {
        for(var i in scripts) {
            if(scripts[i].src && scripts[i].src.match(/\/script\.js$/)) {
                path = scripts[i].src.replace(/(.*)\/script\.js$/, '$1');
                break;
            }
        }
    }
    return path;
};

This function basically scans the DOM tree for all the script tags and looks for the one with the same name as the script we’re currently executing. This has to be hard-coded, unfortunately, because if Javascript provided an object containing the name of the current script then it’d probably provide an object containing the current script’s path and this whole code would be redundant.

To use this reliably, ensure the script file has a unique name, such as .init.js or similar. Then, within the file .init.js, you can load the other resources using scriptPath()+'' such as scriptPath()+'/flash.swf' or scriptPath()+'/image.png'.

Because the above function scans the DOM every time it’s called, it’s sensible for performance reasons to store the result in a variable if it’s going to be called more than a couple of times.

Incidentally, script.aculo.us uses a variant of the above to load other resources but because their codebase depends on prototype they can write less code for this function. If your code depends on a library (jquery, mootools, prototype, etc…) then use what it provides; mine doesn’t, so I needed the above library independent method.

UPDATE March 2014: Thanks to Shawn who has been in touch to suggest an improvement to the above that allows for the query strings caching services typically add to the URLs such as ?v=3.8.1 or similar. In that case, the above will fail because the filename of the javascript file is expected to be the last component of the URL. To solve this problem you could use match(/\/script\.js($|\?.*$)/) and replace(/(.*)\/script\.js($|\?.*$)/, '$1') which makes the query string optional.