How to Test Package Main in Go

Simeon Visser

June 05, 2014 01:00

There are a few things to get right when trying to test the main package in Go. This is a result of package naming rules and where files need to be located. My attempts from other testing frameworks (such as Python) led me to error messages such as:

cannot find package "mypackage" in any of: /first/path/ (from $GOROOT), /second/path/ (from $GOPATH)
can't load package: package mypackage: found packages main (mypackage.go) and mypackage (mypackage_test.go) in /project/path
./mypackage_test.go:4: import "path/to/mypackage" while compiling that package (import cycle)

However, it should be possible to resolve all these errors by following this checklist:

An executable package must have the name main (so package main).
The name of the code file can be as desired, so example.go in this example.
The name of the test file should have _test.go at the end, so example_test.go.
The test file should be in the same directory as the code file.
The package name of the test file must also be package main.
There is no need to import the code in the test file as everything is in the same package.

You don't necessarily need to have these files in a subdirectory of your project; you can have all files in the root directory of the project.

On Modularity in Software

Simeon Visser

May 24, 2014 20:00

In 2009 I started working on a crossword editor written in Python called Palabra (Spanish for "word"). It was a desktop application developed for Ubuntu Linux that allowed anyone to construct a crossword grid, fill in words and enter the clues for each entry. It also included various operations on the grid, such as shifting cells, decorating a cell and viewing grid statistics.

To create a crossword the application needs to provide all those things: a visual editor displaying the grid, the ability to load and manipulate lists of words and so on. Implementing everything in the same codebase seemed like a sensible approach at the time. But looking back at the code five years later it has become apparent that there is a much better approach to constructing the same application: by splitting the functionality into multiple modules that can be released and developed independently. This approach is useful for very large software projects but it also applies to seemingly small applications such as a crossword editor.

For example, the application needs to have the ability to read and write the crossword to a file format. At the time (and still today) there are various file formats in use in the crossword community that a crossword editor could support. This includes the .puz file format, the .jpz file format and the .ipuz file format.

It's possible to include the read / write functionality for multiple file formats as part of the application's code. After all it makes little sense to use a crossword editor if you can't save your work. But this functionality can be reused in other applications or libraries without needing the full code of the crossword editor. One can imagine a Python library for .puz files, a Python library for .jpz files and a Python library for .ipuz files which can be maintained independently. The Python library for the .ipuz file format didn't exist yet so I have started to work on that (entry on PyPI). These modules can then be used by the Palabra crossword editor.

Taking this one step further we can now explore how a monolithic crossword editor can be split up into multiple modules:

A module for reading / writing / validating .puz files.
A module for reading / writing / validating .jpz files.
A module for reading / writing / validating .xpf files.
A module for reading / writing / validating .ipuz files.
A module with a canonical data structure for a crossword grid plus common operations on the grid.
A module for filling a crossword grid with words, either manually, semi-assisted or fully-automated.
A module for loading and processing word lists (including fast searches, such as "give me all five-letter words with an S in the fourth position").

Each of these modules can be useful for other applications or purposes. It also means that the desktop crossword editor needs to do no more than expose the functionality of the above modules to the user. For example, it is possible to create a web-based crossword editor by reusing all the Python modules and by reimplementing only the logic of the editor.

Participating in the Dodentocht

Simeon Visser

April 12, 2014 00:05

One year ago, to the day, I announced that I would be participating in the Four Days Marches Nijmegen 2013 which took place around Nijmegen, the Netherlands. Participants walk 30, 40 or 50 kilometers each day for four consecutive days. The distance that you should walk depends on your age and gender. In my case it was 50 kilometers each day making it a nice total of 200 kilometers.

This year I'm going to do something similar yet also very different. A well-known walking event in Belgium is the Dodentocht (Dutch for "Death march") which involves walking 100 kilometers in 24 hours. It takes place around Bornem, near Antwerpen. This is event is similar in the sense that it's a long distance but the nature of the challenge is very different.

It is probably going to be brutal but at the same time I have a better idea of what to expect after completing last year's event. As part of my preparation last year I had walked several walks of around 30 kilometers in distance which is decent but still far short of 50 kilometers. Surprisingly it's not the 50 kilometers on the first day that wears you down but the 50 kilometers on the second day. This is also the day where most people drop out.

Another lesson learned is that I didn't pace myself properly on the first day (a result of not knowing the distance involved). This meant that I finished early that day which allowed for a lot of time to eat and recover. However, it led to two unnecessary blisters which I had to endure for the next three days. Rookie mistake so I'm clearly going to pace myself better this time. At an estimated five kilometers per hour that makes for 20 hours of walking and 4 hours of breaks and food.

It's safe to say that the difficult part of this upcoming event will be in the second half when fatigue, tired legs and painful feet set in. Fortunately a warm meal is included along the way (if you include it in your registration) and there will be more food and drinks along the way. I'm going to prepare properly and it should once again be a fun event.

As an unrelated update I'm happy to say that I finally made the leap to an entry-level DSLR camera. After talking about it and dabbling a bit I now have a proper camera to learn the art of photography better: a Nikon D3200. You can already see the first results at my 500px profile.

Notes On Amazon RDS Replication Lag

Simeon Visser

November 13, 2013 21:30

What is replication lag?

If your system runs on Amazon Relational Database Service (RDS) you may have opted to configure one or more replicas for your main MySQL database(s). This means you have a master RDS instance and at least one slave RDS instance which receives updates from the master. This process is called replication.

Replication ensures that changes made on the master database also happen on the slave after some period of time. For a variety of reasons this period of time can increase. For example, a long-running query or erroneous query can cause replication to slow down or stop entirely. This results in replication lag: changes made on your main database aren't showing up on the slave replica because the replica is lagging behind.

Being informed when replication lag occurs

Amazon provides monitoring functionality that can alert you when replication lag becomes too high. One of the properties of a slave instance is Seconds_Behind_Master which can be viewed by executing the query show slave status; on the slave.

This field contains the replication lag as measured in seconds and it is usually a small number, such as zero, but it can increase as the lag goes up. When the number becomes too high you can configure Amazon's CloudWatch alert system to send you an e-mail.

This works fine as long as replication occurs normally. There are issues which can cause replication to stop entirely. In that case Seconds_Behind_Master could become NULL and you won't receive an e-mail (as that e-mail is only sent when the delay behind master exceeds a preconfigured threshold).

Resolving replication issues

When there is a replication issue the output of show slave status; is quite useful in debugging and resolving it.

You need to review the values of:

Slave_SQL_Running
Last_Error
Last_SQL_Error

When a particular SQL query failed on the slave it could be that execution of queries in general has stopped. This is indicated by Slave_SQL_Running having the value No.

In that case you'll either need to:

Remedy the error by fixing the issue that caused the SQL query to fail.
Decide to resume replication by letting the slave ignore that error.

The former situation can be tricky as it requires you to figure out what data or query is problematic based on the values of Last_Error and Last_SQL_Error. These fields may provide enough information to determine any incorrect records but this is not always the case.

In the latter case you would execute the following command on the slave:

CALL mysql.rds_skip_repl_error;

You should only run this command when you've determined that skipping the SQL query won't lead to inconsistent data or incorrect data on the slave (or, at least, that this is allowed to occur by skipping that particular SQL query).

How To Walk 200km in Four Days

Simeon Visser

August 10, 2013 23:34

After preparing for the Four Days Marches Nijmegen 2013 it makes sense to share some observations after completing the event a few weeks ago. The short story is that walking 200km in four days can be done, it's a fun event to participate in but it can also be painful at times.

The event combines friendly people, typical Dutch music and marching military forces into a fun spectacle with its ups and downs along the way. The blisters haven't fully healed yet but I made it to the finish line so here's the story.

This year's edition was characterised by hot weather which implies a lot of sweating and drinking water to stay hydrated. Fortunately this is easier to live with than a day of heavy rain which will pretty much guarantee blisters. In my case it didn't make much difference as I had six blisters by the end of the event. I learned that blister plasters can work like magic and they allowed me to continue walking.

Preparation

The organizers of the event are keen to emphasize that this is not a casual walk but an event that requires serious preparation to discourage people from signing up willy-nilly.

I didn't really prepare for fifty kilometers of walking as that is about ten hours (excluding breaks) of walking and it doesn't make sense to sacrifice so much time when you can learn to adapt during the event itself. You would also be walking that distance alone as few people are crazy enough to join you. The longest distance I walked in preparation was about thirty to thirty-five kilometers which I could do without pain and injuries so I thought I was ready to go.

On the first day you discover how far fifty kilometers is and on the other three days you walk the same distance by pacing yourself better. I made the classic beginner's mistake of walking too fast on the first day only to realize that I now had two blisters which I'd have to endure on the remaining days.

Logistics

An unexpected challenge of the event is not just the walking but also the logistics surrounding it. You have to get to and from the starting line and you have to eat, shower and sleep all within 24 hours. There is plenty of time to complete the walk each day but any time spent walking can't be spent sleeping.

It would be great if you could get dropped off a the starting line, walk fifty kilometers, and then get picked up afterwards. Alas, that isn't the case and this combination of factors means the fifty kilometers on the second and third day weigh on you differently than on the first day. I have to say that walking fifty kilometers isn't that difficult but it's a different story when it's done on four consecutive days.

There is plenty of food to purchase along the way and there are also people handing out free fruit and other stuff, such as candy and liquorice. You can bring your own food but I've also seen people having their breakfast at the first possible break (e.g., a bakery with extended opening hours).

The event

A person I talked to during the event, who had completed it numerous times, summed it up quite nicely. There are two events simultaneously taking place in the city of Nijmegen: one event is the party in the bars and streets of the city and the other event is created by the walkers and spectators alongside the road. The walk itself is their party and the spectators enjoy themselves too by watching and encouraging the walkers.

I can wholeheartedly recommend taking part in the event and I can see why some people make this event one of their annual holidays.

To be fair, at my age the distance of fifty kilometers is mandatory and practically it's much easier to prepare for fourty or thirty kilometers. Having experienced the event this year I'm not sure it makes sense to do it again next year with so many other events out there. On the other hand, it has been said that the itch to participate tends to occur next year when the application form is available online again. So, we'll see what happens.

On a final note, if you're going to participate in the Four Days Marches Nijmegen and you have any questions just let me know on Twitter.

How Does 'from future import ...' Work in Python?

Simeon Visser

May 27, 2013 00:18

A few days ago I learned that from __future__ import barry_as_FLUFL allows you to use the <> operator again for inequality in Python 3.3:

>>> 3 <> 4
  File "<stdin>", line 1
    3 <> 4
       ^
SyntaxError: invalid syntax
>>> from __future__ import barry_as_FLUFL
>>> 3 <> 4
True

PEP 401 details the history behind this import.

Those familiar with Python will know that the __future__ module is used to make functionality available in the current version of Python even though it will only be officially introduced in a future version.

For example, from __future__ import with_statement allows you to use the with statement in Python 2.5 but it is part of the language as of Python 2.6.

The syntax from module import function generally means that the function from the specified module is made available in the current scope and that it can be called.

But the earlier examples demonstrate that importing can also make new operators or keywords available. This poses the question of how this module actually works as it is somehow different from a normal import: Python does not allow anyone to implement new operators or keywords so how can we import them from a seemingly normal module __future__?

How does it work?

Let's have a look at the source of the future module. It turns out that anything you can import from __future__ has been hardcoded into the language implementation. Each import is specified using a _Feature object that records the versions in which the new feature is available (using the import and officially without the import) and also a special compiler flag.

Calling repr() on the imported object also shows this:

>>> repr(barry_as_FLUFL)
"_Feature((3, 1, 0, 'alpha', 2), (3, 9, 0, 'alpha', 0), 262144)"

Each of these compiler flags is a constant that is also stored in compile.h. This won't tell us much as it merely defines the available imports from __future__.

So let's look at the actual code that analyses the code for future language features, which is in future.c. Most importantly this file defines a function called PyFuture_FromAST which analyses the code and builds a PyFutureFeatures object that records which imported functionality from __future__ is needed.

This is not a normal module

We can now see why, although similar in syntax, the __future__ module behaves differently from normal imports.

As the new operators and keywords need to be recognized when parsing the Python source code it is necessary for Python to be aware of the 'futuristic imports' at a lower level than at the level of regular imports.

The abbreviation AST that we saw in the name of PyFuture_FromAST refers to Abstract Syntax Tree and this is precisely the level at which Python needs to know which operators and keywords are available: a source file is analysed, converted into an Abstract Syntax Tree and this data structure is then converted into bytecode which can be executed.

I think this sums up why importing from __future__ is different from other modules. One can also envision a language where operators and keywords can be defined in the language itself and then a __future__ module would import those as any other function or object.

But Python is not such a language. As a result the new operators or keywords are baked into the implementation and they can be made available using a special __future__ module.

List Comprehensions Are For Lists

Simeon Visser

May 08, 2013 12:44

List comprehensions in Python are a great way of expressing a list but, as the name suggests, they are for lists and not purely for iterating over an iterable and calling some method.

If you find yourself writing:

[obj.some_method() for obj in my_objects]

then you actually intended to write:

for obj in my_objects:
    obj.some_method()

I know, the first is a one-liner which feels exotic compared to an old-school for loop.

But the second version expresses what you want to do: call a method on each object and ignore the return value of that method. In the first version you're constructing a list, storing the return values and then forgetting about the list altogether because you're not assigning to a variable.

In CPython 3.3 both cases produce similar bytecode but a list is still constructed unneccessarily. Given that Python is designed for readability the second one expresses better what you're doing. If you do need to store the return values in a list then you can rewrite the code later.

Similarly, the same argument applies to the following construct:

map(lambda obj: obj.some_method(), my_objects)

If you need the constructed list of return values then you can rewrite it as a list comprehension. If you don't need the list then you can rewrite it as a for loop.

My First Coursera Course

Simeon Visser

May 03, 2013 18:08

With Coursera and other online learning platforms gaining in popularity I decided to give it a try as well with the course Science, Technology, and Society in China I: Basic Concepts by Naubahar Sharif of the Hong Kong University of Science and Technology.

Why this course?

It is a very accessible course which defines and explains the concepts in a clear way. It builds well upon my background knowledge of the course Philosophy of Computer Science that I followed back in the day at an actual university.

Prior to the course I was familiar with the notions of falsification and paradigms (i.e., what constitutes science and the methods of science) and I had a general understanding of technology. But it turns out a lot more can be said about the relationship between technology and science within society and how that leads to innovations.

I picked this course because it would give me a better insight on technology and innovation as viewed within a society, in particular a non-Western society such as China. It also makes sense to study how innovations come about as there are many developing societies that want to improve their ability to innovate.

It is not uncommon to hear of new startups in Silicon Valley that strive to "disrupt their market" or to refer to product features as "innovations" when that is mostly a debatable opinion. In this light it is interesting to learn that scholars have actually studied what innovations are, how they occur and why doing the same thing in another country is also considered innovative.

On following an online course

Within hours of the start of the course there were already people asking in the forums for clarification about the first assignment. These questions could have easily been answered if you had actually viewed the lectures. At first I thought this was part of typical internet drama: students unwilling to do the work necessary to complete a course.

Based on the question's formulation that is still the most likely explanation. However, it occurred to me that some people may not have an adequate internet connection to view the lecture videos as in most Western countries. For me this course is mostly intellectual curiosity but for others it makes sense to take this course to improve their business relationships with China by having a better understanding of their society.

If it takes an evening to download each lecture then it makes sense to ask for the correct video in the course forums right away (i.e., the lecture video that explained the concepts mentioned in the assignment). Nonetheless I was happy to see the bar being raised for the second and third assignments to make sure decent efforts are put into completing the course.

Summary

I have just completed my peer evaluations which ticks the final box towards completing the course. It's difficult to assess your own submissions, especially when it's far more subjective than assignments given in most beta sciences, but I think I'll pass the course.

Perhaps the most striking example of innovation in this context is that online higher-education courses make university-level lectures freely available to anyone worldwide. Speaking in academic terms, the question of whether that is a "disruptive innovation" is left as an excercise for the reader.

Why Wikidata is Great for Wikipedia

Simeon Visser

April 30, 2013 21:57

A recent Hacker News post made the Wikidata project more widely known in the tech community. It also created some confusion as the relevance of the project was not widely understood. I think Wikidata will easily be among the best projects for Wikipedia in 2013 and the future so it's good to explain why that is.

Wikidata solves a long-standing issue

First of all, it solves a long-standing issue that plagued many Wikipedia articles. Wikipedia's goal is the make knowledge widely available in many languages and many articles are indeed available in multiple languages.

Next to each article is a list of links to articles about the same subject in a different language. These links are manually added by specifying the language code and the title of the article. For example, if I want to include a sidebar link to the Wikipedia article in the English language I would include:

[[en:Wikipedia]]

where en indicates the language code of the project. However, these so-called interwiki links need to be added to every language. So every article stores interwiki links to all other articles about that subject.

This is fine as long as 1) nothing needs to change and 2) each article is unambiguously about the same subject. And this is where the problems begin: often an article would be renamed and then all interwiki links on all other Wikipedia projects would need to be updated.

This was not manually feasible so people developed bots to do this. Many bots would regularly update articles across all languages purely to keep interwiki links in sync.

With bots doing the hard work, all should be good, right? Not really because human error often introduced links to articles about a very closely but not exactly the same subject. For example, the article New York might describe the city on some Wikipedia projects while it discusses the state on others.

Bots don't understand this so they'll gladly review an article's interwiki links and rampantly copy across any missing links to other articles. This ensures interwiki links are the same in each language but it also means errors are propagated. Even worse, it requires human intervention to fix any incorrect links. If a Wikipedia contributor in some exotic language introduced a mistake in the interwiki links it would often propagate to other projects until it was manually fixed, everywhere.

Wikidata solves this by having one page per subject and by storing the interwiki links there. So instead of letting articles link to all other languages they now fetch the interwiki links from their corresponding Wikidata page. This means interwiki links can be updated in one place rather than in all places.

Long-term benefits of Wikidata

Now that we can have a unique page per subject we can build upon that foundation. The Wikidata page stores the interwiki links but it can also store facts about the subject. This makes the Wikidata project even more useful to Wikipedia.

For example, for cities it could store the number of inhabitants per census and for famous individuals it could store the date of birth, date of death and other relevant facts.

This makes it easy to:

Generate the information box at the right side of many Wikipedia articles
Produce tables with relevant information
Make it possible to generate specific lists (e.g., all cities that had one million inhabitants in 1950).

That last point is worth elaborating upon: Wikipedia's articles are categorised so it's currently easy to find people born in a particular year but it's not so easy to find, e.g., all cities with a mayor born in 1950.

In a way the information is currently on Wikipedia but it's not so easy to find as it requires the reader to manually connect these facts together. The existence of Wikidata allows Wikipedia to turn into Semantic Wikipedia and that makes knowledge even more freely available than the project already does.

Not DoingSomething Yet

Simeon Visser

April 26, 2013 22:50

Being new to London I thought it would be a good idea to try various approaches to meeting new people here. I'm not into online dating but I thought the concept of DoingSomething.co.uk would be a fruitful way of meeting new people. The site places emphasis on the activity rather than the person which makes for a different experience.

Great idea but the execution and implementation of the idea can certainly be improved. After signing up and clicking around on the website it quickly became apparent that, technically, there's still some work to do.

It's not possible to change your gender, address and date of birth after signing up. If someone moves to a different address or, indeed, changes gender, then surely that should be possible right? And date of birth could be incorrectly filled in at signup.
The "How to Use" page says that you can't send messages unless you're subscribed (i.e., becoming a paid member). Well, I don't have a subscription but I was able to send messages to other people.
If you don't fill in any message and click "Send" it will send the placeholder text as message (i.e., "Write your message here").
If you double-click on the "Favourite" button, the person is added twice to your list of favourite people.
If you double-click on the "Send message" button and then close the message popup dialog the website still has the coloured background overlay.
If you "Favourite" a person and then click "Remove favourite", the counter at the top still goes up so it shows you have two favourites, although you actually have none.
I can't seem to make the unread count for "Sounds good" go down, even after archiving those messages from those users. Even after refreshing and archiving it remains the same.
On the "100 Must Do Dates" page it shows the "Join Now" button even though you're logged in.
If you use the "Reset Password" feature then your new password isn't checked for minimum length or other constraints.

What surprises me is that the company has been in business since 2011 and presumably has paying members yet a rather cursory glance reveals these technical issues. I'd say these issues are not hard to find so the end-user experience of others must be affected by it as well. It also doesn't build trust to actually become a paid member of the site.

On the other hand these issues may have been fixed by the time you read this and knowing the technical issues is always the first step towards fixing them.