I went to TechFestival in September and did a talk at the Algorithmic Sovereignty summit. The talk touches on how Scuttlebutt can be used to regain some sovereignty over how data is shared and bring transparancy to the algorithms that runs on the data. The whole summit was a wonderful experience with lots of interesting people and discussions. When talking about these issues most people talk about the negative aspects of current systems such as Facebook, but rarely about solutions so I was really happy to give a more solarpunk positive story. The problems are really hard as it intersects human nature and technology but are increasingly important in todays society.
Since January I have been working on a grant to add a benchmarking ci framework to scuttlebot to better understand the behaviour and make it easier to find regressions as the implementation matures and changes. Furthermore I have been spending time understanding and finding things that can be improved performance wise.
The grant is almost at an end so I’ll use this blog to summarize what has been achived.
- Built a CI framework and visualization tool for the bench-ssb repo. I used the nci framework for this and have been quite happy with the result.
- Found and fixed a memory leak that made level-post really slow over time.
- Made several improvements to flume to improve performance. Some interesting links related to this: charwise and json performance related to buffers in node.
I have kept a dev diary for the duration of the grant. It has been a very interesting to really get to understand a relatively large and real world distributed system.
I wrote a small plugin for ssb that allows it to read through posts to discover dat links and seed them automatically. The cool thing is that you can tell it to only seed for people you follow. Besides the normal use case of a user only sharing friends links, this also makes it really easy to set up a pub that automatically seeds content shared by its member. This combined with something like beaker browser makes it easy to create content and share it in a totally distributed fashion.
I got myself an early Christmas present: an Acer Chromebook 14. I wanted a dedicated linux development machine and always wanted to try out Chrome OS to see what it was like. So far the overall experience has been good. The laptop is excellent for the price, the screen is good, keyboard is fine and best of all, the battery life is really good.
I installed debian stable (jessie) in a chroot on the machine using crouton. There i can run node, scuttlebot and tor and use emacs graphically using xiwiÂ for editing. The best part about crouton is that you can do a full chroot backup and store that somewhere safe. You can even have multiple chroots running at the same time for testing. Crouton seems a little rough, like I ran into this wierd node bug. But overall I like the seperation. I might end up installing debian directly at some point, but for now this suits me fine.
My latest contribution to the project is to add tor support. Meaning that the gossiping works in hostile network environments and you get location transparency for free. Tor seems like a perfect fit for scuttlebot and allows for censorship resistant distributed systems to be built. The following is a small guide for settings up a scuttlebot node (or pub in their terminology) on tor.
Configure tor to expose a hidden service on external port 8888 to
8008 locally (/etc/tor/torrc):
HiddenServicePort 8888 127.0.0.1:8008
Then setup sbot to use the onion address:
sbot server –host <YOURONIONADDR>.onion –port 8008
Please note that scuttlebot will make connections to non-tor nodes, so if you don’t want to expose your pubs ip and only have it communicate over tor, you need to pass it a –tor-only flag as well. The latest 9.4 is needed for this to work.
You can join my existing pub with the following invite code:
sbot invite.accept 355ij5sv346bpih2.onion:8888:@lbocEWqF2Fg6WMYLgmfYvqJlMfL7hiqVAV6ANjHWNw8=.ed25519~t7UhT15nIaDoWbdTZHVg4HJ1VHcmtl/FOplSyQfn03E=
In the spirit of holidays and sharing is caring, an update to my other post with some more interesting organizations to support:
2) Mozilla. I can’t think of another program I have spend so much time using over the last many years. Their support is important in order to remain independent and make the web a better place.
3) Randers regnskov This is a Danish one, but a rather important one I think. They pay for school to an indian tribe in Ecuador, and in response the families will not sell their rain forest land to big international companies. What a great idea and really worth spreading.
Some years ago I became an apostate by leaving the church and became a “real” atheist. In Denmark the church and the state are still quite heavily intertwined. Luckily the biggest portion of the money the church receives are from believers who pay about 1% of their annual salary through a special church tax. Many of these never go to church, except for Christmas, when they get married or when they get children and need to get them baptized. What would happen if they spend that money on charity instead?
I have been thinking about ways to spend some of the money I don’t have to pay the church for altruistic purposes. So far I have donated to three different causes that I think makes the world a better place.
We need to nurture and make sure that information in the public domain stays there and are available to anyone with an internet connection free of charge.
2) Khan Academy
I have always been interested in how we can improve the education system and now being a father certainly helps me appreciate any progress being done in this area. Salman Khan tells their story in this TED talk much better than I can do.
There are a wealth of different organization trying to help the poor in need, but I always find them lacking in that they seem to focus on individuals rather than looking at the bigger picture. When I watched Charmian Gooch talk about global corruption I instantly felt that this was finally something worth supported and something I could see really make a difference in the long run.
I was looking into some performance problems with knockout yesterday and starting digging into the code. In particular I was wondering why it would take their mapping plugin several seconds to wrap an array of 500 objects. Profiling lead me to look deeper into the code, and the following code stroke me as very strange:
Basically this is a hash implementation of only put and get operations using two arrays. I don’t think I need to tell you how bad the performance of this will be once you start adding lots of items 🙂
The reason why they didn’t just use an ordinary object to store their values, was that their keys were objects and not just simple strings or integers. I started wondering what other people did with this problem. There must be a library for this. And low an behold there was.
Then I started thinking, why don’t you just jsonize the object and then store that as the key in a plain js object. Both things should be really fast, since every new browser nowadays has a very fast JSON implementation.
With the 3 implementations in hand I did a simple benchmark. Generate 10.000 objects, store them in the hashtable with the object as a key. Followed by getting all the objects again.
Using Firefox 12: naive double array: 1370ms, json: 77ms, jshashtable: 170. Using Chrome 19 and IE9 the performance gap is very similar. Chrome is of course a tad faster.
Update: reran the benchmark using suggestions from Tim.
Code is available here.
For some applications, the ability to be able to trace the different states during the lifetime of data can be very important. Especially when it comes to debugging. This is mostly relevant for data stored in a database, but could potentially also be interesting for in memory data structures. Luckily many databases today support this as change data capture. I would add to the article that the capturing the user as part of the change can be very effective.
I have recently been involved in the development of two systems where this pattern has been employed to great success. One where this was implemented in hand, and one where CDC in SQL server was used. The hand implemented solution had some special requirements that meant we did this in hand, and it also doesn’t tie us to a particular db. While not free at all, once it is up and running, it provides an invaluable tool for reasoning about the data.
I was very pleasantly surprised when datomic was announced with a time model as one of the core concepts. I really think that in this age where storage is so cheap, that we can’t afford to threat data as mutable.
I really love the kickstarter concept, flattr and in general other great free projects, such as wikipedia and wikileaks. Then there are smaller projects such as The Project Hate MCMXCIX trying to fund the recording of their latest album. Or the interesting, but maybe a bit top low budget TV series Pioneer One. Or the excellent free android mod cyanogen.
I donated to the recording of Obscura’s demo collection this month, what did you donate to?