## Sunday, August 26, 2012

### Primes part 2: A segmented sieve

TL;DR: The sum of all primes <= 1,000,000,000,000 is 18,435,588,552,550,705,911,377

This post is a followup to Writing an efficient Sieve of Eratosthenes

A while back I wrote a post detailing a memory-efficient Sieve of Eratosthenes. The algorithm I used took advantage of lazy evaluation and a sliding array to reduce the RAM requirement to a fraction of what a 'vanilla' implementation would require, at the cost of a non-trivial increase in CPU time. The resulting code ran at approximately 25% the throughput of the vanilla implementation, and maxed out at 2 billion candidates.

While researching that post, I noted that the most efficient generators at present use either the Sieve of Atkin or a 'segmented sieve'. As an excuse to play with Go[1], a couple weeks ago I decided to implement a segmented Sieve of Eratosthenes. This post details my results.

## Implementation

Language: Go
Time complexity:
O(n∙log(log(n)))
Space complexity: O(√n)
Candidates in 1 sec: ~50,000,000

Gist (expand/contract)

The algorithm proceeds as follows:

1. Calculate primes up to √max via a vanilla array sieve
2. Slice up segments of about √max candidates for checking
3. To check a range,
1. For each prime p from 1., find the first multiple within the range that's >= p2
2. Cross off every multiple from there to the end of the range
4. Merge the results from the processed segments

You'll note that other than splitting the full candiate set into segments, this is the standard Sieve of Eratosthenes. Hence, it's the segmented Sieve of Eratosthenes.

In my Go version this is implemented by starting segments as individual goroutines that output to their own channels. A single worker goroutine is responsible for marshaling the results from these channels to a single channel read by the main thread. This architecture was chosen simply because it fits well with the Go way of doing things, but it also has the side-effect of providing some amount of free parallelism.

## Results

The very first run of this variant was faster than the most optimized version from my previous post. It runs at about 65% the speed of a vanilla implementation, making it about 2.5x as efficient as the previous lazy implementations, with a lower memory footprint. As always, a better algorithm is worth more than any amount of low level code tuning :). I should point out that in the current implementation I also implemented a bit array rather than simply using an array of bools. This reduced the memory footprint somewhat, but did not appear to have any significant impact in either direction on CPU time required, and so could reasonably be dropped to shorten the code.

With all primes needing to be marshaled back to the main thread parallelism maxes out below 2x linear. If all we care about is an aggregate value computed from the primes (the sum in this case), rather than the primes themselves in order, we can achieve >4x parallelism simply by adding more processes. This is also more efficient in general, and allows >300,000,000 primes to be processed in one second[2].

The net result is an implementation that can sieve 50M candidates in one second on one core or sum 300M on four; sum primes up to one trillion in an hour; or sum primes in a range of one billion (10^9) in the region of one quintillion (10^18) in under a minute. I'm happy with that...for now.

## Sunday, August 19, 2012

### The algorithms of memory

The human brain has the best storage system on the block in a lot of ways. It’s notably lossy and doesn’t have the easiest API to work with, but in terms of flexibility it’s second to none. Computer scientists have been trying to model and mimic its features for a lot of years, but we haven’t got it cracked quite yet. Part of the challenge lies in the sheer variety of access styles that human memory allows. I don’t think we even have them all cataloged yet, much less working together in one system.

I’ve been trying over the last couple days to list all the different patterns I can see myself using. I’ve also tried to pull out systems I know of that do the same for comparison, although I can’t find direct equivalents in all cases. Those without an existing equivalent are probably the most interesting - would mimicking these patterns be useful in the future?

NameMind exampleSystem exampleCharacterized by
Cache hitFacts immediately available for use with no delayIn-memory data storageSynchronous; low latency
Long term storageFacts that take a while to look up. “Um...his name... was.... Let's move on - it'll come to me in a minute.”Lookups from disk or tapeAsynchronous (notify on delivery); high latency
Information requestsAll the associations that come up when you think of a topic. “New Orleans” brings to mind...Web searchWeb of relationships; can be explored further in any direction
Background processingComing up with answers/solutions while you sleep or otherwise aren’t explicitly working on the problemUploading a video to YouTube - when you check again all the different formats and qualities will be availableProcessing item while in storage; queued work requests; separated from foreground tasks
Full ScanLife flashing before your eyes (wiki)Processing ordered events during crash recoveryOrdered; sequential scan of all items
Guided randomnessBrainstorming, free association games, mad libs, improv?Random item output or random exploration of web; subject to limited constraints
Unsolicited triggered remindersBeing reminded of facts/stories/events by things you see or hear? [1]
Unsolicited notifications; loosely related to recent input
Unsolicited untriggered remindersMemories that come to mind with no discernible trigger, e.g. past regrets?Unsolicited notifications; no relation to recent input; may be randomly derived
State affectingMemories that change your mood or attitude. E.g. remembering a birthday makes you happy; remembering specific near miss makes you cautious.? [2]State changes triggered by the contents of the information retrieved
Expectations
(suggested by e-socrates on Reddit)
"We constantly predict from memory what to expect, then compare that to experience"?Continuous short-term predictions; characterizing events as unexpected or surprising

## Footnotes

• [1] Google Now is trying for this to some extent. Online advertising partially fits, however it is not bubbling up details you already know - rather, it’s feeding you new data.
• [2] There are some trigger-based examples of this in security, e.g. logging of accesses to documents, but they don’t really change the state of the system itself (they trigger work in others instead).