Inspired by randomness 6

Posted by Jonas Elfström Thu, 27 Oct 2011 21:19:00 GMT

Inspired by Randomly not so random I decided to play around with the subject.

Both Java and C# uses a linear congruential generator as their pseudorandom number generators. I found this at MSDN

The current implementation of the Random class is based on a modified version of Donald E. Knuth's subtractive random number generator algorithm."

but I couldn't find any information on what values Microsoft uses for the m, a, and c parameters in their LGC implementation.

The German Federal Office for Information Security (BSI) has established four criteria for quality of deterministic random number generators. They are summarized here: K1 — A sequence of random numbers with a low probability of containing identical consecutive elements.

From the pseudorandom number generator article on Wikipedia. Let's see what we can do about that.

1
2
3
4
5
var random = new Random(116793166);
for (int i = 0; i < 9; i++)
{
    Console.Write(random.Next(10) + " ");
}


Outputs:

1 1 1 1 1 1 1 1 1

Did I just break the K1 criteria above? You know I didn't but you might be interested in how I found the seed number. It's easy but also quite computational intensive, that's why I used Parallel.For.

1
2
3
4
5
6
7
8
9
10
11
12
Parallel.For(0, int.MaxValue, i =>
{
    var rnd = new Random(i);
    bool allSame = true;
    for (int j = 0; j < 9; j++)
    {
        allSame = allSame && rnd.Next(10) == 1;
        if (!allSame) break;
    }
    if (allSame)
        Console.WriteLine(i); 
});


It took a couple of minutes for the number 116793166 to show up in my console.

To keep on playing I needed a random string generator. Mine looks like this.

1
2
3
4
5
6
7
8
9
private const string _chars = "abcdefghijklmnopqrstuvwxyz";

private static string RandomString(int size, Random rng)
{
    var buffer = new char[size];
    for (int i = 0; i < size; i++)
        buffer[i] = _chars[rng.Next(_chars.Length)];
    return new string(buffer);
}


If the random generators were the same it looks like it should give the same result as the Java one but RandomString(5, new Random(-229985452)) returns tfsld. Either my RandomString method works different than the Java one or it's the case that Java and .NET has slightly different settings of their LGCs.

Mathematically there's an infinite amount of inputs that results in a returned hello, for instance 1382472294, but here we are limited by the size of our integers.

I found the seed 1382472294 with the following little method and loop

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
private static bool CompareToRandomString(string str, Random rng) 
{ 
    for (int i = 0; i < str.Length; i++)
    {
        if (_chars[rng.Next(_chars.Length)] != str[i])
            return false;
    }
    return true;
}

...

const string lookingFor = "hello";

Parallel.For(0, int.MaxValue, i =>
{
    var rnd = new Random(i);
    if (CompareToRandomString(lookingFor, rnd))
        Console.WriteLine(i);
});


I leave it up to you to implement a break out of that loop.

Ruby does not use a LGC but a Mersenne twister instead. It's still only pseudorandom so there's no problem finding patters you like and being able to repeat them.

1
2
srand(2570940381)
9.times { print "%d " % rand(10) }

Outputs:
1 2 3 4 5 6 7 8 9

1
2
3
charset=('a'..'z').to_a
srand(124931)
puts (0...4).map{ charset.to_a[rand(charset.size)] }.join

Outputs:
ruby

Lazy evaluation is no friend of mutable state 1

Posted by Jonas Elfström Sat, 01 Jan 2011 16:08:00 GMT

A couple of days ago I accidentally landed on a page about implementing merge sort in C#. I thought it would be a nice exercise to try to implement that as a generic method and so I did.

I also wanted to learn more about the characteristics of IEnumerable so I used IEnumerable<T> instead of List<T>. That choice got me in trouble and I opted for help on Stack Overflow.

People said it was sorting correctly but Jon Skeet also asked if I tested it correctly and that I did not. I digged deeper into the problem and extended the question on SO. I had a hunch that it was the mutable state of List<T> and the lazy evaluation of IEnumerable that was the problem but I couldn't quite figure out how.

Along came Kobi and his answer finally made me understand why a .Sort() of the list messed up the result of my sorting.

I then changed the implementation to be fully lazy evaluated and now it looks like this.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
public class MergeSort<T>
{
    public IEnumerable<T> Sort(IEnumerable<T> arr)
    {
        if (arr.Count() <= 1) return arr;

        int middle = arr.Count() / 2;
        var left = arr.Take(middle);
        var right = arr.Skip(middle);
        return Merge(Sort(left), Sort(right));
    }

    private static IEnumerable<T> Merge(IEnumerable<T> left, IEnumerable<T> right)
    {
        IEnumerable<T> arrSorted = Enumerable.Empty<T>();

        while (left.Count() > 0 && right.Count() > 0)
        {
            if (Comparer<T>.Default.Compare(left.First(), right.First()) < 0)
            {
                arrSorted=arrSorted.Concat(left.Take(1));
                left = left.Skip(1);
            }
            else
            {
                arrSorted=arrSorted.Concat(right.Take(1));
                right = right.Skip(1);
            }
        }

        return arrSorted.Concat(left).Concat(right);
    }
}


Please be aware that this is but an exercise and not very efficient.

Now to the problems that you can encounter with the above and an explanation to the title of the post. Let's MergeSort a simple List<int> followed by a call to the built-in .Sort() on List<T>.

1
2
3
4
5
6
var ints = new List<int> { 2, 3, 1 };
var mergeSortInt = new MergeSort<int>();
var sortedInts = mergeSortInt.Sort(ints);
// sortedInts.ToList() is {1, 2, 3}
ints.Sort();
// sortedInts.ToList() is {3, 1, 2}


So what's going on here? As far as I can tell it's something like this. sortedInts isn't evaluated until the first call to MoveNext() (or Tolist() or any of those). Before that it only has lazy pointers to the original enumerable values. ints.Sort() messes up the beauty of lazy evaluation by changing the underlying data structure. It can do that because List<T> is mutable (writeable).

But why {3, 1, 2} after ints.Sort()? The original sequence was { 2, 3, 1} and that is what the MergeSort sorted, not by creating a new sequence but only by lazy pointers.

At first the MergeSort sorts like this

but after the source list has been sorted/changed it will do this

What can you do to stop this to happening to you? The boring way is to never use lazy evaluation but that is kind of hard if you happen to use LINQ (and you should, it's great). Another way is to use immutable data structures and that is how functional languages tackles this problem. In C# we have the ReadOnlyCollection and it obviously has no Sort() method.

A nice feature of the MergeSort above is that it has no problems with an immutable collection.

1
2
3
4
5
var rints = new ReadOnlyCollection<int>(ints);
var sortedRints = mergeSortInt.Sort(rints);
// sortedRints.ToList() is {1, 2, 3}
ints.Sort();
// sortedRints.ToList() is {3, 1, 2}


What?! How could an immutable collection get messed up like this? It turns out that ReadOnlyCollection<T> is only a facade for mutable collections and that is what bit us here. You have to pass a copy of the list. Example follows.

1
2
3
var rints = new ReadOnlyCollection<int>(new List<int>(ints));
var sortedRints = mergeSortInt.Sort(rints);
ints.Sort();


That also works for List<T>.

1
2
3
4
5
6
var ints = new List<int> { 2, 3, 1 };
var mergeSortInt = new MergeSort<int>();
var sortedInts = mergeSortInt.Sort(new List<int>(ints));
// sortedInts.ToList() is {1, 2, 3}
ints.Sort();
// sortedInts.ToList() is {1, 2, 3}


Finally I would like to send a big thank you to Kobi for sorting out my problems with the original solution.

C# implicit string conversion 3

Posted by Jonas Elfström Wed, 03 Feb 2010 16:32:00 GMT

I know how it works and I think I can see why but I'm still not very fond of how eager C# is to perform implicit string conversion.

Contrived example:

1
2
string s = -42 + '+' + "+" + -0.1 / -0.1 + "=" + (7 ^ 5) + 
      " is " + true + " and not " + AddressFamily.Unknown;

s will be set to "1+1=2 is True and not Unknown"

The answer is in white text above, select the text to see it.

A more real problem is something like this


string str = 1 + 2 + "!=" + 1 + 2;

str will be set to "3!=12".

Edit 2010-02-08
This wouldn't be much of a problem if all objects in .NET always returned a decent string representation of their current state/value with ToString() but that's not the case. Instead "The default implementation returns the fully qualified name of the type of the Object.".
I don't like the inconsistency. It's way too late now but I think it would have been much better if only objects that really produces a human readable output of the data in the object should implement ToString(). If you want the name of the type of the Object there should be another way.

Finding primes in parallel 18

Posted by Jonas Elfström Thu, 14 Jan 2010 21:55:00 GMT

Justin Etheredge has been blogging about his challenge to find prime numbers with LINQ. He later used AsParallel() (coming in .NET 4) to speed things up and then followed that up with a post about using The Sieve Of Eratosthenes.

As you can see in the comments of those posts I tried to speed the Sieve of Eratosthenes up by using Parallel.For in the inner loop. I also tried AsParallel() in the LINQ expression but it made no difference in either case. At most it got 5% faster. I'm not sure but it could be that because SoE is very memory intense we could have a scaling issue and maybe also memory bandwidth exhaustion. This is mere speculation.

I then searched for other algorithms and found The Sieve of Atkin. It uses less memory than SoE so I thought I'd give it a try.

I set the limit to 20,000,000 and then benchmarked it. It timed in on 2.48s so actually worse than the 2.2s that SoE took. Not good! Then I added Parallel.For in the loop that did most of the work and lo and behold, it scaled! I have two cores in my machine (T7200@2.0GHz) and the average runtime went down to 1.26s. That's almost linear and surprisingly good! If you happen have a quad core (or more) and feel like trying it out then please contact me. It would be interesting to see if it scales further.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
static List<int> FindPrimesBySieveOfAtkins(int max)
{
   //  var isPrime = new BitArray((int)max+1, false); 
   //  Can't use BitArray because of threading issues.

    var isPrime = new bool[max + 1];
    var sqrt = (int)Math.Sqrt(max);

    Parallel.For(1, sqrt, x =>
    {
        var xx = x * x;
        for (int y = 1; y <= sqrt; y++)
        {
            var yy = y * y;
            var n = 4 * xx + yy;
            if (n <= max && (n % 12 == 1 || n % 12 == 5))
                isPrime[n] ^= true;

            n = 3 * xx + yy;
            if (n <= max && n % 12 == 7)
                isPrime[n] ^= true;

            n = 3 * xx - yy;
            if (x > y && n <= max && n % 12 == 11)
                isPrime[n] ^= true;
        }
    });

    var primes = new List<int>() { 2, 3 };
    for (int n = 5; n <= sqrt; n++)
    {
        if (isPrime[n])
        {
            primes.Add(n);
            int nn = n * n;
            for (int k = nn; k <= max; k += nn)
                isPrime[k] = false;
        }
    }

    for (int n = sqrt + 1; n <= max; n++)
        if (isPrime[n])
            primes.Add(n);

    return primes;
}

This code needs C# 4.0 to compile.

Edit 2010-12-14

Dommer found out that the BitArray implementation had some serious threading issues. I had my worries about the non thread safe characteristics of BitArray but I thought that the isPrime[n] ^= true; was an atomic operation and that it didn't matter in what order bit bits was flipped would make it possible to use anyway. Not so. Changed it to a boolean array and that seems to rock the boat but of course at a much higher memory cost.

Edit 2010-01-20

Indications are that this does in fact not scale very good on a quad core. It's even worse, it seems it scales good on my old T7200 but not on a dual core E6320. I don't know why but of course the shared state of the isPrime BitArray is a huge problem and maybe it could be that differences in CPU architecture (FSB speed, caches and so on) in the E6320 is an explanation. Average execution time on the E6320 was 1290ms in a single thread and 1064ms in two.

If you want to try this in an older version of C# than 4.0 then check out this post.

A reader asked how I timed the executions. Here's how.

1
2
3
4
5
6
7
8
9
10
11
12
13
var steps = new List<long>();
var watch = new Stopwatch();

for (int i = 0; i < 10; i++) 
{
    watch.Reset();
    watch.Start();
    var primes = FindPrimesBySieveOfAtkins(20000000);
    watch.Stop();
    Console.WriteLine(watch.ElapsedMilliseconds.ToString());
    steps.Add(watch.ElapsedMilliseconds);
}
Console.WriteLine("Average: " + steps.Average().ToString());



Edit 2010-10-24

Tom's code from the comment below

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
using System; 
using System.Collections.Generic; 
using System.Linq; 
using System.Numerics; 
using System.Text; 
using System.Threading.Tasks;

namespace Calculate_Primes 
{ 
    class Program 
    { 
        private const int _NUMBER_OF_DIGITS = 100;

        static void Main(string[] args)
        {
            BigInteger floor = BigInteger.Parse("1" + string.Empty.PadLeft(_NUMBER_OF_DIGITS - 1, '0'));
            BigInteger ceiling = BigInteger.Parse(string.Empty.PadLeft(_NUMBER_OF_DIGITS, '9'));

            Console.WindowWidth = 150;

            //var primes = Enumerable.Range(floor, ceiling).Where(n => Enumerable.Range(1, n).Where(m => (n / m) * m == n).Count() == 2);

            Console.Clear();
            _calculatePrimes(floor, ceiling, "C:\\100 digit primes.txt");

            Console.Clear();
            _calculatePrimes(floor, ceiling, "C:\\300 digit primes.txt");
        }

        static IEnumerable<BigInteger> Range(BigInteger fromInclusive, BigInteger toInclusive)
        {
            for (BigInteger i = fromInclusive; i <= toInclusive; i++)
                yield return i;
        }

        static void ParallelFor(BigInteger fromInclusive, BigInteger toInclusive, Action<BigInteger> body)
        {
            Parallel.ForEach(Range(fromInclusive, toInclusive), body);
        } 

        static void _calculatePrimes(BigInteger floor, BigInteger ceiling, string resultsFileFilepath)
        {
            using (System.IO.FileStream fs = new System.IO.FileStream(resultsFileFilepath, System.IO.FileMode.Create)) { }

            using (System.IO.StreamWriter sw = new System.IO.StreamWriter(resultsFileFilepath))
            {
                ParallelFor(floor, ceiling, i =>
                    {
                        if (_isPrime(i))
                        {
                            lock (sw)
                            {
                                sw.Write(i.ToString() + System.Environment.NewLine);
                                sw.Flush();
                            }
                        }
                    });
            }
        }

        static bool _isPrime(BigInteger number)
        {
            bool returnValue = true;

            Console.WriteLine("Checking {0} for primality.", number.ToString());

            if ((number < 2) || (number > 2 && number.IsEven) || (number > 2 && number.IsPowerOfTwo))
                returnValue = false;
            else
                for (BigInteger i = 2; i * i <= number; i++)
                {
                    if (number % i == 0)
                        returnValue = false;
                }

            if(returnValue)
                Console.WriteLine("         {0} IS prime.", number.ToString());
            else
                Console.WriteLine("         {0} IS NOT prime.", number.ToString());

            return returnValue;
        }
    }
}

Infinite ranges in C# 2

Posted by Jonas Elfström Tue, 20 Oct 2009 18:41:00 GMT

I recently learned that C# is compliant with the IEEE 754-1985 for floating point arithmetics. That wasn't a big surprise but that division by zero is defined as Infinity in it was! It actually kind of bothers me that I didn't know this.

In mathematics division by zero is undefined for real numbers but I guess Infinity is a more pragmatic result. Or as a friend put it "IEEE stands for Institute of Electrical and Electronics Engineers not Institute of Mathematics"

1
2
3
4
double n = 1.0;
n = n / 0;
if (n > 636413622384679305)
  System.Console.WriteLine("Yes it certainly is!");


This C# code does not throw an exception, it simply leaves n defined as Infinity and writes a line to the console.

Ruby is also IEEE 754-1985 compliant. It even lets you define infinite ranges.

1
2
3
4
5
6
Infinity=1.0/0
=>Infinity
(1..Infinity).include?(162259276829213363391578010288127)
=> true
(7..Infinity).step(7).take(3).inject(&:+) # 7+14+21
=> 42


I can't say I see very much use of this but it brings a kind of completeness to the handling of infinities. Unfortunately it seems we don't get that in C# out of the box because Enumerable.Range takes <int>,<int> as parameters and there's no Infinity definition for int. That's unless someone wrote a generic Range class. Turns out none other than Jon Skeet did in his MiscUtil. Download MiscUtil and then by using MiscUtil.Collections; you can:

1
2
3
4
5
6
double n = 1.0;
var infinity = n / 0;
var r = new Range<double>(0, infinity);
if (r.Contains(4711))
  System.Console.WriteLine("Yes it certainly does!");
var sum = r.Step(7.0).Take(3).Sum();


And guess what, it works like a charm! 4711 is part of positive infinity and sum is 42.0 and all is good.

Edit

There's also a couple of predefined constants. Thanks to Eric for pointing that out.

1
2
var r = new Range<double>(7,  System.Double.PositiveInfinity);
var sum = r.Step(7.0).Take(3).Sum();


Older posts: 1 2 3