The G95 project

G95 is a stable, production Fortran 95 compiler available for multiple cpu architectures and operating systems. Innovations and optimizations continue to be worked on. Parts of the F2003 standard have been implemented in g95.

Main website: http://www.g95.org

August 17

Dan Nagle reported a problem where C_LOC returned the wrong thing when given an argument of C_PTR. Fixed.

August 16

My old email address, andyv@firstinter.net address is no more. I was planning on letting it lapse the next time I got a bill, but I think they decided to get rid of me instead of letting me renew. The address had too much spam overhead anyhow.

June 17

Mat Cross sent in a valgrind issue (reading some free'd memory) inside the I/O library. Fixed.

June 16

Doug Cox has built some new windows builds.

Jens Bischoff sent in an old IBM fortran manual that describes some of the algorithms they used for the intrinsic functions. Good stuff.

John Harper sent in a correction to the manual-- the -fzero option only zeroes scalars.

Jürgen Reuter and Elizabeth Wong sent in a problem on OSX where initialized module variables were not being handled correctly. Turns out different versions of OSX handle these things.

June 15

John Reid sent in a problem with writing output from a running coarray program-- the output has to be unbuffered when writing to a pipe because otherwise everything gets written at the wrong time.

A little more work on erfc() today. Got it up to x=50 now, then had a pause because the last couple of digits ended up being wrong. Scratched my head for a while, then did the calculation side-by-side between the x87 assembler and the arbitrary precision python. The numerator and denominators of the rational approximation were fine, the quotient was fine out to an ulp. The problem turned out to be calculating exponentials on the x87.

I'm not talking about run of the mill exponentials, I'm talking about calculating exp(-1000). The way this is calculated within the x87 is to split the argument into the integer and fractional part, calculate 2^x on the fractional part, then scale by the integer part. For exp of -1000, the top twelve bits are the integer part. After the separation, the bottom twelve bits are garbage, leaving the bottom twelve bits of the exponential also as garbage. So it's unfixable without a lot of extra effort.

June 14

Michael Pock pointed out that g95 is too strict when complaining about complex constants with spaces between the sign and the real constant part. Fixed.

I've got the test suite going again. Had to update the sources in some cases because of errors that g95 now detects and in one case threw out an old version of the HDF library that calls C function in different ways leading to all kinds of errors. I'm sure the current version uses C interop features.

Evangelos Evangelou sent in a crash on array returns that has been fixed. This was the stare at it for an hour, change half a dozen lines and it works. I'm glad I have the test suite working again.

June 10

More work on erf/erfc(). I happened to run across a web page the other day that mentioned in passing that implementing erf/erfc is "nontrivial", which makes me feel better considering all the time I've put into it so far. I am up to |x| = 5. At x > 6.75, erf(x) is indistinguishable from one, and at about x > 110, erfc(x) < tiny(0_10).

The divergent series gets better for larger x, so hopefully only a few more intervals will be required. I am not so sure how good my approximations are, so I am going to set up some sort of automatic testing at some point. The work I put into reading and writing floating numbers accurately is paying off.

June 9

I was thinking a little today about yesterday's post on Brent's method for exponentials. Part of the reason I wanted to share this is because it is so illustrative of how numerical analysts really work. Sure, if you're calculating an exponential like this, you can always solve the problem by throwing more bits at it, but that isn't what Richard Brent did. By fiddling a little with the math, he found an equivalent representation of the exponential that involves the exponential power series with a value guaranteed not to have much precision loss, some squarings that don't involve much precision loss and a multiplication involving shifts with no precision loss at all. Beautiful.

A simpler example would be if you're writing a program involving navigation, where you'll likely end up needing (1-cos(w)) for w close to zero, which will cause a problem because cos(w) approaches one itself, leading to a huge loss of precision. You too can be a Richard Brent and remember that cosine near zero is 1 - w^2/2! + w^4/4! - ..., making the troublesome 1-cos(x) = w^2/2! - w^4/4! + ... . Which is fast and easy to calculate for small w. Your users never notice that your program doesn't freak when they enter coordinates near the north pole because of the care you've taken.

Michael Richmond sent in another INTENT(OUT) regression in that has been fixed.

Several people have reported problems with using g95 on the Snow Leopard version of osx. Alison Boeckmann has volunteered to help me with getting a build going.

John Reid pointed out some inaccuracies about coarray on the website that have been fixed.

Doug Cox has built some new windows and debian builds.

Matthew Halfant sent in a crash on real do-loops that has been fixed.

June 8

Fixed another 158-warning about INTENT(OUT) dummies not being set. Collapsed some duplicated code into a common subroutine. It looks like some of the remaining issues are in fact problems with the suite and not g95.

Got more time in on erf(x)/erfc(x). Turns out the way to calculate erfc(x) at large x is that divergent series I was sneering at the other day. The trick is to recognize that it only works at large x and to quit while you're ahead. So from zero to one, erfc(x) is just 1-erf(x), where erf(x) is the power series. From one to ten, erfc(x) is the continued fraction, and from ten on up the divergent power series. The power series starts working at about nine, and the continue fraction gets painful at ten. The two agree perfectly from nine to ten.

I also had to spend some fixing the calculation of the exponential to use Brent's method instead of the usual power series, which starts losing bits for negative arguments less than around minus ten or so. To calculate erfc(100), we need exp(-100*100) = exp(-10000). Might seem excessive, but kind 10 and 16 numbers have go down to 10^{-4700}.

Brent's method (one of his lesser known) works like this: Let r = 2^{-k} (x - n log 2),

where x is the exponential we want and n is the number of bits of desired precision. Pick an integer k such that 0 < r < log 2. This is done by computing the difference, then pulling the floating point number apart into mantissa and exponent in order to figure out k and r are. Solving for x, we have x = r 2^k + n log 2

exponentiating both sides, exp(x) = 2^n exp(r)^{2k}

Since 0 < r < log 2, it is amenable to the power series. Raising exp(r) to the 2k-th power involves squaring k times. Multiplying by 2^n involves shifting, which in floating point land is just adding n to the exponent. The beauty of this method is none of the additions or subtractions that lose bits.

June 7

Fixed a spurious warning for an INTENT(OUT) dummy parameter not being set.

Michael Richmond and Frank Muldoon reported a regression in dummy procedures that has been fixed.

Damn... this keyboard is sooooo smooth.

June 6

I have a new keyboard for my laptop. The old one has been getting worse for years. The 'd' key came unstuck a while back and I glued it back on not very well. I had ordered a new keyboard from Singapore, but it turned out to be for a slightly different model. The risers were smaller than the old one for some reason and the existing screws wouldn't reach.

The new one languished until yesterday when the ENTER key got stuck. This happens every now and then, and blowing or shaking usually removes whatever foreign matter is in the way. But half an hour later, no luck. So I took a look at the new one again. Today I went off to the hardware store, and I bought a pair of longer screws and nylon spacers. The spacers are scotch-taped to the too-short risers and the longer screws work fine. The new keyboard is solidly in place, and it's like typing on velvet.

I think that is about the last moving part in the laptop that has been replaced.

June 5

I realized that the glibc implementation of erf() isn't as good as I thought. For larger x's, you calculate erf(x) by computing erf(x) = 1-erfc(x). There is an asymptotic expansion of erfc(x) that looks like:

erfc(x) \approx exp(-x^2) / (x \sqrt\pi) * (1 + PowerSeries(1/x^2))

The thing is, the series digverges. For all x, no less. The first couple of terms in the polynomial help, but then you get the divergence problem. The glibc implementation actually uses:

erfc(x) \approx exp(-x^2 + R(1/x^2)) / (x \sqrt\pi)

Where R(x) is a rational approximation. Putting the polynomial in the exponential replaces a multiplication with an addition, but there isn't a particular reason for using a rational approximation in 1/x^2. I tried a rational approximation in x, and it worked a lot better.

I also figured out why the mysterious "#ifdef DO_NOT_USE_THIS" are there. This comments out the Horner's rule evaluation and the new code has a set of linked partial Horner's rules. I originally thought there was some numerical reason for this, but re-enabling the code didn't produce any signficant differences. Turns out that the code was optimized for vector processors and the partial Horner's rules have the potential for allowing a little parallelization to happen. Doesn't affect me for the kind=10 reals in x86 assembler, or in the quad precision, where even a lowly addition is many processor instructions.

June 3

Got the continued fractions for erf() and erfc() going. The continued fraction is a little faster for erf(), but it turns out that having a way to calculate erfc() is the real benefit here. Subtracting 1-erf(x) doesn't count, because of the insane precision required.

These functions are real pains in the butt. The problem is that the erf(x) power series is an alternating series where the numerator of the terms is a power of x and the denominator is a factorial. The factorials eventually dominate the powers, but for larger x's that can take a while. In the meantime, the partial sums get huge, meaning that catastrophic cancellation is required to get the ultimate sum back to one. This means that in order to get 64 or 112 bits of some value of erf(x), many more bits of precision than that are required.

Using the continued fraction to calculate erfc(x) at large x's similarly requires a vast number of bit of precision, but not as many. It looks like this is going to be so slow for large x that I'm going to have to use a compiled arbitrary precision library instead of my hacked python version.

I got the new test program running on ox. I had to adjust ox's fan settings, too. There are several that you can set in the bios, and I had it set on the slowest/quietest. The motherboard will automatically go to the next highest setting when the CPU's get hot, but for some reason, only one setting. About a minute into the run, I get the beeping that indicates overheating. At the next highest setting, the fan automatically bumps up to the next setting which appears to work.

I ended up with a 12x speedup instead of the 20x that I had hoped for, but the total elapsed time is short enough to keep testing interactive. The granularity is one cpu per test directory, which worked fine once I put the largest directories first. The other way of doing this, a per-file granularity would be much more complicated since some files have to be compiled to create required modules. After a little fiddling, it looks like all of this complication would only result in saving about twenty seconds or so, which isn't worth it.

June 2

In the ongoing project for computing erf(), it turns out the best way to compute erf() on certain intervals is as 1-erfc(). There are a couple of common series for computing erfc(), but none of them practical, as far as I can see-- they call for evaluating factorials of half-integers. So ok, I wrote the factorials in terms of gamma functions, pulled out the factor of \Gamma(1/2) = \sqrt{\pi}, ended up with a series that had cofficients in terms of Guassian "chooses", which led right back to half-integer factorials.

This isn't for evaluating erfc() within the library, this is for evaluating erfc() in order to fit a rational approximation to it. After casting around for another way, I hit on evaluating upper and lower gamma functions (which lead straight to erf() and erfc()) via continued fractions. I'd sort of wondered for years how you actually evaluate a continued fraction, and it wasn't long until I got Lentz's algorithm working.

In this method, you pass a generating function to the Lentz evaluation subroutine. This subroutine calls the generating function with an iteration number and the generating function returns a pair of numbers that are the numerator and summand of the corresponding denominator. The Lentz evaluator keeps asking for more terms until the fraction converges. Slick, but a little complicated when it comes to writing the generating function.

Several bugs that people have been reporting are regressions against earlier versions of g95. Regressions are always something I have to watch out for. For a long time now, I've haven't been very good about running the test suite. After giving the matter some thought, I realize that this is because it takes way too long to run.

Right now, the suite runs on the laptop that I use for g95, a 1.3G Pentium-III. The plan is to move it to ox, my quad core Xeon. I've seen a speedup by a factor of five for the python program I've been using for the floating point library work, and with ox's four stomachs, I'm hoping for a factor of twenty.

I've written about 200 lines to make the testing program multicore, in the same way as the build script. Now I have to debug it, and it's always a pain to debug child processes where the standard descriptors have been redirected.

May 31

Got Remez's algorithm working. Turns out the 'e' in the prior post is not an input, it is an output. You take one more x and solve for e as part of the linear system. This make a lot more sense. You get an e out that gives you a ballpark idea of how good the approximation is. The method isn't perfect in the sense that if pushed too hard, the rational approximation will get an extra oscillation on the edge far outside the desired tolerance. You have to check the final approximation, but it looks like the basic algorithm will work fine.

I've got three intervals working for kind=10 erf(). There are a few more left to go.

There is a neat trick buried in glibc that I've appropriated for this work. Viewed as an integer, real numbers have the sign bit as the most significant bit, followed by a biased exponent, followed by the most signficant bits of the mantissa. If you load the top word of a real number as an integer, you can do integer compares to determine if a number is a not-a-number, infinity, or even compare with numbers that have 16-bit mantissas.

This works because the exponent is biased-- ie an exponent of zero really means a value of -e_max. This was meant to simplify calculations in hardware, but there is nothing to stop us from doing the same thing with software. You can't compare for equality without comparing all the low bits of the mantissa, but you can get a sort of "lower bounds" comparison, which works fine for figuring out which interval to use.

For kind=16 reals, since comparisons like this are vastly faster than polynomial calculations, I suspect that I will end up using a large number of crude approximations instead of a small number of high order polynomials.

May 24

Tony Mannucci pointed out that the debug symbols were still present in the libf95.a builds. These aren't a lot of use to the end user, and take up about a megabyte, so I added a line in the build script to strip the symbols.

Doug Cox has built some new windows builds. Arjan van Dijk had a problem with library paths on windows that Doug has fixed.

May 23

Out sick for the last couple of days. Feeling better now.

I have continued to make sporadic progress on the extended and quad precision transcendental functions. I have Remez's algorithm almost working. I think I'm having problems with poles, though.

A lot of transcendental functions are implemented by rational approximations over various intervals of the functions in question. These look something like:

         p0 + p1 x + p2 x^2 + ... 
  R(x) = ------------------------ \approx f(x)
         q0 + q1 x + q2 x^2 + ...

Which can be tuned to the function be approximated-- in my library q0 is always one, and you can specify which p's and q's are zero up front. I can get a straight polynomial approximation by not using any q's.

If you pick a set of as many x's as you have unknown p's and q's, in the interval that you are approximating the function over, you get a system of linear equations that you can solve for the p's and q's.

Where do the x's come from? Since this is a library for lots of people to use, we want the best approximation possible and a standard way to go is to minimize the maximum error, the minimax solution.

By construction, the approximation is equal to f(x) at the x's that you pick, and R(x) oscillates between too big and too small in the middle. So you pick a tolerance, e, smaller than the precision of the target numbers and fit R(x) to f(x1)+e, f(x2)-e, f(x3)+e, ... over the interval in question.

There is a theorem that says that you've got the minimax solution when the values of the maximum deviations (|R(x) - f(x)|) are all the same. If the deviation is always less than e, then you've got something that will approximate the target function to precision e.

Remez's algorithm picks an initial set of x's, fits to get the p's and q's and finds the x's of maximum deviation. These points are bracketed by the original x's, so something like a golden search (Brent's method) can be started right away.

Once you've found where the maximum deviations are, use these for the new set of x's. Repeat. The x's are supposed to converge quadradically, which isn't happening for me yet.

Once I get this machinery in place, the work will be much more of a turn-the-crank operation: find the best approximation, implement it, repeat.

May 19

Jacques Lefrere pointed out that argv[0] was being excluded from the GET_COMMAND intrinsic. Fixed.

May 14

John Harper pointed out a typo in the manual (SECNDS() is REAL, not INTEGER). Fixed.

May 13

I've been stealing a little time here and there to work on implementing the transcendental functions for kind=10 and kind=16 reals. I've looked at the glibc sources and am confident that I can adapt many of the methods for both kind=10 and kind=16.

One required component of doing something like this is high precision arithmetic. After all, suppose you have your brand new subroutine for calculating Bessel functions for kind=10 reals. How do you test such a thing? Why, higher precision than kind=10, using slow but simple power series that you'd never use for the library functions.

I've taken the time to write an arbitrary precision floating point library in python. Like the one in C buried inside g95, it uses big integers as its basis. There is no particular reason for speed, so it's better to write something that is easy to read and maintain. The basics: addition, subtraction, multiplication, division and comparison are done. I spent a bit of today writing a subroutine that takes one of these numbers, rounds it to a 63-bit mantissa number and generates an assembler initialization expression for the number, ie something like: .byte 0xD3,0x66,0x14,0x30,0xA7,0xD6,0xEF,0x6B,0x7B,0x3F ;; 3.1415900001111E-40 that can easily be pasted into a larger code. After I get this going, I'll end up creating programs for each of the transcendentals that need to be done, for both implementation and testing.

May 12

Jürgen Reuter sent in a problem with the FLUSH statement in IF-clauses that has been fixed.

John Harper pointed out that the intrinsic SECNDS extension was not supported for kind=10 reals. Added that.

I've finished implementing SMP coarrays for OSX. Both x86 and ppc versions are supported. They work the same way as with linux, just add a "--g95 images=10" when running your program to run it with ten images.

May 7

Took my own advice, and deleted everything, starting over. Ran right into the same problem, though. After some investigation, the problem turned out to be some apple hackery trying to make C string functions safe. Disabled that. Now things work fine. Got everything working shortly after that, until the "indirect jump without *" problem. Inserted the * in the right place and now it looks like it works.

May 6

Finished the ppc/osx version. Although the old version is archived just in case. OSX always requires exceptions, special cases and outright hacks to get it working right. I got a good start on x86/osx, but have run into a snag that causes mysterious compiler errors. Not totally sure where the problem is, but I suspect that the way to handle this one is to delete what I've done and start over.

May 5

Still on the wonderful OSX experience.

May 4

Working on getting the OSX build going again.

April 28

John Harper sent in a typo in the manual that has been fixed. He's also pointed out the need for a serious revision.

Doug Cox has built some new windows and Debian builds.

Reinhold Bader sent in a weird crash involving the array form of THIS_IMAGE() that has been fixed. The weird corner case was the single image case.

April 27

Jun Chin Ang sent in a question on real loop variables, and in preparing a reply, I discovered a crash on real loops variables, having to do with the recent fix to loop variable types. Fixed now. Don't use real loops, though. Really.

Doug Cox has built some new windows and Debian builds.

I guess I've completed the desktop upgrade-- I finally put its skin back on, put it back into its cubbyhole and have resumed regular backups.

April 26

Stefan Birner, I wrote you back, but your ISP seems to have decided (over the weekend!) that my ISP are bad dudes. No big message, I just wanted to encourage your efforts.

Reinhold Bader sent in a problem with the THIS_IMAGE() intrinsic in single image mode that has been fixed.

April 22

Doug Cox has built some new windows builds. The problem with the crash in the options processor looks like it has been fixed.

John Reid reported that his SMP coarray problem went away. Reinhold had success on ia64 and x86-64. I'm very happy. A hard debug session that produces results is always very satisfying.

I spent a some time porting some of the recent fixes to SMP coarrays to the nascent windows version. The code is similar in some ways, but some fixes just don't apply because of the wildly different approaches that have to be used.

April 21

Reinhold send in a coarray crash on ia64 that has been fixed. This was a nasty bug to find. Reinhold's institution is in Germany, and the connection from here (Arizona) is about 10k/sec. The debug line numbers are way off, somehow, reducing the debugging to print statements. But after a couple dozen tries, I found it. We're hoping this fix will fix another bug reported by John Reid.

Larry Wetzel, George Madzsar, Xavier Leoncini and most of the gg95 newsgroup have seen a problem with the windows versions dealing with a crash in the options processor. Doug and I have been fiddling with this for a few days now. I caused it when I tried to remove a deprecated option (-I-). Although half of the effects of -I- are doable with -iquote, that isn't the property of -I- that I was relying on. I think I've got a workaround for the problem, fingers are crossed. Doug seems to have fixed the problem manually, new builds are up.

April 20

Doug Cox has built some new debian builds. We've been working on a crash in the options handling introduced the other day when I finally got rid of the -I- in the builds without really remembering why it was there. I think it was for the windows build, but the option is deprecated in favor of -iquote, and it's time to move on.

John Harper sent in a new problem with IEEE_CLASS derived types that has been fixed.

I got a neat new script working for the first time. I was working on another bug (that eluded me today). The bug apparently only bites on 64-bit machines, so I wanted to use ox for testing. Problem was, I wasn't actually at home. I'd run into this problem the other day, so I had my wake-on-lan script ready. The way this works, you open a socket and send a broadcast packet that contains the MAC address of the machine you want to wake up in a special format. The target machine is mostly asleep, but when the network interface reads the packet with its own MAC address, it wakes the rest of the machine up. The reverse command, is of course, "poweroff".

April 19

John Reid and Reinhold Bader pointed out that J3 renamed the coarray ALL STOP statement to ERROR STOP at the most recent meeting. I held off on making this change, but J3 has since taken a vote that freezes the whole draft more than what it was. I've gone ahead and made the switch. The changes extend internally from the compiler to the libraries. The ALL STOP statment will continue to be accepted as a synonym.

John Harper and Nicholas Peshek reported problems with the x86/linux version, specifically a problem not finding glibc-2.7. It turned out that the x86 build was building against the libraries on my desktop system. It didn't suffice to compile against an old library, the newer compiler generated a dependence on the newer glibc. So I ended up compiling an older version of gcc and building what was essentially a cross compile environment for the older library.

The new build environent is the old environment. Instead of copying headers and library from the old system, I ended up copying files from the old backups, since the old opensuse 10 is on a toasted disk. I really hate pointless upgrades, and hate to force others to upgrade when unnecessary.

The most notable thing about the old compiler is how fast it is. Gcc continues to bloat, and with bloat comes sloth. No one notices this because machines are faster.

April 18

Reinhold Bader and John Reid sent in a problem with allocatable coarrays, turned out to be two separate problems.

Cuneyt Sert sent in a broken link on the blog that has been fixed.

Michael Richmond caught some debug code that had been left in from Reinhold and John's bug from the previous night. This was a tough bug to track, since the problem reared its head only after enough images were enabled. The debug code is gone.

Michael also sent in some regression to the CALL statement caused by recent fixes.

Doug Cox has built some new windows builds. The 32-bit debian build is broken at the moment. We are investigating.

April 17

John Harper sent in a subtle problem with the IEEE_ARITHMETIC module. The IEEE module have a series of intrinsic derived type for various kinds of numbers, etc. The .eq. and .ne. operators are defined for some of these derived types, and you can now USE these operators like user-defined operators.

April 16

Yuri Sohor sent in an illegal coarray code-- he had a derived type that he was assigning across images. On a regular code, this causes allocations to be copied. On a coarray code, this would have been hideous to implement. Fixed. It turns out to be legal to pass around derived types that contain pointers, but the pointers are considered to be undefined on different images.

I've replaced the reporting of signal numbers on a crash with signal names for the more popular signals.

Michael Richmond sent in a regression involving subroutine calls. The problem was the recent fix to calling host associated procedure pointers. Got it all fixed now, I hope.

April 15

Reinhold Bader sent in a bug with the deallocation of allocatable coarrays that has been fixed. This was the same bug I was working on when my hard disk started its death-rattle. It's actually quite a good thing that a disk starts making noises before it fails. Also fixed a problem the allocation of coarrays in derived types.

April 14

Predictably, today was mostly spent working on getting g95 building on the laptop. I've removed the -I- directive in the build-- I can no longer recall why it was there, and it seemed to mess up finding the system include files. The real struggle was getting the library to build. I'd have thought this would be easier because the configuation scripts are more straightforward. But of course the versions of autoconf have changed and subtle changes were apparently required. I'm finally back to development.

April 13

A long struggle today to get the wireless card on the laptop going. My laptop has an internal wireless card that has never worked, even under windows. Turned out that was interfering with the pcmcia card. The windows side downloaded another set of updates, bringing me back up to SP3. It's a weird thing to apply the updates and see that there are... 71 of them.

The upshot is that the laptop is ready to host g95 again. I like it this way because I can work on g95 even without a connection to the network. If I do have a network connection, I connect to my desktop via the vpn to access mail.

I'm also going to use the opportunity to migrate some things back to the desktop, like my lilypond files. The laptop has the worst display, and is consequently the worst place to do music engraving. I'm toying with the idea of not even installing X windows.

April 12

Been sick all weekend. To add insult to injury, I've been back to upgrade hell, this time with the new disk. This time was worse, because I use the laptop as a dual boot system with windows and linux. Spent yesterday (in my reduced capacity) trying to install windows. The laptop came with OEM installation disks for windows. Because the disks are OEM, they weren't particularly polished, and the partitioning part didn't work. Because I have the arch linux disk, I could easily look and set the partitions. The OEM would install, finish, reboot the system... then lock up on reboot.

I tried and tried and tried. Half an hour to go through the whole 3-CD install and I must have done it a dozen times. The key was the realization that the installation software expected an NTFS partition and wouldn't even mark it bootable after it ran. Ergo, the OEM software expected the partition to already be in a particular state, more than just a bootable flag. I ended up installing an even older windows 2000, to where it would boot itself, then wiped it with the OEM XP. It worked-- win2k must have installed a part of the bootstrap loader that the OEM installed wasn't.

Of course, once windows is going, it's time to get unix going. The reason is that unix can live on other partitions and its installer takes care of snarfing up the windows boot code, doing the dual boot thing.

The Arch Linux installation went smoother than usual, the third time being the charm. Arch does have this problem with downloads getting stalled and being downright passive about retrying stalled downloads. After the nth try, I got the basic install done. The nasty surprise was on the first boot. The kernel came up, the first couple of items in the initialization scripts ran, the screen went blank and stayed that way.

Now this was a fine pickle. The only way to go from here was to boot from the CD, mount the drive, edit the startup scripts and reboot. The CD wasn't real reliable on the laptop, making the whole process vastly more irritating that it would normally have been. Thought about giving up, but the CD worked fine, if the laptop saw it. Eventually tracked the problem to something in the udev demon. After some head-scratching, I guessed that udev was forcing some buggy module to load.

After guessing wrong with some manufacturer-specific modules, it hit me-- if the screen is going blank, the problem must be in a screen driver. But how to list modules on a system without a screen? Why, over the network of course. Small problem, though, the base install doesn't include things like sshd, and you can't see what you're typing, so you can't log in remotely!

So, how to fix that. Turned I out I had a copy of the telnetd source code. It's small, about 2k lines of C. Compiled it on my desktop system, scp'ed it up to the g95 website. Loged into the laptop, blindly, carefully typed a wget command, then chmod it to make it executable, then run it. Try telnet from the desktop and whattaya know, it worked the first time. Login as root, permission denied. Ah. Gotta love these modern security measures. The only way around that one was a user account. Had to create a user account on the laptop, typing blind again.

With a working telnetd, I finally got to do an lsmod, and it turned out that there was indeed a display driver, i915 being loaded. I blacklisted the module and things worked a lot better. It was downhill after that, just regular installation of things that aren't there.

After some success on unix, I moved back to windows, basically an XP with sp1 and IE6. Upgrading windows went smoother-- you just let the updater tell you about the critical security flaws, download, install and reboot. Repeated that four times before it go to SP2. That went in and it doesn't want to move on to SP3 for some reason. While I was staring at IE6, wondering how to upgrade it without getting pwned, I decided to go with Google Chrome. Very nice. I've got skype, putty, wireless networking and am on my way.

Not bad for an otherwise wasted day. I've been more lucid for about five hours now and hopefully got this cold licked.

April 11

Got all of the data off of my laptop. Even if I'd lost it, the difference between what was there and the most recent backup would have been trivial to redo. I have a new disk for the laptop, but have been battling a cold the whole weekend.

April 10

Doug Cox has built some new windows builds.

I've added support for floor() and ceiling() for kind=16 reals.

No build today. I was working away on g95 when I noticed the hard drive on my laptop (where g95 lives) started making bad noises. Checked the logs and read errors are starting to happen. Time for a new drive. So I've shut down for the night to cool it off. First thing tomorrow, I'll power it up and copy everything off of it. Same procedure as the usual backup, except I'll skip compression to get everything off as fast as possible.

The laptop and drive are 7-8 years old, so it's not that unexpected. I actually have new keyboard on the way, since several of the keys don't work so well.

April 8

John Harper pointed out that NINT() was not implemented for kind=16 reals. Got that done now. Also noticed some internal rounding problems with kind=16 reals that have been fixed.

John Reid sent in a SMP coarray regression that has been fixed.

Reinhold Bader sent in a crash on an illegal (coarray) program that has been fixed.

Christian Speckner sent in a problem with procedure pointers in internal procedures that has been fixed.

April 7

Doug Cox has built some new windows builds.

John Harper sent in a problem with Bessel functions being generic that has been fixed.

John McFarland sent in another problem with abstract interfaces. This time, the problem was that the abstract attribute wasn't being propagated through modules. This necessitated a change in the module version-- the new modules produced by g95 won't work with previous versions.

April 6

Doug Cox has built some new windows builds.

John McFarland sent in another unique regression with abstract procedure pointers that has been fixed. My John's original problem was a little too broad, but we're getting there.

Michael Richmond helped clear up a lot of confusion in the various coarray packages. If you compile a coarray program with the g95 in the g95-cocon-* packages, you get a network version, meant to run under the coarray console, which builds the network.

The regular downloads have the SMP versions, which are accessed by the --g95 image=x option. Running the network version with the --g95 image=x option will now print a helpful warning message. On top of all that, the SMP version was not being compiled into the x86-linux build. The regular builds having the SMP version are: x86-linux, x86_64/linux, ia64/linux and alpha/linux.

Reinhold Bader continued the SMP coarray shakedown with a problem with allocatable coarrays in single-image mode and some addressing weirdness on IA64. Both fixed.

April 5

John McFarland sent in a regression with abstract procedure pointers that has been fixed.

Michael Richmond and I were getting pretty confused about version numbers, until we realized that the version numbers were not propagating into the library version number. Got that fixed and added a new documentation line describing the new '--g95 images=' option.

Martien Hulsen sent in a problem with MOVE_ALLOC() that has been fixed.

April 4

John Harper sent in a bug with NINT(), which returned an incorrect result when converting a kind=10 real on x86-64 to a kind=8 integer. The problem was ABI conventions-- x86-64 returns kind=8 integers in the rax register, not the edx:eax pair on x86. Fixed the analogous problems with FLOOR() and CEILING().

John McFarland sent a crash on an illegal procedure pointer declaration that has been fixed. John also sent a correct program that was giving spurious errors for a procedure pointer assignment, also fixed.

Doug Cox built some new windows builds for MinGW and Cygwin. Doug is discontinuing the builds against gcc 4.0, because of compatibility reasons against windows 98.

Piet Wertelaers sent in a problem with TRANSFER() that has been fixed.

I've fixed the web page on the sourceforge site. Normally this is kept automatically in sync with www.g95.org, but my desktop upgrade caused a change in the ssh key. Sourceforge now has the new key.

John Reid sent in another bug with SMP coarrays that has been fixed. This one had to do with passing coarrays though dummy arguments.

Johan Klassen sent in a gift-- US $80.00, for beer. Thanks Johan! It won't all go for beer-- Four Peaks has delicious beer bread that I am partial to.

April 1

Pedro Lopes and Delbert Franz reported that the new Debian package didn't. I had to change the upload script, and some status messags were intermingled with the archive itself. Fixed now.

Reinhold Bader and I were discussing a bug that seems to have vanished. Reinhold did supply several libraries that have enabled me to build the coarray console on IA64 again. The gateway machine at Reinhold's institute is a 16-core IA64. There are rules against using it for computation, but it was 0-dark-thirty, and it was cool to run the monte carlo pi calculation on all 16 cores.

Doug Cox has built some new Debian packages.

I've also decided that the 'g95' name is getting too antiquated. From now on, it will now be known as 'h96'.

March 31

John Reid sent in another pair of problems with SMP coarrays that have been fixed.

Reinhold Bader sent in a problem with allocatable coarray components that has been fixed.

Jimmy James pointed out (from a conversation on gg95) that the trip counts for kind=8 do-loops were limited to the default integers. Loops with kind=8 variables now have kind=8 trip counts, this is not general-- loops with kind=1 iterator variables will still have default integer trip counts. Doug Cox built some new Debian packages.

March 30

Michael Richmond sent in the necessary files to update the alpha build from Debian 4.0 to 5.0. Things went smoothly. Since it was linux, it was easy to add support for the SMP coarrays.

Reinhold Bader sent in a missing startfile that let me complete the IA64 build.

March 29

John Reid (all hail the convener), sent in a problem with the SMP coarray implementation. Writes by separate images to standard out can't overlap the records that they write, so there needs to be a mutex on standard output. The network version multiplexes the output in a different way, leaving a lock unnecessary in that case. John also sent another problem with allocatable coarrays that has been fixed.

Elliott Estrine pointed out that the new linux build was against glibc-2.7, which causes a problem for people on older systems. I've switched the linux build back to glibc-2.6.

Jürgen Reuter sent in nasty bug involving initialization of recursive structures with multidimensional arrays that has been fixed.

March 28

Nick Yas'ko sent in a problem with bad USE statements that has been fixed.

John Harper sent in a problem with the solaris build-- the solaris linker complains about unaligned debugging relocations. Fixed by compiling without debugging symbols.

John lives in New Zealand-- the new version of pine that I'm now using lists the date on his mails (correctly) as 'tomorrow'.

March 26

Martien Hulsen, Maarten Becker and Michael Richmond reported a problem with the x86-64 being unable to find the system startfiles like crt1.o, etc. This was an oversight on my part-- the x86-64 build is compiled using my own x86-64 system which runs a new Arch Linux distribution. This distribution is a new-style system which does not have support for legacy 32-bit programs, so the startfiles are in /usr/lib. I've fixed g95 to search /usr/lib64 before /usr/lib when searching for startfiles so that things will work on older systems.

March 25

Jacques Lefrère pointed out a nasty problem with the download directories that has been fixed.

March 24

Firmly stuck in mail hell today. I thought I had the mail set up, but when I download a letter, it gets appended to mailbox in a different format than the other letters-- the 'new' format involves the list of the servers that the mail has passed through, and that doesn't quite jibe with the mbox format. I thought I'd lost the last couple of letters, but they just appear glued onto the last message that pine thinks is there. No clue on this one for now.

Update-- I found formail... Mail is just a hideous hideous problem. You get to be an expert at it when you get your system going, then forget everything you learned and have to re-learn it when you upgrade your system...

March 23

Finished work on the new build system.

I've unfortunately deluded myself about the state of my mail setup. I neglected my spam filter of choice, popfile. I can send, but can't quite get mail correctly yet.

March 22

Lots of new stuff.

New build is up.

New version of g95, now 0.93.

I've finished the 2.0 version of the build system. It does all that the previous one did and much more. It's doubled in size, from a thousand lines of python to just under two thousand. The old one built by forking a subprocess for each build, logging into remote machines, building g95 and uploading the result. This capability is still intact, but I am now relying much more on local cross compiles.

A "cross compiler" is a compiler that runs locally on one machine, and compiles a program meant to run on another machine, the 'target' machine. Cross compilers are tricky to build, and require a lot of insight. At the end, even a relatively easy build was taking about three to four hours to complete. I now have nearly a dozen of these, and they build g95 just fine, except for a couple systems I don't quite have access to. A few systems remain as remote builds for now.

It's impressive to watch a full build taking place. The complete build takes about the same time as it did before, the slowdown not coming from a particularly slow legacy machine, but from the sheer volume of simultaneous processes running. The load average peaks around eight, memory usage increases by about a hundred and fifty megs or so. By modern standards, cheap.

New targets include freebsd7 for x86-64 and opensolaris for x86.

A minor sub-innovation I'm pleased with is in the creation of the tarballs themselves. The tar format is quite simple and instead of putting everything into a temporary directory structure and running 'tar', the structure of the tarball is instead described by a data structure, which is traversed to create the tarball. Compiling up a tarball, if you will. This approach avoids a lot of unnecessary copying of files all over the place, lets me create the 'owner' of the files as 'g95' without actually having a 'g95' user on the system, makes all the timestamps identical and creates symbolic links out of nothingness. The gzip process is still done with 'gzip', though.

The new build system also integrates another new feature-- SMP-only coarray support. This version won't operate over networks, but it is free for use by anyone and is in the new version. There is currently a limit of 32 images hardcoded into the library, but I would be happy to compile another version if anyone is using some monster system. I will probably remove the limit sometime soon-- it just makes the first version easier.

SMP coarrays are currently available on x86/linux and x86-64/linux. Other unixes shouldn't be that difficult. I am planning a similar version for windows machines. I started with a copy of the unix library and have started work on the port, although there are so many changes that calling it a 'rewrite' is probably more accurate. I've spent quite some time poring through msdn and have figured out how to do everything. It's pretty much the same-- the coarrays are stored in shared memory, pipes and mutexes will be used for the various synchronization primitives. The orginal process spawns all of the images and so on.

This also required a change in how g95-compiled programs handle the command line, since that is the easiest way to specify the number of images you want. As it was, the special command line argument --g95 caused a printout of the fortran runtime settings to be printed instead of running your program. Now, the --g95 signals the start of arguments that you want to pass to the runtime library. The special argument -- indicates that all following arguments should be passed to your program. If a --g95 is given and no additional arguments are given, then the runtime variable settings are printed.

For example:


./a.out --g95

./a.out --g95 images=10

./a.out --g95 images=10 -- a b

./a.out a b --g95 images=10

./a.out a --g95 images=10 -- b

./a.out -- --g95

I was running the monte carlo caluclation of pi in the compendium on ox, my quad core xeon system. Things were going just fine for about ten minutes, until I heard that european two-tone police siren noise and the flashing red LED that ox does when it is getting a little too hot. I had set the fan on the lowest setting for noise abatement reasons, and the fan does go to the next level when things get hot, but with all four cores going at once, it needed the third level. I also took some time to install the newfangled CPU temperature monitors.

My system upgrade still isn't quite done. The old disk is mounted on the new disk. My home directory had become quite flat over the years (ie lots of files in it), and I've been working on making it more tree like.

March 2

Download, list, install, remove, edit config files, search the web for docs, read docs, give up, start again. Oh man. My life for the last week. This is why these things get put off. The system is coming together, though. I'm using Arch Linux, and have been pretty happy with it. The basic installation gets you to a bare bones system where you get to install the individual packages. I've never been very happy with Suse on my laptop (where g95 currently lives). It's more of a kitchen-sink distribution. Arch on the other hand is a million pieces, but they come together surprisingly well.

Your desktop system should fit you like a soft leather glove. Not a steel guantlet. The old system was more like a dried-out leather glove that has become too stiff to be useful.

The video system had a nasty problem that took a long time to find. If a character had a blue foreground, there was this weird line through the letter, always at about the eight row from the top. After much head scratching, I found out that there is an 'underline line' VGA registers. Apparently this was getting set to eight somewhere, after I set it to 31, the problem went away.

The wonderful console system I described in the last post doesn't work. Although the monitor reports being in 1280x768, the monitor displays only about 148 columns (ie 1184 dots). After much head-scratching, I remembered what happened last time-- after running in that mode for a while, the monitor eventually figures out that there is a better way of displaying things.

I've got ssh. I've got compilers. I've got X. I've got printer support. The mpage program is gone, but a2ps replaces it. My custom vpn is going. Gpm is working, after I had to re-enable cross console cutting and pasting. I have enlightenment (a window manager), but haven't figured out how to configure some keyboard shortcuts that old time irix users would recognize. I've got firefox-- the mouse wheel now works (yaay!), which it didn't with the ancient RedHat 6/Mozilla. The old system has netscape 3 running on it, and works amazingly well, rendering pages that the old mozilla would lock up or crash on. But the new firefox renders them all. I've got DRI going. I've got editors-- vi, nano... and emacs after a struggle.

The mail system is in place. Under the new system, I'll get and receive mail from g95.org, which was another reason for the upgrade. The old system looked like:

Outbound: Me -> pine -> qmail -> ssh tunnel -> g95.org ISP -> You

Inbound: You -> firstinter.net -> fetchmail -> inbox -> pine -> Me

I had qmail built, installed and was dreading the configuration when I realized that I didn't need it. I think I was using it mainly to buffer outbound mail if the connection went down. A couple years ago, that went badly when the ssh tunnel went down instead, and I didn't notice it for a week until my outbound mail finally started bouncing. The new mail system looks like:

Outbound: Me -> alpine -> ssh tunnel -> g95.org ISP -> You

Inbound: You -> g95.org ISP -> fetchmail (via ssh tunnel) -> inbox -> alpine -> me

The reason for the ssh tunnel is that the g95.org's ISP (bluehost) doesn't let just any schmoe connect to port 25 to relay their mail. This has to happen locally as far as they are concerned. The tunnel on the inbound leg isn't strictly necessary (although IMAP sends passwords in cleartext), but the tunnel is already there and a little encryption never hurts.

The pine program is gone, but alpine is its new incarnation. Arch is pretty secure by default because it runs no daemons, not even inetd. Nevertheless, I have a firewall rules because I don't want ssh's port exposed to the nasty internet-- I've got ssh also listening on a nonstandard port as well as the usual 22 for internal requests.

There are things to do, but the end is in sight. I need TeX, which is now 'TeX Live!' instead of the venerable TeTex. I'm really looking forward to having lilypond on the desktop with the huge screen instead of just on the laptop with the worst screen. Redirecting X doesn't work well enough.

More important-- g95's new build system is taking shape. It'll be a major extension of the old one. I intend to use the infrastructure I'm building for that for other projects as well. One of Joel Spolsky's 'Joel Tests' is whether you can build your software with a single command or not. That used to be the case with g95, but it's slowly drifted away from that, and it is time to bring that cur to heel. Down boy!

February 19

More upgrade hell. The basic system boots now. It's an Arch Linux system-- modern but lightweight. OK package manager too. I've been running it on ox (my quad core xeon-- it has four stomachs) for a while now. One thing I really like about it is that the total boot time is about eight seconds. Why am I refurbishing a P3 with a third the clock speed of the quad core xeon? I have other plans for ox.

I spent most of the time today getting the video mode right. This is more important than it sounds. If you're going to be staring at a screen for a long time, it's really important to have it be readable. I have a really sweet 25-inch monitor that I use which can support 1920x1280, and in X, that is what it runs at. If you tell linux to use the framebuffer in the console modes, you can get a 200x75 text mode display that is very crisp, but just a little too tiny for lots of work. The real disadvantage of the framebuffer is that scrolling takes a visible amount of time.

On ox, I had a program that spit out about a screenfull of data per second, and the slowdown when the output hit the bottom of the screen was just painful to watch. So I continue to do what I've done for a long time-- on boot, I set the dot clock of the video card to something well above the standard text modes without putting the card into graphics mode. In a 1280x768 mode, an 8x12 character cell gives a 160x64 screen that is fast and readable.

There is a program called SVGATextMode that would program various SVGA chipsets, but it is not much used any more. The documentation provided by video chipset vendors, along with code examples in X-windows is usually enough for me to the write a little code that manipulates the extended registers.

The big problem was finding a modeline that works. And it's not just enough to find the modeline, you also have to find an intermediate mode that monitor (mine, anyhow) will accept before moving up to the real one. But I found them. After that, some tweaks-- turn the default rendering to green on black, set a decent timeout interval and force the cursor to be a solid blue block.

My favorite thing to do with this setup is when I'm editing something and want to check someplace else briefly, I can split the screen vertically (in emacs), go to the other place, move the cursor to the other window and do what I need to do. When done, I get rid of the other display. Emacs will let you link the two windows together as if you were editing a single area, but I haven't found that particularly useful.

February 18

Deep in upgrade hell. My desktop system, that I've used for more than a decade, is a red hat six system. You read that right. I can't use it for much since one small upgrade leads to half a dozen others. The other problem it has is that the six gigabyte disk I use is always full. The plan now is to take a new disk, install a modern system on it, and move all the old files onto the new drive, like a hermit crab finding a new and bigger shell. The 'new' disk is 30G, so that ought to hold me for a while.

February 17

Alexei Averchenko sent in a problem with -fone-error that has been fixed. Warnings were being promoted to errors if this flag was given. No build yet, I'm revamping my desktop system.

February 10

I've been working down the mails.

Michael Parkinson found a bug on windows where the >> redirection is trashing the file instead of appending. He sent a patch that seeks to the end, but I can't help thinking that we're seeking to the beginning of the file somehow. We're investigating further.

February 1

I'm changing my email. The firstinter.net address gets way too much spam. Click on the link at the top left, prove your humanity to get the new one.

January 18

Cleared up a bunch of spam + emails today.

January 17

Some anonymous person sent a $20.00 Four Peaks gift certificate. Thanks so much-- I appreciate this more than you know.

December 21

Richard Otero sent a pound of some really high-grade Scharffen Berger chocolates. Thanks!

December 20

Only in my alternate reality is 'shortly' three months... Patrick sent some extremely yummy chocolates, all the way from Switzerland.

James Beard sent a gift certificate for Four Peaks, my favorite brew pub. That'll have to wait a little more.

Thanks guys!