AmiWest 2016 (October 7 to 9) is fast approaching and so is Amiga DevCon 2016 (October 6 and 7). The location is Sacramento, California, USA. Please note, as always, the DevCon is a 100% volunteer effort as is the AmiWest show itself.
Some information about the DevCon from the AmigaOS Documentation Wiki:
- A detailed tour of AmigaOS 4 with tutorials. We will be going through the Amiga Future programming articles by Michael Christoph. If you’d like to get a jump on them see the Amiga Future Programming Articles.
- The anatomy of an Amiga disk device driver. We will be dissecting the p5020sata.device which is a SATA device driver used in the upcoming AmigaOne X5000. This isn’t just any custom made driver. It is melding between Linux libata and the traditional AmigaOS trackdisk device drivers.
- Hans de Ruiter will be presenting his tutorials on Warp3D Nova. For anyone that wants to program for Warp3D Nova be sure to obtain/borrow/steal a card for the DevCon. See the Warp3D Nova press release for assistance on choosing a card.
- Personal Projects. You are encouraged to bring your own personal projects to the DevCon where we, as a group, may be able to help you out.
See you there!
The 68K assembler version of the console had a peculiar architecture. When the console was opened, it allocated display memory 50% bigger than the size of the window (in those days the console had to be passed the address of an existing window when you opened it). That memory was divided up into a character map, Width by Height bytes, and a similar sized array of character attributes. The addressing of the array was really strange and I think it might have been translated into 68K assembler from BCPL.
The new console had to include:
1) A ReAction window and gadgets;
2) History and display scrolling;
3) Multiple windows (tabbed);
5) Prefs Settings.
As I mentioned last time, the fixed array size made it impossible to add history features to the architecture. Also, any operation such as an insertion, deletion, window resize or refresh was implemented by unpacking the whole character map and repacking it again. It was designed for a particular architecture and was not adaptable. We decided to use a linked list for the display rows, so that rows could be added or deleted at any time without having to repack a large array. The rows were then allocated and returned to a memory Pool for speed. The independence of the rows made it easy to keep a few pointers to the list, such as “History Start”, “Display Start”, “Display End” and “History End”. Most list operations were simplified by this approach and scrolling speed seemed to be adequate.
The ReAction Window and Gadgets
The old console window was opened by the con-handler and its address passed to the console device. That meant that one program (the con-handler) opened and “owned” the window, but another (the console device) performed all the work on it. That had to change if we were ever going to support memory protection. In any case, it made sense that the program that used the window, opened and closed it also. One of the first changes was that the console device would from now on open and maintain its own window (of course, it still had to accept the address of an existing window passed to it by an old program).
Once the console could open its own window, it could make it a ReAction window with Iconify, Scroll and ClickTab gadgets. If it was passed the address of an existing window, then none of the “extras” could be added since the calling program might have added its own extras, so only “new-style” windows have all the extra gadgets. Now you no longer have to close the Shell when you change Workbench GUI settings!
Most of the gadgets can be independently enabled or disabled by options in the Shell tooltypes.
History and Display Scrolling
One feature of the old design was that a long row could overlap the edge of the display window and “wrap around” onto the next row. You’ve all seen what happens when you make a Shell window narrower – the long rows split into two or more and the text occupies more rows on the screen. To implement this feature with the linked list, we ended up creating “multiple rows” that consisted of two or more rows of text on the screen, but linked, each one to the others on each side. What is more, the new console display had to be able to scroll text on and off top and bottom of the window. The window might even have been resized since the old text was scrolled off the edge, so we had to accommodate changes in window width as old rows of text were scrolled back into view. A lot of work has gone into the treatment of “long lines”!
The current history, consisting of the contents of the linked list of lines, is stored in RAM. In order to save memory, we allow old history to be written out to a disk file once the number of lines of stored text reaches a predetermined limit. You can specify how many lines of text you want to keep in RAM, and what to do once that overflows (e.g. keep it in a disk file or discard it). You can also elect to save the whole Shell session to a file if you wish.
Scrolling the history is easy, you can use the mouse wheel, the scroll bar gadget or built-in keystrokes. If you have selected to write old history to backup files when the current buffer is full, you can even scroll back through the history that has been written to the file. As you do, that history is read back into the current buffer, making it exceed the size limit temporarily. If you like, you can display the number of lines of history in the window title.
Multiple Windows (Tabbed)
Adding a ClickTab gadget to the window makes it possible to switch the display between any one of two or more actual Shells. All Shells share the same window geometry and gadgets, but each has its own independent history and text attributes. You could have one Shell displayed in black on white, another in white on black and others in any combinations you like. When you switch from one Shell to another, the new one is displayed in its own colours and style. You can choose the text to be shown on each tab (you might like to have the current directory, for instance).
From one Shell, you can open a new one by selecting a menu item or a built-in keystroke. The new Shell is a clone of the previous one, that is, it “has” the same current directory, local variables, and so on. However, it is opened using the same “Shell-Startup” file as all the other Shells, so if your Shell-Startup file includes a line like “cd RAM:”, all new Shells will open with RAM: as the current directory.
You can close any Shell individually, or even all at once.
The menu accumulates all the key short-cuts (apart from those for command-line editing) and lets you select or edit settings as you wish. You can also call up the console Help file from the window. The Help file contains all the user documentation for the Shell, the con-handler and the console device. Most of the menu operations can also be performed by built-in keystrokes.
There is a Preferences Editor for the console settings. These are the default settings for the console and can be changed by the tooltypes in the Shell icon or the program running in the Shell window. You can also call the prefs editor from within the console window and “Use” the changes temporarily if you wish.
You can select the font for the console text (fixed-width fonts only). You can also choose from a plain old “block” cursor, an underline or a vertical bar. You can make the cursor blink if you like.
The “old” console was pretty restricted in its choice of text and background colours. You could use the ANSI escape sequences to set colours, but you could only choose from the system Pens. In the new console, we have introduced a palette of colours from which you can choose your text foreground and background colours. There are four palettes available to the user – you can select your preferred palette from the Preferences editor. The four palettes are the old System Pens, an ANSI set of primary colours, an ANSI set of “faint” colours and a “user” set which you can choose to your own preferences.
The new console supports a few new text attributes, like italics, strike-through and character blink. Italics are spaced out slightly so that they don’t overlap with non-italic characters. The console does not attempt to support proportional fonts.
Name completion strictly is a function of the con-handler, not the console, but it appears in the console window. You can now choose to have name-completion choices displayed in a “popup” window close to where you are typing. You can then choose from the displayed list. If you choose to see completion “in-line”, you can choose if or when you want the system to beep at you (if there are no matches, multiple matches, or “this is the last match”).
Another con-handler setting in the console prefs is the size of the command input buffer. The old console was fixed at 1024 characters, but now you can leave it at 16 kB or set it to anything up to 1 MB.
You can also change the default Tab spacing (Horizontal Tabs, not ClickTab) from its default eight spaces to two, or eleven, or any value between two and sixteen.
Compatibility with Older Programs
The new console works in either its “legacy” style or its “new” style, depending on how it is opened. While testing it, we have discovered several older applications that depended on unspecified console behaviour or even, in some cases, bugs in the old console.
We have tried to keep the new console compatible with the old, but inevitably there will be programs “out there” that will behave differently. Let us know and we will see what we can do.
New applications can use the newer console features and make the most of them.
All the above description is user information, of course. The AmigaOS SDK contains the full documentation of the new console API. The AmigaOS Documentation Wiki has also been updated with new information about the improved Shell.
Well, that’s it for now. I hope you enjoy the benefits of the new Shell.
It was late in 2003 when I received my first AmigaOne-XE. At that time, like many others, I had to be content to use Linux, and I waited impatiently for the day when I could run AmigaOS on my new machine. In the meantime I built the Boing Ball “case” for it. [The XE board failed in 2014 after eleven years of faithful service. These days the Boing Case is occupied by the AmigaOne X5000]
Early in 2004 I was invited to become a beta tester and had the good fortune to be able to run the 68K version of AmigaOS 4 on my A4000. At that stage most of the old AmigaOS software had been modified (where necessary) to compile with GCC, and the developers were mainly using cross-compilers running under Linux or Windows. Many of the system components, originally written in 68K assembler, had already been rewritten in C or translated to C. Only a couple of months later, the first PPC versions of the system components were released to beta testers and 68K versions were dropped, one by one. Only a handful of third-party components remained in 68K, those for which we had binary licenses only (e.g. ARexx).
During a discussion on the beta tester mailing list one day, one of the developers mentioned that only the console device remained to be translated/rewritten in C. Foolishly I waved my hand in the air and said something like “I’d like to have a go at that!”. Well, I had been an assembly-language programmer for probably thirty years and a C programmer for probably twenty, so I thought it might be fun.
You can imagine my excitement when, some weeks later, an archive arrived in my inbox. It was the assembler sources for the console device, all 26 source files and about half a dozen header files. Where was I going to start? I couldn’t just translate the whole thing in one pass, then sit down and attempt to debug it. I had to be able to switch back and forth between assembler originals and C translations, or I would never be able to even make it run, let alone debug it.
I decided to use SAS/C, the 68K compiler/assembler package that I had been using for years. Like many other packages, it allows you to mix C and assembler modules freely in the build. I could translate each assembler source file in turn, and with some stubs, call the assembler functions from C or the C functions from assembler. That way I would be able to switch individual modules back and forth between C and assembler at any time. I started by compiling the source on my trusty old A4000, since the A1-XE was by now the beta test machine and the A4000 was the stable development platform. Later, the native AmigaOS 4 version of GCC was released and I was able to compile and build on the A1-XE (it was a bit faster than the A4000, even with its 68060/50 processor!).
Well, the translating job took about nine months. I took each assembler source file and commented out the assembler statements, writing in the C version underneath. Of course, this approach made each source file huge, but it would prove to be a boon later when I was debugging, since I could see the original side by side with my translation. During that time I had to break up some of the assembler source files into smaller modules, just to keep them a reasonable size. I finished up with some 30 C source files.
Here is an extract of part of a translation:
// lea cd_RastPort(a6),a1
// move.b cu_Mask(a2),rp_Mask(a1)
// moveq #0,d0
// move.b d6,d0
// LINKGFX SetDrMd
You can see that the original five lines of assembler have become two lines of C.
The makefile was huge, with half of it commented out at any one time. The SAS/C suite produced 68K code, but from a mixture of C and assembler sources. At first it was all compiled on the A4000, but later I swung everything over to the A1-XE and made that my new development platform.
As the development progressed, there were fewer assembler sources and more C sources, until the first pure-C version was released to the beta testers in May 2005. That version was still compiled by SAS/C and distributed as a 68K binary. There were several more months of debugging and testing, until the pure C sources were compiled into native PPC for the first time in August 2005. At about the same time, the C sources were stripped of the original assembler code and we had a working console device, written in C and compiled in pure PPC. We tested it to destruction and cleared the bugs before it was finally released to the world. Most of the bugs were inconsistencies with the old 68K version, caused by mis-translation or errors in understanding. It had taken a year to translate and debug, but it was a great satisfaction.
The original 68K assembler version of the console occupied 16,212 bytes. Compare that with the last 68K C version (V50.26) which was 42,564 bytes! It just shows how squashed the assembler version was (but that C version did have some debug code as well).
The translated console had several drawbacks. Because the original had been written in assembler, it was very difficult to read and maintain and there were a number of known “features” that were just too hard to fix. Also, the assembler version had been written to minimise code size, rather than to optimise speed or code legibility. There were many tricks employed within the assembler code to save a few bytes of space in the Kickstart ROM. For instance, it was common to arrange several byte-sized variables adjacently in the data area so that four variables could be picked up at once with a long word fetch. Not recognising these tricks was the cause of many a bug in the early days! Also a lot of the ANSI code had been not implemented in order to save space.
Not the least of the drawbacks was that the memory allocated for the display was fixed when the window was opened, and lines of text that scrolled off the top or bottom of the display were scrubbed clean and re-introduced at the other end. It was just not possible to save the scrolled lines to keep the display history. To add a scroll bar and history would need a big rethink of the architecture.
Next time: the Great Console Rewrite.
This is a significant update to the SDK which will require a lot of developer support to use properly. New articles have already began to appear on the AmigaOS Development Wiki like Intuition’s new Menu Class article.
We are also planning tutorials and a workshop at the Amiga DevCon 2015 which is before the AmiWest 2015 show in Sacramento, California, USA starting October 15th.
AmigaOS Development Team Lead
The AmigaOS Documentation Wiki (http://wiki.amigaos.net) is moving to a brand new server. The new server is quicker and also enables important upgrades to the underlying software driving the wiki.
It always takes a few days for the DNS propagation to complete so the URL may not work in your region yet. In the mean time, you can use http://126.96.36.199/wiki to access the wiki. You can monitor on the DNS propagation progress using services like DNS Checker.
Amiga DevCon 2015
Moving and upgrading the AmigaOS Documentation Wiki is just one small step towards Amiga DevCon 2015 which is taking place in Sacramento, California, USA on October 15 and 16 before the AmiWest 2015 show. With the release of AmigaOS 4.1 Final Edition there is a need for an updated AmigaOS SDK. The SDK includes some important AmigaOS API improvements and these will be covered at the DevCon and more.
Don’t worry if you can’t make the Amiga DevCon in person. All the presentation materials will be available on the AmigaOS Documentation Wiki in the Tutorials section.
AmigaOS Development Team Lead
Symmetric Multi-Processing (SMP) is on the wishlist of AmigaOS users for quite some time now, and while progress has been made, we’re still not there yet.
To explain the progress, let us first look at the concept, and then point to where we are in the whole.
Concept: Threads, Cores and Processors
When looking at SMP support, we need to take actual processor technology into account. Older implementations used actual physical processors for SMP, that is, one processor could execute one instruction stream, and to achieve parallel execution, you would be able to plug in more processors. Usually and not surprisingly, this is rather limited in the amount of processors that can communicate with each other (although there were massively parallel machines that used a complex interconnection network for communication).
Later on, chip manufacturers added additional so called “cores” to one physical processor.
A very recent development is the ability of such individual “cores” to execute more than one instruction stream in parallel. We call those instruction streams threads. This technology is used in such CPUs as the Intel Core i7 (where it is named hyper-threading), or the Freescale e6500 core, which is used on the T-series CPUs from Freescale (up to the T4240, which has 12 physical cores with two threads each).
How to schedule tasks in SMP configurations
When looking at how to schedule Exec tasks and processes on a SMP system, we need to look at how much overhead is involved with scheduling. Currently, in single-CPU environments, Exec periodically interrupts execution and evaluates whether it should pick another task to run. Doing that with high frequency, say, 20 or 30 times per second creates the illusion of multiple tasks running in parallel.
The evaluation whether a new task should run or not is done based on the tasks priority, or depending on whether the task has something important to do. Of course, this takes time as well, and if the time taken for this evaluation becomes too long, the machine will take more time evaluating what to run than running what it is supposed to.
This gets worse if more execution units are in the system: If the evaluation would also need to ask other CPU cores whether they want to run something, the time for this processing would rise tremendously, to a point where adding more CPUs to the system would actually slow it down.
Therefore, the scheduling of Exec tasks and processes will remain something a single CPU will do for itself, even in multi-core. This will ensure a reasonable time is spent on this task.
So, how do multiple cores, threads and/or CPU’s come into play ?
We can easily monitor how loaded a core is by checking how many of the tasks we have ready actually got time to run. This is called the “load” of the CPU. If it spends a lot of time waiting for tasks to become available, the load is low. If a lot of tasks are waiting for their turn to run, the load is high.
In a multi-core system, some cores will have high load, and some low. To balance this, the system will look at the load of the individual core and determine whether there is a need to balance out the load among the CPUs or not. This balancing is triggered by the individual schedulers when they notice that their current workload is too big for them to handle. In this case, the overloaded core will migrate some of its tasks to other cores until it can again handle its workload.
In the previous section, we talked about balancing. Let’s take a closer look at this. A task running on one core is usually using the core’s resources. An important resource is caches: Level 1 caches (L1) contain data from memory very close to the CPU’s instruction units, so accessing this data is instantaneous. A level 2 cache (L2) is something that is below the Level 1 cache, is usually bigger, and access to it is slightly slower than accessing the level 1 cache. Some systems even have higher level caches below the L2 cache.
In a typical multi-core system, the L1 is exclusive to one core, while the L2 is shared among many cores. However, as in the scheduling example, more cores accessing one L2 means more overhead in communication with the L2 because some core might be waiting for another core to finish accessing the L2. The more cores access the same L2, the more likely such a stall is. Therefore, some processors with a high number of cores have multiple L2 with groups of cores sharing that L2 while another group shares another L2. We call those groups “clusters”. As an example, the T4240 has 12 cores, grouped in three clusters of four cores each, with each cluster sharing their own L2 and each core having a separate L1. In addition, there are cores that can run multiple threads at once. These threads even share the L1.
What we see here is a hierarchy of execution units. Clearly, moving a task from one thread to another thread on the same core is a rather cheap operation, the migrated task will not suffer from misses in the L1 since it still uses the same one. On the other hand, migrating from one core to another one means that the new core will not have access to the L1 of the previous core, and thus, migrating to this core will come at a slight initial performance cost. Similarly, migrating across to a new cluster will mean the task will lose the benefit of both the L1 and the L2, resulting in an even larger initial performance cost.
As you can see, moving a task to another execution unit will have different degrees of performance penalties. Note, these are only “initial performance” penalties, since the caches will gather the necessary data over time so that after some time, the caches will be fully available again to the task.
This hierarchy is in essence the hierarchy of “scheduling domains”. Scheduling domains define a cost associated with moving a task from one core to another. When balancing the load, the system will strive to minimize the cost of movement to ensure minimal performance loss. Migrating inside the current scheduling domain will always be cheaper than migrating to some core outside of the current scheduling domain, however, if all cores in the task’s scheduling domain are overloaded, the higher cost will need to be paid.
Pitfalls: The dreaded Forbid
Adding this functionality to AmigaOS is not without problems, naturally. There is one central part of the OS that has been around since its very beginning, and has been widely misused and misunderstood: The Exec function Forbid (and its counterpart, Permit). The problem with these functions is both semantic and practical. It has been documented as disabling task switching, and as enforcing the system to become single threaded.
If you look at this, these are one and the same on a single core system, but something completely different in SMP. Forbidding task switching in an SMP system will have a number of threads running in parallel. No CPU core will switch tasks, since it is forbidden to do so, but the system is not running single threaded at all.
This misunderstanding has lead to a lot of misuse. Forbid has been used to basically protect critical data structures from tampering with by other threads. Of course, this will work in a single core system: Prohibit task switching, and you can be sure that no one else will get to access that data. In a SMP situation, however, this does not stop anyone from accessing it.
So what’s the solution to this ? Keeping in mind that Forbid is mostly used to protect critical sections of code and data, the SMP enabled kernel will treat it just like that: Any core issuing a Forbid call will, simply put, write its number into a field somewhere in the system. It will do this “atomically”, meaning that the memory is only modified if no one else is competing for it. If it succeeds, it can proceed into the critical section. On the other hand, if there is already some other core in the critical section, or the write did not succeed, it will repeat this process until it succeeds, basically stalling the competing core until it is successful. This ensures that only one core at any time can enter into a Forbid state, and any subsequent core that wants to forbid will have to wait until its turn.
Of course, this is a simplified description of the process. Some other things are necessary to ensure fairness and equal distribution of access, to prevent one core from hogging this “lock” for too long, and so on, but the basic process works like this.
Where are we now ?
The development of SMP support has been separated into several distinctive steps. The first step was to rewrite the scheduler in C for easier accessibility. In the very end, this step might be reversed again, rewriting the then SMP capable scheduler back into assembly language. The second, more fundamental step was to decouple the scheduler from its current data structures. As you might know, ExecBase contains a lot of list for task that are ready, or waiting for a signal.
This has now been achieved. The current development build uses a scheduler that no longer uses the original AmigaOS data structures, but a structure that is replicated for each core.
The next step is to have each core in the development system (currently, the X1000) to run the scheduler. Test code will then start tasks on the different cores and see how they behave. We have already experimented with this and the results look promising. The tests basically showed that the lockout mechanism for Forbid works as planned.
As a final step, the balancing will be introduced, which then finalizes the first implementation of SMP support in AmigaOS.
There are several possibilities to chose from once the first implementation is done. First of all, the scheduling algorithm is still the same as the one used by current Exec, a priority modified Round-Robin. Naturally, this is an algorithm that can be improved upon. There are several other implementations that come to mind, like the O(1) scheduler, multilevel feedback queues or even the Brain Fuck Scheduler.
Also, the balancing algorithms are candidates for improvement. The system might record scheduling data and compute typical user profiles that can be pre-selected, like, for example, the ability to determine when to balance, and how aggressively balancing is carried out.
We are extremely pleased to be able to host the 3rd incarnation of the AmiWest Programming Seminar. This is a unique event in the AmiVerse, to bring together Amiga developers, programmers and users to learn from each other and create apps.
In previous years we have taken a more formal approach with seminars and fixed topics. Given the wide range of programming experience on the part of participants, we will try a more free-form approach this year.
To read more and get more information, check out this link:
Here’s the sign up form:
AmiWest Programming Jam Organizer