Two Wrongs

The Expressive Ada 2012 Challenge

The Expressive Ada 2012 Challenge

I like the idea of the Expressive C++17 Challenge. I have an issue with how the problem is way, way underspecified11 I have provided my interpretation of the requirements in Appendix B: Program Requirements. You can probably understand why I find the specification annoying. What happens if the input file simply does not exist? What happens if the number of columns differ between lines? Not to mention that this ascii delimited data format is described all over the place as “csv”. It’s not., but let’s not rain on our own parade!

The twist of the challenge is that the program should use as many new C++17 features as reasonable, and it should not use any library other than the standard library. We got a C++ solution. We also have a Rust solution as well as a D solution. However, the latter two are not as faithful to the original problem as one may have hoped, because they do not focus on using new features. That said, the D solution is probably my favourite so far because the accompanying article was so educational.

With this article, we even have an Ada solution! I focused on two things while pulling this code together a while back:

  1. Using as many new features from Ada 2012 as reasonable22 This is in line with the original C++17 challenge, but a departure from the Rust and D solutions which just use language features that are good in general, and not specifically those that appeared in the most recent version of the language., and
  2. Handle infinite streaming files with good performance even on many fields,

Before looking at the code, let’s just quickly check out which Ada features I used that are worth mentioning. The numbers in parentheses indicate which version of the language33 In 1983, the first version of the language – Ada 83 – was designed. In 1995, it was upgraded with (among other things) better support for object-oriented programming and got the name Ada 95. This got further incremental improvements in 2005, and, unsurprisingly, this version of the language is called Ada 2005. Finally, a fairly large revision of the language was made in 2012, giving us the eponymous Ada 2012. that introduced them.

I would like to go into each of these in depth, but I don’t have nearly enough time to do that. What I’ll do is highlight some of the more interesting parts of the code and then you can check out the full program in appendix A, should you want to.

Contracts and Invariants (2012)

What is probably one of the greatest additions in Ada 2012 is the inclusion of various assertion types in the language standard. Assertions are one of the best ways to ensure your code is doing what you want it to do. Whenever you find yourself thinking, “and here, on line 84, the keys_pressed set should include the return key” – stop! – and encode that thought as an assertion. Above line 84, create a new line and write

assert keys_pressed.contains(xk_return);

By taking the things you think you know about the code, and spelling them out literally in the source file, you not only discover whether or not you were right – but you also help the next person to read the code, who will not have to retrace your steps to gain your insight.

Ada 2012 makes this stuff really convenient. Beside the basic Assert pragma, we can also specify, among others,

  • Pre(condition), checked each time before its subprogram44 Subprogram is a fancy way of saying “function or procedure” while still retaining the ability to do whole sentences in only one breath. is called;
  • Post(condition), checked each time before its subprogram returns;
  • Loop invariants, checked on each iteration of its loop55 I was just informed by sparre in #ada that the Loop_Invariant pragma is actually not part of Ada 2012 – it is borrowed from spark 2014 and my Ada compiler converts it to a regular assertion. No matter, we can simply replace it with a regular assertion ourselves!;
  • Type_Invariant, checked each time a value may have changed66 At this point the performance-seeking reader has probably fell off their chair. Obviously, these checks are disabled in hot paths of the release build, and it is trivial to do so..

These let us encode the “hidden truths” of our code in a way that lets us detect bugs very early – namely exactly at the point where something went wrong.

Please note that you don’t have to think hard to add assertions to your code. Think of them as insanity checks, because for almost any variable, I can come up with a value that will make you go, “Haha, that’d be insane!77 Examples: The number of keys pressed simultaneously on a standard 105-key keyboard will never exceed 500. The number of dots in an email address will never exceed the number of characters in the address. When you change a dollar bill for cents, you will never get more than 100 coins. These are things that should make you go “duh”, and these are also exactly the things that can delay a bug hunt by hours.” Assert that shit. You’re not necessarily trying to verify that your code is sane, you just want to make sure it’s not insane.

Type Invariant

In the Ada code I wrote for this, the type representing a row of fields is called Row. The Row type stores the start and end string indices for each field in that row as two parallel arrays called First and Last88 So to get the contents of the third field, you have to slice the string on (Row.First (3) .. Row.Last (3))..

type Row
  (Source : not null access constant String;
   Columns : Positive)
is tagged record
   First : Index_Array (1 .. Columns) := (others => 1);
   Last : Index_Array (1 .. Columns) := (others => 1);
end record with
   Type_Invariant =>
     (for all I in 1 .. Row.Columns =>
         Row.First (I) <= Row.Last (I)
         and Row.First (I) in Row.Source'Range
         and Row.Last (I) in Row.Source'Range);

We see that Row is a tagged record, which means it is allowed to tap into the object-oriented features introduced in Ada 95. I don’t use any of those features99 Besides using the Object.Method (Arguments) syntax we also get by virtue of being tagged but in my experience, defaulting to tagged types comes with nearly no drawbacks, and gives a lot of flexibility later on when the program grows.

The type declaration states that each value of this type references the input string it is based on, which we’ll look closer at in the next section about non-null constant references. The type also looks like it takes the expected number of columns as an argument, and we’ll look at that when we get to dynamic bounds and discriminated types.

I meant to write about the type invariant, but I realise I probably don’t even have to explain that. It uses a quantified expression to say that all fields should start and end inside the string which contains them, and all fields must start before they can end.

Loop Invariant

Similarly, the loop invariant in the code is probably self-explainatory. The assertion at the top of the loop body is the invariant that must be true on each iteration.1010 In an earlier version of this article, the code here said pragma Loop_Invariant (First < Line'Last) because I mistakenly believed that pragma was part of the 2012 standard. I was informed that it only exists in the spark 2014 standard (a subset of Ada aimed at high reliability), and it worked for me because my Ada compiler supports it by converting it to a regular assertion.

return This : Row (Line, Column_Count) do
Scan:
   for I in 1 .. Column_Count loop
      pragma Assert (First < Line'Last);

      This.First (I) := First;

   Find_Last:
      declare
         Next : constant Natural := Strings.Index
           (Line.all, FS, First, Ada.Strings.Inside);
      begin
         This.Last (I) :=
           (if Next > 0
               then Next - 1
               else Line'Last);
      end Find_Last;

      First := This.Last (I) + 2;
   end loop Scan;
end return;

This loop invariant is of the insanity check kind. When we enter an iteration of the loop, it means we are going to scan a field of the string. At entry, First stores the index of the first character of the field. If this is outside the string, we have clearly gone too far – or messed up some arithmetic. It’s not a sophisticated invariant, but it did actually reveal a bug at one point.

Oh, and we’re also looking at two other features that are new in Ada 2012: extended return statements and if expressions. There are technical reasons extended return statements have to exist1111 There are some types of resources we are not meant to create copies of – they are only supposed to have one owner at any time. In languages like C, you just have to pinky promise that you won’t do anything naughty. Ada has language support for expressing non-copyability with what are called limited types. They way non-copyability is enforced in practise is, among other things, by forbidding assignment of such types. So if you want to return a value of this type, and do some initialisation first, you get the option of turning the return statement into a full block of code, and inside that block of code, the normal restrictions on the variable are lifted. but I have come to really like them for code organisation reasons. It’s a neat way to not litter the namespace with a variable you just want to initialise and then return.

Non-null Constant References (2012)

Ada does a lot to help out with memory management while staying very low-level. In languages like C and Ada, there are primarily two1212 Okay, three, actually, but we’ll look at the third in the next section modes of dealing with acquiring memory for your values. You either let the compiler do it automatically for you, or you have to do it manually.1313 I’m tempted to work in the words “stack” and “heap” into this description but I’ll resist. In my uneducated mind, the specifics around how the allocations happen is an implementation detail. Doing it manually is obviously dangerous. If you dispute that, please recall that thousands of really hardcore, skilled C wizards get it wrong over and over and over again. That should be a powerful argument.

However, I don’t blame these powerful wizards. The C compiler is really limited in its ability to do memory management automatically, so they have to do it manually. A lot. Of course they’ll get it wrong. The Ada compiler does a slightly better job of helping the programmer with automatic allocation and keeping track of memory. Some of this is stuff we’ll see when we get to parametrised types, but we’ll start gently here with just talking about referencing existing objects.

Parameter Modes

One big reason pointers are everywhere in C code is that pointers are the only way to let a subprogram modify one of the caller’s variables. Without pointers, all subprograms get their own copies of the data – which is not always what we want. Ada has what are called parameter modes, which means you can specify whether the function body should be able to read from a parameter, write to it – or both. The compiler handles the reference boilerplate for you.1414 This makes things much more clear than it may at first seem: it makes it so that all parameters can be passed by reference, because either they are read only and then the callee can’t modify them anyway, or they are writable and then the caller expects the callee to modify them.

Access Types

The closest thing Ada has to pointers are called access types. These access types are really cleverly designed. Someone wrote that “in Ada it is syntactically invalid to leak pointers to memory that is about to become invalid.” That’s a C programmer’s wet dream. That’s the kind of power you can get when you pay great minds a lot of money to solve hard problems.

There are several rules surrounding access types to make this work and I’m not going to give a comprehensive guide, but these are the basics: To reference an object, you have to create an access type for that type of object, and then declare a variable of that access type. This shouldn’t be particularly surprising. Assming we have a boat:

type Boat is record
   Captain : Person;
   Propulsion : Sails_Or_Motor;
end record;

My_Boat : aliased Boat :=
  (Captain => Me,
   Propulsion => Sails);

The aliased indicator is required in order to get a reference from a variable in the first place.1515 The reason for this is elementary: in order to get the address of an object, the object must have an address. In order to have an address, the object must reside in primary memory. The compiler tries its very hardest to avoid storing objects in primary memory, because that is slow compared to storing them in cpu registers – or even optimising them away entirely. If we want the compiler to not do this, we make it an aliased declaration, which guarantees it will have a memory address. We can then let others access our boat by creating

type Boat_Access is access Boat;

Family_Key : Boat_Access :=
   My_Boat'Access;

This creates an access called Family_Key which, well, accesses my boat. Anyone who gets hold of the Family_Key value can do stuff to my boat, including replace me as the captain.

But! Big but.1616 And I cannot lie. While access types are free to reference any global object, they can only reference a stack object if the stack object is declared in the same scope as the access type itself. In addition, two different types declared to be access Boat are actually incompatible. If you want to copy a reference, you have to use the same access type that was originally used to create the reference. That way, references can only ever point to live stack objects.1717 This is a tricky part of Ada when you are learning it. Despite what masochists say, memory management is hard. In C, you get weird security flaws. In Ada, you have to really fight to get the compiler to accept your code. If you are struggling with learning this, my recommendation is to try different ways to intentionally create a pointer to invalid memory. You’ll know exactly what the problem is, and then the Ada compiler will complain – and bam! – you now have an understanding for exactly what the compiler complains about when it gives you that error message.

We can also create access types that serve as read-only views of the object they reference, limit access types to never contain null (i.e. always reference some object), create access types that can only reference objects allocated a certain way, or limit the total amount of memory available for an access type, or …

In our case, we parametrise the Row type on an access to the source string:

type Row
  (Source : not null access constant String;
   Columns : Positive)
is tagged private;

The Source parameter accepts any non-null read-only access-to-string type. Then when we create objects of this type, the actual reference we pass in comes from a string read from the input file:

Raw : aliased constant String := Get_Line (Input.File);
Header : constant CSV.Row := CSV.Parse (Raw'Access);

How can this possibly be safe, if Row accepts any access type? Isn’t there a way for Row to misplace the pointer, in a way that it will still linger when the referenced Raw goes out of scope? No, because in order for Row to be able to actually store the pointer somewhere, Row would have to be able to declare a variable of a specific access type – one that is declared in the same location as the Raw object. This does not exist, since the access to Raw is passed via a one-time-use-only anonymous access type.

Controlled Types (95)

I mentioned there’s a third way to deal with memory management. In fact, this approach transcends just memory management, and extends into general resource management. In other words, any time you need to do mechanical acquisition of a resource, or make sure you clean up when you are done with an object, this technique works.

In Ada, we call it controlled types. In other circles, it’s better known as raii. It is hard to overstate how important raii is. It does not eliminate resource leaks – far from it – but with raii you get a low-effort mechanism to reduce resource leaks by a lot.

The idea of raii is that whenever an object of a raii controlled type is created, an initialisation procedure is automatically called, and this is responsible for acquiring the resource. When a copy of the resource is created (by assignment), an adjustment procedure is automatically called, responsible for creating a copy and handing over control over it. When an object is finally destroyed, a finalisation procedure is automatically called, taking care of cleaning up. This happens regardless of why the object is destroyed: it could be because an unhandled exception was thrown, or simply because the variable containing the object went out of scope. In a sense, this makes a garbage collector out of the regular scope management the compiler has to do anyway.

In our code, we declare a raii controlled file handle, to automatically close the file even if the program exits abnormally.

type Controlled_File
is new Ada.Finalization.Limited_Controlled
with record
   File : File_Type;
end record;

overriding
procedure Finalize
  (This : in out Controlled_File)
is begin
   if Is_Open (This.File) then
      Close (This.File);
   end if;
end Finalize;

To declare a raii controlled type in Ada, all that needs to be done is inherit from one of the abstract types Controlled or Limited_Controlled1818 The difference between the two is that Limited_Controlled does not have an Adjust method., both in the standard library package Ada.Finalization. The three methods Initialize, Adjust, and Finalize are called automatically at various points when objects are created, assigned or destroyed, and these can be overridden to do custom resource management.

Generic Units (83)

In principle, the generics in Ada work like C++ templates (i.e. they allow you to tell the compiler how to generate concrete code from a generic description, and the compiler will generate code for the types you need). They are different from C++ templates in that they are not a mess. They’re actually well architected.

In this program, I have made the Clamp function a generic unit. That’s slightly overkill, and I probably wouldn’t have done that if I wrote this particular program only for myself. However, in a larger Ada program, this is just the sort of thing you might see.

generic
   type Target is range <>;
function Clamp
  (Element : in Target'Base)
   return Target
with
   Post => (if Element in Target
            then Clamp'Result = Element);

In fact, this declaration is probably as close as I have come to a pure distillation of ada. It’s a very compact display of what Ada is like. We have at our disposal unusually advanced type mechanisms, combined with what are commonly referred to as zero cost abstractions, and we make sure our variables are what we expect them to be.

We say that the function Clamp is generic on the Target range, takes any value in the base of the target range, i.e. the greater parent range, and then returns a value in the Target range. We have an insanity check to make sure that the value returned is the same as the value passed in, if the value passed in already fits into the target range.

The implementation of the function is not very interesting, but you can find it in the code listing in the appendix anyway.

We instantiate this generic function by first making a range for it to clamp values to. Then we ask the compiler to generate a concrete function from the generic description, given the range type.

subtype Byte_Range is Positive range 0 .. 255;

function Byte_Clamp is new Clamp (Byte_Range);

Now Byte_Clamp is a regular concrete function that will take any integer and return the nearest valid value that fits in the range 0–255.

Named Blocks and Lexical Nesting (83)

Note that in the spirit of the challenge, I decided to implement everything in a single Ada file. This is not how code is normally written. Like in other languages, packages are normally defined in separate files with separate package specification, and so on. Ada has a relatively expressive module system, which – contrary to many other languages – supports both different philosophies when it comes to package hierarchies1919 Do parent packages provide the fundamental base blocks, with child packages extending these with high level operations – or do private child packages implement the base blocks, with parent packages pulling these together into higher level operations? Ada lets you do either, even though at first they may appear mutually exclusive..

You can also see that Ada supports arbitrary levels of lexical nesting in subprograms. This is necessary when you want to simplify the body of a subprogram by extracting some of the code into a separate – but individually nonsensical – subprogram.

Ada also has named blocks and loops. You can create a new block of code as such:

Find_Target:
   declare
      Raw : aliased constant String := Get_Line (Input.File);
      Header : constant CSV.Row := CSV.Parse (Raw'Access);
   begin
      Column_Count := Header.Column_Count;
      ------------------------------------------------->8---
      Put_Line (Output.File, Header.Source.all);
   end Find_Target;

This would be just as valid without the name, and the name is nothing but a documentation thing. It does prevent bugs by not letting you accidentally close the wrong block, though. Named loops are more important.

Replace:
   while not End_Of_File (Input.File) loop
   ------------------------------------------------->8---
   end loop Replace;

Why does this matter? Because if we had another loop nested inside Replace, we would be able to break out of the outer loop from inside the inner by saying exit Replace.

Dynamic Bounds and Parametrised Types (83)

We saw earlier some ways in which Ada avoids pointers. There are more ways, though! We’ll look at the Row type again, to see how.

type Row
  (Source : not null access constant String;
   Columns : Positive)
is tagged record
   First : Index_Array (1 .. Columns) := (others => 1);
   Last : Index_Array (1 .. Columns) := (others => 1);
end record -- 

This type is parametrised (or, as Ada people say, discriminated) on the number of fields. This means that – even for the same source – Row (Src, 5) and Row (Src, 6) are not the same. The former indicates a row of five fields, and the latter one of six. This stuff matters in low-level programming, where the compiler needs to know how much memory to reserve on the stack for each object. So far, this is not fundamentally different from C. What happens next, though, is what makes the difference.

Raw : aliased constant String := Get_Line (Input.File);
Header : constant CSV.Row := CSV.Parse (Raw'Access);

Ada lets us declare a variable holding a Row of arbitrary number of columns, as long as it is initialised directly. This appears like dynamic memory management, and it is dynamic for sure, but it is still only reserved space on the stack.

This is in fact why the first assignment to Raw works, too. In Ada, a String (1 .. 6) (i.e. a an array of six characters) is a different and incompatible type type compared to String (1 .. 7) (an array of 7 characters). However, we can declare an almost generic String variable and fill it up right away. C has this for arrays, but Ada has it for arbitrary types.

Another difference is that Ada allows us to do this also for subprogram parameters. In Ada we can have a subprogram receive a String of run-time decided size. In C, such an array would be implicitly converted to a plain pointer, with the information about its size lost.

Reflections

We have now seen some of the more important features I used. There are other things worth discussing, though. This is probably interesting also to people who couldn’t care less about learning Ada.

Code Style

Looking at the code listing, I realise how formal and enterprisey the code looks. I think this is a natural consequence of the task at hand, and not a reflection of my coding style. Even though Ada is a very enterprisey language, you don’t have to write enterprisey code in Ada. In fact, the first solution I sketched was much less formal in its construction.

But then I tried to cram in a bunch of useful Ada 2012 features, and suddenly there was no way to avoid the formality of it all. As examples, I found out that type invariants require encapsulation through private implementations, and generic functions cannot be declared and implemented at the same time.

Performance

Sebastian, in the article he wrote on his D solution, ran a benchmark comparing the performance between the solutions in C++, Rust, and D. I would love to compare all of those to this Ada program, but I’m currently stuck on a bus with no reliable internet connection, so I can’t install the compilers for D and Rust.

What I can do, however, is compare different versions of the Ada program to the final C++ contribution. To begin with, I constructed a large-ish (12 megabytes: 17 columns, and 97164 lines) comma-separated file where the fields were filled with random numbers.

The C++ program, compiled according to the command specified in the challenge, processes this file in 0.95 seconds.2020 Since my laptop died, my fiancée and/or her sister graciously let me borrow an old laptop they don’t use anymore. Hence the not-so-fast program execution. In order not to focus too much on specific numbers, and look instead at general trends, the times in the table below will be given relative to the C++ measurement.

Program description Speedup
C++ winning solution 0 %
Ada w/ heap allocation 19 %
Ada w/ stack allocation 50 %
Ada w/ varying array size 150 %
Ada w/ fixed array size 190 %

The program we have been discussing here is the last one. The results here are mostly what you would expect, so there’s not too much to say about that. The two allocating versions copied the contents of the fields to each field object. Obviously, just referencing the source string directly is faster.

I then tried compiling the last program under two different types of optimisation. First with the bounds checks disabled (-gnatp), which yielded only a modest increase in performance, and then with optimisation on (e.g. -O1), which was surprisingly significant. It didn’t appear to matter much which optimisation level, only that optimisation was requested at all.

Compilation type Speedup
Normal 0 %
Checks disabled 6 %
Optimisation on 38 %

I suspect the reason for the modest improvement with disabled checks is that the compiler will already elide checks which it knows will not be triggered. So ideally, in a program which is 100% correctly written and almost all failure modes are taken care of, the disable checks flag should do nothing to performance.

Multi-tasking

Because one of the places where Ada shines is in multi-tasking, I also threw together a multi-tasked version of the program.

Program Speedup
Ada, sequential 0 %
Ada, multi-tasking -73 %

You are reading that right. Multi-tasking decreased the performance a lot. The reason for that is pretty clear though, just by looking at the output of time:

Measure Time
Real 3.53
User 4.01
System 1.39

There are two ways to read this, but both with the same results.

  1. We could look at the system time, and say, “It’s spending a lot of time in the kernel. I wonder what it does there? Perhaps it is coordinating threads of execution.”
  2. We could also look at the ratio of user to real time. A fully parallelised task on my dual-core-with-hyperthreading-registered-trademark-intel-property machine would show a user time of nearly quadruple the real time. We barely even get 1.3 times the real time, so no real gain from parallelisation there.

The conclusion is, of course, that the problem is sequential enough in its nature that we do not gain performance with multi-tasking. Instead, we lose performance because we need to spend a lot of time coordinating between the tasks.

Calls For Contribution

I’d love to see solutions in C, Go, and Java: all under the premise of using the new features in the latest version of the language. I realised the original challenge may very well have been a sneaky attempt at tricking people into learning the new features in C++17, because I promise you learn a lot by doing this sort of thing. Even if you already think you know most of the new features. Shoehorning them into a problem gets you familiar with their ins and outs.

Either way, I like the idea of a task like this to keep yourself updated on the new stuff. And I would like to know what’s new in particular in Java 9 and 10.

Appendix A: Code Listing

-- Modeled on the Expressive C++17  Challenge, I present:
--
--  The Expressive Ada 2012 Challenge!
--
-- Ada features used (language version in parentheses):
-- + [x] Contracts (2012)
-- + [x] Type invariants (2012)
-- + [x] If expressions (2012)
-- + [x] Quantified expressions (2012)
-- + [x] Expression functions (2012)
-- + [x] Extended use clauses (2012)
-- + [x] Extended return statements (2012)
-- + [x] Non-null constant reference (2005)
-- + [x] Message included in exception (2005)
-- + [x] Controlled types (1995)
-- + [x] Generic units (1983)
-- + [x] Named blocks (1983)
-- + [x] Dynamic bounds (1983)
-- + [x] Discriminated types (1983)
-- + [x] Lexical nesting (1983)

with Ada.Exceptions;
with Ada.Finalization;
with Ada.Strings;
with Ada.Strings.Maps;
with Ada.Strings.Fixed;
with Ada.Text_IO;
with Ada.Command_Line;

procedure Replace is

   -------------------------------------------------------------------
   -- Exceptions
   -------------------------------------------------------------------
   Invalid_Input  : exception;   -- Raised when input file exists but
                                  --|   is empty.

   Invalid_Target : exception;   -- Raised when header line of input
                                  --|   file does not contain
                                  --|   requested target field.


   -------------------------------------------------------------------
   -- Error messages generated by the exceptions above
   -------------------------------------------------------------------
   Input_Missing : constant String := "Input file missing";
   Column_Missing : constant String := "Target field not found";


   -------------------------------------------------------------------
   -- User supplied parameters to customise operation
   -------------------------------------------------------------------
   In_Filename : constant String := Ada.Command_Line.Argument (1);
   Out_Filename : constant String := Ada.Command_Line.Argument (4);
   Target_Field : constant String := Ada.Command_Line.Argument (2);
   Replacement : constant String := Ada.Command_Line.Argument (3);


   -------------------------------------------------------------------
   -- Clamp
   --
   -- Purpose:
   --    Returns the integer within the Target range that is nearest
   --    to the Element argument.
   -- Exceptions:
   --    None.
   -------------------------------------------------------------------
   generic
      type Target is range <>;
   function Clamp
     (Element : in Target'Base)
      return Target
   with
      Post => (if Element in Target then Clamp'Result = Element);

   -------------------------------------------------------------------
   -- Clamp
   -------------------------------------------------------------------
   function Clamp (Element : in Target'Base) return Target
   is (Target'Base'Max (Target'First,
       Target'Base'Min (Target'Last,
                        Element)));


   -------------------------------------------------------------------
   -- CSV
   --
   -- Purpose:
   --    Provides the Row type, which represents an ASCII-comma
   --    delimited record of fields. Has subprograms to parse a
   --    String into Row, and to get the contents of individual
   --    fields in a Row.
   -------------------------------------------------------------------
   package CSV is

      ----------------------------------------------------------------
      -- Row
      --
      -- Purpose:
      --    Represents an ASCII-comma delimited record of
      --    Columns fields, parsed from Source.
      ----------------------------------------------------------------
      type Row
        (Source : not null access constant String;
         Columns : Positive)
      is tagged private;

      ----------------------------------------------------------------
      -- Parse (known number of columns)
      --
      -- Purpose:
      --    Parses Column_Count fields from Line into Row.
      -- Exceptions:
      --    None.
      -- Performance Notes:
      --    O(n) time in length of Line.
      -- Dragons:
      --    When Column_Count exceeds the actual field count, the ret-
      --    urned Row will have a corresponding number of empty fields
      --    at the end.
      ----------------------------------------------------------------
      function Parse
        (Line : not null access constant String;
         Column_Count : in Positive)
         return Row;

      ----------------------------------------------------------------
      -- Parse (unknown number of columns)
      --
      -- Purpose:
      --    Parses an unknown number of fields from Line into Row.
      -- Performance Notes:
      --    O(n²) time in length of Line.
      ----------------------------------------------------------------
      function Parse
        (Line : not null access constant String)
         return Row;

      ----------------------------------------------------------------
      -- Contents (single field)
      --
      -- Purpose:
      --    Returns contents of individual fields of Row.
      -- Exceptions:
      --    Constraint_Error
      --       when Index is larger than the number of fields in Row.
      ----------------------------------------------------------------
      function Contents
        (This : in Row;
         Index : in Positive)
         return String
      with
         Pre => Index in 1 .. This.Column_Count,
         Post =>
            Ada.Strings.Fixed.Index
              (This.Contents (1, This.Column_Count),
               Contents'Result) > 0;

      ----------------------------------------------------------------
      -- Contents (range of fields)
      --
      -- Purpose:
      --    Reconstructs the part of the Source line that contains all
      --    fields between First and Last, including delimiters.
      -- Exceptions:
      --    Constraint_Error
      --       when either First or Last is larger than
      --       the number of fields.
      ----------------------------------------------------------------
      function Contents
        (This : in Row;
         First, Last : in Positive)
         return String
      with
         Pre =>
            (1 <= First and First < Last
               and Last <= This.Column_Count);

      ----------------------------------------------------------------
      -- Column_Count
      --
      -- Purpose:
      --    Return the number of fields in Row. Useful e.g. if Row was
      --    parsed by the Parse function that figures out field count
      --    on its own.
      ----------------------------------------------------------------
      function Column_Count
        (This : in Row)
         return Positive;

   private   -- package CSV

      ----------------------------------------------------------------
      -- Index_Array
      --
      -- Purpose:
      --    Store an arbitrary number of string indices.
      ----------------------------------------------------------------
      type Index_Array
      is array (Positive range <>)
         of Positive;

      ----------------------------------------------------------------
      -- Row
      --
      -- Implementation Notes:
      --    Stores fields as parallel arrays of the string indices of
      --    Source at which each field starts and ends.
      ----------------------------------------------------------------
      type Row
        (Source : not null access constant String;
         Columns : Positive)
      is tagged record
         First : Index_Array (1 .. Columns) := (others => 1);
         Last : Index_Array (1 .. Columns) := (others => 1);
      end record with
         Type_Invariant =>
           (for all I in 1 .. Row.Columns =>
              Row.First (I) <= Row.Last (I)
              and Row.First (I) in Row.Source'Range
              and Row.Last (I) in Row.Source'Range);

   end CSV;

   -------------------------------------------------------------------
   -- CSV
   -------------------------------------------------------------------
   package body CSV is
      package Strings renames Ada.Strings.Fixed;
      use all type Ada.Strings.Maps.Character_Set;

      ----------------------------------------------------------------
      -- FS
      --
      -- Purpose:
      --    The field separator expressed as Character_Set for
      --    compatibility with Ada.Strings.Fixed.
      ----------------------------------------------------------------
      FS : constant Ada.Strings.Maps.Character_Set := To_Set (",");

      ----------------------------------------------------------------
      -- Parse (known column count)
      --
      -- Implementation Notes:
      --    Linear scan of line with Column_Count iterations, and the
      --    starting point of the next iteration found by
      --    Ada.Strings.Fixed.Index.
      ----------------------------------------------------------------

      function Parse
        (Line : not null access constant String;
         Column_Count : in Positive)
         return Row
      is
         First : Positive := 1;
      begin
         return This : Row (Line, Column_Count) do
         Scan:
            for I in 1 .. Column_Count loop
               pragma Assert (First < Line'Last);

               This.First (I) := First;

            Find_Last:
               declare
                  Next : constant Natural := Strings.Index
                    (Line.all, FS, First, Ada.Strings.Inside);
               begin
                  This.Last (I) :=
                    (if Next > 0
                        then Next - 1
                        else Line'Last);
               end Find_Last;

               First := This.Last (I) + 2;
            end loop Scan;
         end return;
      end Parse;


      ----------------------------------------------------------------
      -- Parse (unknown column count)
      --
      -- Implementation Notes:
      --    First counts the number of field separators in Line, to
      --    to determine Column_Count. Then calls Parse with the now
      --    known Column_Count.
      ----------------------------------------------------------------
      function Parse
        (Line : not null access constant String) return Row
      is (Parse (Line, Strings.Count (Line.all, FS) + 1));

      ----------------------------------------------------------------
      -- Contents (single field)
      ----------------------------------------------------------------
      function Contents
        (This : in Row; Index : in Positive) return String
      is (This.Source (This.First (Index) .. This.Last (Index)));

      ----------------------------------------------------------------
      -- Contents (range of fields)
      ----------------------------------------------------------------
      function Contents
        (This : in Row; First, Last : in Positive) return String
      is
         subtype Field_Range is Positive range 1 .. This.Columns;
         function Field_Clamp is new Clamp (Field_Range);

         First_Field : constant Positive := (Field_Clamp (First));
         Last_Field : constant Positive := (Field_Clamp (Last));

         subtype Source_Range is Positive range This.Source'Range;
         function Source_Clamp is new Clamp (Source_Range);
      begin
         return This.Source
           (Source_Clamp (This.First (First_Field) - 1)
            .. Source_Clamp (This.Last (Last_Field) + 1));
      end Contents;

      ----------------------------------------------------------------
      -- Column_Count
      ----------------------------------------------------------------
      function Column_Count (This : in Row) return Positive
      is (This.Columns);

   end CSV;

   use Ada.Text_IO;

   -------------------------------------------------------------------
   -- Controlled_File
   --
   -- Purpose:
   --    Wrapper around File_Type which automatically closes the file
   --    whenever the object is finalized, iff the file was open.
   -------------------------------------------------------------------
   type Controlled_File
   is new Ada.Finalization.Limited_Controlled
   with record
      File : File_Type;
   end record;

   -------------------------------------------------------------------
   -- Finalize
   -------------------------------------------------------------------
   overriding
   procedure Finalize
     (This : in out Controlled_File)
   is begin
      if Is_Open (This.File) then
         Close (This.File);
      end if;
   end Finalize;

   -------------------------------------------------------------------
   -- Local variables of the replacing procedure. Keeps track of input
   -- and output files, as well as the column index and the number
   -- of columns all rows should have.
   -------------------------------------------------------------------
   Input : Controlled_File;
   Output : Controlled_File;
   Target : Positive;
   Column_Count : Positive;

begin   -- procedure Replace

   -- Open the input file and verify it is not empty
   Open (Input.File, In_File, In_Filename);
   if End_Of_File (Input.File) then
      raise Invalid_Input with Input_Missing;
   end if;

   -- Scan the first line of input to find the column index of target.
   -- If found, create output file and write the first line to it.
Find_Target:
    declare
       Raw : aliased constant String := Get_Line (Input.File);
       Header : constant CSV.Row := CSV.Parse (Raw'Access);
    begin
       Target := 1;
       Column_Count := Header.Column_Count;

       for I in 1 .. Column_Count loop
          if Header.Contents (I) = Target_Field then
             Target := I;
          end if;
       end loop;

       if Header.Contents (Target) /= Target_Field then
          raise Invalid_Target with Column_Missing;
       end if;

       Create (Output.File, Out_File, Out_Filename);
       Put_Line (Output.File, Header.Source.all);
    end Find_Target;

    -- For all other lines in the input, copy everything except the
    -- target column to the output, replacing the target column
    -- contents with the replacement string.
Process:
    while not End_Of_File (Input.File) loop
       declare
          Raw : aliased constant String := Get_Line (Input.File);

          Current : constant CSV.Row :=
             CSV.Parse (Raw'Access, Column_Count);

          Result : constant String :=
            Current.Contents (1, Target - 1)
               & Replacement
               & Current.Contents (Target+1, Column_Count);
       begin
          Put_Line (Output.File, Result);
       end;
    end loop Process;

exception   -- procedure Replace

   when Error : Invalid_Input | Invalid_Target =>
      Put_Line
        (Standard_Error,
         Ada.Exceptions.Exception_Message (Error));

end Replace;

Appendix B: Program Requirements

  • The program must read a file, the name of which is given as the first command-line argument to the program.
  • If the input file is empty, an error message stating so must be emitted on a standard output stream, and the program execution must halt, and a new file must not be created.
  • Each line in the input file must be interpreted as a record of related fields, where each ascii comma character2121 0x2c separates two fields.
  • If the second command-line argument does not appear alone in a field on the first line of the file, an error message stating so must be emitted on a standard output stream, and the program execution must halt, and a new file must not be created.
  • If the second command-line argument appears alone in a field on the first line of the file, the index of this field must be interpreted as the target of a replacement operation.
  • The fourth command-line argument is to be interpreted as a file to which output must be written. If this file exists, it must be cleared before new output is written.
  • The first line of the input file must be written verbatim to the output file.
  • All lines except the first should be written to the output file with one modification: the field indicated as the target of the replacement operation must have contents given as the second command-line argument. All other fields must retain their contents from the input file.