Overview

The open-lmake build system automatically determines which pieces of a large workflow need to be remade and issues the commands to remake them.

Our examples show how to build and test C/C++ programs, as they are very common, but you can use open-lmake with any programming language to run any phase of your CI/CD as long as these can be scripted.

Indeed, open-lmake is not limited to building programs. You can use it to describe any task where some files must be (re)computed automatically from others whenever recomputing would lead to a different content. Such situations include dep modifications, but also command modifications, dep list modifications, apparition of an include file that was in the search path before an include file was actually accessed, symbolic link modifications, etc.

Symbolic links and hard links are also supported.

As far as open-lmake is concerned, repositories can be moved, archived and restored. Tools are provided to help user achieve the same level of flexibility.

Open-lmake is designed to be scalable, robust and efficient.

By scalable, we mean that open-lmake can manage millions of files and tens of thousands of CPU hours with no difficulties, so there is never any reason to have any kind of recursive invocation, open-lmake can handle the whole project flat.

By robust, we mean that open-lmake guarantees that if a job is not rerun, then rerunning it would lead to the same content (or a content that is equally legal). This includes automatic capture of so called hidden deps (i.e. deps that are not explicitly stated in the rules, e.g. include files).

We also mean that open-lmake, as any software, it may have bugs. Such bugs can lead to crashes and to pessimistic behavior (a job is rerun while it was not necessary). But special attention has been devoted in its design to ensure that it is never optimistic (a job not being rerun while it should have been). In case of any adverse event (lmake crashes or spurious system reboot), open-lmake automatically recovers potentially corrupted states in a safe way to avoid having to remake the whole project because a few files are corrupted. In extreme cases, there is a lrepair command that can recover all safe parts of a damaged repository. Similarly the ldircache_repair command is able to recover a damaged (or manually manipulated) cache directory.

Note that open-lmake does not only recorver from its own flees, but also a lot of experience is embedded into it to work around system bugs. This includes for example NFS peculiar notion of close-to-open consistency (which does not apply to the directory containing the file) or jobs spuriously disappearing.

By efficient, we mean that jobs are run in parallel, optionally using a batcher such as SGE or slurm, managing limited resources as declared in Lmakefile.py. We also mean that open-lmake makes a lot of effort to determine if it is necessary to run a job (while always staying pessismistic). Such effort includes checksum based modification detection rather than date based, so that if a job is rerun and produces an identical content, subsequent jobs are not rerun. Also, open-lmake embed a build cache whereby jobs can record their results in a cache so that if the same run needs to be carried out by another user, it may barely fetch the result from the cache rather than run the - potentially lengthy - job.

Preparing and running lmake.

To prepare to use open-lmake, you must write a file called Lmakefile.py that describes the relationships between files in your workflow and provides commands for generating them. This is analogous to the Makefile when using make.

When developping a program, typically, the executable file is built from object files, which in turn are built by compiling source files. Then unit tests are run from the executable and input files, and the output is compared to some references. Finally a test suite is a collection of such tests.

Once a suitable Lmakefile.py exists, each time you change anything in the workflow (source files, recipes, ...), this simple shell command:

lmake <my_target>

suffices to perform all necessary steps so that <my_target> is reproduced as if all steps leading to it were carried out although only necessary steps were actually carried out. The lmake program maintains an internal state in the LMAKE directory to decide which files need to be regenerated. For each one of those, it issues the recipes recorded in Lmakefile.py. During job execution, lmake instruments them in order to gather which files are read and written in order to determine hidden deps and whether such actions are legal. These information are recorded in the LMAKE directory.

You can provide command line arguments to lmake to somewhat control this process.

Problems and Bugs

If you have problems with open-lmake or think you've found a bug, please report it to the developers; we cannot promise to do anything but we may well be willing to fix it.

Before reporting a bug, make sure you've actually found a real bug. Carefully reread the documentation and see if it really says you can do what you're trying to do. If it's not clear whether you should be able to do something or not, report that too; it's a bug in the documentation!

Before reporting a bug or trying to fix it yourself, try to isolate it to the smallest possible Lmakefile.py that reproduces the problem. Then send us the Lmakefile.py and the exact results open-lmake gave you, including any error messages. Please don't paraphrase these messages: it's best to cut and paste them into your report. When generating this small Lmakefile.py, be sure to not use any non-free or unusual tools in your recipes: you can almost always emulate what such a tool would do with simple shell commands. Finally, be sure to explain what you expected to occur; this will help us decide whether the problem is in the code or the documentation.

if your problem is non-deterministic, i.e. it shows up once in a while, include the entire content of the LMAKE directory. This directory contains extensive execution traces meant to help developers to track down problems. Make sure, though, to trim it from any sensitive data (with regard to your IP).

Once you have a precise problem, you can report it on github

In addition to the information above, please be careful to include the version number of the open-lmake you are using. You can get this information from the file LMAKE/version. Be sure also to include the type of machine and operating system you are using. One way to obtain this information is by running the command uname -a.

Introduction

To introduce the basic concepts of open-lmake, we will consider a case including:

C/C++ compilation
link edition of executables
tests using test scenari
test suites containing list of test scenari

We will implement such a simple flow with a full decoupling of the flow (as described in Lmakefile.py) and project data (as described by the other source files). In this flow, we will assume that:

for an executable foo.exe, its main is located in foo.c or foo.cc
for a feature foo.o, its interface is in foo.h or foo.hh and it implementation (if any) is in foo.c or foo.cc

With these assumptions, which correspond to usual code organization, the list of objects necessary to link an executable can be derived automatically by analyzing the source files.

The `Lmakefile.py` file

Lmakefile.py is the file that describes the flow. It is analogous to Makefile when using make and is plain python (hence its name). It is composed of 3 parts:

config
sources
rules

In this introduction, we will use the default config and nothing needs to be put in Lmakefile.py.

Open-lmake needs to have an explicit list of the source files. If your code based is managed by git, it will be automatically used to list source files. We will assume we are in this common case for this introduction.

We are left with rules that really describe the flow. Rules are python classes inheriting from lmake.rules.Rule.

So we must import these:

from lmake.rules import Rule

Now, let us start with the compilation rules.

These will look like:

class CompileC(Rule) :
	targets = { 'OBJ' : '{File:.*}.o' }
	deps    = { 'SRC' : '{File}.c'    }
	cmd     = 'gcc -c -o {OBJ} {SRC}'

class CompileCpp(Rule) :
	targets = { 'OBJ' : '{File:.*}.o' }
	deps    = { 'SRC' : '{File}.cc'   }
	cmd     = 'gcc -c -o {OBJ} {SRC}'

Notes:

targets define the stems (here File) with a regular expression (here .*).
deps define the static deps, i.e. the deps that are needed for all jobs of the rule, but upon execution, other deps (such as .h files) may be discovered (and most of the time will).
cmd is an f-string. There is no f-prefix because it is not expanded while reading Lmakefile.py but when a job is executed.
The same is true for deps values, these are f-string (without the f-prefix).
To be selected to build a file, a rule must match the file with one if its targets and its static deps must be buildable. For example, if foo.c exists and foo.cc does not, rule CompileC can be selected to build foo.o and CompileCC cannot.
Hidden deps, on the contrary, may or may not be buildable, this is not a problem for open-lmake. It may or may not be a problem for the command, depending on how it is written.
Note the use of 'buildable' in the previous points and instead of 'exist'. If a file does not exist and is buildable, it will be built as soon as open-lmake learns its need. If a file exists and is not buildable, it will be rm'ed if it was produced by an old job, else it will be declared 'dangling' and this is an error condition. In this latter case, most of the time, it corresponds to a file that must be added to git.

For the link edition, here it is:

for ext in ('c','cc') :
	class ListObjects(Rule) :
		name    = f'list objects from {ext}'
		targets = { 'LST' : '{File:.*}.lst'   }
		deps    = { 'SRC' : f'{{File}}.{ext}' }
		def cmd() :
			# this is a rather complex code that computes the transitive closure of included files
			# including included files of the .c (or .cc) file for each included .h (or .hh) file
			# such details are outside the scope of this document
			...

class Link(Rule) :
	targets = { 'EXE' : '{File:.*}.exe' }
	deps    = { 'LST' : '{File}.lst'    }
	cmd     = 'gcc -o {EXE} $(cat {LST})'

Notes:

You have the full power of python, including loops, conditionals etc.
It is not a problem to have several classes defined with the same name (as ListObjects here). However, to avoid confusion when reporting execution to the user open-lmake refuses to have several rules with the same name. The name of a rule is its name attribute and defaults to the class name.
cmd can be either a string, in which case it is interpreted as an f-string and its expanded is run with bash to execute the job, or a function, in which case it is called to execute the job
rule that generate .lst files does not prevent the existence of such files as sources. In such a case, the rule will not be executed.

Finally, we need tests :

class Test(Rule) :
	target = '{File:.*}.test_out'
	deps   = { 'SCN' : '{File}.scn' }
	cmd    = '{SCN}'

class TestSuiteExpansion(Rule) :
	targets = { 'SCN'       : '{File:.*}.test_dir/{Test*:.*}.scn' }
	deps    = { 'TEST_SUIT' : '{File}.test_suite'                 }
	def cmd() :
		for line in open(TEST_SUIT) :
			name,scn = line.split(None,1)
			open(SCN(name),'w').write(scn)
			os.chmod(SCN(name),0o755)

class TestSuiteReport(Rule) :
	target = '{File:.*}.test_suit_report'
	deps   = { 'TEST_SUIT' : '{File}.test_suite' }
	def cmd() :
		for line in open(TEST_SUIT) :
			name = line.split(1)[0]
			stdout.write(open(f'{File}.test_dir/{name}.test_out'))

Notes:

One can define target rather than targets. It is a single target and is special in that is is fed by the stdout of cmd. Although more rarely used, dep can be used and feeds cmd as its stdin.
In TestSuiteExpansion targets, there is a * after Test. This a so-called 'star-stem'. It means that a single execution of the job generates files with diferent values for this star-stem.
for python cmd (when it is a function), targets, deps and stems are accessible as global variables. Star-stems are not defined (they would be meaningless) and the corresponding targets are functions instead of variables: you must pass the star-stems as arguments.

Special notes on content based up-to-date definition:

Open-lmake detects a file has changed if its content has changed, not its date.
This means that a checksum is computed for all targets generated by all jobs.
This has a marginal impact on performance: it uses xxh as a checksum lib, it is both of excellent quality (though not crypto-robust) and bracingly fast.
This is not only an optimization but has consequences on the flow itself.
Here, we define a test suite as a list of named test scneari. When the test suite is modfiied, all scenari are rebuilt, but only those that have actually changed are rerun.
And this is common case: you often modify the test suite to change a few tests or add a few ones. It would be unacceptable to rerun all tests in such case and would require the flow to be organized in another way. Yet, defining test suites this way is very comfortable.

Execution

The first time you execute the flow, you need to execute all steps :

compile all source files to object files
link object files to the executable
run all tests

Once this has been done, what needs to be executed depends on what has been modified:

What has been modified	What needs to be re-executed	Notes
nothing	nothing
a .c (or .cc) source file	compile said source file, link, run all tests	if .o does not change, nothing is run any further
a .h (or.hh) include file	compile source files including said .h (or .hh), link, run all tests	if no .o change, nothing is run any further
a test suite file	run modified/new tests
a `cmd` in `Lmakefile.py`	run all jobs using corresponding rule and all depending jobs	depending jobs are only executed if file actually changes

Further notes

Use of the `critical` attribute

Some deps can be declared critical. This has no semantic impact but may be important performance wise. Open-lmake computes the actual deps of a job while it executes it while ideally, you would need to know them beforehand. So it considers the known deps (i.e. collected during the last run) as a good approximation of the ones that will be collected during the current run.

The general principle is:

Rebuild known deps.
Execute job.
If new deps appear, rebuild them.
If one of such new deps changes during this rebuild, rerun job.
Iterate until no new deps appear.

Suppose now that you have a dep that contains a list of some files that will become deps (such as the LST dep in the Link rule). If this list changes, it may very well suppress such a file (an object file in the Link case). This means that a file may be uselessly recompiled.

If a critical dep exists, the firt step of the general principle becomes:

rebuild known deps, except those located after a modified critical dep.

The Link rule would be better if written as:

class Link(Rule) :
	targets = { 'EXE' : '{File:.*}.exe'               }
	deps    = { 'LST' : ( '{File}.lst' , 'critical' ) }
	cmd = '''
		ldepend --read $(cat {LST})
		lcheck_deps
		gcc -o {EXE} $(cat {LST})
	'''

The drawback of using the critical attribute is that the job will more often be executed twice. While often such jobs are fast (as the TestSuiteReport), the link phase may be heavier and we would like to avoid executing it twice. The idea here is:

ldepend creates deps.
lcheck_deps checks that deps accumulated up to the calling point are up-to-date.
This guarantees that gcc will discover no new deps.

Use of a base class

Quite often, you want to define a vocabulary of stems. In our example, we have File and Test.

We may define a base class to contain these definitions:

class Base(Rule) :
	stems = {
		'File' : '.*'
	.	'Test' : '.*'
	}

Then all rules can inherit from Base instead of Rule and this vocabulary is defined.

Binary packages

Open-lmake binary packages are available on launchpad.net for the following systems :

ubuntu22.04 (jammy)
ubuntu24.04 (noble)

To install these, execute:

sudo add-apt-repository ppa:cdouady/open-lmake
sudo apt update
sudo apt install open-lmake

Compilation

Requirements

To compile open-lmake, you will need:

c++20
python3.6 or later with developer support (i.e. access to the libpython*.so file)
if using python2, it must be python2.7

The following compilers are known to compile open-lmake:

gcc version 11 and above
clang version 14 and above

The following compilers are known not to compile open-lmake:

gcc version 10 and below

It has been tested with the dockers listed in the docker directory.

Procedure

type make
- this builds all necessary files and some unit tests
- you must invoke git clean -fdx if you modified the Makefile or otherwise if you want a reliable build (make is not open-lmake)
- you may have to invoke git clean -fdx lmake_env* or even git clean -fdx after a git pull
  - lmake_env is a directory which builds lmake under lmake, for test purpose, no guarantee that the resulting package is funtional for now
  - lmake_env-cache is a directory containing cached results from jobs in lmake_env
  - they are not cleaned on purpose before running as this creates variability for testing lmake, but may fail
  - and generally speaking, make is not robust to past history, so a full 'git clean -fdx' may be necessary to get a reliable build
- you can type make LMAKE to just build all necessary files
- you can type make lmake.tar.gz.SUMMARY (built by default) to make a tar ball of the compiled files that you can easily deploy
install
- untar lmake.tar.gz wherever you want and have your $PATH point to the bin directory.
  - the bin sub-directory contains the executables meant to be executed by the user
  - the _bin sub-directory contains the executables necessary for open-lmake to run, but not meant to be directly invoked by the user
    - it also contains some executables to help debugging open-lmake itself.
  - the lib sub-directory contains binary and python files for use by the user
  - the _lib sub-directory contains the binary and python files necessary for open-lmake to run, but not meant for direct use by the user
  - the relative positions of these 4 directories must remain the same, i.e. they must stay in the same directory with the same names.
specialization
- you can specialize the build process to better suit your needs:
- this can be done by setting variables
  - for example, you can run: CXX=/my/g++ make
  - $PYTHON2 can be set to your preferred python2 (defaults to python2 as found in your $PATH). You will be told if it is not supported.
  - $PYTHON can be set to your preferred python3 (defaults to python3 as found in your $PATH). You will be told if it is not supported.
  - $CXX can be set to your preferred C++ compiler (defaults to g++ as found in your $PATH). You will be told if it is not supported.
  - $SLURM_ROOT can be set to the root directory of the slurm installation (by default, slurm/slurm.h will be searched in the standard include path). For example, slurm.h will be found as $SLURM_ROOT/include/slurm/slurm.h
  - $LMAKE_FLAGS can be defined as O[0123]g?d?t?S[AB]P?
    - O[0123] controls the -O option (default: 1 if profiling else 3 )
    - g controls the absence of -g option (default: debug )
    - d controls -DNDEBUG (default: asserts are enabled )
    - T controls -DTRACE (default: traces are disabled )
    - Sa controls the -fsantize=address -fsanitize=undefined options (exclusive with St )
    - St controls the -fsantize=thread option (exclusive with Sa )
    - P controls the -pg option (profiling info is in gmon.out..)
    - C controls the --coverage option (profiling info is in gmon.out..)
  - the -j flag of make is automatically set to the number of processors, you may want to override this, though
- this is true the first time you run make. After that, these values are remembered in the file sys_config.env.
- you can freely modify this file sys_config.env, though. It will be taken into account.
- it is up to you to provide a suitable $LD_LIBRARY_PATH value. it will be transferred as a default value for rules, to the extent it is necessary to provide the lmake semantic

Installation

Open-lmake does not require to be installed. You can run it directly from the build directory. This is the simplest way unless you seek a system-wide installation.

If running under Ubuntu and you have the necessary packages installed (that you can find by inspecting Makefile, the entry DEBIAN_DEPS), you can make a Debian package:

type make DEBIAN
the package is open-lmake_v25.02.7-1_<arch>.deb
you can install it with sudo apt install ./open-lmake_v25.02.7-1_<arch>.deb

Alternatively, you can untar lmake.tar.gz at any place.

Installing system-wide with Debian package will take care of placing both binaries and man pages in the standard directories. Alternatively, you can:

put /path/to/open-lmake/bin in your $PATH

This will simplify the user experience but is not required.

AppArmor

AppArmor is known to need tuning on Ubuntu24.04 only.

In order to implement namespace related features (the lmake_view, repo_view, tmp_view, views and chroot rule attributes), and if your system is configured with AppArmor, adequate rights must be provided.

In that case, do the following :

create a file /etc/apparmor.d/open-lmake with the following content :

	abi <abit/4.0>
	include <tunables/global>
	profile open-lmake /**/{_bin/job_exec,bin/lautodep} flags=(unconfined) {
		userns,
	}

activate it with sudo aa-apparamor_parser -r /etc/apparamor.d/open-lmake

Rationale

Because of its unique features (auto-dep, views, ...), open-lmake is necessarily tighly coupled to the underlying OS. In particular, it was not possible to write it solely on top of Posix.

There are no known road blocks to port it to other OSes (such as Darwin or Windows) although this would require an in depth knowledge of these systems.

The `lmake` module

For use in `Lmakefile.py`

`backends`

The tuple of implemented backends. 'local' is always present.

`autodeps`

The tuple of implemented autodep methods. 'ld_preload' is always present.

`repo_root`

The root dir of the (sub)-repo.

`top_repo_root`

The root dir of the top-level repo.

`version`

This variable holds the native version of open-lmake. It is a tuple formed with a str (the major version) and a int (the minor version).

Upon new releases of open-lmake, the major version is a tag of the form YY.MM providing the year and month of publication if it is not backward compatible. Else the minor version is increased if the interface is modified (i.e. new features are supported). Hence, the check is that major versions must match equal and the actual minor version must be at least the expected minor version.

`user_environ`

When reading Lmakefile.py, the environment is reset to a standard environment and this variable holds a copy of os.environ before it was reset.

This ensures that the environment cannot be used unless explicitly asked.

Variable values actually used in Lmakefile.py are considered as deps for this process and it is rerun if the actual environment is modified in subsequent lmake commands.

`class pdict`

This class is a dict in which attribute accesses are mapped to item accesses. It is very practical to handle configurations.

`multi_strip(txt)`

This function deindents txt as much as possible so as to ease printing code.

`check_version( major , minor=0 )`

This function is used to check that the expected version is compatible with the actual version.

This function must be called right after having imported the lmake module as in the future, it may adapt itself to the required version when this function is called. For example, some default values may be modified and if they are used before this function is called, a wrong (native) value may be provided instead of the correct (adjusted to required version) one.

For use in `cmd()`

`run_cc( *cmd_line , marker='...' , stdin=None )`

This functions ensures that all dirs listed in arguments such as -I or -L exist reliably.

marker is the name of a marker file which is created in include dirs to guarantee there existence.

stdin is the text ot send as the stdin of cmd_line.

`depend( *deps , follow_symlinks=False , verbose=False , read=False , critical=False , essential=False , ignore=False , ignore_error=False , readdir_ok=False , required=True , regexpr=False , no_star=True )`

Declare deps as parallel deps (i.e. no order exist between them).

If follow_symlinks, deps that are symbolic links are followed (and a dep is set on links themselves, independently of the passed flags that apply for the target the links).

Each dep is associated with an access pattern. Accesses are of 3 kinds, regular, link and stat:

Regular means that the file was accessed using C(open,2) or similar, i.e. the job is sensitive to the file content if it is a regular file, but not to the target in case it is a symbolic link.
Link means that the file was accessed using C(readlink,2) or similar, i.e. the job is sensitive to the target if it is a symbolic link, but not to the content in case it is a regular file.
Stat means that the file meta-data were accessed, i.e. the job is sensitive to file existence and type, but not to the content or its target.

If a file have none of these accesses, changing it will not trigger a rebuild, but it is still a dep as in case it is in error, this will prevent the job from being run. Making such dinstinctions is most useful for the automatic processing of symbolic links. For example, if file a/b is opened for reading, and it turns out to be a symbolic link to c, open-lmake will set a dep to a/b as a link, and to a/c as a link (in case it is itself a link) and regular (as it is opened).

By default, passed deps are associated with no access, but are required to be buildable and produced without error unless readdir_ok is true. To simulate a plain access, you need to pass read=True to associate accesses and required=False to allow it not to exist.

If verbose, return a dict with one entry par dep where:

The key is the dep name.
The value is a dict composed of:
- ok: True if the dep is built with no error, False if the dep is built in error, None if the was not built.
- checksum: The checksum computed after the dep (unless ok is None) (cf. xxhsum(1)).
- rule: The rule name of the job that has generated the dep one exists.
- special: The special nature (e.g. src).
- stems: A dict mapping stem names to stem values if the rule is a plain rule.

If read, report an actual read of deps. Default is just to alter associated flags.

If regexpr, pass flags to all deps matching deps interpreted as regexprs, much like the side_deps rule attribute. However, the ignore flag only applies to deps following this call.

For critical, essential, ignore, ignore_error, readdir_ok, required and no_star, set the corresponding flag on all deps:

If critical, create critical deps (cf. note (5)).
If essential, passed deps will appear in the flow shown with a graphical tool.
If ignore_error, ignore the error status of the passed deps.
If readdir_ok, readdir (3) can be called on passed deps without error even if not ignore'ed nor incremental.
If not required, accept that deps be not buildable, as for a normal read access (in such a case, the read may fail, but open-lmake is ok).
If not no_star, accept regexpr based flags (e.g. calls to this function with regexpr=True).
If ignore, deps are ignored altogether, even if further accessed (but previous accesses are kept).

Flags accumulate and are never reset.

Notes:

(1): The same functionality is provided with the ldepend executable.
(2): Flags can be associated to deps on a regexpr (matching on dep name) basis by using the side_deps rule attribute.
(3): If cat a b is executed, open-lmake sees 2 open system calls, to a then to b, exactly the same sequence that if one did cat $(cat a) and a contained b.
Suppose now that b is an error. This is a reason for your job to be in error. But if a is modifed, in the former case, this cannot solve the error while in the latter case, it may if the new content of a points to a file that may successfully be built. Because open-lmake cannot distinguish between the 2 cases, upon a modification of a, the job will be rerun in the hope that b is not accessed any more. Parallel deps prevents this trial.
(4): If a series of files are read in a loop and the loop is written in such a way as to stop on the first error and if the series of file does not depend on the actual content of said files, then it is preferable to pre-access (using B(ldepend)) all files before starting the loop. The reason is that without this precaution, deps will be discovered one by one and may be built serially instead of all of them in parallel.
(5): If a series of dep is directly derived from the content of a file, it may be wise to declare it as critical. When a critical dep is modified, open-lmake forgets about deps reported after it.
Usually, when a file is modified, this has no influence on the list of files that are accessed after it, and open-lmake anticipates this by building these deps speculatively. But in some situations, it is almost certain that there will be an influence and it is preferable not to anticipate. this is what critical deps are made for: in case of modifications, following deps are not built speculatively.

`target( *targets , write=False , regexpr=False , allow=True , essential=False , ignore=False , incremental=False , no_warning=False , source_ok=False , critical=False , ignore_error=False , readdir_ok=False , required=False , no_star=True )`

Declare targets as targets and alter associated flags. Note that the allow argument default value is True.

Also, calling this function does not make targets official targets of the job, i.e. targets are side targets. The official job of a target is the one selected if needing its content, it must be known before any job is run.

If write, report that targets were written to.

If regexpr, pass flags to all targets matching targets interpreted as regexprs, much like the side_targets rule attribute. However, the ignore flag only applies to targets following this call.

For allow, essential, ignore, incremental, no_warning, source_ok and no_star, set the corresponding flag on all targets:

If essential, show when generating user oriented graphs.
If incremental, targets are not unlinked before job execution and read accesses to them are ignored.
If no_warning, no warning is emitted if targets are either uniquified or unlinked while generated by another job.
If ignore, from now on, ignore all reads and writes to targets.
If not allow, do not make targets valid targets.
If not no_star, accept regexpr based flags (e.g. calls to this function with regexpr=True).
If source_ok, accept that targets be sources. Else, writing to a source is an error.

In case passed targets turn out to be deps, the deps flags are also available: critical, ignore_error, readdir_ok and required:

If critical, create critical deps (cf. note (5) of depend).
If ignore_error, ignore the error status of the passed deps.
If readdir_ok, readdir (3) can be called on passed deps without error even if not ignore'ed nor incremental.
If not required, accept that deps be not buildable, as for a normal read access (in such a case, the read may fail, but open-lmake is ok).

Flags accumulate and are never reset.

`check_deps(sync=False)`

Ensure that all previously seen deps are up-to-date. Job will be killed in case some deps are not up-to-date.

If sync, wait for server reply. Return value is False if at least a dep is in error. This is necessary, even without checking return value, to ensure that after this call, the dirs of previous deps actually exist if such deps are not read (such as with lmake.depend).

CAVEAT

If used in conjonction with the kill_sigs attribute with a handler to manage the listed signal(s) (typically by calling signal.signal(...) and without sync=True, and if a process is launched shortly after (typically by calling subprocess.run or os.system), it may be that said process does not see the signal. This is due to a race condition in I(python) when said process is just starting.

This may be annoying if said process was supposed to do some clean up or if it is very long. The solution in this case is to pass sync=True. This has a small cost in the general case where deps are actually up-to-date, but provides a reliable way to kill the job as check_deps will still be running when the signal fires up.

`get_autodep()`

Returns whether autodep is currently active or not.

By default, autodep is active.

`set_autodep(active)`

Set the state of autodep.

`class Autodep`

A context manager that encapsulates set_autodep.

with Autodep(active) :
	<some code>

executes <some code> with autodep active set as instructed.

`encode( file , ctx , val , min_length=1 )`

If a code is associated to val within file file and context ctx, return it. Else a code is created, of length at least min_length, is associated to val and is return. Cf. encode/decode.

file must be a source file.

`decode( file , ctx , code )`

If a val is associated to code within file file and context ctx, return it. Else an exception is raised. Cf. encode/decode.

file must be a source file.

Associations are usually created using encode but not necessarily (they can be created by hand).

`xxhsum_file(file)`

Return a checksum of provided file.

The checksum is :

none if file does not exist, is a dir or a special file
empty-R if file is empty
xxxxxxxxxxxxxxxx-R (where x is a hexa digit) if file is regular and non-empty
xxxxxxxxxxxxxxxx-L if file is a symbolic link

Note : this checksum is not crypto-robust.

Cf. xxhsum(1) for a description of the algorithm.

`xxhsum(text,is_link=False)`

Return a checksum of provided text.

It is a 16-digit hex value with no suffix.

Note : the empty string lead to 0000000000000000 so as to be easily recognizable.

Note : this checksum is not the same as the checksum of a file with same content.

Note : this checksum is not crypto-robust.

Cf. xxhsum(1) for a description of the algorithm.

`report_import(module_name=None,path=None,module_suffixes=None)`

Does necessary reporting when a module has been imported.

module_name is the name of the imported module. This function only handles the last level import, so it must be called at each level in case the module lies in a package.

If not provided or empty, only does the reporting due to the path, but assumes no module is accessed.

The way such reporting is done is by reporting a dep for each local dir in the path, for each module suffix, until the module is found, locally or externally.

path is the path which the imported module is searched in.

If not provided or empty, it defaults to the current value of sys.path.

Unless used with python2, this function allows readdir accesses to local dirs in the path as python does such a readdir.

module_suffixes is the list of suffixes to try that may provide a module, e.g. ('.py','/__init__.py').

If not provided, the default is:

for python2: ('.so','.py','/__init__.so','/__init__.py')
for python3: ( i+s for s in importlib.machinery.all_suffixes() for i in ('','/__init__'))

Note:

It may be wise to specify only the suffixes actually used locally to reduce the number of deps.
External modules are searched with the standard suffixes, even if module_suffixes is provided as there is no reason for the external modules to adhere to local conventions.

The `lmake.sources` module

`manifest_sources(manifest='Manifest')`

This function returns the list of sources found in the manifest file, one per line. Comments are supported as everything following a # itself at start of line or preceded by a space. Leading and trailing white spaces are stripped after comment removal.

`git_sources( recurse=True , ignore_missing_submodules=False )`

This function lists files under git control, recursing to sub-modules if recurse is true and ignore missing such sub-modules if ignore_missing_submodules is true.

The git repo can be an enclosing dir of the open-lmake repo. In that case, sources are adequately set to track git updates.

`auto_sources(**kwds)`

This function tries to find sources by calling manifest_sources and git_sources in turn, untill one succeeds. Arguments are passed as pertinent.

In absence of source declaration, this function is called with no argument to determine the sources.

The `lmake.rules` module

Base rules

`class Rule`

Base class for plain rules.

A class becomes a rule when:

it inherits, directly or indirectly, from Rule
it has a target or targets attribute
it has a cmd attribute

`class AntiRule`

Base class for anti-rules.

A class becomes an anti-rule when:

it inherits, directly or indirectly, from AntiRule
it has a target or targets attribute

An anti-rule has no cmd nor deps.

It applies to a file as soon as it matches one of the targets. In that case, the file is deemed unbuildable.

`class SourceRule`

Base class for source-rules.

A class becomes a source-rule when:

it inherits, directly or indirectly, from SourceRule
it has a target or targets attribute

A source-rule has no cmd nor deps.

It applies to a file as soon as it matches one of the targets. In that case, the file is deemed to be a source.

If such a file is required and does not exist, it is an error condition.

Helper rules

`class Py2Rule(Rule)`, `class Py3Rule(Rule)` and `class PyRule(Rule)`

These classes may be used as base class for rules that execute python code doing imports.

It manages .pyc files. Also, it provides deps to module source files although python may optimize such accesses and miss deps on dynamically generated modules.

If cmd is not a function, and python is called, this last feature is provided if lmake.import_machinery.fix_import is called.

Py2Rule is used for python2, Py3Rule is used for python3. PyRule is an alias for Py3Rule.

`class RustRule(Rule)`

This class may be used as a base class to execute executable written in rust.

Rust uses a special link mechanism which defeats the default ld_audit autodep mechanism. This base class merely sets the autodep method to ld_preload which works around this problem.

`class HomelessRule(Rule)`

This class sets $HOME to $TMPDIR. This is a way to ensure that various tools behave the same way as if they were run for the first time. By default $HOME points to the root of the repo, which permits to put various init code there.

`class TraceRule(Rule)`

This class sets the -x flag for shell rules and manage so that traces are sent to stdout rather than stderr.

This allow to suppress the common idiom:

echo complicated_command
complicated_command

`class DirtyRule(Rule)`

This class may be used to ignore all writes that are not an official target.

By itself, it is a dangerous class and must be used with care. It is meant to be a practical way to do trials without having to work out all the details, but in a finalized workflow, it is better to avoid the usage of this class.

The `lmake.import_machinery` module

`fix_import()`

This should be called before importing any module that may be dynamically generated.

It updates the import mechanism to ensure that a dep is set to the source file when importing a module, even if such source file does not exist (yet). Without fix, when a statement import foo is executed, although foo.py is read if it exists, python does not attempt to access it if it does not exist. This is embarassing if foo.py is dynamically produced as initially, it does not exist and if no attempt is made to access it, there will be no dep on it and it will not be built.

`module_suffixes`

This variable holds the list of suffixes used to generate deps when importing a module and lmake.import_machinery.fix_import() has been called.

It is better to reduce this list to what is really needed in you flow, i.e. the list of suffixes used for generated python modules (modules that are sources are not concerned by this list and deps to them will be accurate in all cases). Reducing this list avoids useless deps.

The default value is the full standard list, e.g. ('/__init__.cpython-312-x86_64-linux-gnu.so','/__init__.abi3.so','/__init__.so','/__init__.py','.cpython-312-x86_64-linux-gnu.so','.abi3.so','.so','.py') for python3.12 running on Linux with a x86_64 processor architecture.

This standard list is constructed as follows:

Call importlib.machinery.all_suffixes().
Reorder to put suffixes ending with .so before those ending in .py to match actual python try order.
Then use 2 copies of this list, the first one being prefixed with /__init__ for each entry (again, python gives priority to packages over leaf modules).

Reasonable values could be:

('/__init__.so','/__init__.py','.so','.py') if compiled packages and modules are used locally.
('/__init__.py','.py') if no modules nor packages are compiled.
('.py',) if no packages is used locally nor modules compiled.

Note that the management of .pyc files is independent.

Writing `Lmakefile.py`

Lmakefile.py contains 3 sections:

config, (some global information)
sources, (the list of sources)
rules, (the list of rules)

When reading Lmakefile.py, open-lmake:

imports Lmakefile
for each section (config, sources, rules):
- if there is a callable with this name, call it
- if there is a sub-module with this name, import it

The advantage of declaring a function or a sub-module for each section is that in case something is modified, only the impacted section is re-read.

The config

The config is determined by setting the variable lmake.config. Because it is predefined with all default values, it is simpler to only define fields. A typical Lmakefile.py will then contain lines such as:

lmake.config.path_max = 500 # default is 400

lib/lmake/config.py can be used as a handy helper as it contains all the fields with a short comment.

The sources

The sources are determined by setting the variable lmake.manifest.

Sources are files that are deemed as intrinsic. They cannot be derived using rules as explained in the following section.

Also, if a file cannot be derived and is not a source, it is deemed unbuildable, even if it actually exists. In this latter case, it will be considered dangling and this is an error condition. The purpose of this restriction is to ensure repeatability : all buildable files can be (possibly indirectly) derived from sources using rules.

lmake.manifest can contain :

Files located in the repo
Dirs (ending with /), in which case:
- The whole subtree underneath the dir are considered sources.
- They may be inside the repo or outside, but cannot contain or lie within system dirs such as /usr, /proc, /etc, etc.
- If outside, they can be relative or absolute.

In both cases, names must be canonical, i.e. contain no empty component nor ., nor .. except initially for relative names outside repo.

The helper functions defined in lib/lmake/sources.py can be used and if nothing is said, auto_sources() is called.

The rules

Rules are described as python class'es inheriting from lmake.Rule, lmake.AntiRule or lmake.SourceRule.

Such classes are either defined directly in Lmakefile.py or you can define a callable or a sub-module called rules that does the same thing when called/imported. For example you can define :

def rules() :
	class MyRule(lmake.Rule) :
		target = 'my_target'
		cmd    = ''

Or the sub-module Lmakefile.rules containing such class definitions.

Inheriting from lmake.Rule is used to define production rules that allows deriving targets from deps.

Inheriting from lmake.AntiRule is (rarely) used to define rules that specify that matching targets cannot be built. Anti-rules only require the targets attribute (or those that translate into it, target) and may usefully have a prio attribute. Other ones are useless and ignored.

Inheriting from lmake.SourceRule may be used to define sources by patterns rather than as a list of files controlled by some sort of source-control (typically git).

Special rules

In addition to user rules defined as described hereinafter, there are a few special rules:

Uphill: Any file depends on its dir in a special way : if its dir is buildable, then the file is not. This is logical : if file foo is buildable (i.e. produced as a regular file or a symbolic link), there is not way file foo/bar can be built. If foo is produced as a regular file, this is the end of the story. If it is produced as a symbolic link (say with foo_real as target), the dependent job will be rerun and it will then depend on foo and foo_real/bar when it opens foo/bar. Note that if the dir applies as the star-target of a rule, then the corresponding job must be run to determine if said dir is, indeed, produced.
Infinite: If walking the deps leads to infinite recursion, when the depth reaches lmake.config.max_dep_depth, this special rule is triggered which generates an error. Also, if a file whose name is longer that lmake.config.path_max considered, it is deemed to be generated by this rule and it is in error. This typically happens if you have a rule that, for example builds {File} from {File}.x. If you try to build foo, open-lmake will try to build foo.x, which needs foo.x.x, which needs foo.x.x.x etc.

Dynamic values

Most attributes can either be data of the described type or a function taking no argument returning the desired value. This allows the value to be dynamically selected depending on the job.

Such functions are evaluated in an environment in which the stems (as well as the stems variable which is a dict containing the stems and the targets (as well as the targets variable) are defined and usable to derive the return value. Also, depending on the attribute, the deps (as well as the deps variable) and the resources (as well as the resources variable) may also be defined. Whether or not these are available depend on when a given attribute is needed. For example, when defining the deps, the deps are obviously not available.

For composite values (dictionaries or sequences), the entire value may be a function or each value can individually be a function (but not the keys). For dictionaries, if the value function returns None, there will be no corresponding entry in the resulting dictionary.

Note that regarding resources available in the function environment, the values are the ones instantiated by the backend.

Inheritance

python's native inheritance mechanism is not ideal to describe a rule as one would like to prepare a base class such as:

provide environment variables
provide some default actions for some files with given pattern
provide some automatic deps
...

As these are described with dict, you would like to inherit dict entries from the base class and not only the dict as a whole. A possibility would have been to use the __prepare__ method of a meta-class to pre-define inherited values of such attributes, but that would defeat the practical possibility to use multiple inheritance by suppressing the diamond rule.

The chosen method has been designed to walk through the MRO at class creation time and:

Define a set of attributes to be handled through combination. This set is defined by the attribute combine, itself being handled by combination.
Combined attribute are handled by updating/appending rather than replacing when walking through MRO in reverse order.
Entries with a value None are suppressed as update never suppress a given entry. Similarly, values inserted in a set prefixed with a '-' remove the corresponding value from the set.

Because this mechanism walks through the MRO, the diamond rule is enforced.

dict's and list's are ordered so that the most specific information appear first, as if classes are searched in MRO.

Combined attributes may only be dict, set and list:

dict's and set's are updated, list's are appended.
dict's and list's are ordered in MRO, base classes being after derived classes.

paths

Some environment variables contain paths, such as $PATH.

When such an entry appears in a rule, its value is searched for occurrences of the special marker ... surrounded by separators (the start and end of the strings are deemed to be separators) And each such occurrence is replaced by the inherited value.

This makes it particularly useful to manage paths as it allows any intermediate base class to add its own entries, before or after the original ones.

For example, to add the dir /mypath after the inherited path, one would define the attribute environ as {'PATH':'...:/mypath'}. To add it before, one would use {'PATH':'/mypath:...'}.

Entries going through this step are provided by the attribute paths, which is a dict with . as keys and as values. The default value is { 'environ.PATH':':' , 'environ.LD_LIBRARY_PATH':':' , 'environ.MANPATH':':' , 'environ.PYTHONPATH':':' }

Config fields

Depending on when each field can be modified, they are said:

Clean : requires a fresh repo to change this value
Static : requires that no lmake is running to change this value
Dynamic : field can be changed any time.

The default value is mentioned in ().

`colors` : Dynamic (reasonably readable)

Open-lmake generate colorized output if it is connected to a terminal (and if it understands the color escape sequences) (cf. video-mode).

This attribute is a pdict with one entry for each symbolic color. Each entry is a 2-tuple of 3-tuple's. The first 3-tuple provides the color in normal video mode (black/white) and the second one the color in reverse video (white/black). Each color is a triplet RGB of values between 0 and 255.

`disk_date_precision` : Static (`0.010`)

This attribute instruct open-lmake to take some margin (expressed in seconds) when it must rely on file dates to decide about event orders. It must account for file date granularity (generally a few ms) and date discrepancy between executing hosts and disk servers (generally a few ms when using NTP).

If too low, there are risks that open-lmake consider that data read by a job are up to date while they have been modified shortly after.
If too high, there is a small impact on performance as open-lmake will consider out of date data that are actually up to date.

The default value should be safe in usual cases and user should hardly need to modify it.

`file_sync` : Static (`'dir'` if non-local backends are used, else `None`)

This attribute specifies how to ensure file synchronization when a file is produced by a host and read by another one.

Possible values are:

'none' or None: the filesystem is deemed reliable an no further protection is needed.
'dir': the enclosing dir (and recursively up-hill) of a file is closed after any write (or creation or removal) and open before any read.
'sync': fsync is called on file after any write.

Recommanded values for known file systems:

File system	`file_sync`
NFS	`'dir'`
CEPH	`None`

The expected performance impact is increasing in this order : None, 'dir', 'sync'. The expected reliability order is the reverse one.

`debug` : Static

When ldebug is used, it consults this dict.

It maps debug keys to modules to import to implement the debug method (cf. ldebug(1)). Values contain the module name optionnaly followed by a human description (that will appear with ldebug -h) separated with spaces.

`heartbeat` : Static (`10`)

Open-lmake has a heartbeat mechanism to ensure a job does not suddenly disappear (for example if killed by the user, or if a remote host reboots). If such an event occurs, the job will be restarted automatically.

This attribute specifies the minimum time between 2 successive checks for a given job. If None (discouraged), the heartbeat mechanism is disabled.

The default value should suit the needs of most users.

If too low, build performance will decrease as heartbeat will take significative resources
If too high, reactivity in case of job loss will decrease

`heartbeat_tick` : Static (`0.1`)

This attribute specifies the minnimum time between 2 successive checks globally for all jobs. If None (discouraged), it is equivalent to 0.

The default value should suit the needs of most users.

If too low, build performance will decrease as heartbeat will take significative resources
If too high, reactivity in case of job loss will decrease

`local_admin_dir` : Clean (-)

This variable contains a dir to be used for open-lmake administration in addition to the LMAKE dir.

It is guaranteed that all such accesses are performed by the host, hence a dir in a locally mounted disk is fine.

If unset, administration by user is simplified (no need to manage an external dir), but there may be a performance impact as network file systems are generally slower than local ones.
If set to a local dir, user has to ensure that lmake and other commands are always launched from the host that has this locaol file system.
If set to network dir, there is no performance gain and only added complexity.

`link_support` : Clean (`'full'`)

Open-lmake fully handle symbolic links (cf. data model).

However, there is an associated cost which may be useless in some situations.

Value	Support level
`'full'`	symbolic links are fully supported
`'file'`	symbolic links are only supported if pointing to files
`'none'`	symbolic links are not supported

`max_dep_depth`: Static (`100`)

The rule selection process is a recursive one. It is subject to infinite recursion and several means are provided to avoid it.

The search stops if the depth of the search reaches the value of this attribute, leading to the selection of a special internal rule called infinite.

If too low, some legal targets may be considered infinite.
If too high, the error message in case of infinite recursion will be more verbose.

`max_error_lines` : Dynamic (`100`)

When a lot of error lines are generated by open-lmake, other than copying the stderr of a job, only the first max_error_lines ones are actually output, followed by a line containing ... if some lines have been suppressed. The purpose is to ease reading.

`network_delay` : Static (`1`)

This attribute provides an approximate upper bound of the time it takes for an event to travel from a host to another.

If too low, there may be spurious lost jobs.
If too high, there may be a loss of reactivity.

The default value should fit most cases.

`nice` : Dynamic (`0`)

This attribute provides the nice value to apply to all jobs. It is a value between 0 and 20 that decreases the priority of jobs (cf. nice(2)).

If available, the autogroup mecanism (cf. sched(7)) is used instead as jobs are launched as sessions.

Note that negative nice values are not supported as these require privileges.

`path_max` : Static (`200`)

The rule selection process is a recursive one. It is subject to infinite recursion and several means are provided to avoid it.

The search stops if any file with a name longer than the value of this attribute, leading to the selection of a special internal rule called infinite.

`sub_repos` : Static (`()`)

This attribute provide the list of sub-repos.

Sub repos are sub-dirs of the repo that are themselves repos, i.e. they have a Lmakefile.py. Inside such sub-repos, the applied flow is the one described in it (cf. Subrepos).

`backends` : Dynamic

This attribute is a pdict with one entry for each active backend (cf. backends).

Each entry is a pdict providing resources. Such resources are backend specific.

`backends.*.interface` : Dynamic (best guess)

When jobs are launched remotely, they must connect to open-lmake when they start and when they complete. The same is true if the job is launche locally but it launches sub-commands remotely (in this case it is the command that needs to connect to the job trampoline). This is done by connecting to a socket open-lmake has opened for listening, which requires that we must have a means to determine an IP address to connect to. The host running open-lmake may have several network interfaces, one of them (typically only one) being usable by such remote hosts. There is no generic way to determine this address, so in general, open-lmake cannot determine it automatically.

This value may be empty (using hostname for addresse look up), given in standard dot notation, as the name of an interface (as shown by ifconfig) or the name of a host (looked up as for ping). In case of ambiguity, local backend will use the loop-back address, remote backends will generate an error message showing the possible choices.

`backends.*.environ` : Dynamic (`{}`)

Environment to pass when launching job in backend. This environment is accessed when the value mentioned in the rule is ....

`backends.local.cpu` : Dynamic (number of phyical CPU's)

This is a normal resource that rules can require (by default, rule require 1 cpu).

`backends.local.mem` : Dynamic (size of physical memory in MB)

This is the pysical memory necessary for jobs. It can be specified as a number or a string representing a number followed by a standard suffix such as k, M or G. Internally, the granularity is forced to MB.

`backends.local.tmp` : Dynamic (`0`)

This is the disk size in the temporary dir necessary for jobs. It can be specified as a number or a string representing a number followed by a standard suffix such as k, M or G. Internally, the granularity is forced to MB.

`caches` : Static

This attribute is a pdict with one entry for each cache.

Caches are named with an arbitrary str and are referenced in rules using this name.

By default, no cache is configured, but an example can be found in lib/lmake/config.py, commented out.

`caches.*.tag` : Static (-)

This attribute specifies the method used by open-lmake to cache values. In the current version, only 2 tags may be used:

none is a fake cache that cache nothing.
dir is a cache working without daemon, data are stored in a dir.

`caches.<dir>.dir` : Static

This attribute specifies the dir in which the cache puts its data.

The dir must pre-exist and contain a file LMAKE/size containing the size the cache may occupy on disk. The size may be suffixed by a unit suffix (k, M, G, T, P or E). These refer to base 1024.

Also, an adequate default ACL (cf. acl(5)) must most probably be set for this dir to give adequate permissions to files created in it. Typically, the command setfacl -m d:g::rw,d:o::r CACHE can be used to set up the dir CACHE.

`caches.<dir>.file_sync` : Static (`'dir'`)

Same meaning as config.file_sync for accesses in the cache.

`caches.<dir>.group` : Static (default group of user)

This attribute specifies the group used when creating entries.

`caches.<dir>.key` : Static (repo root dir/git sha1)

A key used to avoid cache pollution. No more than a single entry can be stored for any job with a given key.

By default, it is made after the absolute root dir of the repo and the current git sha1 if repo is controlled by git.

`collect` : Dynamic

This attributes specifies files and dirs to be ignored (and hence kept) when lcollect is run. Files are specified as in rule targets : with stems and patterns given with a syntax similar to python f-strings.

By default, no files are ignored when lcollect is run.

`collect.stems` : Dynamic

This attributes provides a dict as the stems Rule attribute.

`collect.ignore` : Dynamic

This attributes provides a dict as the targets Rule attribute. However, contrarily to Rules, several targets can be provided as a list/tuple for each key, and no flags can be passed in.

`console` : Dynamic

This is a sub-configuration for all attributes pertaining to the console output of lmake.

`console.date_precision` : Dynamic (`None`)

This attribute specifies the precision (as the number of digit after the second field, for example 3 means we see milli-seconds) with which timestamps are generated on the console output. If None, no timestamp is generated.

`console.has_exec_time` : Dynamic (`True`)

If this attribute is true, execution time is reported each time a job is completed.

`console.history_days` : Dynamic (`7`)

This attribute specifies the number of days the output log history is kept in the LMAKE/outputs dir.

`console.host_len` : Dynamic (`None`)

This attribute specifies the width of the field showing the host that executed or is about to execute the job. If None, the host is not shown. Note that no host is shown for local execution.

`console.show_eta` : Dynamic (`False`)

If this attribute is true, the title shows the ETA of the command, in addition to statistics about number of jobs.

`console.show_ete` : Dynamic (`True`)

If this attribute is true, the title shows the ETE of the command, in addition to statistics about number of jobs.

`trace` : Dynamic

This is a sub-configuration for all attributes pertaining to the optional tracing facility of open-lmake.

For tracing to be active, it must be compiled in (cf. INSTALLATION), which is off by default as performances can be severly degraded.

`trace.size` : Static (`100_000_000`)

While open-lmake runs, it may generate an execution trace recording a lot of internal events meant for debugging purpose.

The trace is handled as a ring buffer, storing only the last events when the size overflows. The larger the trace, the more probable the root cause of a potential problem is still recorded, but the more space it takes on disk.

This attributes contains the maximum size this trace can hold (open-lmake keeps the 5 last traces in case the root cause lies in a previous run).

`trace.n_jobs` : Static (`1000`)

While open-lmake runs, it generates execution traces for all jobs.

This attributes contains the overall number of such traces that are kept.

`trace.channels` : Static (all)

The execution trace @lmake generates is split into channels to better control what to trace.

This attributes contains a list or tuple of the channels to trace.

Rule attributes

Each attribute is characterized by a few flags:

How inheritance is handled:
- None: (ignore values from base classes)
- python: (normal python processing)
- Combine: (Combine inherited values and currently defined one).
The type.
The default value.
Whether it can be defined dynamically from job to job:
- No
- Simple: globals include module globals, user attributes, stems and targets, no file access allowed.
- Full: globals include module globals, user attributes, stems, targets, deps and resources, file accesses become deps.

When targets are allowed in dynamic values, the targets variable is also defined as the dict of the targets. Also, if target was used to redirect stdout, the target variable contains said file name.

Similarly, when deps are allowed in dynamic values, the deps variable is also defined as the dict of the deps. Also, if dep was used to redirect stdin, the dep variable contains said file name.

When a type is mentioned as f-str, it means that although written as plain str, they are dynamically interpreted as python f-strings, as for dynamic values. This is actually a form of dynamic value.

Dynamic attribute execution

If the value of an attribute (other than cmd) is dynamic, it is interpreted within open-lmake, not as a separate process as for the execution of cmd. This means:

Such executions are not parallelized, this has a performance impact.
They are executed within a single python interpreter, this implies restrictions.

Overall, these functions must be kept as simple and fast as possible, in favor of cmd which is meant to carry out heavy computations.

The restrictions are the following:

The following system (or libc) calls are forbidden (trying to execute any of these results in an error):
- changing dir (chdir and the like)
- spawning processes (fork and the like)
- exec (execve and the like)
- modifying the disk (open for writing and the like)
The environment variables cannot be tailored as is the case with cmd execution (there is no environ attribute as there is for cmd).
Modifying the environment variables (via setenv and the like) is forbidden (trying to execute any of these results in an error).
Altering imported modules is forbidden (e.g. it is forbidden to write to sys.path).
- Unfortunately, this is not checked.
- sys.path is made tuple though, so that common calls such as sys.path.append will generate an error.
sys.path is sampled after having read Lmakefile.py (while reading rules) and local dirs are filtered out. There are no means to import local modules.
However, reading local files is ok, as long as sys.modules is not updated.
There is no containers, as for cmd execution (e.g. no repo_view).
execution is performed in the top-level root dir.
- This means that to be used as a sub-repo, all local file accesses must be performed with the sub-repo prefix.
- This prefix can be found in lmake.sub_repo, which contains . for the top-level repo.

Attributes

`name`

Inheritance	Type	Default	Dynamic	Example
None	`str`	`cls.__name__`	No

This attribute specify a name for the rule. This name is used each time open-lmake needs to mention the rule in a message.

All rules must have a unique name. Usually, the default value is fine, but you may need to set a name, for example:

for ext in ('c','cc'):
	class Compile(Rule):
		name    = f'compile {ext}'
		targets = { 'OBJ' : '{File:.*}.o' }
		deps    = { 'SRC' : f'{{File}}.{ext}' }
		cmd     = 'gcc -c -o {OBJ} {SRC}'

`virtual`

Inheritance	Type	Default	Dynamic	Example
None	`bool`	`False`	No	`True`

When this attribute is true, this class is not a rule even if it has the required target & cmd attributes. In that case, it is only used as a base class to define other rules.

`prio`

Inheritance	Type	Default	Dynamic	Example
python	`float`	`0` or `+inf`	No	`1`

Default value is 0 if inheriting from lmake.Rule, else +inf.

This attribute is used to order matching priority. Rules with higher priorities are tried first and if none of them are applicable, rules with lower priorities are then tried (cf. rule selection).

`stems`

Inheritance	Type	Default	Dynamic	Example
Combined	`dict`	`{}`	No	`{'File':r'.*'}`

Stems are regular expressions that represent the variable parts of targets which rules match.

Each entry : define a stem named whose associated regular expression is .

`job_name`

Inheritance	Type	Default	Dynamic	Example
python	`str`	...	No

Default is the first target of the most derived class in the inheritance hierarchy (i.e. the MRO) having a matching target.

This attribute may exceptionally be used for cosmetic purpose. Its syntax is the same as target name (i.e. a target with no option).

When open-lmake needs to include a job in a report, it will use this attribute. If it contains star stems, they will be replaced by *'s in the report.

If defined, this attribute must have the same set of static stems (i.e. stems that do not contain *) as any matching target.

`targets`

Inheritance	Type	Default	Dynamic	Example
Combined	`dict`	`{}`	No	`{ 'OBJ' : '{File}.o' }`

This attribute is used to define the regular expression which targets must match to select this rule (cf. rule selection).

Keys must be python identifiers. Values are list's or tuple's whose first item defines the target regular expression and following items define flags. They may also be a simple str in which case it is as if there were no associated flags.

The regular expression looks like python f-strings. The fixed parts (outside {}) must match exactly. The variable parts, called stems, are composed of:

An optional name. If it exists, it is used to ensure coherence with other targets and the job_name attribute, else coherence is ensured by position. This name is used to find its definition in the stems dict and may also be used in the cmd attribute to refer to the actual content of the corresponding part in the target.
An optional *. If it exists, this target is a star target, meaning that a single job will generate all or some of the targets matching this regular expression. if not named, such stem must be defined.
An optional : followed by a definition (a regular expression). This is an alternative to refering to an entry in the stems dict. Overall, all stems must be defined somewhere (in the stems dict, in a target or in job_name) and if defined several times, definitions must be identical. Also, when defined in a target, a definition must contain balanced {}'s, i.e. there must be as many { as }. If a regular expression requires unbalanced {}, it must be put in a stems entry.

Regular expressions are used with the DOTALL flag, i.e. a . matches any character, including \n.

The flags may be any combination of the following flags, optionally preceded by - to turn it off. Flags may be arbitrarily nested into sub-list's or sub-tuple's.

CamelCase	snake_case	Default	Description
`Essential`	`essential`	Yes	This target will be shown in a future graphic tool to show the workflow, it has no algorithmic effect.
`Incremental`	`incremental`	No	Previous content may be used to produce these targets. In that case, these are not unlinked before execution. However, if targets have non-targets hard links and are not read-only, they are uniquified, i.e. they are copied in place to ensure modification to such targets do not alter other links.
`Optional`	`optional`	No	If this target is not generated, it is not deemed to be produced by the job. Open-lmake will try to find an alternative rule. This is equivalent to being a star target, except that there is no star stem.
`Phony`	`phony`	No	Accept that this target is not generated, this target is deemed generated even not physically on disk. If a star target, do not search for an alternative rule to produce the file.
`SourceOk`	`source_ok`	No	Do not generate an error if target is actually a source
`NoStar`	`no_star`	Yes	Accept regexpr-based flags (e.g. from star `side_deps` or `side_targets`)
`NoWarning`	`no_warning`	No	Warning is not reported if a target is either uniquified or unlinked before job execution while generated by another job.
`Top`	`top`	No	Target pattern is interpreted relative to the root dir of the repo, else it is relative to the `cwd` of the rule.

All targets must have the same set of static stems (i.e. stems with no * in its name).

Matching is done by first trying to match static targets (i.e. which are not star) then star targets. The first match will provide the associated stem definitions and flags.

Unless the top flag is set, the pattern is rooted to the sub-repo if the rule is defined in such a sub-repo. If the top flag is set, the pattern is always rooted at the top-level repo.

`target`

Inheritance	Type	Default	Dynamic	Example
python	`str` or `list` or `tuple`	-	No

This attribute defines an unnamed target. Its syntax is the same as any target entry except that it may not be incremental. Also, such a target may not be a star target.

During execution, cmd stdout will be redirected to this (necessarily unique since it cannot be a star) target.

The top flag cannot be used and the pattern is always rooted to the sub-repo if the rule is defined in such a sub-repo.

`side_targets`

Inheritance	Type	Default	Dynamic	Example
Combined	`dict`	`{}`	No

This attribute is identical to targets except that:

targets listed here do not trigger job execution, i.e. they do not participate to the rule selection process.
it not compulsery to use all static stems as this constraint is only necessary to fully define a job when selected by the rule selection process.

`deps`

Inheritance	Type	Default	Dynamic	Example
Combined	`dict`	`{}`	Simple	`{ 'SRC' : '{File}.c' }`

This attribute defines the static deps. It is a dict which associates python identifiers to files computed from the available environment.

They are f-strings, i.e. their value follow the python f-string syntax and semantic but they are interpreted when open-lmake tries to match the rule (the rule only matches if static deps are buildable, cf. rule selection). Hence they lack the initial f in front of the string.

Alternatively, values can also be list or tuple whose first item is as described above, followed by flags.

The flags may be any combination of the following flags, optionally preceded by - to turn it off. Flags may be arbitrarily nested into sub-list's or sub-tuple's.

CamelCase	snake_case	Default	Description
`Essential`	`essential`	Yes	This dep will be shown in a future graphic tool to show the workflow, it has no algorithmic effect.
`Critical`	`critical`	No	This dep is critical.
`IgnoreError`	`ignore_error`	No	This dep may be in error, job will be launched anyway.
`ReaddirOk`	`readdir_ok`	No	This dep may be read as a dir (using `readdir` (3)) without error.
`Required`	`required`	No	This dep is deemed to be read, even if not actually read by the job.
`NoStar`	`no_star`	Yes	Accept regexpr-based flags (e.g. from star `side_deps` or `side_targets`)
`Top`	`top`	No	Dep pattern is interpreted relative to the top-level repo, else to the local repo (cf. subrepos.

Flag order and dep order are not significative.

`dep`

Inheritance	Type	Default	Dynamic	Example
python	`str` or `list` or `tuple`	-	Simple

This attribute defines an unnamed static dep.

During execution, cmd stdin will be redirected to this dep, else it is /dev/null.

`side_deps`

Inheritance	Type	Default	Dynamic	Example
Combined	`dict`	`{}`	No

This attribute is used to define flags to deps when they are acquired during job execution. It does not declare any dep by itself. Syntactically, it follows the side_targets attribute except that:

Specified flags are dep flags only where side_targets accept both target flags an dep flags.
The flag Ignore or ignore only applies to reads to prevent such accessed files from becoming a dep where for side_targets, this flag prevents files from being deps or targets.
. may be specified as pattern (or pattern may include it as a possible match) which may be necessary when passing the readdir_ok flag.

`chroot_dir`

Inheritance	Type	Default	Dynamic	Example
python	`f-str`	`None`	Full	`'/ubuntu22.04'`

This attribute defines a dir in which jobs will chroot into before execution begins. It must be an absoluted path.

Note that unless the repo_view is set, the repo must be visible under its original name in this chroot environment.

If None, '' or '/', no chroot is performed unless required to manage the tmp_view and repo_view attributes (in which case it is transparent). However, if '/', namespaces are used nonetheless.

`repo_view`

Inheritance	Type	Default	Dynamic	Example
python	`f-str`	`None`	Full	`'/repo'`

This attribute defines a dir in which jobs will see the top-level dir of the repo (the root dir). This is done by using mount -rbind (cf. namespaces).

It must be an absolute path not lying in the temporary dir.

If None or '', no bind mount is performed.

As of now, this attribute must be a top level dir, i.e. '/a' is ok, but '/a/b' is not.

`tmp_view`

Inheritance	Type	Default	Dynamic	Example
python	`f-str`	`None`	Full	`'/tmp'`

This attribute defines the name which the temporary dir available for job execution is mounted on (cf. namespaces).

If None, '' or not specified, this dir is not mounted. Else, it must be an absolute path.

As of now, this attribute must be a top level dir, i.e. '/a' is ok, but '/a/b' is not.

`views`

Inheritance	Type	Default	Dynamic	Example
Combined	`dict`	`{}`	Full

This attribute defines a mapping from logical views to physical dirs.

Accesses to logical views are mapped to their corresponding physical location. Views and physical locations may be dirs or files depending on whether they end with a / or not. Files must be mapped to files and dirs to dirs.

Both logical views and physical locations may be inside or outside the repo, but it is not possible to map an external view to a local location (cf. namespaces).

Physical description may be :

a f-str in which case a bind mount is performed.
a dict with keys upper (a str) and lower (a single str or a list of str) in which case an overlay mount is performed. Key copy_up (a single str or a list of str) may also be used to provide a list of dirs to create in upper or files to copy from lower to upper. Dirs are recognized when they end with /. Such copy_up items are provided relative to the root of the view.

`environ`

Inheritance	Type	Default	Dynamic	Example
Combined	`dict`	...	Full	`{ 'MY_TOOL_ROOT' : '/install/my_tool' }`

This attribute defines environment variables set during job execution.

The content of this attribute is managed as part of the job command, meaning that jobs are rerun upon modification. This is the normal behavior, other means to define environment are there to manage special situations.

The environment in which the open-lmake command is run is ignored so as to favor reproducibility, unless explicitly transported by using value from lmake.user_environ. Hence, it is quite simple to copy some variables from the user environment although this practice is discouraged and should be used with much care.

Except the exception below, the value must be a f-str.

If resulting value is ... (the python ellipsis), the value from the backend environment is used. This is typically used to access some environment variables set by slurm.

If a value contains one of the following strings, they are replaced by their corresponding definitions:

Key	Replacement	Comment
`$LMAKE_ROOT`	The root dir of the open-lmake package	Dont store in targets as this may require cleaning repo if open-lmake installation changes
`$PHYSICAL_REPO_ROOT`	The physical dir of the subrepo	Dont store in targets as this may interact with cached results
`$PHYSICAL_TMPDIR`	The physical dir of the tmp dir	Dont store in targets as this may interact with cached results
`$PHYSICAL_TOP_REPO_ROOT`	The physical dir of the top-level repo	Dont store in targets as this may interact with cached results
`$REPO_ROOT`	The absolute dir of the subrepo as seen by job
`$SEQUENCE_ID`	A unique value for each job execution (at least 1)	This value must be semantically considered as a random value
`$SMALL_ID`	A unique value among simultaneously running jobs (at least 1)	This value must be semantically considered as a random value
`$TMPDIR`	The absolute dir of the tmp dir, as seen by the job
`$TOP_REPO_ROOT`	The absolute dir of the top-level repo, as seen by the job

By default the following environment variables are defined :

Variable	Defined in	Value	comment
`$HOME`	Rule	`$TOP_REPO_ROOT`	See above, isolates tools startup from user specific data
`$HOME`	HomelessRule	`$TMPDIR`	See above, pretend tools are used for the first time
`$PATH`	Rule	The standard path with `$LMAKE_ROOT/bin:` in front
`$PYTHONPATH`	PyRule	`$LMAKE_ROOT/lib`

`environ_resources`

Inheritance	Type	Default	Dynamic	Example
Combined	`dict`	`{}`	Full	`{ 'MY_TOOL_LICENCE' : '12345' }`

This attribute defines environment variables set during job execution.

The content of this attribute is managed as resources, meaning that jobs in error are rerun upon modification, but not jobs that were successfully built.

The values undertake the same substitutions as for the environ attribute described above.

Except the exception below, the value must be a f-str.

If resulting value is ... (the python ellipsis), the value from the backend environment is used. This is typically used to access some environment variables set by slurm.

`environ_ancillary`

Inheritance	Type	Default	Dynamic	Example
Combined	`dict`	`{}`	Full	`{ 'DISPLAY' : ':10' }`

This attribute defines environment variables set during job execution.

The content of this attribute is not managed, meaning that jobs are not rerun upon modification.

The values undertake the same substitutions as for the environ attribute described above.

Except the exception below, the value must be a f-str.

If resulting value is ... (the python ellipsis), the value from the backend environment is used. This is typically used to access some environment variables set by slurm.

By default the following environment variables are defined :

Variable	Defined in	Value	comment
`$UID`	Rule	the user id
`$USER`	Rule	the user login name

`python`

Inheritance	Type	Default	Dynamic	Example
python	`list` or `tuple`	system python	Full	`venv/bin/python3`

This attribute defines the interpreter used to run the cmd if it is a function.

Items must be f-str.

At the end of the supplied executable and arguments, '-c' and the actual script is appended, unless the use_script attribut is set. In the latter case, a file that contains the script is created and its name is passed as the last argument without a preceding -c.

Open-lmake uses python3.6+ to read Lmakefile.py, but that being done, any interpreter can be used to execute cmd. In particular, python2.7 and all revisions of python3 are fully supported.

If simple enough (i.e. if it can be recognized as a static dep), it is made a static dep if it is within the repo.

`shell`

Inheritance	Type	Default	Dynamic	Example
python	`list` or `tuple`	`/bin/bash`	Full	`('/bin/bash','-e')`

This attribute defines the interpreter used to run the cmd if it is a str.

Items must be f-str.

If simple enough (i.e. if it can be recognized as a static dep), it is made a static dep if it is within the repo.

`cmd`

Inheritance	Type	Default	Dynamic	Example
Combined	`f-str`	-	Full	`'gcc -c -o {OBJ} {SRC}'`
Combined	function	-	Full	`def cmd() : subprocess.run(('gcc','-c','-o',OBJ,SRC,check=True))`

if it is a function

In that case, this attribute is called to run the job (cf. job execution). Combined inheritance is a special case for cmd.

If several definitions exist along the MRO, They must all be functions and they are called successively in reverse MRO. The first (i.e. the most basic) one must have no non-defaulted arguments and will be called with no argument. The other ones may have arguments, all but the first having default values. In that case, such function's are called with the result of the previous one as unique argument. Else, if a function has no argument, the result of the previous function is dropped.

Because jobs are executed remotely using the interpreter mentioned in the python attribute and to avoid depending on the whole Lmakefile.py (which would force to rerun all jobs as soon as any rule is modified), these functions and their context are serialized to be transported by value. The serialization process may improve over time but as of today, the following applies:

Basic objects are transported as is : None, ..., bool, int, float, complex, str, bytes.
list, tuple, set and dict are transported by transporting their content. Note that reconvergences (and a fortiori loops) are not handled.
functions are transported as their source accompanied with their context : global accessed variables and default values for arguments.
Imported objects (functions and class'es and generally all objects with a __qualname__ attribute) are transported as an import statement.
Builtin objects are transported spontaneously, without requiring any generated code.

Also, care has been taken to hide this transport by value in backtrace and during debug sessions, so that functions appear to be executed where they were originally defined.

Values are captured according to the normal python semantic, i.e. once the Lmakefile module is fully imported. Care must be taken for variables whose values change during the import process. This typically concerns loop indices. To capture these at definition time and not at the end, such values must be saved somewhere. There are mostly 2 practical possibilities:

Declare an argument with a default value. Such default value is saved when the function is defined.
Define a class attribute. Class attributes are saved when its definition ends, which is before a loop index.

if it is a `f-str`

In that case, this attribute is executed as a shell command to run the job (cf. job execution). Combined inheritance is a special case for cmd.

While walking the MRO, if for a base class cmd is defined as a function and it has a shell attribute, the value of this attribute is used instead. The purpose is that it is impossible to combine str's and functions because they use different paradigms. As a consequence, a base class may want to have 2 implementations, one for subclasses that use python cmd and another for subclasses that use shell cmd. For such a base class, the solution is to define cmd as a function and set its shell attribute to the str version.

If several definitions exist along the MRO, They must all be str's and they are run successively in reverse MRO in the same process. So, it is possible for a first definition to define an environment variable that is used in a subsequent one.

As for other attributes that may be dynamic, cmd is interpreted as an f-string.

`auto_mkdir`

Inheritance	Type	Default	Dynamic	Example
python	`bool`	`False`	Full	`True`

When this attribute has a true value, executing a chdir syscall (e.g. executing cd in bash) will create the target dir if it does not exist.

This is useful for scripts in situations such as:

The script does chdir a.
Then try to read file b from there.
What is expected is to have a dep on a/b which may not exist initially but will be created by some other job.
However, if dir a does not exist, the chdir call fails and the file which is open for reading is b instead of a/b.
As a consequence, no dep is set for a/b and the problem will not be resolved by a further re-execution.
Setting this attribute to true creates dir a on the fly when chdir is called so that it succeeds and the correct dep is set.

`autodep`

Inheritance	Type	Default	Dynamic	Example
python	`f-str`	`'ld_audit'` if supported else `'ld_preload'`	Full	`'ptrace'`

This attribute specifies the method used by autodep to discover hidden deps.

`backend`

Inheritance	Type	Default	Dynamic	Example
python	`f-str`	-	Full	`'slurm'`

This attribute specifies the backend to use to launch jobs.

`cache`

Inheritance	Type	Default	Dynamic	Example
python	`f-str`	-	Simple

This attribute specifies the cache to use for jobs executed by this rule.

When a job is executed, its results are stored in the cache. If space is needed (all caches are constrained in size), any other entry can be replaced. The cache replacement policy (described in its own section, in the config chapter) tries to identify entries that are likely to be useless in the future.

`compression`

Inheritance	Type	Default	Dynamic	Example
python	`int`	`0`	Full	`1`

This attribute specifies the compression level used when caching. It is passed to the zlib library used to compress job targets.

0 means no compression.
9 means maximum compression.

`force`

Inheritance	Type	Default	Dynamic	Example
python	`bool`	`False`	Full	`True`

When this attribute is set to a true value, jobs are always considered out-of-date and are systematically rerun if a target is needed. It is rarely necessary.

`keep_tmp`

Inheritance	Type	Default	Dynamic	Example
python	`bool`	`False`	Full	`True`

When this attribute is set to a true value, the temporary dir is kept after job execution. It can be retreived with lshow -i.

Sucessive executions of the same job overwrite the temporary dir, though, so only the content corresponding to the last execution is available. When this attribute has a false value, the temporary dir is cleaned up at the end of the job execution.

`kill_sigs`

Inheritance	Type	Default	Dynamic	Example
python	`list` or `tuple`	`(signal.SIGKILL,)`	Full

This attribute provides a list of signals to send the job when @lmake decides to kill it.

A job is killed when:

^C is hit if it is not necessary for another running lmake command that has not received a ^C.
When timeout is reached.
When check_deps is called and some deps are out-of-date.

The signals listed in this list are sent in turn, once every second. Longer interval can be obtained by inserting 0's. 0 signals are not sent and anyway, these would have no impact if they were.

If the list is exhausted and the job is still alive, a more agressive method is used. The process group of the job, as well as the process group of any process connected to a stream we are waiting for, are sent SIGKILL signals instead of just the process group of the job. The streams we are waiting for are stderr, and stdout unless the target attribute is used (as opposed to the targets attribute) in which case stdout is redirected to the the target and is not waited for.

Note: some backends, such as slurm, may have other means to manage timeouts. Both mechanisms will be usable.

`max_retries_on_lost`

Inheritance	Type	Default	Dynamic	Example
python	`int`	`1`	No

This attribute provides the number of allowed retries before giving up when a job is lost. For example, a job may be lost because of a remote host being misconfigured, or because the job management process (called job_exec) was manually killed.

In that case, the job is retried, but a maximum number of retry attemps are allowed, after which the job is considered in error.

`max_stderr_len`

Inheritance	Type	Default	Dynamic	Example
python	`int`	`100`	Full	`1`

This attribute defines the maximum number of lines of stderr that will be displayed in the output of lmake. The whole content of stderr stays accessible with the lshow -e command.

`max_submits`

Inheritance	Type	Default	Dynamic	Example
python	`int`	`10`	No

The goal is to protect agains potential infinite loop cases. The default value should be both comfortable (avoid hitting it in normal situations) and practical (avoid too many submissions before stopping).

`readdir_ok`

Inheritance	Type	Default	Dynamic	Example
python	`bool`	`False`	Full	`True`

When this attribute has a false value, reading a local dir that is not ignored nor incremental is considered an error as the list of files in a dir cannot be made stable, i.e. independent of the history (the repo is not constantly maintained without spurious files, nor with all buildable files existing).

If it is true, such reading is allowed and it is the user responsibility to ensure that spurious or missing files have no impact on output, once all deps are up-to-date.

`resources`

Inheritance	Type	Default	Dynamic	Example
Combined	`dict`	`{}`	Full	`{ 'MY_RESOURCE' : '1' }`

This attribute specifies the resources required by a job to run successfully. These may be cpu availability, memory, commercial tool licenses, access to dedicated hardware, ...

Values must f-str.

The syntax is the same as for deps.

After interpretation, the dict is passed to the backend to be used in its scheduling (cf. local backend for the local backend).

`start_delay`

Inheritance	Type	Default	Dynamic	Example
python	`float`	`3`	Full

When this attribute is set to a non-zero value, start lines are only output for jobs that last longer than that many seconds. The consequence is only cosmetic, it has no other impact.

`stderr_ok`

Inheritance	Type	Default	Dynamic	Example
python	`bool`	`False`	Full	`True`

When this attribute has a false value, the simple fact that a job generates a non-empty stderr is an error. If it is true, writing to stderr is allowed and does not produce an error. The lmake output will exhibit a warning, though.

`timeout`

Inheritance	Type	Default	Dynamic	Example
python	`float`	no timeout	Full

When this attribute has a non-zero value, job is killed and a failure is reported if it is not done before that many seconds.

`use_script`

Inheritance	Type	Default	Dynamic	Example
python	`bool`	`False`	Full	`True`

This attribute commands an implementation detail.

If false, jobs are run by launching the interpreter followed by -c and the command text.

If true, jobs are run by creating a temporary file containing the command text, then by launching the interpreter followed by said file name.

If the size of the command text is too large to fit in the command line, this attribute is silently forced to true.

Execution

Handling an access

At first glance, recognizing a target from a dep when a job runs seems pretty easy when the accesses to the disk can be traced : reading a file is a dep, writing to it is a target. And this is what is done informally, but there are a lot of corner cases.

The specification devised hereinafter has been carefully thought to allow open-lmake to run adequate jobs to reach a stable state from any starting point. More specifically, think of the following sequence:

git clean -ffdx
lmake foo
git pull
lmake foo

The second lmake foo command is supposed to do the minimum work to reach the same content of foo as would be obtained with the sequence:

git pull
git clean -ffdx
lmake foo

This what stable state means : the content of foo is independent of the history and only depends on the rules and the content of sources, both being managed through git in this example.

In this specification, dirs are ignored (i.e. the presence or content of a dir has no impact) and symbolic links are similar to regular files whose content is the link itself.

Reading and writing files

The first point is to precisely know what reading and writing mean.

Writing to file foo means:

A system call that writes or initiate writing to foo, e.g. open("foo",O_WRONLY|O_TRUNC) or symlink(...,"foo"), assuming the autodep rule attribute is not set to 'none'.
Unlinking foo, e.g. unlink("foo"), is also deemed to be writing to it.
A call to lmake.target('foo',write=True).
The execution of ltarget -W foo.
Under the condition that these actions are not preceded by a call to lmake.target('foo',ignore=True) or the execution of ltarget -I foo.
Also under the condition that foo does not match a targets or side_targets entry with the Ignore flag set.
Also under the condition that foo lies in the repo (i.e. under the dir containing Lmakefile.py but not in its LMAKE/lmake sub-dir).

Reading file foo means :

A system call that reads or initiate reading foo, e.g. open("foo",O_RDONLY), readlink("foo",...) or stat("foo",...), assuming the autodep rule attribute is not set to 'none'.
Unless the config.link_support attribute is set to 'none', any access (reading or writing) to foo which follows symlinks is an implicit readlink.
Unless the config.link_support attribute is set to 'file' or 'none', any access (reading or writing) to foo, whether it follows symlinks or not, is an implicit readlink of all dirs leading to it.
Note that some system calls can be both a read and a write, e.g. open("foo", O_RDWR) but also rename("foo",...). In that case, the read occurs before the write.
A call to lmake.depend('foo',read=True).
The execution of ldepend -R foo.
Under the condition that these actions are not preceded by a call to lmake.depend('foo',ignore=True) or the execution of ldepend -I foo.
Also under the condition that foo is not listed in deps or matches a side_deps entry, with the Ignore flag set.
Also under the condition that foo lies in the repo (i.e. under the dir containing Lmakefile.py but not in its LMAKE/lmake sub-dir) or in a source dir.

Reading a dir

A dir foo is read when files it contains are listed, which occur when:

A system call that reads dir foo, e.g. getdents.
A libc call that reads dir foo, e.g. readdir or glob (in which its pattern argument requires reading foo).
Under the condition that these actions are not preceded by a call to lmake.target('foo',ignore=True) or the execution of ltarget -I foo.
Also under the condition that neither lmake.target('foo',incremental=True) was called nor ltarget -i foo executed.
Also under the condition that foo does not match a targets or side_targets entry with the Ignore or Incremental flags set.
Also under the condition that foo lies in the repo (i.e. under the dir containing Lmakefile.py but not in its LMAKE/lmake sub-dir) or is the repo root dir.

Although dirs do not exist for open-lmake, reading dir foo is an error unless the ReaddirOk attribute was set on the rule or the ReaddirOk flag is set, which can be done by:

Passing the ReaddirOk flag in the targets, side_targets or side_deps entry.
Calling lmake.depend('foo',readdir_ok=True) or executing ldepend -D foo.
Calling lmake.target('foo',readdir_ok=True) or executing ltarget -D foo.

Note that the lmake.PyRule base class sets the the ReaddirOk flag on dirs mentioned in sys.path when executing python3. This is because python3 optimizes imports by pre-reading these dirs.

Such restrictions ensure the reliability of job execution as the content of a dir is mostly unpredictable as it depends on the past history: files may or may not have been already built, or previously built files that are now non-buildable may still exist.

Ideally, listing a dir would lead to all buildable files (or sub-dirs), but this is not doable in the generic case as such list may be infinite. So open-lmake reverts to letting the user deal with this question, using an opt-in approach so the user cannot miss it.

Note that if such a dir is marked as incremental, the user already has the responsibility of handling its past history and there is no need for an additional flag.

Being a target

A file may be a target from the begining of the job execution, or it may become a target during job execution. In the latter case, it is not a target until the point where it becomes one. A file cannot stop being a target: once it has become a target, this is until the end of the job execution.

A file is a target from the begining of the job execution if it matches a targets or side_targets entry.

A file becomes a target when :

It is written to (with the meaning mentioned above)..
lmake.target or ltarget is called unless the flag source_ok is finally set (i.e. including if set independently).

Being a dep

A file may be a dep from the begining of the job execution, or it may become a dep during job execution.

A file cannot stop being a dep : once it has become a dep, this is until the end of the job execution.

A file is a dep from the begining of the job execution if it listed as a deps in the rule.

A file becomes a dep when it is read (with the meaning mentioned above) while not a target at that time.

Errors

Some cases lead to errors, independently of the user script.

The first case is when there is clash between static declarations. targets, side_targets, side_deps entries may or may not contain star stems. In the latter case, and including the static deps listed in deps, they are static entries. It is an error if the same file is listed several times as a static entry.

The second case is when a file is both a dep and a target. You may have noticed that the definition above does not preclude this case, mostly because a file may start its life as a dep and become a target. This is an error unless either:

The file is finally unlinked (or was never created).
Its source_ok flag has been set and its incremental one is not.
This allows rules that alter sources to choose between handling any previous content without impact on final value (incremental) or compute value based on previous one (in which case it must be a dep). In this latter case, the job will rerun if such dep-target is actually modified.

The third case is when a target was not declared as such. foo can be declared as target by:

Matching a targets or side_targets entry.
Calling lmake.target('foo') (unless allow=False is passed).
Executing ltarget foo in which the -a option is not passed.

A target that is not declared is an error.

Processing a target

Targets are normally erased before the start of the job execution, unless they are sources or flagged as incremental. In case a target is also a dep, it is automatically flagged as incremental, whether it is an error or not.

If a job is run when a not incremental and not source target exists, it is deemed unreliable and is rerun.

Best effort

Open-lmake tries to minimize the execution of jobs, but may sometimes miss a point and execute a job superfluously. This may include erasing a file that has no associated production rule. Unless a file is a dep of no job, open-lmake may rebuild it at any time, even when not strictly necessary.

In the case open-lmake determines that a file may have actually been written manually outside its control, it fears to overwrite a user-generated content. In that case, open-lmake quarantines the file under the LMAKE/quarantine dir with its original name. This quarantine mechanism, which is not necessary for open-lmake processing but is a facility for the user, is best effort. There are cases where open-lmake cannot anticipate such an overwrite.

tmp dir

The physical dir is:

If $TMPDIR is set to empty, there is no tmp dir.
If open-lmake is supposed to keep this dir after job execution, it is a dir under LMAKE/tmp, determined by open-lmake (its precise value is reported by lshow -i).
Else if $TMPDIR is specified in the environment of the job, it is used. Note that it need not be unique as open-lmake will create a unique sub-dir within it.
Else, a dir determined by open-lmake lying in the LMAKE dir.

Unless open-lmake is instructed to keep this dir, it is erased at the end of the job execution.

At execution time:

If $TMPDIR is set to empty, it is suppressed from the environment and if the job uses the default tmp dir (usually /tmp), an error is generated.
Else $TMPDIR is set so that the job can use it to access the tmp dir.

Job execution

Job are executed by calling the provided interpreter (generally python or bash).

When calling the interpreter, the following environment variable are automatically set, in addition to what is mentioned in the environ attribute (and the like). They must remain untouched:

$LD_AUDIT : A variable necessary for autodep when it is set to 'ld_audit'
$LD_PRELOAD : A variable necessary for autodep when it is set to 'ld_preload' or 'ld_preload_jemalloc'
$LMAKE_AUTODEP_ENV : A variable necessary for autodep in all cases
$TMPDIR : The name of a dir which is empty at the start of the job. If the temporary dir is not kept through the use of the keep_tmp attribute or the -t option, this dir is cleaned up at the end of the job execution.

After job execution, a checksum is computed on all generated files, whether they are allowed or not, except ignored targets (those marked with the ignore attribute).

The job is reported ok if all of the following conditions are met:

Job execution (as mentioned below) is successful.
All static targets are generated
All written files are allowed (either appear as target, side target or are dynamically allowed by a call to ltarget or lmake.target)
Nothing is written to stderr, or the stderr_ok attribute is set.

if cmd is a `str`

Because this attribute undergo dynamic evaluation as described in the cmd rule attribute, there is not further specificities.

The job execution is successful (but see above) if the interpreter return code is 0.

if it is a function

In that case, this attribute is called to run the job.

During evaluation, its global dict is populated to contain values referenced in these functions. Values may come from (by order of preference):

The stems, targets, deps, resources, side targets and side deps, as named in their respective dict.
stems, targets, deps, resources that contain their respective whole dict.
if a single target was specified with the target attribute, that target is named target.
if a single dep was specified with the dep attribute, that dep is named dep.
Any attribute defined in the class, or a base class (as for normal python attribute access).
Any value in the module globals.
Any builtin value.
undefined variables are not defined, which is ok as long as they are not accessed (or they are accessed in a try/except block that handle the NameError exception).

Static targets, deps, side targets and side deps are defined as str. Star targets, side targets and side deps are defined as functions taking the star-stems as argument and returning the then fully specified file. Also, in that latter case, the reg_expr attribute is defined as a str ready to be provided to the re module and containing named (if corresponding star-stem is named) groups, one for each star-stem.

The job execution is successful (but see above) if no exception is raised.

Data model

Open-lmake manages 2 kinds of objects: files and jobs.

The reason they are different objects is that jobs may have several targets, so there is no way to identify a file and the job that generates it.

Files

names

Files are identified by their canonical name, as seen from the root of the repo. For example, all these code snippets will access the same file a/b (assume no symbolic links for now):

cat a/b
cat a//b
cat ./a/b
cat a/./b
cat /repo/a/b # assume actual repo is /repo
cat c/../a/b  # assume c is a dir
cd a ; cat b

When spying the job (or when ldepend) is called, all these accesses will be converted to a/b.

Although targets are necessarily inside the repo, deps may be outside as some source dirs may declared outside the repo. For such deps, their name is:

the absolute path if the source dir is declared absolute
the relative path if the source dir is declared relative

Symbolic links

Open-lmake manages the physical view of the repo. This means that symbolic links are genuine files, to the same extent as a regular file and their content is their target. This means if a is a symbolic link to b, the content of a is b, not the content of b.

If a is a symbolic link to b, the code snippet cat a accesses 2 files:

This is what is expected : if either a or b is modified, the stdout of cat a may be modified.

Dirs

Open-lmake manages a flat repo. This means that / is an ordinary character.

As far as open-lmake is concerned, there is no difference between a being a dir and a not existing.

However, because dirs do exist on disk, it is impossible for a and a/b to exist simultaneously (i.e. exist as regular or symbolic link). As a consequence, there is an implicit rule (Uphill) that prevents a/b from being buildable if a is buildable.

Also, because dirs cannot be made up-to-date, scripts reading dirs can hardly be made reliable and repeatable. Such constructs are strongly discouraged:

use of glob.glob in python
use of wildcard in bash

Jobs

Jobs are identified by their rule and stems (excluding star-stems).

They have a list of targets and a list of deps.

Rule selection

When open-lmake needs to ensure that a file is up to date, the first action is to identify which rule, if any, must be used to generate it. This rule selection process works in several steps described below.

A file is deemed buildable if the rule selection process leads to a job that generates the file.

Name length

First, the length of the target name is checked agains lmake.config.path_max. If the target name is longer, then the process stops here and the file is not buildable.

Sources

The second step is to check target agains sources and source dirs.

If the target is listed as a source it is deemed buildable. No execution is associated, though, the file modifications made by the user are tracked instead.

If the target is within a dir listed as a source dir (i.e. appears ending with a / in the manifest), it is deemed buildable if it exists. If it does not exist, it is not buildable. In both cases, the process stops here.

Up-hill dir

The third step is to see if a up-hill dir (i.e. one of the dir along the dir path leading to the file) is (recursively) buildable.

If it is the case, the rule selection process stops here and the file is not buildable.

`AntiRule` and `SourceRule`

The following step is to match the target against AntiRule's and SourceRule's (ordered by their prio attribute, high values are considered first). If one is found, the target is buildable if it matches a SourceRule and is not if it matches an AntiRule.

If it matches a SourceRule and it does not exist, it is still buildable, but has an error condition.

In all cases, as soon as such a match is found, the process stops here.

Plain rules

The rules are split into groups. Each group contains all the rules that share a given prio. Groups are ordered with higher prio first.

The following steps is executed for each group in order, until a rule is found. If none is found, the file declared not buildable.

Match a target

For a given rule, the file is matched against each target in turn. Static targets are tried first in user order, then star targets in user order, and matching stops at the first match. Target order is made of targets and target entries in reversed MRO order (i.e. higher classes in the python class hierarchy are considered first),

If a target matches, the matching defines the value of the static stems (i.e. the stems that appear without a *). Else, the rule does not apply.

Check static deps

The definition of the static stems allow to compute :

The other targets of the rule. Static targets become the associated file, star targets becomes regular expressions in which static stems are expanded.
Static deps by interpreting them as f-strings in which static stems and targets are defined.

Static deps are then analyzed to see if the are (recursively) buildable, and if any is not buildable, the rule does not apply.

Group recap

After these 2 previous steps have been done for the rules of a group, the applicable rules are analyzed the following way:

If no rule apply, next group is analyzed.
If the file matches several rules as a sure target (i.e. a static target and all static deps are sure), the file is deemed buildable, but if required to run, no job will be executed and the file will be in error.
If the file matches some rules as a non-sure target (i.e. a star target or a dep is not sure), the corresponding jobs are run. If no such jobs generate the file, next group is analyzed. If several of them generate the file, the file is buildable and in error.

Backends

Backends are in charge of actually launching jobs when the open-lmake engine has identified that it had to be run. It is also in charge of :

Killing jobs when the open-lmake engine has identified it had to be so.
Scheduling jobs so as to optimize the runtime, based on some indications provided by the open-lmake engine.
Rescheduling jobs when new scheduling indications becomes available.

A backend has to take decisions of 2 kinds:

Is a job eligible for running ? From a dep perspective, the open-lmake engine guarantees it is so. But the job needs some resources to run and these resources may already be busy because of some other jobs already running.
If several jobs are eligible, which one(s) to actually launch.

Each backend is autonomous in its decisions and has its own algorithm to take them. However, generally speaking, they more or less work by following the following principles:

For the first question, the backend maintain a pool of available resources and a job is eligible if its required resources can fit in the pool. When launched, the required resources are subtracted from the pool and when terminated, they are returned to it.
For the second question, each job has an associated pressure provided by the open-lmake engine and the backend actually launches the eligible job with the highest pressure.

The required resources are provided by the open-lmake engine to the backend as a dict which is the one of the job's rule after f-string interpretation.

The pressure is provided in the form of float computed as the accumulated ETE along the critical path to the final targets asked on the lmake command line. To do that, future job ETE have to be estimated. For jobs that have already run, last successful execution time is used. When this information is not available, i.e. when the job has never run successfully, a moving average of the execution times of the jobs sharing the same rule is used as a best guess.

The backend also provides the current ETA of the final targets to allow the backends from different repo to take the best collective decision.

In addition to dedicated resources, all backends manage the following 3 resources:

cpu : The number of threads the job is expected to run in parallel. The backend is expected to reserve enough resources for such a number of threads to run smoothly.
mem : The memory size the job is expected to need to run smoothly. The backend is expected to ensure that such memory is available for the job. Unit must be coherent with the one used in the configuration. It is MB by default.
tmp : The size of necessary temporary disk space. By default temporary disk space is not managed, i.e. $TMPDIR is set (to a freshly created empty empty dir which is cleaned up after execution) with no size limit (other than the physical disk size) but no reservation is made in the backend.

Resource buckets

It may be wise to quantify resources with relatively large steps for resources mem and tmp, especially if these may be computed with a formula.

The reason is linked to the way the backends select jobs. When a backend (actually the local, SGE and slurm backends essentially work the same way) search for the next job to launch, it walks through the available jobs to find the eligible one with the highest priority. When doing that, only jobs with different resources need to be compared as for a given set of resources, they can be pre-ordered by priority. As a consequence, the running time is proportional to the number of different resources. If the mem and tmp needed space is computed from some metrics, it may be very well possible that each job has a different number, leading to a selection process whose time is proportional to the number of waiting jobs, which can be very high (maybe millions).

To help reduce this overhead, one may want to put jobs into buckets with defined values for these resources. This is done by rounding these resources for grouping jobs into buckets.

When job is launche, however, the exact resources are reserved. Rounding is just applied to group jobs into bucket and improve the management of the queues.

backend conversion to local

If a backend cannot be configured because the environment does not allow it (typically missing the SGE or slurm daemons), then:

A Warning message is emmitted at configuration time.
Jobs supposed to start with such backends will be redirected to the local backend.
Resources are mapped on a best-effort basis, and if a resource does not exist or is insufficient in the local backend, job is started so as to be alone on the local host.

Local backend

The local backend launches jobs locally, on the host running the lmake command. There is no cooperation between backends from different repos and the user has to ensure there is no global resource conflict.

This backend is configured by providing entries in the lmake.config.backends.local dict. The key identifies the resource and the value is a int that identifies a quantity.

The local backend is used when either:

The backend attribute is 'local' (which is the default value).
lmake is launched with the the --local option.
The required backend is not supported or not available.

In the two latter cases, required resources are translated into local resources (best effort) and if not possible (e.g. because a resource is not available locally or because special constraints cannot be translated), then only one such job can run at any given time.

Configuration

The configuration provides the available resources :

standard resoureces cpu, mem and tmp
any user defined resource

Each rule whose backend attribute is 'local' provides a resources attribute such that:

The key identifies a resource (which must match a resource in the configuration).
The value (possibly tailored by job through the use of the f-string syntax) is a int or a str that can be interpreted as int.

The variable available to the job as global variables (python case) or environment variables (shell case) contains the actual quantity of resources allocated to this job.

The local backend ensures that the sum of all the resources of the running jobs never overshoots the configured available quantity.

By default, the configuration contains the 2 generic resources: cpu and mem configured respectively as the overall number of available cpus and the overall available memory (in MB).

cpu : The number of cpu as returned by os.wched_getaffinity(0).
mem : The physical memory size as returned by s.sysconf('SC_PHYS_PAGES')*os.sysconf('SC_PAGE_SIZE') in MB.

Each rule has a default resources attribute requiring one CPU.

SGE backend

The SGE backend connects to a SGE daemon to schedule jobs, which allows:

a global scheduling policy (while the local backend only sees jobs in its own repo).
the capability to run jobs on remote hosts (while the local backend only run jobs on the local host).

Command line option

The command line option passed with -b or --backend is ignored.

Configuration

The configuration is composed of:

bin : The dir in which to find SGE executables such as qsub. This entry must be specified.
cell : The cell used by the SGE daemon. This is translated into $SGE_CELL when SGE commands are called. By default, this is automatically determined by the SGE daemon.
cluster : The cluster used by the SGE daemon. This is translated into $SGE_CLUSTER when SGE commands are called. By default, this is automatically determined by the SGE daemon.
default_prio : the priority used to submit jobs to the SGE daemon if none is specified on the lmake command line.
n_max_queued_jobs : open-lmake scatters jobs according to the required resources and only submit a few jobs to SGE for each set of asked resources. This is done to decrease the load of the SGE daemon as open-lmake might have millions of jobs to run and the typical case is that they tend to require only a small set of different resources (helped in this by the limited precision on CPU, memory and temporary disk space requirements). For each given set of resources, only the jobs with highest priorities are submitted to SGE, the other ones are retained by open-lmake so as to limit the number of waiting jobs in slurm queues (the number of running job is not limited, though). This attribute specifies the number of waiting jobs for each set of resources that open-lmake may submit to SGE. If too low, the schedule rate may decrease because by the time taken, when a job finishes, for open-lmake to submit a new job, slurm might have exhausted its waiting queue. If too high, the schedule rate may decrase because of the slurm daemon being overloaded. A reasonable value probably lies in the 10-100 range. Default is 10.
repo_key : This is a string which is add in front of open-lmake job names to make SGE job names. This key is meant to be a short identifier of the repo. By default it is the base name of the repo followed by :. Note that SGE precludes some characters and these are replaced by close looking characters (e.g. ; instead of :).
root : The root dir of the SGE daemon. This is translated into $SGE_ROOT when SGE commands are called. This entry must be specified.
cpu_resource : This is the name of a resource used to require cpu's. For example if specified as cpu_r and the rule of a job contains resources={'cpu':2}, this is translated into -l cpu_r=2 on the qsub command line.
mem_resource : This is the name of a resource used to require memory in MB. For example if specified as mem_r and the rule of a job contains resources={'mem':'10M'}, this is translated into -l mem_r=10 on the qsub command line.
tmp_resource : This is the name of a resource used to require memory temporary disk space in MB. For example if specified as tmp_r and the rule of a job contains resources={'tmp':'100M'}, this is translated into -l tmp_r=100 on the qsub command line.

Resources

The resources rule attributes is composed of :

standard resources cpu, mem and tmp.
hard : qsub options to be used after a -hard option.
soft : qsub options to be used after a -soft option.
any other resource passed to the SGE daemon through the -l qsub option.

Slurm backend

The slurm backend connects to a slurm daemon to schedule jobs, which allows :

a global scheduling policy (while the local backend only sees jobs in its own repo).
the capability to run jobs on remote hosts (while the local backend only run jobs on the local host).

Command line option

The only option that can be passed from command line (-b or --backend) is the priority through the -p options of qsub.

Hence, the command line option must directly contain the priority to pass to qsub.

Configuration

The configuration is composed of :

config : The slurm configuration file to use to contact the slurm controller. By default, /etc/slurm/slurm.conf is used.
init_timeout : Maximum time allowed to init slurm. By default, 10s.
lib_slurm : The slurm dynamic library. If no / appears, $LD_LIBRARY_PATH (as compiled in) and system default lib dirs are searched. By default, libslurm.so is used.
n_max_queued_jobs : open-lmake scatters jobs according to the required resources and only submit a few jobs to slurm for each set of asked resources. This is done to decrease the load of the slurm daemon as open-lmake might have millions of jobs to run and the typical case is that they tend require only a small set of different resources (helped in this by the limited precision on CPU, memory and temporary disk space requirements). for each given set of resources, only the jobs with highest priorities are submitted to slurm, the other ones are retained by open-lmake so as to limit the number of waiting jobs in slurm queues (the number of running job is not limited, though). This attribute specifies the number of waiting jobs for each set of resources that open-lmake may submit to slurm. If too low, the schedule rate may decrease because by the time taken, when a job finishes, for open-lmake to submit a new job, slurm might have exhausted its waiting queue. If too high, the schedule rate may decrase because of the slurm daemon being overloaded. A reasonable value probably lies in the 10-100 range. Default is 10.
repo_key : This is a string which is add in front of open-lmake job names to make slurm job names. This key is meant to be a short identifier of the repo. By default it is the base name of the repo followed by :.
use_nice: open-lmake has and advantage over slurm in terms of knowledge: it knows the deps, the overall jobs necessary to reach the asked target and the history of the time taken by each job. This allows it to anticipate the needs and know, even globally when numerous lmake commands run, in the same repo or on several ones, which jobs should be given which priority. Note that open-lmake cannot leverage the dep capability of slurm as deps are dynamic by nature:
- new deps can appear during job execution, adding new edges to the dep graph,
- jobs can have to rerun, so a dependent job may not be able to start when its dep is done,
- and a job can be steady, so a dependent job may not have to run at all.
The way it works is th following:
- First open-lmake computes and ETA for each lmake command. This ETA is a date, it is absolute, and can be compared between commands running in different repos.
- Then it computes a pressure for each job. The pressure is the time necessary to reach the asked target of the lmake command given the run time for all intermediate jobs (including the considered job).
- The subtraction of the pressure from the ETA gives a reasonable and global estimate of when it is desirable to schedule a job, and hence can be used as a priority.
The way to communicate this information is to set for each job a nice value that represents this priority. Because this may interfere with other jobs submitted by other means, this mechanism is made optional, although it is much better than other scheduling policies based on blind guesses of the futur (such as fair-share, qos, etc.).

There are 2 additional parameters that you can set in the PriorityParams entry of the slurm configuration in the form of param=value, separated by ,:

time_origin: as the communicated priority is a date, we need a reference point. This reference point should be in the past, not too far, to be sure that generated nice values are in the range 0 - 1<<31.
open-lmake sometimes generates dates in the past when it wrongly estimates a very short ETA with a high pressure. Taking a little bit of margin of a few days is more than necessary in all practical cases. Default value is 2023-01-01 00:00:00. Date is given in the format YYYY-MM-DD HH:MM optionally followed by +/-HH:MM to adjust for time zone. This is mostly ISO8601 except the T between date and time replaced by a space, which is more readable and corresponds to mainstream usage.
nice_factor: this is the value that the nice value increases each second. It is a floating point value. If too high, the the nice value may wrap too often. If too low, job scheduling precision may suffer. The default value is 1 which seems to be a good compromise.

Overall, you can ignore these parameters for open-lmake internal needs, the default values work fine. They have been implemented to have means to control interactions with jobs submitted to slurm from outside open-lmake.

Resources

The resources rule attributes is composed of:

standard resources cpu, mem and tmp.
excludes features, gres, licence, nodes, partition, qos, reserv : these are passed as is to the slurm daemon. For heterogeneous jobs, these attribute names may be followed by an index identifying the task (for example gres0, gres1). The absence of index is equivalent to index 0.
any other resource passed to the slurm daemon as licenses if such licenses are declared in the slurm configuration, else as gres.

Command line option

The command line option passed with the -b or --backend option is a space separate list of options. The following table describes supported option, with a description when it does not correspond to the identical option of srun.

Short option	Long option	Description
`-c`	cpus-per-task	cpu resource to use
	mem	mem resource to use
	tmp	tmp resource to use
`-C`	constraint
`-x`	exclude
	gres
`-L`	licenses
`-w`	nodelist
`-p`	partition
`-q`	qos
	reservation
`-h`	help	print usage )

Namespaces

Namespaces are used to isolate jobs. This is used to provide the semantic for the chroot_dir, repo_view, tmp_view and views attributes.

In that case, pid's are also isolated which allow reliable job end : when the top-level process exits, the namespaces are destroyed and no other process can survive. This guarantees that no daemon is left behind, uncontrolled.

Note that this is true even when chroot_dir is '/', which otherwise provides no other effect by itself.

Namespaces can be used in the following situations :

Open-lmake provides a cache mechanism allowing to prevent executing a job which was already executed in the same or another repo. However, some jobs may use and record absolute paths. In that case, the cache will be inefficient as the result in a repo is not identical to the one in another repo. This is current practice, in particular in the EDA tools community (which may be rather heavy and where caching is mostly desirable). Using the repo_view attribute is a good way to work around this obstacle.
Open-lmake tracks all deps inside the reposity and listed source dirs. But it does not track external deps, typically the system (e.g.the /usr dir). However, the chroot_dir attribute is part of the command definition and a job will be considered out of date if its value is modified. Hence, this can be used as a marker representing the whole system to ensure jobs are rerun upon system updates.
some softwares (e.g. EDA tools) are designed to operate on a dir rather than dealing with input files/dirs and output files/dirs. This goes against reentrancy and thus reliability, repeatability, parallelism etc. This problem can be solved with symbolic links if they are allowed. In all cases, it can be solved by using the tmp_view and copying data back and forth between the repo and the tmp dir. Or, more efficient, it can be solved by adequately mapping a logical steady file or dir to a per job physical file or dir (respectively).

Autodep

Autodep is a mechanism through which jobs are spied to automatically detect disk accesses. From this information open-lmake can determine if accesses were within the constraints provided by the rule and can list all deps.

Spying methods

There exists two classes of spying methods. Not all methods are supported on all systems, though.

Spying methods based on `libc` calls

This consists in spying all calls the the libc. Several mechanisms can be used to do so.

All of them consist in diverting the calls to the libc that access files (typically open, but there are about a hundred of them) to a piggy-back code that records the access before handling over to the real libc. They differ in the methods used to divert these calls to the autodep code.

This class of methods is fast as there is no need to switch context at each access. Moreover, the accessed file is first scanned to see if it is a system file (such as in /usr), in which case we can escape very early from the recording mechanism. And in practice, such accesses to system files are by far the most common case.

One of the major drawbacks is that it requires the libc to be dynamically linked. While libc static linking is very uncommon, it does happen.

`$LD_AUDIT`

Modern Linux dynamic linkers implement an auditing mechanism. This works by providing hooks at dynamic link edition time (by setting the environment variable $LD_AUDIT) when a file is loaded, when a symbol is searched, when a reference is bound, etc. In our case, we trap all symbol look up into the libc and interesting calls (i.e. those that access files) are diverted at that time.

However, some linkers do not seem to honor this auditing API. For example, programs compiled by the rust compiler (including rustc itself) could not be made working.

Such auditing code is marginally intrusive in the user code as, while lying in the same address space, it is in a different segment. For example it has its own errno global variable.

If available, this is the default method.

`$LD_PRELOAD`

This method consists in pre-loading our spying library before the libc. Because it is loaded before and contains the same symbols as the libc, these calls from the user application are diverted to our code.

this is a little bit more intrusive (e.g. the errno variable is shared) and this is the default method if $LD_AUDIT is not available.

`$LD_PRELOAD` with `jemalloc`

The use of jemalloc creates a chicken and egg problem at start up. The reason is that the spying code requires the use of malloc at start up, and jemalloc (which is called in lieu of malloc) accesses a configuration file at start up. A special implementation has been devised to handle this case, but is too fragile and complex to make it the default $LD_PRELOAD method.

Spying methods based on system calls

The principle is to use ptrace (the system call used by the strace utility) to spy user code activity.

This is almost non-intrusive. In one case, we have seen a commercial tool reading /proc/self/status to detect such a ptraceing process, and it stopped, thinking it was being reverse engineered. Curiously, it did not detect $LD_PRELOAD...

The major drawback is performance wise: the impact is more significant as there is a context switch at each system call. BPF is used, if available, to decrease the number of useless context switches, but it does not allow to filter out on file name, so it is impossible to have an early ignore of system files.

What to do with accesses

There are 2 questions to solve :

Determine the cwd. Because accesses may be relative to it (and usually are), the spying code must have a precise view of the cwd. This requires to intercept chdir although no access is to be reported.
Symbolic link processing. Open-lmake lies in the physical world (and there is no way it can do anything else) and must be aware of any symbolic link traversal. This includes the ones on the dir path. So the spying code includes a functionality that resembles to realpath, listing all traversed links.

Lying in the physical world means that symbolic links are handled like plain data files, except that there is a special bit that says it is a symbolic link. Its content is its target. For example, after the code sequence:

cd a
cat b

where b is symbolic link to c, 2 deps are recorded:

a/b (a symbolic link), as if it is modified, job must be rerun.
a/c (a plain data file), same reason.

Generally speaking, read a file makes it a dep, writing to it makes it a target. Of course, reading a file that has been written doe not make it a dep any more.

How to report accesses

When a job is run, a wrapper (called job_exec) is launched that launches the user process.

job_exec has several responsibilities, among which :

Prepare the user environment for the user code (environment variables, cwd, namespace if necessary, etc.).
Receive the accesses made by the user code (through a socket) and record them.
Determine what is a dep, what is a target etc.
Report a job digest to the server (the central process managing the dep DAG).

The major idea is that when an access is reported by the user code (in the case of libc call spying), there is no reply from job_exec back to the user code, so no round trip delay is incurred.

Deps order is kept as well

Remember that when a job is run, its deps list is approximative. It is the one of the previous run, which had different file contents. For example, a .c may have changed, including a #include directive. In case there are 2 deps d1 and d2, and d1 was just discovered, it may be out-of date and the job ran with a bad content for d1.

Most of the time, this is harmless, but sometimes, it may happen that d2 is not necessary any more (because old d1 content had #include "d2" and new one does not). In that case, this job must be rerun with the new content od d1, even if d2 is in error, as d2 might disappaer as de dep.

This may only occurs if d2 was accessed after d1 was accessed. If d2 was accessed before d1, it is safe to say the job cannot run because d2 is in error: it will never disappear.

Critical deps

The question of critical deps is a performance only question. Semantically, whether a dep is critical or not has no impact on the content of the files built by open-lmake.

During dep analysis, when a dep (call it dep1) has been used and turns out to be out-of-date, open-lmake must choose between 2 strategies regarding the deps that follow:

One possibility is to anticipate that the modification of dep1 has no impact on the list of following deps. With such an anticipation, open-lmake will keep the following deps, i.e. when ensuring that deps are up-to-date before launching a job, open-lmake will launch all necessary jobs to rebuild all deps in parallel, even if the deps have been explicitly declared parallel.
Another possivility is to anticipate that such a modification of dep1 will drastically change the list of following deps. With such an anticipation, as soone as open-lmake sees a modified dep, it will stop its analysis as the following deps, acquired with an out-of-date content of dep1 is meaningless.

The first strategy is speculative: launch everything you hear about, and we will see later what is useful. The second strategy is conservative: build only what is certain to be required.

Generally speaking, a speculative approach is much better, but there are exceptions.

Typical use of critical deps is when you have a report that is built from the results of tests provided by a list of tests (a test suite).

For example, let's say you have:

2 tests whose reports are built in test1.rpt and test2.rpt by some rather heavy means
a test suite test_suite.lst listing these reports
a rule that builds test_suite.rpts by collating reports listed in test_suite.lst

In such a situation, the rule building test_suite.rpts typically has test_suite.lst as a static dep but the actual reports test1.rpt and test2.rpt are hidden deps, i.e. automatically discovered when building test_suite.rpts.

Suppose now that you make a modification that makes test2.rpt very heavy to generate. Knowing that, you change your test suite so list a lighter test3.rpt instead. The succession of jobs would then be the following:

test1.rpt and test2.rpt are rebuilt as they are out-of-date after your modification.
test_suite.rpts is rebuilt to collate theses reports.
Open-lmake then sees that test3.rpt is needed instead of test2.rpt.
Hence, test3.rpt is (re)built.
test_suite.rpts is finally built from test1.rpt and test3.rpt.

There are 2 losses of performance here:

test2.rpt is unnecessarily rebuilt.
test1.rpt and test3.rpt are rebuilt sequentially.

The problem lies in the fact that test1.rpt and test2.rpt are rebuilt before open-lmake had a chance to re-analyze the test suite showing that the new tests are test1 and test3. Generally speaking, this is a good strategy : such modifications of the dep graph happens rather rarely and speculating that it is pretty stable by building known deps before launching a job is the right option. But here, because collating is very light (something like just executing cat on the reports), it is better to check tests_suilte.lst first, and if it changed, rerun the collation before ensuring (old) tests have run.

This is the purpose of the critical flag. Such a flag can either be passed when declaring static deps in a rule, or dynamically using lmake.depend or ldepend.

The collating rule would look like:

Set the critial flag on test_suite.lst (before or after actually reading it, this has no impact).
Read test_suite.lst.
Call ldepend on the reports listed in test_suite.lst. This is optional, just to generate parallel deps instead of automatic sequential deps (but if done, it must be before actually reading the reports).
Collate reports listed in test_suite.lst.

And the succession of job would be:

test_suite.rpts is rebuilt before analyzing test1.rpt and test2.rpt because test_suite.lst has changed.
Open-lmake sees that test3.rpt is needed instead of test2.rpt.
Hence, test1.rpt and test3.rpt are (re)built in parallel.
test_suite.rpts is finally built from test1.rpt and test3.rpt.

ETA

An ETA estimation is made possible because the execution time for each job is recorded in open-lmake book-keeping after all successful runs (if a job ends in error, it may very well have been much faster and the previous execution time is probably a better estimate than this one). When a job has never run successfully, an ETE is used instead of its actual execution time by taking a moving average of all the jobs of the same rule.

This being given, a precise ETA would require a fake execution of the jobs yet to be run which can take all deps and resources into account. But this is way too expensive, so a faster process must be done, even at the expense of precision.

In all cases, the ETA assumes that no new hidden deps are discovered and that no file is steady so that all jobs currently remaining will actually be executed.

2 approaches can be considered to estimate the time necessary to carry out remaining jobs :

Resources limited : deps are ignored, only resources are considered. Roughly, the time is the division of the quantity of resources necessary by the quantity of resources available. For example, if you need 10 minutes of processing and you have 2 cpus, this will last 10/2=5 minutes.
Deps limited : resources are ignored and only deps are considered. This means you only look at the critical path. For example if you need to run a 2 minutes job followed by a 3 minutes job, and in parallel you must run a 4 minutes job, this will last 2+3=5 minutes.

Open-lmake uses the first approach. For that it measures the parallelism of each job while running and the ETA is computed after the sum of the costs of all waiting and running jobs, the cost being the execution time divided by the observed parallelism. Jobs running for the first time inherit a moving average of the last 100 run jobs of the same rule.

Video mode

If lmake is connected to a terminal, then the terminal foreground and background colors are probed and if the brightness of the background color is less than that of the foreground color, video mode is set to normal, else it is set to reverse.

In that case, lmake output is colored and the (configurable) color set is chosen depending on video mode.

Meta data

The LMAKE dir at the root of the repo contains numerous information that may be handy for the user.

It also contains a lmake dir containing private data for open-lmake's own usage.

LMAKE/environ and LMAKE/manifest can be freely used in jobs and are considered as sources if they are listed in lmake.manifest, which is automatic by default.

`LMAKE/config_deps`, `LMAKE/rules_deps` and `LMAKE/sources_deps`

These files contain a list of files that open-lmake has read to process Lmakefile.py when reading each section (config, rules and sources).

They contain several types of lines, depending on the first char:

#: comment line
*: line contains the open-lmake installation dir
+: line contains an existing file that was read
!: line contains a non-existing file that was accessed

These contents are then used to determine if each section must be refreshed when a new lmake command is run.

`LMAKE/auto_tmp`

This dir contains the default tmp dirs of jobs.

`LMAKE/config`

This file contains a description of the lmake.config dict as it has been understood by open-lmake after having processed Lmakefile.py.

`LMAKE/debug`

This dir contains a sub-dir for each job ldebug was used for. These sub-dirs are named after the job id as displayed by lshow -i.

`LMAKE/environ`

This file contains the list of environment variables actually used in Lmakefile.py in the form of lines containing <key>=<value>.

`LMAKE/last_output`

This file is a symbolic link to the last transcript.

`LMAKE/lmakefile_tmp`

This dir contains the tmp dirs used when reading Lmakefile.py. When several passes are used, each pass has its own tmp dir. These are kept after reading for debugging purpose.

`LMAKE/manifest`

This file contains a description of the sources as it has been understood by open-lmake after having processed Lmakefile.py.

`LMAKE/matching`

This file contains a description of the matching performed by open-lmake when looking for rules to generate a file. It is composed of matching entries.

Each matching entry starts with a header providing a prefix and a suffix. Files matching the prefix and suffix are matched against the rules listed underneath.

The prefix and suffix are given in the form <prefix>*<suffix>' to mimic shell patterns. However, in this case, the *` may match a negative number of characters, i.e. the prefix and the suffix may overlap.

The listed rules are provided as:

Its priority: if a rule at given priority match and generate the file, rules of lower priority are not tried.
Its name.
The target corresponding to this prefix/suffix pair.

`LMAKE/outputs/<date>/<time>`

This file contains a transcript of the lmake command that has been run at <time> on <day>. Such logs are kept for a number of days given in lmake.config.console.history_days.

`LMAKE/quarantine`

This dir contains all files that have been quarantined. A file is quantantined when open-lmake decides it must be unlinked and it contains manual modifications, i.e. modifications made outside the control of open-lmake. In that case, in order to be sure that no user work is lost, the file is quarantined in this dir rather than unlinked.

`LMAKE/rules`

This file contains a description of the rules as they have been understood by open-lmake after having processed Lmakefile.py.

`LMAKE/targets`

This file contains the targets that have been required by lmake commands in chronological order (with duplicates removed).

`LMAKE/tmp`

This dir contains a sub-dir for each job which was run while keeping its tmp dir. These sub-dirs are named after the job id as displayed by lshow -i.

`LMAKE/version`

This file contains a state-recording version of open-lmake. If the recorded version does not match the used version, none of the open-lmake commands can be used.

Commands

Open-lmake is a package containing several commands.

The full documentation of these commands can also be obtained by running man <command>.

command execution

Most commands (ldebug, lforget, lmake, lmark and lshow) do not execute directly but instead connect to a server, or launch one if none already run.

The reason is that although several of these commands can run at the same time (including several times the same one, in particular several lmake), they all must run in the same process to stay coherent.

Among these commands, all of them except lmake run mostly instantaneously. So the serverr mostly exist to be able to run any of these commands while one or several instances of lmake are already running.

The (unique) server is created automatically when necessary and dies as soon as no more needed. So under normal situations, one does not have to even be aware of the existence of such a server.

Although the server has been carefully coded to have a very low start overhead, it may happen in rare circumstances, though, that pre-launching a server (<installation dir>/_bin/lmakeserver) leads to improved performances by avoiding to relaunch a server for each command. In such cases, the server must be run with no argument.

However, if, under a particular cicumstance, the server must be killed, best is to use signal 1 (SIGHUP) or 2 (SIGINT) as this will force the server to smoothly kill all running jobs. Other signals are not managed and will lead to the server dying abruptly, potentially leaving a lot of running jobs. This has no semantic impact as these jobs will be considered out-of-date and will rerun, but may incur a waste of resources.

commands to control build execution

These commands are meant to be run by the user outside jobs. They are:

Command	Short description
`lautodep`	run a script in an execution environmeent while recording accesses
`lcollect`	remove obsolete files and dirs
`ldebug`	run a job in a debug environement
`ldircache_repair`	repair a broken repo
`lforget`	forget history of a job
`lmake`	run necessary jobs to ensure a target is up-to-date
`lmark`	mark a job to alter its behavior w.r.t. `lmake`
`lrepair`	repair a broken repo
`lshow`	show various informations of a job
`xxhsum`	compute a checksum on a file

commands to interact with open-lmake from within jobs

These commands are meant to be run from within a job. They are:

Command	Short description
`lcheck_deps`	check currently seen deps are all up-to-date and kill job if not the case
`ldecode`	retrieve value associated with a code
`ldepend`	generate deps
`lencode`	retrieve/generate a code associated with a value
`lrun_cc`	run a compilation, ensuring include dirs and lib dirs exist
`ltarget`	generate targets

Experimental features

The features described herein are experimental as long as they have not been thoroughly used.

If you plan to use one of those, best is to be in contact with the development team to:

get dedicated support
make necessary evolutions so as to fit your needs as they appear

As a consequence, it is most probable that the specifications will evolve in a non-backward compatible way.

Cache

Several cache mechanisms will be implemented but for now, ony one exists.

DirCache

This cache is based on a shared dir and requires no running daemon.

It must be initialized with a file LMAKE/size containing the overall size the cache is allowed to occupy. The value may end with a unit suffix in k, M, G, T (powers of 1024). For example LMAKE/size can contain 1.5T.

Encoding and decoding

In some situations with heavily parameterized generated files, file names can become very long. Think of the mere compilation of a C++ file foo.c. You may want to specify:

the optimization level through a -O argument
Whether debug checks are enable through the definition of NDEBUG
a trace level through the definition of a macro such as TRACE_LEVEL
whether and how to instrument with -fsanitize
whether some internal data are 32 or 64 bits
whether to use a reference algorithm or an agressively optimized one used in production.
...

You may want to be able to generate any combination so as, for example, compare the output of any 2 of them for validation purpose. You easily end up with an object file with a name such as foo.O3.NDEBUG.TRACE_LEVEL=1.sanitize=address.32.o. Already 50 characters or so. In a real projects, file names can easily be 200, 300, 400 characters long.

As long as the file name, with adequate shorthands such as using TL instead of TRACE_LEVEL fits within a few hundreds of characters, the situation is heavy but manageable. But if you need, say 3000 characters to specify a file, then it becomes completely impractical.

When the configuration can be devised in advance, in a stable way, an efficient alternative is to create a file to contain it, which becomes a configuration name, and just specify the configuration name in the generated file.

In the example above, you may have a file test.opts that contains options for testing and prod.opts that contains options for production. then, your object file is simply named foo.test.o or foo.prod.o.

When it is not, the situation is more complex and you need to automatically generate these configuration files with reasonably short names. A practical and stable way to generate short names is to compute a checksum on the parameters. You then need a way to retrieve the original parameters from the checksum to generate the generated file (the .o file in our example). In doing so, you must account for:

robustness : because such checksums are subject to the birthday paradox, you need either to deal with collisions are provide enough margin (roughly doubling the size) to avoid them.
repeatability : your system must not prevent you from being able to repeat a scenario that was generated some days, weeks, months earlier.
merging : when you invent a name, think that some colleagues working on the same project may also invent names, and they may collide. Tools such as git are there to help you in this process, but your scheme must be git friendly.
performance : you must have a scheme that support as many code/value associations as necessary for your case, without spending most of its time searching for value when given a code.
communication : ideally, you may want to signal a bug to a colleague by just telling him "build that target, and you see the bug". If the target refers to a code, he may need some further steps to create the code/value association, which goes against communication.

One way to deal with this case is to create a central database, with the following pros and cons:

robustness : collisions can easily be dealt with.
repeatability : this is a probleme. When dealing with collisions, some codes change, which change old repo because the database is not itself versioned. This is a serious problem.
merging : no merging.
perfomance : accessing the data in a performant way is easy. Detecting modifications so that open-lmake can take sound decisions may be more challenging.
communication : excellent, the database is shared
installation : you need a server, configure clients to connect to it, etc. it is some work
maintainance : as any central services, you may inadvertently enter wrong data, you need a way to administer it as it has the potential to block the whole team.

The lencode/ldecode commands (or the lmake.encode/lmake.decode fonctions) are there to address this question.

The principle of operation is the following:

There are a certain number of files storing code/value associations. These are sources seen from open-lmake, i.e. they are normally managed by git.
To keep the number of such files to a reasonably low level (say low compared to the overal number of sources), there are contexts, mostly used as a subdivision of files
So, a file provides a certain number of tables (the contexts), each table associating some codes with some values
These tables are stored in files as lines containing triplet : context, code, value
When reading, lencode/ldecode are very flexible. The files may contain garbage lines, duplicates, collisions, they are all ignored. When 2 values are associated with the same code by 2 different lines, a new code is generated by lengthening one of them with further digites of the checksum computed on the value. When 2 codes are associated with the same value by 2 different lines, only one code is retained, the shorter of the 2 (or any if of equal length).
When writing, lencode/ldecode are very rigid. File is generated sorted, with no garbage lines, nor duplicates, or collisions.
When open-lmake starts and read a file, it write it back in its canonical form.
When open-lmake runs, that lencode is used and generate new codes on the fly, additional lines are merely appended to the file.

This has the following properties:

Information is under git. No further server, central database, management, configuration etc.
repeatability is excellent. As long as you do not merge, your are insensitive to external activities. When merging, the probability of collision depends on the length of the used codes, which is under user control. Moreover, the length increasing automatically with collisions maintain the number of such collision to a reasonably low level, even in fully automatic mode.
Merging is very easy : actually one need not even merge. The simple collision file generated by git can be used as is. This makes this tool very git friendly.
Robustness is perfect : collisions are detected and dealt with.
Coherence is perfect : seen from open-lmake, each association is managed as a source. If anything changes (i.e. a new value is associated with an old code or a new code is associated with an old value), the adequate jobs are rerun.
Performance is very good as the content of the file is cached in a performance friendly format by open-lmake. And update to the file is done by a simple append. However, the file is sorted at every lmake command, making the content more rigid and the merge process easier.
Associations files can be editing by hand, so that human friendly codes may be associated to some heavily used values. lencode will only generate codes from checksums, but will handle any code generated externally (manually or otherwise). In case of collision and when open-lmake must suppress one of 2 codes, externally generated codes are given preference as they believed to be more readable. If 2 externally generated codes collide, a numerical suffix is appended or incremented to solve the collision.

Sub-repos

Sub-repos are repos that contain repos, i.e. some Lmakefile.py are present in sub-dirs.

In that situation, it is reasonable to assume that the Lmakefile.py are made to handle building files underneath it.

To support this situation, open-lmake allow you to simply mention such sub-repos, so that:

Targets only match within the sub-repo (and escape is possibly by setting the top flag to the target to provide global rules).
The same applies to deps.
cmd is run from this sub-repo, i.e. its cwd is set accordingly.
The priority of deeper rules are matched first, so that builds in a sub-repo is not pertubated by rules of englobing repo.

Glossary

Acronyms

Acronym	Definition	Comment
CAD	Computer Aided Design
ETA	Estimated Time of Arrival	(from aeronautic jargon) This is the date at which a given event is estimated to occur
ETE	Estimated Time Enroute	(from aeronautic jargon) This is the remaining time necessary to complete a task
LRU	Least Recently Used	A classical cache replacement policy where the entry that was least recently used is discarded when a new one is allocated
MRO	Method Research Order	The inheritance chain from the current class to its most basic base, usually `object`

Abbreviations

Some words are so heavily used in this documentation that abbreviating them greatly improve readability.

Abbreviation	Definition
dep	dependency
dir	directory
repo	repository

Vocabulary in messages

Regularly, open-lmake generate some messages. In order to keep such messages reasonably terse, a dedicated vocabulary is used. This vocabulary is meant to be intuitive, but a full explanation is given here.

Consider

When open-lmake finds an error and has a reasonable suggestion to fix it, it also generates an action likely to be done by the user. Wuch action are generated in a form that can be as directly copy-pasted as possible.

When such action is linked to the source versioning system, git is assumed (which is by far the most commonly used system). In case another source versioning system is used, the user must adapt the suggested action.

Dangling

A dangling file refers to an existing file with no way to generate it (i.e. no rules apply) but which is not controlled by the source versioning system.

Open-lmake considers depending on such a file an error as such dep would go against repeatability, i.e. a git push followed by a git pull in another repo would not transport said file.

Manual

A manual file refers to a file that has been modified outside open-lmake control (i.e. it is not the result of the execution of a job).

Official job

The official job of a target is the one that would be selected to generate it upon execution of the command lmake <target>. If no such job exists, the official state of such target is to not exist.

When open-lmake needs a dep for a job, it ensures its content is its official content (i.e. it is generated by its official job) for the job to be up-to-date.

A target may be generated by other means, such as written by a job as a side_target or allowed to be written by a call to lmake.target or ltarget. This does not make such target officially generated, but actually generated : open-lmake has no means to find out the right job to execute should it need to (re)generate it.

Polluted

A polluted file refers to a file that has been actually generated by a non-official job.

Quarantine

Before executing a job, open-lmake ensures its targets do not exist (unless they are incremental) to ensure repeatable executions. This may lead it to unlink such files if they have been previously generated.

However, if a target is determined to be manual (cf. above), it might contain valuable information that the user would be upset to lose.

In such cases, unless open-lmake can determine that no valuable information is present, instead of unlinking the file, it moves it to the LMAKE/quaranntine dir where it can be retrieved by the user. Open-lmake considers that there is no valuable information in 2 cases:

the file is empty
the file is identical to its previous content (typically, it has been edited, saved, but not modified)

Steady

A job is steady when it has been run, but all targets have been generated identically to the previous run, i.e. the job could have not ben run with no semantic consequences.

This may be an indication that the flow is suboptimal. Consider 2 cases:

a source file is modified, but only comments have been touched. Compilation is run as the source is modified, but the result is identical to its previous content.
a source file includes an include file (e.g. in C) or import a module (e.g. in Python), but does not actually use such include file/module.

In the first case, it is very difficult to devise a flow to avoid such compilation. A possibility would be to split the compilation process into 2 parts, the first one filtering out comments, this has generally adverse consequences (such as line numbers being altered or source file name being difficult to trace).

In the second case, the solution is probably pretty trivial : just suppress the contemplated include/import line.

Uniquify

When hard links are used and open-lmake decides that one of the link must be regenerated, sharing is no more possible.

If such a target is not incremental, it will be unlinked, regenerated and the other link will not be modified. But if such a target is incremental, it is not unlinked but a copy must be done to split the links between those that must be updated by the job execution and those the must not.

This process is called "uniquify".

A class D inherits from B and C in that order.
Both B and C inherit from a class A.
A method m is defined on A and C but not on B.
Then if m is called from an instance of D, C.m will be called and not B.m (which turns out to be A.m).

It is extensively described here.

This feature is a central point that makes python multiple inheritance easy to use and enables the class hierarchy shopping list style.

python computes the MRO in such a way as to enforce the diamond rule.

open-lmake documentation