This is a experimental translator of C (ISO/IEC 9899:2018) programs to EO programs.
Assuming, you are on Ubuntu 22.04+:
$ apt update
$ apt install -y software-properties-common
$ apt-key adv --keyserver keyserver.ubuntu.com --recv-keys F7C91591CC543ECA
$ add-apt-repository 'deb http://c2eo.polystat.org/debian/ c2eo-rep non-free main contrib'
$ apt-get install -y clang
$ apt-get install -y c2eoThen, just run:
$ c2eo <path-to-c-file-name> <eo-file-name>.eoYou can also use yegor256/c2eo image via Docker:
$ docker run -v $(pwd):/eo yegor256/c2eo:<tag> hello.c hello.eoAssuming you have hello.c in the current directory, the hello.eo will be created next to it.
We do not support the utility for other distributions and operating systems yet. However, you can try to build the project from source at your own risk.
Again, we recommend Ubuntu 22.04+ and you will need wget 1.21+, tar 1.30+, git 2.32.+, cmake 3.18+, gcc 11.2.+, g++ 11.2.+, ninja-build 1.10.1+, clang 14.0.0+ and python3 3.10.0+. You will also need requirements for the EO project (Maven 3.3+ and Java 8+)
Then, you need to install GTest 1.12.1+
$ apt install libgtest-dev googletest
$ cd /usr/src/googletest
$ cmake .
$ make
$ lib
$ cp *.a /usr/local/libAfter that, you need to install LLVM/Clang 12.0.1 or you may use an alternative way below this code:
$ wget https://github.com/llvm/llvm-project/archive/refs/tags/llvmorg-12.0.1.tar.gz
$ tar -xvf llvmorg-12.0.1.tar.gz
$ mv ./llvm-project-llvmorg-12.0.1 ./llvm-clang
$ cd llvm-clang
$ mkdir build && cd $_
$ cmake --no-warn-unused-cli -DBUILD_SHARED_LIBS:STRING=ON -DLLVM_TARGETS_TO_BUILD:STRING=X86 -DCMAKE_EXPORT_COMPILE_COMMANDS:BOOL=TRUE "-DLLVM_ENABLE_PROJECTS:STRING=clang;compiler-rt" -DCMAKE_BUILD_TYPE:STRING=Debug -DLLVM_OPTIMIZED_TABLEGEN:STRING=ON -DLLVM_USE_SPLIT_DWARF:STRING=ON -DLLVM_USE_LINKER:STRING=gold ../llvm -G Ninja
$ cmake --build . --config Debug --target all -j 10 -- -j1 -l 2
$ cd ../..You may also try our own pre-packaged archive:
$ apt install megatools
$ megadl 'https://mega.nz/#!cZ9WQCqB!z713CuC-GNFQAXIxZwZxI05zOH4FAOpwYHEElgOZflA'
$ tar -xvf llvm-clang.tar.gzIt is assumed that the llvm-clang dir is located in the c2eo dir. If your llvm-clang is in different place, set the path in that line.
Formally speaking, this is where the preparation can be completed. However, in order to fully work with the project, testing and executing the translated code, you need to study the EO compiler project and fulfill its necessary requirements. After that, it will be possible to proceed with further steps.
All sources files of transpiler are located in project/src/transpiler. The transpiler's work begins with the code from the source file project/src/transpiler/main.cpp. Аfter making changes in these files, we will need to rebuild the executable file c2eo. To do this, you need to go to the project dir. For the first time, create the build folder:
$ mkdir buildthen go to the build folder and run the following commands:
$ cmake ..
$ makeAs you have already noticed, the project is being built in the project/build folder. The result of this build is the c2eo file in project/bin. Now you have a transpiler and you can convert programs from C to EO. Just run:
$ ./c2eo <path-to-c-file-name> <eo-file-name>.eo
# ./c2eo ../some_dir/example.c example.eoYour PR will pass the following checks, so before creating PR run these locally to make sure everything is ok:
$ clang-format project/src/transpiler/*.(cpp|h) -i $ cpplint --filter=-runtime/references,-runtime/string,-build/c++11 project/src/transpiler/** $ cd project/scripts
$ python3 clang_tidy.py$ cd project/scripts
$ python3 transpile.py <your_path_to_the_folder>/gcc.c-torture -s gcc -n$ cd project/scripts
$ python3 test.py -p <your_path_to_the_folder>/c-testcuite -s testcuite -n- test
$ cd project/scripts
$ python3 test.py -s test- unit-tests
$ cd project/scripts
$ python3 build_c2eo.py
$ project/bin/
$ ./unit_tests --gtest_filter=*From project/scripts/ directory:
$ python3 update-release.py -h
usage: update-release.py [-h] [--branch BRANCH] [--version VERSION]
Release maker
optional arguments:
-h, --help show this help message and exit
--version VERSION specify the new versionExample
$ python3 update-release.py --version=0.1.1To use this script, make sure you have the following packages installed:
$ pip3 install git_config pgpy s3cmd
$ apt install md5deep reprepro gcc cmake dpkg wget tar s3cmd -y
# for the latest version of the cmake package, try:
$ pip3 install cmakeNotes:
- Use
.as a version delimiter. - This script uses the current date, time, and time zone. Make sure they are configured correctly.
- This script extracts your name and email from
git config. Make sure you have them.
This script will write automatically generated merges to the changelog file. You can view an approximate list of changes by running the following command in the terminal:
$ git log $(git describe --tags --abbrev=0)..HEAD --merges --oneline --format=" * %h %s by %an <%aE>"- Build the executable file.
- Create a deb file (basic: HABR)
- Create a repository (basic: UNIXFORUM)
- Upload a repository tree into the bucket's virtual 'directory'.
The following files will be generated
$ tree
.
├── c2eo-X.X.X
│ ├── DEBIAN
│ │ ├── changelog
│ │ ├── control
│ │ ├── copyright
│ │ └── md5sums
│ └── usr
│ ├── bin
│ │ └── c2eo
│ └── lib
│ ├── libclangAnalysis.so
│ ├── libclangAnalysis.so.12
│ ├── ...
│ └── libLLVMTransformUtils.so.12
├── c2eo-X.X.X.deb
├── readme.md
├── repository
│ ├── conf
│ │ └── distributions
│ ├── db
│ │ ├── checksums.db
│ │ ├── contents.cache.db
│ │ ├── packages.db
│ │ ├── references.db
│ │ ├── release.caches.db
│ │ └── version
│ ├── dists
│ │ └── c2eo-rep
│ │ ├── contrib
│ │ │ ├── binary-amd64
│ │ │ │ ├── Packages
│ │ │ │ ├── Packages.gz
│ │ │ │ └── Release
│ │ │ ├── binary-i386
│ │ │ │ ├── Packages
│ │ │ │ ├── Packages.gz
│ │ │ │ └── Release
│ │ │ ├── debian-installer
│ │ │ │ ├── binary-amd64
│ │ │ │ │ ├── Packages
│ │ │ │ │ └── Packages.gz
│ │ │ │ └── binary-i386
│ │ │ │ ├── Packages
│ │ │ │ └── Packages.gz
│ │ │ └── source
│ │ │ ├── Release
│ │ │ └── Sources.gz
│ │ ├── InRelease
│ │ ├── main
│ │ │ ├── binary-amd64
│ │ │ │ ├── Packages
│ │ │ │ ├── Packages.gz
│ │ │ │ └── Release
│ │ │ ├── binary-i386
│ │ │ │ ├── Packages
│ │ │ │ ├── Packages.gz
│ │ │ │ └── Release
│ │ │ ├── debian-installer
│ │ │ │ ├── binary-amd64
│ │ │ │ │ ├── Packages
│ │ │ │ │ └── Packages.gz
│ │ │ │ └── binary-i386
│ │ │ │ ├── Packages
│ │ │ │ └── Packages.gz
│ │ │ └── source
│ │ │ ├── Release
│ │ │ └── Sources.gz
│ │ ├── non-free
│ │ │ ├── binary-amd64
│ │ │ │ ├── Packages
│ │ │ │ ├── Packages.gz
│ │ │ │ └── Release
│ │ │ ├── binary-i386
│ │ │ │ ├── Packages
│ │ │ │ ├── Packages.gz
│ │ │ │ └── Release
│ │ │ ├── debian-installer
│ │ │ │ ├── binary-amd64
│ │ │ │ │ ├── Packages
│ │ │ │ │ └── Packages.gz
│ │ │ │ └── binary-i386
│ │ │ │ ├── Packages
│ │ │ │ └── Packages.gz
│ │ │ └── source
│ │ │ ├── Release
│ │ │ └── Sources.gz
│ │ ├── Release
│ │ └── Release.gpg
│ └── pool
│ └── main
│ └── c
│ └── c2eo
│ └── c2eo_X.X.X_all.deb
├── todo.sh
└── update-release.py
35 directories, 120 files
Then you have to upload ./repository/dists and ./repository/pool to c2eo.polystat.org/debian/.
C is a system-level procedural programming language with direct access to the underlying hardware architecture elements, such as memory and registers. EO, on the other hand is a high-level object-oriented language. There are a number of non-trivial mechanisms of translating constructs from the former to the latter, which are explained below:
✔️ Implemented:
- basic data types: double, int, bool, char, string
- const
- arrays
- structures
- unions
- functions
- function call operators
- multiple return
- pointers
- external links
- if-else
- ternary operator
- while
- do-while
- for
- break
- continue
- switch case default
- operators
Let's take the following C code as an example:
double z = 3.14;In EO, we represent the global memory space as a copy of ram object, which we call global. Thus, the variable z would be accessed as a block of 8 bytes inside ram at the very beginning, since it's the first variable seen. For example, to change the value of z we write 8 bytes to the 0th position of global:
ram > global
global.write 0 (3.14.as-bytes)We transform const like ordinary variable.
const int a = 3;
if (a == 10) {
...
}a.write-as-int32 3 // only once
if
a.read-as-int32.eq 10
seq
...
TrueWe can work with enumerated types as well as with constants and substitute numeric values instead of names.
enum State {Working = 1, Failed = 0};
if (10 == Working) {
...
}if
10.eq 1
seq
...
True
seq
TrueIf we have fixed-size arrays we can work like with one-dimension array and calculate bias from start for any element and dimensions. In this example, we use a special object address, which makes it more convenient to read and write information from memory from a certain position.
int a[2] = { 5, 6 };
╭─────┬─────╮
| 5 │ 6 │
├─────┼─────┤
| 0th │ 4th │
╰─────┴─────╯address global-ram 0 > a
a.write (4.mul 0) (5.as-bytes)
a.write (4.mul 1) (6.as-bytes)We know the size of structures so we generate additional objects that store the bias of the fields of the structure and allow access to them. For nested structures and other types, we can also calculate bias and generate corresponding objects.
struct Rectangle {int x; int y;} rect;
rect.x = 5;
╭───────┬───────╮
| int x │ int y │
├───────┼───────┤
| 0th │ 4th │
╰───────┴───────╯address global-ram 0 > rect
0 > x
4 > y
(rect.add x).write 5The size of the union is determined by the nested object with the maximum size. The main feature is that internal objects are located at the beginning of the same address. We do the same with nested structures.
union { int a; int b; } u;
u.a = 5;
╭───────┬───────╮
| int a │ int b │
├───────┼───────┤
| 0th │ 0th │
╰───────┴───────╯address global-ram 0 > u
0 > a
0 > b
(u.add a).write 5In a similar way we deal with function call, we calculate the necessary space for arguments (param-start and param-size) and local variables in global for each function call. The variable r will be "pushed" to global and accessible by the code inside the function foo by the 0th position with local offset. The local variable x will also be pushed to the global and will be accessible by the 4th with local offset, because the length of int is four.
Also we use separate copy of ram named return for storing function return result. Here, we are trying to simulate the bevaviour of a typical C compiler. The declaration of foo and its execution may look like this:
double pi = 3.14;
void circle(int r) {
double x = 2 * pi * r;
return x;
}
circle(10);
╭──────────┬───────┬──────────╮
| double z │ int r │ double x │ // variables in global
├──────────┼───────┼──────────┤
| 0th │ 8th │ 12th │ // start position in global
╰──────────┴───────┴──────────╯address global-ram 0 > pi
[param-start param-size] > circle
global.read param-start > r
global.read (add param-start 4) > x
seq > @
x.write (2.mul (pi.mul r))
return.write x
seq
pi.write 0 3.14
global.write 8 10 // write 10 to circle arguments stack
circle 8 4 // arguments stack start from 8th byte and have 4 bytes for rThe function has input variables and local variables. To determine the amount of memory for input variables, we use two parameters in the function description. For the convenience of accessing local variables, we use the bias local-start of the local position. To indicate a free position, we use empty-local-position. We divide the nested function call into several consecutive calls, the result of which is passed to subsequent calls.
long long func1(long long x) {
return x - 111;
}
long long func2(long long x) {
return x - 10;
}
void main() {
long long a;
a = func1(func2(5));
printf("%lld\n", a);
} [param-start param-size] > func1
add param-start param-size > local-start
add local-start 0 > empty-local-position
address global-ram (param-start.add 0) > x
seq > @
return.write (x.sub 111)
TRUE
[param-start param-size] > func2
add param-start param-size > local-start
add local-start 0 > empty-local-position
address global-ram (param-start.add 0) > x
seq > @
return.write (x.sub 10)
TRUE
[] > main
seq > @
a.write // write func1 return in a
seq
write // write func2 return in temp place
address global-ram (add empty-local-position 0)
seq
write // write 5 to func2 arguments stack
address global-ram (add empty-local-position 0)
5
^.func2 empty-local-position 8
return
^.func1 empty-local-position 8
return
printf "%d\n" aWe generate a record of the result in a separate ram memory object. Further, other functions can read the result from there. To solve the multiple return problem, we can use the goto object in eo. By wrapping the entire function in a similar object, we can interrupt its execution at any time. To do this, you just need to generate a g.forward call for each return.
function {
...
return <result_1>;
...
return <result_2>;
...
return <result_3>;
}[] > function
goto > @
[g]
seq > @
...
return.write <result_1>
g.forward TRUE
...
return.write <result_2>
g.forward TRUE
...
return.write <result_3>
g.forward TRUEC code may get an address of a variable, which is either in stack or in global memory:
int f = 7;
void bar() {
int t = 42;
int* p = &t; // local scope
*p = 500; // write from local scope to local
p = &f; // global scope
*p = 500; // write from local scope to global
}
╭───────┬───────┬────────╮
| int f │ int t │ int* p │ // variables in global
├───────┼───────┼────────┤
| 0th │ 4th │ 8th │ // start position in global
╰───────┴───────┴────────╯However, as in C, our variables are located in global and have absolute address.
The object param-start provided as an argument to EO object bar is a calculated offset in global addressing the beginning of the frame for function call. Thus, &t would return param-start + 0, while &f would be just 0:
[param-start] > bar
global.write
8 // int* p
param-start // &t -> function offset position in global space
global.write
8
0 // &f -> address of f in global
seq > @
bar 4To compile files with any external links, we use the following solution:
In the file where the external call is used, we generate the following alias
#include <string>
strncpy(str2, str1, 8);+alias c2eo.external.strcpy
strncpy str2 st1 8Сreating a file of the same name by the specified alias with an empty implementation
+package c2eo.external
[args...] > strncpy
TRUE > @In EO, we have analogues of if-else and if objects, so we just convert without any significant changes.
if (condition) {
...
}
else {
...
}
// -----------------
if (condition) {
...
}if-else
condition
seq
...
TRUE
seq // else
...
TRUE
// -----------------
if
condition
seq
...
TRUEWe can turn the ternary operator into the same if-else, only seq must be without True at the end, because its return value will be used.
condition ? a : bif-else
condition
seq
a
seq // else
bWe can generate of C while on the EO by using goto, conditional operator and analogs for break and continue.
while (condition) {
...
}goto
[while-loop-label]
while-loop-label.backward > continue
while-loop-label.forward TRUE > break
if > @
condition
seq
body
continue
TRUEWe can generate an analog of C do-while on EO by using nested goto for further checking by a conditional operator and analogs for break and continue.
do {
body
} while (condition)goto
[do-while-loop-label-1]
do-while-loop-label-1.forward TRUE > break
seq > @
goto
[do-while-loop-label-2]
do-while-loop-label-2.forward TRUE > continue
body > @
if
condition
do-while-loop-label-1.backward
TRUEWe can generate an analog of C for on EO using the nested goto to execute loop-expression after executing the body of the loop, conditional operator and analogs for break and continue.
for(init;condition;loop-expression) {
body
}init
goto
[for-loop-label-1]
for-loop-label-1.forward TRUE > break
if > @
condition
seq
goto
[for-loop-label-2]
for-loop-label-2.forward TRUE > continue
body > @
loop-expression
for-loop-label-1.backward
TRUEWith goto object we can transofrm any number of breaks in cycle to g.forward TRUE call.
while (condition) {
...
break;
...
}goto
[while-loop-label]
while-loop-label.backward > continue
while-loop-label.forward TRUE > break
if > @
condition
seq
...
break
...
TRUEWith goto object we can transofrm any number of continue in cycle to g.backward call.
while (condition) {
...
continue;
...
}goto
[while-loop-label]
while-loop-label.backward > continue
while-loop-label.forward TRUE > break
if > @
condition
seq
...
continue
...
TRUEWe can convert such simple switch statement to goto object.
switch (x): {
case 1:
op1;
break;
case 2:
case 3:
op2;
break;
case 4:
op3;
case 5:
op4;
break;
case 6:
default:
op6:
break;
} memory > flag
goto > @
[end]
seq > @
write flag 0
if
or (eq x 1) flag
seq
write flag 1
op1
end.forward TRUE
TRUE
if
or (eq x 2) flag
seq
write flag 1
TRUE
if
or (eq x 3) flag
seq
write flag 1
op2
end.forward TRUE
TRUE
if
or (eq x 4) flag
seq
write flag 1
op3
TRUE
if
or (eq x 5) flag
seq
write flag 1
op4
end.forward TRUE
TRUE
if
or (eq x 6) flag
seq
write flag 1
TRUE
op6
end.forward TRUE
TRUEThe table of all C operators and similar objects in the EO.
| С | EO |
|---|---|
| + | plus |
| - | minus |
| * | times |
| * | write|read-as-<type> |
| / | div |
| = | write-as-<type> |
| % | mod |
| +x | pos |
| -x | neg |
| ++x | pre-inc-<type> |
| x++ | post-inc-<type> |
| --x | pre-dec-<type> |
| x-- | post-dec-<type> |
| == | eq |
| != | neq |
| < | lt |
| <= | lte |
| > | gt |
| >= | gte |
| && | and |
| || | or |
| ! | not |
| & | bit-and |
| & | addr-of |
| | | bit-or |
| ^ | bit-xor |
| ~ | bit-not |
| << | shift-right |
| >> | shift-left |
| (type casting) | as-<type> |
x += 10;For assignment operations, we generate the following constructs
x.write (x.add 10)In EO, an implementation of at least 8 bytes is used to store floating-point numbers. At the moment, full support for numbers with fewer bytes is not possible. So far, to work with such numbers, we also use 8 bytes for storage.
float b = 5.0; // 4 byteswrite-as-float32 b 5.0 // 8 bytesAt the moment, the largest type in EO is int64, there is no support for uint64 numbers and it crashes with an error at the compilation stage. The current our implementation supports numbers in the range of type uint63 (the first bit is always 0 for correct translation to int).
unsigned long long int c = 18446744073709551615; // max uint64 value
unsigned long long int c = 9223372036854775807; // max int64 value// [COMPILATION EXCEPTION] the number is too high
write-as-uint64 c 10223372036854775807
// correct
write-as-uint64 d 9223372036854775807Source: https://stackoverflow.com/questions/840501/how-do-function-pointers-in-c-work
Let's start with a basic function which we will be pointing to:
int addInt(int n, int m) {
return n + m;
}First thing, let's define a pointer to a function which receives 2 ints and returns an int:
int (*functionPtr)(int, int);Now we can safely point to our function:
functionPtr = &addInt;In EO we generate special object call with array for storing all function call:
[index param-start param-size] > call
at. > @
*
<function_name_1> param-start param-size
addInt param-start param-size // our function has an index of 1
...
<function_name_n_n> param-start param-size
indexNow, if we want to assign the function to a pointer, we replace this expression with a specific index value of this function in our array
write-as-ptr functionPtr 1Now that we have a pointer to the function, let's use it:
int sum = (*functionPtr)(2, 3); // sum == 5... // before calling the function, we place its arguments in memory
write-as-int32
sum
call
param-start
param-size
read-as-ptr functionPtr // return 1Current development at this stage
Passing the pointer to another function is basically the same:
int add2to3(int (*functionPtr)(int, int)) {
return (*functionPtr)(2, 3);
}We can use function pointers in return values as well (try to keep up, it gets messy):
// this is a function called functionFactory which receives parameter n
// and returns a pointer to another function which receives two ints
// and it returns another int
int (*functionFactory(int n))(int, int) {
printf("Got parameter %d", n);
int (*functionPtr)(int, int) = &addInt;
return functionPtr;
}But it's much nicer to use a typedef:
typedef int (*myFuncDef)(int, int);
// note that the typedef name is indeed myFuncDef
myFuncDef functionFactory(int n) {
printf("Got parameter %d", n);
myFuncDef functionPtr = &addInt;
return functionPtr;
}The EO language has a goto object that supports transitions similar to the continue and break statements. We also use it to implement multiple return statements to terminate a function.
The goto statements in the C language provides a variety of jumps to the appropriate labels, violating the principles of structured programming. It cannot be implemented for many situations using the goto object that EO has. Replacing goto in C with other statements requires additional effort. It is very difficult to implement based on AST analysis alone. For example:
if (a) {
A();
goto L3;
}
B();
L1:
if (b) {
L2:
C();
L3:
D();
goto L1;
}
else if (c) {
E();
goto L2;
}
F();stateDiagram-v2
state "if (a)" as if_1
state "if (b)" as if_2
state "else if (c)" as if_3
state "L1:" as L1
state "L2:" as L2
state "L3:" as L3
state "A();" as A
state "B();" as B
state "C();" as C
state "D();" as D
state "E();" as E
state "F();" as F
[*] --> if_1
if_1 --> A: True
A --> L3
if_1 --> B: False
B --> L1
L1 --> if_2
if_2 --> L2: True
L2 --> C
C --> L3
L3 --> D
D --> L1
if_2 --> if_3: False
if_3 --> E: True
E --> L2
if_3 --> F: False
F --> [*]
To solve this problem in the future, we can propose to implement goto-statement and goto-label objects in EO, which directly implement the goto semantics of the C language and are used when transforming from C to EO. The use of these objects in direct EO programming can be disabled. In addition, these constructs can be replaced when performing static analysis of EO programs.
Modern C compilers do not have direct support functions with a variable number of arguments. This is due to the use of new standards for passing arguments through registers. Therefore, an additional library is used for implementation, the interface of which is connected via the stdarg.h header file. The library describes such constructs as va_list\verb, va_start, va_arg, va_end and others that are added to a C program at link time. Therefore, the implementation of these functions during transformation is impossible. For example:
#include <stdarg.h>
#include <stdio.h>
double average(int num,...) {
va_list valist;
double sum = 0.0;
int i;
/* initialize valist for num number of arguments */
va_start(valist, num);
/* access all the arguments assigned to valist */
for (i = 0; i < num; i++) {
sum += va_arg(valist, int);
}
/* clean memory reserved for valist */
va_end(valist);
return sum / num;
}
int main() {
printf("Average of 1, 2, 3, 4 = %f\n", average(4, 1, 2, 3, 4));
printf("Average of 1, 2, 3 = %f\n", average(3, 1, 2, 3));
}In the further development of the project, it may be to try implementing a linker that combines EO library packages with compilation units obtained during the transformation of C programs into a single program. Another possible, but labor-intensive option could be additional parsing of functions that support a variable number of parameters. This option introduces additional features, but leads to deviations from the standard of C compiler.
In the C language, bitwise fields can be formed as structures. They provide access to individual bits of signed and unsigned numbers. Since at the moment EO does not support work at the bit level, the implementation of work with bit fields requires a lot of time and manipulations with logical operations at the byte level. This task was not a priority during the current development.
// memory-optimized date storage structure
struct date {
unsigned int day: 5; // the maximum value of days is 31, so we need 5 bits for this
unsigned int month: 4; // the maximum value of months us 12, so weed 4 bits for this
unsigned int year;
};
struct date d = {15, 7, 2022};
printf("Date size is: %lu bytes\n", sizeof(d)); // 8 bytes instead 12
printf("Date is %d.%d.%d", d.day, d.month, d.year); // 15.7.2022