diff --git a/assets/static/img/codelogician-process.png b/assets/static/img/codelogician-process.png new file mode 100644 index 00000000..e81a7430 Binary files /dev/null and b/assets/static/img/codelogician-process.png differ diff --git a/code-logician-tut.md b/code-logician-tut.md new file mode 100644 index 00000000..93c115fe --- /dev/null +++ b/code-logician-tut.md @@ -0,0 +1,368 @@ +# CodeLogician Tutorial + +*CodeLogician* helps you write better code by formalizing it into mathematical models, logically reasoning about it, generating test cases and then converting the results back. It is an agent which combines "statistical AI" (e.g. LLMs) and "symbolic AI" (e.g. automated reasoning, formal methods) resulting in the combination that's referred to as "neurosymbolic AI." The entire process is powered by ImandraX, the latest version of our reasoning engine which we've used in large industrial projects. + +## Your first neurosymbolic AI experience +Before we dive deep into the details, let's get you warmed up with a quick example. We list the steps depending on how you're using *CodeLogician*. + +```python +# Simple python program with +# some verification statements +``` + +### Using VS Code Extension +Step 1: +``` +Describe the steps of using VS Code Extension +``` +Step 2: + +Step 3: + +### Using Cursor MCP Server +Step 1: +``` +Describe the steps of using cursor +``` +Step 2: + +Step 3: + +### Using Python library/API +Step 1: +```python +Describe the steps of using library/API +``` +Step 2: + +Step 3: + +And there you have it - your first neurosymbolic AI experience (or maybe not!)! + +To recap, you have just: +- Written a bunch of code in Python (we'll call this *source program* written in *source language* going forward) +- Used *CodeLogician* to turn it into a formal (mathematically precise) *model* (the process as *autoformalization*) with a *verification goal* in ImandraX +- Used *ImandraX* to verify the goal to confirm it's correct (this process is referred to as *formal verification*) +- Asked *CodeLogician* to extract test cases from *ImandraX* analysis + +## Overview of the process + +*CodeLogician* is a LangGraph agent that uses LLMs, *ImandraX* various tools (e.g. static analyzers) to convert your source program into formal models, help you verify their properties, correct the original source code and extract test cases. + +When you're using *CodeLogician*, you're incrementally creating a model of your source code. The are two phases of using *ImandraX* are as follows: +- Model admitted to *ImandraX* - there are no missing definitions and type-checking passes +- Model admitted *ImandraX* and executable - there are no opaque functions (either they're fully implemented or substituted with approximations) + +We'll go into approximations in more detail later, but for now you can think of them as *mocking* functionality you don't have access to (e.g. third party services or libraries) or computation that difficult to reason about (e.g. *exp* function). + +One way to describe this process is *model-based software engineering*, or alternatively working with a *digital twin*. But the idea is the same - you're working with some non-trivial source code and you want to make sure you have all the edgecases covered, you can explain the behavior and generate intelligent test cases. Doing so directly in Python or *source languages* (e.g. Java) is very difficult directly (for various reasons beginning with their type systems), so a useful approach is to create a model of the code you're implementing that you can reason abou mathematically. Think of using Matlab to simulate airplane control algorithm before implementing it as part of the actual physical airplane. With the help of LLMs, formal tools (e.g. static analyzers), we transform your code into a model that our reasoning engine, *ImandraX*, can reason about. + +![CodeLogician Process](assets/static/img/codelogician-process.png) + +It's useful to think of the process that the agent implements through three elements: +- 1 - the agent uses your *source program* and your feedback (*Human in the loop*) in the process of creating the *model* +- 2 - the multistep process of *autoformalization* and *refinement* during which the agent uses *ImandraX*, various tools, its database of formalisms (recommended translation samples) and user input to create *model* +- 3 - the ultimate result will contain recommended changes, verification report and test cases + +The interface to the agent is very flexible - you can ask it to perform the full process of *model* creation, verification and test case generation, or ask it to perform individual steps. This last part is especially useful when you're using it via MCP server through a coding assistant like *Cursor*. + +Can you use these *models* directly? Absolutely! The models are written in *IML* (more on this later) which is (mostly, with few directives) a subset of OCaml (www.ocaml.org). But because of their precision (type information, etc), we can translate them back to the *source language* pretty easily. + +## Why should you be using *CodeLogician*? +There's a handful of reasons - in summary, it adds automated logical reasoning to your favorite coding assistant - by creating the formal logical *model*, you can answer questions with certainty, perform deep analysis and rigorously generate test cases. It automates a lot of the type of thinking about code you'd like to do, but it does it (unlike "pure" LLM-based techniques) with rigor. + +At a high level, there are the three primary benefits of using *CodeLogician*: +- Formalizing the requirements - it forces you be very precise about the code you'd like to write, be explicit about the assumptions and formulate properties the code should have (aka the *specification*) +- Verifying deep properties about the code using mathematical logic, covering infinitely many possible inputs +- Rigorously automate test case generation - considerably speeding up the process in a way that a human nor an LLM (with other statistical methods) can reproduce + +### Make sure you haven't missed something +Is this approach relevant for every bit of code that you write - probably not, but it's indispensable for critical logic, UI state transitions, etc. It's also unmatched at test case generation - we'll get to this later in detail. The process that *CodeLogician* mimics we've successfully applied in financial services and US government projects where we've saved millions in regulatory fines and reputational damages. + +For example - in a public case study we've done for the investment bank UBS, we demonstrated that if you encode the logic they described for how their dark pool matched orders (order priority calculation) as it was described in their regulatory findings, then it would ultimately lead to violation of best execution properties in a very non-trivial way. This is a type of logical error that would not be detected by a human (the LLM would not be able to identify it either) and could cost many millions of dollars in fines and reputational costs. The full notebook is here: [UBS Case Study](https://docs.imandra.ai/imandra-docs/notebooks/ubs-case-study/). + +The main bit of logic there - `order_higher_ranked (side, o1, o2, mkt)` - encodes the logic descibed in the section on order ranking (the actual English prose is provided in the notebook) and it contains a fundamental flaw. Here's how we will check for it: +```ocaml +let rank_transitivity side o1 o2 o3 mkt = + order_higher_ranked(side,o1,o2,mkt) && + order_higher_ranked(side,o2,o3,mkt) + ==> + order_higher_ranked(side,o1,o3,mkt) +``` + +`rank_transivity` is an example of a *Verification Goal*, a property we will to *verify* of the model written in *IML*. In this case, it takes `side` of the order, three order values `o1, o2, o3` and variable representing current market conditions (best bid/offer on the primary exchange) `mkt`. When *ImandraX* is presented with this *VG*, it tries to prove it but fails, because that property does not, in fact, hold true. So ImandraX produces a counterexample: +```ocaml +- : order_side -> order -> order -> order -> mkt_data -> bool = +module CX : + sig + val side : order_side + val o1 : order + val o2 : order + val o3 : order + val mkt : mkt_data + end +Counterexample (after 0 steps, 0.028s): +let side : order_side = BUY +let o1 : order = + {id = 9; peg = NO_PEG; client_id = 13; order_type = MARKET; qty = 7720; + min_qty = 14; leaves_qty = 1; price = 1; time = 1; src = 10; + order_attr = RESIDENT; capacity = Principal; category = C_ONE; + cross_restrict = + {cr_self_cross = false; cr_ubs_principal = false; + cr_round_lot_only = false; cr_no_locked_nbbo = false; + cr_pegged_mid_point_mode = 12; cr_enable_conditionals = false; + cr_min_qty = false; + cr_cat_elig = + {c_one_elig = false; c_two_elig = false; c_three_elig = false; + c_four_elig = false}}; + locate_found = false; expiry_time = 11} +let o2 : order = + {id = 18; peg = FAR; client_id = 19; order_type = PEGGED_CI; qty = 1237; + min_qty = 20; leaves_qty = 1; price = 8859; time = 2; src = 21; + order_attr = RESIDENT; capacity = Principal; category = C_ONE; + cross_restrict = + {cr_self_cross = false; cr_ubs_principal = false; + cr_round_lot_only = false; cr_no_locked_nbbo = false; + cr_pegged_mid_point_mode = 17; cr_enable_conditionals = false; + cr_min_qty = false; + cr_cat_elig = + {c_one_elig = false; c_two_elig = false; c_three_elig = false; + c_four_elig = false}}; + locate_found = false; expiry_time = 22} +let o3 : order = + {id = 3; peg = NEAR; client_id = 5; order_type = LIMIT_CI; qty = 2438; + min_qty = 6; leaves_qty = 0; price = 8858; time = 0; src = 7; + order_attr = RESIDENT; capacity = Principal; category = C_ONE; + cross_restrict = + {cr_self_cross = false; cr_ubs_principal = false; + cr_round_lot_only = false; cr_no_locked_nbbo = false; + cr_pegged_mid_point_mode = 4; cr_enable_conditionals = false; + cr_min_qty = false; + cr_cat_elig = + {c_one_elig = false; c_two_elig = false; c_three_elig = false; + c_four_elig = false}}; + locate_found = false; expiry_time = 8} +let mkt : mkt_data = {nbb = 8857; nbo = 8858; l_up = 10655; l_down = 8856} +``` + +*ImandraX* computed this counterexample from actually analyzing the logic and not by using LLM trained on similar examples, or some other statistical technique (e.g. QuickCheck). If it was actually true, then ImandraX would've produced a proof object that can be independently checked (our IP is in producing these counterexamples or proofs, but once produced they're easily checked by 3rd party software). Our documentation page has many other examples! + +Similar techniques have become bedrock in industries like hardware manfucaturing - where having a bug in the design of the microprocessor is ungodly costly! With the help of LLMs and agents, we can finally provide this + +### Functional testing done right +If you write source code professionally, you must've written test cases. There are different types of tests - from functional to other things. With *CodeLogician*, we will focus on those test cases that push your code into different "edge cases" as described by its logic - the closest definition of this is "behavioral testing." We're now going to give this process a mathematical foundation! + +One of the core features of *ImandraX* is the ability to decompose behavior of models into a finite number of symbolic regions where each region contains a set of constraints on the inputs and a description of the outcome. Both of these are described 'symbolically' - each, in fact, may contain infinitely many possible inputs that satisfy it. + +Consider this example of analyzing all of the edge cases of the SIX Swiss pricing logic (for a particular segment of the Swiss market): [SIX Swiss Exchange Pricing Notebook](https://docs.imandra.ai/imandra-docs/notebooks/six-swiss-exchange-pricing/). +That code must process effectively (we come from the world of math, so we always qualify the word "infinite" :) infinitely many possible inputs, but even from a brief look, you can tell that there's a finite number of edge cases. How do we analyze this efficiently? + +This is where ImandraX's region decomposition comes in - it's symbolic analysis of the model that decomposes the behavior of the model into a finite number of regions - each region may contain infinitely many possible inputs. These "regions" are the edge case of the model. This technique is inspired by [Cylindrical Algebraic Decomposition](https://en.wikipedia.org/wiki/Cylindrical_algebraic_decomposition) which we've lifted to algorithms at large. + +For the SIX Swiss example above, ImandraX identified 44 different regions. Here's one example: +```ocaml +Constraints: +- (List.hd ob.buys).order_qty = (List.hd ob.sells).order_qty +- (List.hd ob.buys).order_type = Market +- (List.hd ob.sells).order_type = Market +- (List.hd (List.tl ob.buys)).order_type <> Market +- ref_price >= (List.hd (List.tl ob.buys)).order_price +- (List.hd (List.tl ob.sells)).order_type <> Market +- ref_price > (List.hd (List.tl ob.sells)).order_price + +Invariant: +- F = Known (List.hd (List.tl ob.sells)).order_price +``` + +The `constraints` and `invariant` are both in *IML* - here are the relevant functions that appear: +- `List.hd` refers to the first element of the list (`ob.buys` is a list containing orders and `ob` is the type representing the order book) +- `List.tl` refers to the list except for the first element. So, `List.hd (List.tl ob.buys)` is the second element of the list. +If you're using *CodeLogician* through a tool like Cursor, then it's quite easy to convert these to English prose (or other formats and languages) or testing framework and language of your choice. + +In a sense, generation of these regions is also generative AI, but the results are derived solely from the logic of the model - no statistics involved! If you're using *CodeLogician* through a tool like Cursor, then it's quite easy to convert these to English prose (or other formats). + +These constraints have to be satisfied for the code to be in that specific region - this code is automatically generated by ImandraX, without any statistics involved - purely from the logical information encoded into the model. `Invariant` represents the outcome of calling that function on inputs that satisfy the constraints and together they define the region. ImandraX generates a proof that if you combine all of the regions together, you will recover the original behavior of the model. + +*CodeLogician* works with ImandraX to extract regions (and corresponding sample points) to generate high-coverage test suites. + +We'll cover `Region Decomposition` more later, but you should hopefully already appreciate the value proposition of this approach - just that example would take many hours for humans to go through themselves (another questions is whether they'd be able to enumerate them at all). LLM would not be able to generate this by itself! + + +### Make your LLM/coding agent smart at reasoning +*CodeLogician* should be one of the tools at your disposal. With the rapidly rising popularity of coding assistants like Cursor and prolifiration of MCP (Model Context Protocol), *CodeLogician* becomes an indispensable part of the developer's process. Moreover, since we use LangGraph - you can insert *CodeLogician* as a subagent using the *RemoteGraph* class. + +You want to use AI to "vibe code" your project? Great! But, first you should ask it to formalize and verify what it's trying to do with *CodeLogician*! + +## How you should be using *CodeLogician*? + +There are three ways to use *CodeLogician*: +- (Imandra CodeLogician VS Code Extension)[https://marketplace.visualstudio.com/items?itemName=imandra.imandra-code-logician] +- (API, we have a nice Python library wrapper)[https://docs.imandra.ai/universe/] +- MCP Server: Coming soon! + +## Background on ImandraX +ImandraX is the latest version of the Imandra automated reasoning engine. Its input language is *Imandra Modeling Language* (*IML*), a pure subset of OCaml (www.ocaml.org) with some added directives to interact with the engine. *IML* models can be used as ordinary OCaml programs. At Imandra, we've created mechanized formal semantics for this subset of OCaml which ImandraX uses under-the-hood to automatically translate *IML* models into mathematics, allowing it to use myriad of techniques to analyze them. + +Our documentation page contains many examples, but for our purposes - let's cover the two major features that *CodeLogician* relies upon: (dis)proving verification goals and region decomposotion. + +### Encoding and (dis)proving Verification Goals (VGs) +A `Verification Goal (VG)` is a boolean valued (returns a True or a False) function written in *IML* that encodes a property you wish to *verify* of the model. This is much more powerful than simply writing tests - where you would need to specify specific inputs and outputs, because here - you can express properties that must hold true for effectivelly infinitely many possible inputs. + +Let's revisit the UBS example (here's the link). Fundamentally, the model that's created there has three parts: +- Background type definitions and helper functions (e.g. `order` type) +- `order_priority` function that takes as input `` and returns `True` or `False` depending on the case - encoding the logic of the statement +- Transivity VG checking that `order_priority` function is, in fact, transitive. + +VGs are great for codifying statements you make about your code. + +Once you have admitted those into ImandraX and asked to verify it, you would typically get either: +- A counterexample demonstrating where the VG fails +- Or, a proof that the VG is `True` for all possible inputs + +### Region decomposition +`Region decomposition` refers to automated analysis of the model's state-space by dividing it up into a finite number of symbolic "regions." Each region contains: +- a set of `constrains` (expressed in *IML*) on the inputs such that the code goes into this direction +- `invariant result` refers to the outcome of running the model on inputs that satisfy the constraints +- `sample points` these refer to a sample set of inputs that satisfy the constraints + +There are other bits of information there, but those are the most important ones. Let's now revisit Let's now look at an example (here's the full link to the page: ). + +```ocaml +(* Expand on using decomposition with ImandraX *) +``` + +## Background on formalizing source code in *IML* + +Thinking mathematically about your code opens up access to many powerful tools, but it also requires a slightly different way of thinking about programming. In this section we'll discuss some of the details of how you'd go about it. + +### Thinking formally +Perhaps the first important distinction to discuss is that *IML* is based on the *pure* subset of OCaml - which means that it does not allow code with side-effects. In plain English, this means that if you write a function, nothing in it can influence values declared out side of it. All of the changes that are made to the global state of the program are included in the return value of function. Note that this requirement is not a limitation - you can convert programs with side-effects into formal state machine models where you "carry the state" in your code. We've encoded incredibly complex systems this way and technique goes back to the seminal work by Boyer and Moore. + +Here are some other key aspects of writing models in *IML* that are important to cover: +- *Strong types* - *IML* has strong typing, so things like assigning a variable to `None`, then `string`, and finally an `int` that you might see in Python (which has dynamic typing by default) are no go in *IML*. In fact, if the types are mismatched in the model, ImandraX will return an error. *CodeLogician* uses vairous tools to extract type information from the source program when it synthesizes the *IML* model. +- *Recursion* is used to model (besides recursive functions) loops. So, both `while` and `for` loops will be rewritten as recursive function calls. The key to reasoning about recursion is _induction_ and by rewriting (for example) a `for` loop as call to a recursive function, ImandraX can easily identify the inductive structure. +- *State machines* - these are typically *infinite* state machines because the *state* type may contain complex datatypes (structures, lists) in addition to the regular arithmetic types. *State machines* are a useful mechanism for modeling code with side-effects (where the variable outside the scope becomes part of the state) and, more generally, complex systems like stock exchanges. For example, you would the stream of incoming order messages as a list that the top-level `step` function iterates through. + +### What should we model (let's talk about abstraction)? + +*CodeLogician* is an agentic UI (one of several) to *ImandraX*, so the question "What type of source code should I apply to *CodeLogician*?" should be instead reformulated as "What type of code and how could/should I model in *ImandraX*?" While any type of executable program (even those that encode hardware design) could be modelled in *ImandraX*, how you go about it depends on what your objectives are. + +A useful method for thinking about how to go about formalizing source code is by organizing it into three categories of *abstraction* as is done below. By *abstraction* - we mean what type of reasoning you're interested in. For example, languages like C++ are universal - unlike (for example) VHDL, you can use it to program low-level hardware designs and business logic. So instead of anchoring on the languages themselves, we focus on "what is the type of program you want to write" and "what type of reasoning you would like to perform". + +#### 1. High-level +This category (typically) involves *DSLs* are used to represent complex processes, systems, etc. In this case a program in one of these languages would be translated into *IML* with high-level of compression. For example, our formal DSL *Imandra Protocol Language* (*IPL*) which we use to model symbolic state transition systems (e.g. a complex trading API) has 1:10 compression ratio for *IML* - 1 line of *IPL*, on average, results in 10 lines of *IML*. +For this category, we don't recommend using CodeLogician, but rather developing a DSL that ultimately translates into *ImandraX*. For *IPL*, we use Langium framework by TypeFox (and its predecessor Xtext/Xtend). + +We're working on another agent, *SpecLogician*, which encodes (ala Cucumber) user stories expressed in Given/When/Then sequences in English into a formal representation in *IPL* for subsequent analysis of state reachability and much more. + +The reasoning at this level deals with non-trivial state transitions, validations of data input/output represented by rich structures (conditioned on the current state), etc. + +#### 2. Mid-level +"Application level" software is another way to think of this category - it's most of the software that you interact with on a day-to-day basis. It is the "sweet spot " application type for *CodeLogician* - for these programs, you can (typically) encode *source programs* into *models* directly, without having to go via a DSL. + +#### 3. Low-level +This is also a great application of *ImandraX*, but is outside the scope for *CodeLogician*. Typically analyzing such code in *ImandraX* would require creating a virtual machine representing memory manager, CPU (for giving meaning to the instructions), etc. For example, at *Imandra* we created the first formal model of the Ethereum Virtual Machine - this required not only defining the instructions, but implementing the actual executable model of the virtual machine so we could reason about the execution of those instructions. If you were to simply ask *CodeLogician* to analyze a set of instructions at such low level, it would not have access to the virtual machine which is what's required to understand the semantics (aka meaning) of those instructions. + +## CodeLogician agent + +### Agent architecture + +[TODO add architecture information] + +### Opaque functions + +`opaque` functions in ImandraX represent functions whose signature you can define well, but you don't know specifics of the implementation. For example, they are a good way to model external libraries or external API calls. Gradually you can add more details to them. + +Another way to think of `opaque` functions is that it's forcing you to be explicit about the assumptions about external code that you make. For example, whereas if you ask an LLM what + +### Adding information to models through refinement + +*Refinement* refers to the process of adding more details to the model if it starts with opaqueness present. We think of a model as being in one of four stages: +1. `opaque` functions present +2. `opaque` functions with assertions (done via writing `axiom`'s) - this is adding more information which is used in verifications +3. `opaque` functions substituted with approximations - these are akin to "service mocking" where you replace the original function with an approximation (e.g. ) so that the model becomes fully executable. +4. No `opaque` functions - fully transparent model. + +For 3 and 4, ImandraX can perform region decomposition, generate test cases and attempt to verify VGs. For 1 and 2, it can only perform formal verification (VGs). + +If you start out with a model containing opaque functions, the more information you add to the model, the more precise type of reasoning you will be able to do. If it's a fully executable model (no opaqueness present), then you can generate test cases - because doing that will require actual execution of the model code. + +To reiterate - it might seem onerous to generate + +### Common methods requiring approximation +Without going deep into the theortical foundations of computer science, there are certain functions that are very difficult to automatically reason about - even when using the latest advances in automated reasoning. Here are several examples of the types of functions that require approximations: +- `exp` - exponent +- [TODO add more] + +The functions would require approximations to make the model executable. + +### Available commands + +- `full_run` - full run-through of the formalization, verification and test case generation (if possible) +- `formalize_model` - formalization of source program into an *IML* model - initial attempt +- `update_model` - updates the *IML* model from update to the *source program* +- `extract_verification_goals` - extract verification goals from *source program* comments (depending on the *source language*) +- `update_source` - updates source program from changes to the *IML* model +- `run_verification` - runs *ImandraX* on the *model* against the verification goals +- `extract_tests` - runs region decomposition for the select functions (retrieved from comments in *source program*) + +## Frequently Asked Questions + +- Q: What languages are supported? +- A: We've started with Python, but working on others, including Java, Apex (Salesforce), COBOL and C++. +- Q: Is the project open source? +- A: Not yet, but we're working on open sourcing the agent (CodeLogician, not ImandraX). +- Q: How do you know the model that's created is a correct representation of the source program? +- A: There's always going to be a mismatch, unless the original program is also written in *IML*. The agent attempts to build the model that is *functionally* equivalent (modulo assumptions and approximations). + +## Example walk throughs + +### Example 1: + + +### Example 2: + +Source program: +```python +# Source: https://machinelearningmastery.com/tensorflow-tutorial-deep-learning-with-tf-keras/ +# mlp for binary classification +from pandas import read_csv +from sklearn.model_selection import train_test_split +from sklearn.preprocessing import LabelEncoder +from tensorflow.keras import Sequential +from tensorflow.keras.layers import Dense +# load the dataset +path = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/ionosphere.csv' +df = read_csv(path, header=None) +# split into input and output columns +X, y = df.values[:, :-1], df.values[:, -1] +# ensure all data are floating point values +X = X.astype('float32') +# encode strings to integer +y = LabelEncoder().fit_transform(y) +# split into train and test datasets +X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33) +print(X_train.shape, X_test.shape, y_train.shape, y_test.shape) +# determine the number of input features +n_features = X_train.shape[1] +# define model + +# Verify that the model layers are connected correctly so there's no size mis-match... +model = Sequential() +model.add(Dense(10, activation='relu', kernel_initializer='he_normal', input_shape=(n_features,))) +model.add(Dense(8, activation='relu', kernel_initializer='he_normal')) +model.add(Dense(1, activation='sigmoid')) +# compile the model +model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy']) +# fit the model +model.fit(X_train, y_train, epochs=150, batch_size=32, verbose=0) +# evaluate the model +loss, acc = model.evaluate(X_test, y_test, verbose=0) +print('Test Accuracy: %.3f' % acc) +# make a prediction +row = [1,0,0.99539,-0.05889,0.85243,0.02306,0.83398,-0.37708,1,0.03760,0.85243,-0.17755,0.59755,-0.44945,0.60536,-0.38223,0.84356,-0.38542,0.58212,-0.32192,0.56971,-0.29674,0.36946,-0.47357,0.56811,-0.51171,0.41078,-0.46168,0.21266,-0.34090,0.42267,-0.54487,0.18641,-0.45300] +yhat = model.predict([row]) +print('Predicted: %.3f' % yhat) +``` + +