Open to Collab

18 4 22

AbstractPhila PRO

AbstractPhil

https://civitai.com/user/AbstractPhila

AbstractEyes

AI & ML interests

datasets, research papers, experimentation, vision, classification, text encoders, tokenization, llms, diffusion, distillation, and more.

Recent Activity

updated a model about 11 hours ago

AbstractPhil/geolip-svae-ablations

published a model about 15 hours ago

AbstractPhil/geolip-svae-ablations

upvoted an article about 17 hours ago

Omega Tokens: Finding The Self Solving Frame

View all activity

Organizations

updated a model about 11 hours ago

AbstractPhil/geolip-svae-ablations

Updated about 11 hours ago

published a model about 15 hours ago

AbstractPhil/geolip-svae-ablations

Updated about 11 hours ago

upvoted an article about 17 hours ago

Article

Omega Tokens: Finding The Self Solving Frame

13 days ago

•

updated a model 1 day ago

AbstractPhil/geolip-cvae-proto

Updated 1 day ago

published a model 1 day ago

AbstractPhil/geolip-cvae-proto

Updated 1 day ago

updated a model 1 day ago

AbstractPhil/geolip-svae-batteries

Other • Updated about 15 hours ago

published a model 1 day ago

AbstractPhil/geolip-svae-batteries

Other • Updated about 15 hours ago

updated a model 3 days ago

AbstractPhil/geolip-svd-encoder-sweeps

Updated 3 days ago

published a model 3 days ago

AbstractPhil/geolip-svd-encoder-sweeps

Updated 3 days ago

replied to their post 5 days ago

First note, there is no degeneracy in this cell now. As per hundreds of bulk tests with many readouts, the degeneracy is swept up in the SVD kernel, the fl_gram eigh svd, the FLEigh structure, or any of the subsequent catches that the pytorch handles.

The degeneracy problem is solved, and with that introduced a massive amount of new problems. Problems that I have built prototypes to address; each core problem has been narrowed down to three core components as solutions for information movement.

S^N sequential
Scattered S^N * D for orthogonal clustering
S * D + D * D for structural cohesive memory annealing

This comes down to three important utilities that many core structures depend on.

Sequence, distance, cosine similarity, QKV support, rotary support, and more.

Sequential structural cohesion; LLM, tokens, next token prediction, spearman, and so on.
Behavioral attenuated implicit; ViT, Resnets, diffusers, etc
Geometric alignment structure; Distillation, transfer learning, teacher/student, genetic inheritance, generational learning, SVAE, geolip prototypes, and constellations.

Third is least useful to the out of scope, first two are very useful so they are my predominant focus here.

I have 14 potential prototypes and I will be forming a notebook for each, testing the robustness, the positives, the negatives, the storage and recall capacity, the magnitude standardization vs normalization accuracy, the flow matched directional EMA vs non-EMA, the structural supported ensemble approach vs the residual approach, and a few other elemental substructures.

The biggest tradeoffs will be between normalization clipping and standardization unit structured tokens. These are inherently entirely different expectations and produce entirely different opinions.

Each of these experiments will be fully documented, the subsequent models included in the notebook sections, and the notebooks represented in the cell repo.

The Cell is a fickle beast, but I believe I have tamed the monster. The battery will be substantially stronger with the new cell upgrades, as the battery includes multiple constellation elements such as FILM solidification, normalization at curative points rather than destructive, and a few other elements to assist with producing tokenizations such as direct Conv support and huggingface transformer capacity for the MOE substructures.

As it stands, the transformer tokens here are represented simply as [b, S, D, V] also [b, S, U, Vt], and they have direct embedding tokenization potentials on many structures, but not all structures. There are multiple deviant structures that suffer from certain rules that require additional solutions before those work.

The prototypes may not exactly reflect this shape, and the shape may change for packaging and reuse purposes so bare with it for now. I'm only one person and I'm heavily relying on Claude to handle many of the logistics. I can code all of this, it just takes a lot longer for me to do manually so I'm basically on NO GELU HERE - NO NORMS HERE - NO PROJECTION HERE duty. I'm basically babysitting Claude so the code is correct and making sure the tests come out as they are supposed to.

replied to their post 5 days ago

updated a model 5 days ago

AbstractPhil/geolip-spectral-cell

Updated 5 days ago

replied to their post 6 days ago

Original Question:

Can the cells utilize positional encoding patches from the triton d=2 decompositions? I'm thinking maybe, and it's worth a shot.

Update:

They can, and without degenerates when curated correctly. Their usefulness isn't as helpful unless applied to an regression cascade that projects upward to the largest structure and compares rotation. The rotation for the projections are perfect procrustes if handled correctly as per the experimental documentation on the SVD triton kernel.

Original Question Assessment:

If the degenerates can converge in-model implicitly, this changes the game entirely. The SVD cell can handle ensuring the preservation, the reconstruction of the cells have no rival. Now... lets see if this holds up over huge dimensional spaces, or if the models simply... shatter with the thin triton 500x speed.

Getting access to that 500x speed is beyond reproach in terms of speed, there is no comparison.

Update:

The model can benefit from the 5000x speed, yes 5000 not 500, and it can be useful - however there are stipulations that require multiple uses so the gains are not as useful as I'd hoped. You need to run many more of them to get a useful informational cluster, otherwise you just end up norming everything to death down the line without magnitudes as shown by the cascade tests.
There NEEDS to be an fp64 triton kernel and I'll work that out today if it's even possible to run fp64 through triton in this fashion. The fp32 is showing serious problems with rounding faults and it needs to be addressed with fp64.
The D=2 models operate around 8x faster, at a much lower accuracy overall. Without the high fidelity access to the FLEigh or gram eigh SVD structure, the model simply does not have the necessary matmul accuracy to represent the outcome.
The D=2 magnitudes are useful only for D=2, if you project the rotation upward to the higher D you lose the magnitude directional accuracy as per the tests and documentation. This means D=2 when ran 32 times is still less accurate than a single D=8 in magnitude-sense, even when cross-correlation is used to determine the most likely magnitude.
This behavior is ideal when distilling a model's signals into another, not as useful when forming a proper embedding encoder utility chain. The encoder needs to be fairly stable, so you need to make sure the model is capably learning the encoding spectrum, and each subsequent encoder chain down the line sees the same structural system and the residual opinions of the last - otherwise the encodings are simply lost. I've mentioned it before, residuals are lossless in this regime, and with that the lossless behavior is essentially explained as rigidity and difficult to differentiate strictness. This when correctly aligned is a powerful implicit structural controller, and explicitly a nightmare to tune into something that isn't just ON OR OFF gating.

There are no more paved roads here... It's time to chart some jungle.

I've baked in CM config controllers into the head of the spectral cell. This will allow the CM to be crutched heavily, letting the model legitimately diverge and drift into impossible terrain and still maintain order - catching everything invalid as it cuts through.

updated a model 6 days ago

AbstractPhil/geolip-spectral-vit

Updated 6 days ago

published a model 6 days ago

AbstractPhil/geolip-spectral-vit

Updated 6 days ago

replied to their post 6 days ago

replied to their post 7 days ago

======================================================================
  COMPLETE
======================================================================
  Best val acc: 93.8%
  Time: 979s (8.2s/epoch)
  Conv: 4,251,200  Cells: 366,176  Head: 167,946  Total: 4,785,322

  Comparison:
    SpectralCell standalone (D=16 V=16 h=256 +conv +aug): 79.1%  926K  1.2s/ep
    ConduitBattery backbone (GPT trainer, ep55/120):       88.7%   ~2M  ?s/ep
    Conv + SpectralCell inline:                     93.8%  4,785,322  8.2s/ep

updated a collection 7 days ago

GEOLIP Research Concepts

Collection

A series of repos dedicated to geolip research and results, some with stored weights, some without. All entirely based on the progress of geolip. • 19 items • Updated 7 days ago

published a model 7 days ago

AbstractPhil/geolip-spectral-cell

Updated 5 days ago

AbstractPhila PRO

AI & ML interests

Recent Activity

Organizations

AbstractPhil's activity

Omega Tokens: Finding The Self Solving Frame

Original Question:

Update:

Original Question Assessment:

Update: