Posts

Multiple Experts, Multiple Objectives

At long last, I present my Scholars project, where I engineered a (somewhat primitive) framework that disentangles data containing many behaviors from different experts to learn to steer a model towards one mode of behavior or another.

Last updated on Jul 10, 2021 15 min read

The Final Stretch

As I round the corner of the final days of my project, I am striving to succeed in provably demonstrating its practical utility, simplicity and contributions. The neural architecture I have designed so far, however, is anything but simple.

Last updated on Mar 16, 2021 1 min read

Exploring New Depths

Over the last two weeks I have been delving into new depths, turning a problem over in my mind for days at a time, without certainty of success. Continuing to probe at an idea in the face of possible failure can be daunting, and I wanted to share some of the tips that helped me overcome the challenges of designing novel solutions.

Last updated on Mar 2, 2021 3 min read

The Makings of an Option

Reinforcement learning literature involves learning to pursue actions that provide sufficient enough rewards (or minimize the agent’s cost). As we have previously seen, encouraging continuous actions defined as an “environment step” can be tricky because of the credit assignment problem, wherein the learning function must attribute credit for rewards or costs to some actions taken along a trajectory**.

Last updated on Feb 16, 2021 5 min read

Making and Benchmarking a Clone

The Expert We trained multiple experts at different thresholds and constraints, but in this report we will discuss a configuration set (alias Marigold), for which we ran multiple smaller cloning experiments.

Last updated on Feb 1, 2021 3 min read

Learning with Constraints

Reinforcement Learning as a field has advanced in lockstep with advances in compute power. Iterative generation of tricks improved performance by learning the values of actions. The value function is therefore necessary for choosing actions.

Last updated on Jan 18, 2021 4 min read

Experiments and Logging

My high-level goal over the last week or so involved finishing the write-up for trajectory collection for constrained and unconstrained agents. Immediately upon starting training, I butted up against the question of how and when to log experiments and agent performance.

Last updated on Jan 4, 2021 2 min read

Learning from OpenAI Experts

An emergent trend of my posts so far has been my attempt to link my progress through curriculum study to concepts in artificial intelligence. The end of my learning is no different.

Last updated on Dec 5, 2020 5 min read

Learning with Rewards

Reinforcement learning is the subset of machine learning in which an agent exists within an environment and looks to maximize some kind of reward. The agent takes an action, which alters the environment in some way, observes the reward associated with the environmental change.

Last updated on Nov 23, 2020 3 min read

80:20 Learning

The past 2 weeks have been a blur, wherein I read papers that significantly advanced my understanding of the evolution of deep learning research and, more broadly, how to learn.

Last updated on Mar 16, 2021 16 min read