1. What are the target applications and what is the expected sophistication of the target users (naive, knowledgeable, expert)? - Target applications are both regular and irregular, focusing on miniApps and patterns of interest to DOE - The target users are parallel application experts when writing directly to the OCR API. We also target expert parallel languages/libraries/runtime developers for mapping their work to OCR 2. List the set of features/concepts of your programming model and divide them into two sets: What is the minimum set of things a new user need's to learn to become productive? What are the more advanced features a user can potentially take advantage of? - User need to know about the basic OCR constructs: EDT, events, datablocks, GUIDs. More advanced features include labeled GUID for distributed DAG building and referencing OCR objects as well as few specialized types of events. 3. How do you decide whether a new application is a good fit? What metrics do you use to evaluate whether an application is implemented well in your model? - OCR being a low-level API, it needs to accomodate all sorts of programming paradigm and models. In that sense every use case is a 'good' fit. We also see value in both static and dynamic workloads. For static workloads, introspection and adaptation may play a minor role but the runtime can still offer resiliency services. For dynamic workloads, the dynamic nature of OCR is more appropriate and the runtime can offer load-balancing. - A key metric is exposed parallelism: how much of the application’s parallelism is expressed with OCR. We have been evaluating an application by studying the task execution timeline and identifying the “gaps” between task boundaries which indicate the slackness in the parallelism. However, one could come up with a theoretical method to evaluate this metric in a platform-independent manner and without having to execute the application. - From that, we look at the achieved throughput with respect to a state of the art implementation of that same application's implementation. - Another consideration is how much of non-application specific code such as custom load-balancing or backups shift from the appplication code to the runtime. 4. What is the plan for interoperability with MPI, OpenMP, Kokkos, etc.? If you could add requirements to MPI, what would those be? - We have a tool-chain that allows running MPI programs on top of OCR. Users should not expect high-performance but the code will run providing it uses the supported MPI constructs. OpenMP and Kokkos can be invoked directly from within an OCR task and may lead to resource contention between the runtime implementations. 5. What is the plan for performance portability? - Applications should be written in a generic enough way that the granularity of tasks and data can be easily adapted either dynamically or statically to map to the characteristics of the underlying hardware. We also envision that OCR hints could be tuned for the application to better map to the hardware. 6. What is the plan for fault tolerance? - We have demonstrated recovery from data corruption for a single X86 node & TG architecture. We have also demonstrated recovery from node failure in a cluster environment through selective checkpointing. Next work will focus on the TG architecture assessing overheads and handling core failure. 7. What static analyses and transformations could you do? What do you do today? - We have not explored static analysis and transformations options for OCR programs. We are exploring a source to source transformation tool to generate OCR code from annotated C code that may be able to statically guarantee some properties of the graph. 8. Questions about task graphs: * When is the task graph generated (compile-time, load-time, run-time)? - The graph is generated at runtime. * How do you manage task graph generation vs. task graph execution? - If the application allows it the whole graph can be generated at start-up or it can be dynamically built. * What is the value of non-ready tasks in the DAG? - Information about non-ready tasks is currently not leveraged by the OCR implementation. * Do you exploit the repetitiveness of iterative applications that repeatedly execute the same task graph? - Partially. We have the concept of channel events that can be reused across iterations of an application. However, we have not yet explored more complex patterns reusability. 9. Questions about tasks: * How is task granularity managed? - The burden is on the programmer or the high-Level language targeting OCR. * What is the life-cycle of a task? - The OCR specification fully defines the life-cycle of a task. Important states are when all dependences of a task are set up (the runtime gains the knowledge of the task in the larger DAG) and when all of a task's dependences have been satisfied which tells which data will be accessed and can be used for scheduling and placement purpose. 10. What is the relationship between task and data parallelism --- can one be invoked from the other arbitrarily or are there restrictions? - Task parallelism is necessary to achieve data parallelism, i.e. only a task can operate on data. Datablock can be accessed simultaneously by multiple EDTs too. 11. Where exactly is concurrency (meaning the ability to have races and deadlocks) exposed to the programmer, if at all? - Data races happen when multiple tasks are reading/writing to a datablock concurrently. The memory-model defined in the OCR specification state what is observable. - No deadlocks per say but an EDT may never execute if it doesn't have all its dependences set or satisfied.