Advertisement
AquaBlitz11

Minispec FIFOs

Oct 8th, 2024 (edited)
273
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
  1. // None of the code in this file has been tested. Just be aware.
  2.  
  3. // AXI-style ready/valid handshake
  4. // Key rule: data_downs must not depend on stall_downs. That is, if we are ready to send data,
  5. // we should make it available regardless of whether downstream says it is ready to take.
  6. // Of course, this means we can assume upstream module tries it best to make data available
  7. // even if we say we aren't taking the data yet. This rule avoids deadlocks and livelocks.
  8. // Heuristic: Data transfer happens on the cycle when data is valid and stall is false.
  9. module AXIModule#(type In, type Out);
  10.   input Maybe#(In) data_ups; // Data from upstream. Guaranteed to remain valid as long as we tell upstream to keep stalling. (Not stalling means consuming.)
  11.   input Bool stall_downs; // The stall signal from downstream. if !stall_downs and we're providing data, we must consider the data consumed.
  12.   method Maybe#(Out) data_downs; // Data to provide to downstream. Once valid, must remain valid until consumed.
  13.   method Bool stall_ups; // Stall signal to return to upstream.
  14. endmodule
  15.  
  16. // A register between pipeline stages, with stall signal.
  17. // If downstream isn't taking data and we're full, then we tell upstream to stall.
  18. // If downstream is taking data and we have data to provide from the buffer, we do.
  19. // In the same cycle, we can also tell upstream to send data (since the buffer is being consumed),
  20. // which will get put into the buffer (as long as downstream is taking data).
  21. // That is, there is a combinational path from stall_downs to stall_ups.
  22. // Not the other way around: Data from upstream will always take at least one cycle to arrive to downstream,
  23. // because it needs to go through the buffer first. This is exactly the same behavior as pipeline registers we've seen in 6.191.
  24. module PipelineFIFO1#(type T);
  25.   Reg#(Maybe#(T)) buffer(Invalid);
  26.  
  27.   input Maybe#(T) data_ups;
  28.   input Bool stall_downs;
  29.   method Maybe#(T) data_downs = buffer; // always make data available without waiting for !stall_downs
  30.   method Bool stall_ups(Bool stall_downs) = stall_downs && isValid(buffer); // tell upstream to stall if we can't send out this buffer
  31.  
  32.   rule update;
  33.     // downstream tells us it's not stalling
  34.     if (!stall_downs) begin
  35.       // if our buffer is valid, we are indeed sending out that data (as we always do when buffer is valid),
  36.       // and we must consider it consumed because downstream is not stalling.
  37.       // the buffer needs to be cleaned out, taking in the new input if possible (could be empty).
  38.       // otherwise, if our buffer is empty, well, we still try to take new input anyway (could be empty).
  39.       buffer <= data_ups;
  40.     end else
  41.       // downstream tells us it IS stalling.
  42.       // if the buffer is empty, we can afford to still take the data.
  43.       // indeed, we have to, because we told upstream not to stall in this case.
  44.       if (!isValid(buffer)) buffer <= data_ups;
  45.       // otherwise, buffer is full and we can't clean it out. that's fine.
  46.       // stall_ups returns true in this exact case.
  47.     end
  48.   endrule
  49. endmodule
  50.  
  51. // An example where we chain a bunch of pipeline registers. This essentially creates a pipelined FIFO
  52. // with an unfortunate latency of n. The FIFO is "pipelined" in the sense that
  53. // even when the FIFO is full, enqueue and dequeue can happen in the same cycle.
  54. module PipelinedFIFO#(Integer 1, type T);
  55.   PipelineFIFO1#(T) buffer;
  56.   input Maybe#(T) data_ups;
  57.   input Bool stall_downs;
  58.   method Maybe#(T) data_downs = buffer.data_downs;
  59.   method Bool stall_ups(Bool stall_downs) = buffer.stall_ups(stall_downs);
  60.   rule update;
  61.     buffer.data_ups = data_ups;
  62.     buffer.stall_downs = stall_downs;
  63.   endrule
  64. endmodule
  65. module PipelinedFIFO#(Integer n, type T);
  66.   PipelineFIFO1#(T) first;
  67.   PileineBufferChain#(n-1, T) rest;
  68.   input Maybe#(T) data_ups;
  69.   input Bool stall_downs;
  70.   method Maybe#(T) data_downs = rest.data_downs;
  71.   method Bool stall_ups(Bool stall_downs) = first.stall_ups(stall_downs);
  72.   rule update;
  73.     first.data_ups = data_ups;
  74.     rest.data_ups = first.data_downs;
  75.     rest.stall_downs = stall_downs;
  76.     first.stall_downs = rest.stall_ups(stall_downs);
  77.   endrule
  78. endmodule
  79.  
  80. // A fast path between stages. Data from upstream goes to downstream in the same cycle
  81. // if downstream is willing to take it. Otherwise, it gets put into the buffer.
  82. // That is, there is a combinational path from upstream to downstream.
  83. // To avoid having a combinational stall signal from downstream to upstream,
  84. // we only tell upstream whether to stall or not based on the state of the buffer,
  85. // i.e. essentially registering the stall signal.
  86. // If the buffer is full, we need to clear out the buffer first. Only once cleared,
  87. // we can tell upstream that we're consuming.
  88. // This obviously does not provide pipelining in a traditional sense. (The main path is wholly combinational.)
  89. // It's usually only useful for a path that goes back up against the pipeline (e.g. annulment, bypassing in pipelined processors),
  90. // where the buffer's "downstream" is actually up the pipeline and "upstream" is later in the pipeline,
  91. // and that up-the-pipeline may not be ready to act on the bypassed information.
  92. // (This has never really been useful in most 6.191 processors. Writeback always completes in one cycle.
  93. // So, whenever execute completes and has data to bypass to decode, execute can move on to writeback.
  94. // If execute moves on, then our implementation is such that decode also moves on, i.e. it can use the bypassed information
  95. // immediately in this cycle. That's why we never needed a FIFO to store the bypass information.
  96. // When writeback bypasses to execute and execute isn't ready to move on, meaning decode isn't ready to move on,
  97. // it doesn't matter anyway because writeback ends up writing to the register file, also eliminating the need for the FIFO.
  98. // It is fine if we did have the FIFO, though, since Bypass FIFO does not add a combinational path down the processor pipeline.)
  99. module BypassFIFO1#(type T);
  100.   Reg#(Maybe#(T)) buffer(Invalid);
  101.  
  102.   input Maybe#(T) data_ups;
  103.   input Bool stall_downs;
  104.   method Maybe#(T) data_downs(Maybe#(T) data_ups) = isValid(buffer) ? buffer : data_ups;
  105.   method Bool stall_ups = isValid(buffer);
  106.   rule update;
  107.     // downstream tells us not to stall
  108.     if (!stall_downs) begin
  109.       // if buffer is filled, we are sending from buffer so we should clear it.
  110.       // if not, we bypassed data to downstream, so we shouldn't store anything.
  111.       buffer <= Invalid;
  112.     end else begin
  113.       // downstream is stalling. if buffer is free, we've told upstream we're consuming its data,
  114.       // so we do need to take and buffer the data.
  115.       if (!isValid(buffer)) buffer <= data_ups;
  116.       // otherwise, buffer is full, and it remains full because we can't send it to downstream yet
  117.     end
  118.   endrule
  119. endmodule
  120.  
  121. // Pipeline buffer has a combinational path from downstream to upstream due to stall signal.
  122. // Bypass buffer has a combinational path from upstream to downstream due to data, defeating the point of pipelining.
  123. // Can we get the best of both worlds? Let's try a FIFO of size 1.
  124. // When FIFO is empty:
  125. //   Input gets registered for outputting. Let's call this the "output register."
  126. //   This may or may not be actually used in the next cycle depending on whether downstream stalls.
  127. //   The input _has_ to get registered regardless of whether downstream stalls. If we could decide not to register
  128. //   the output based on whether downstream stalls, then that means we had a combinational path for the stall,
  129. //   same problem as the pipeline buffer!
  130. // When FIFO is full:
  131. //   This corresponds to the case where we did register something and it wasn't consumed by downstream.
  132. //   We must be continually trying to put out the content of this FIFO.
  133. //   Furthermore, since we avoid the combinational path from downstream to upstream,
  134. //   that means we have no choice but to tell the upstream to stall. (Otherwise, where would we put the data?)
  135. // This design means we do not have any combinational path between the upstream and downstream,
  136. // but we halved the throughput! Why? Because when the buffer is full, upstream could never put anything
  137. // in the buffer, so it has to wait a cycle until the buffer is emptied again.
  138. // So, this is actually kinda the worst of both worlds.
  139. module FIFO1#(type T);
  140.   Reg#(Maybe#(T)) buffer(Invalid);
  141.   input Maybe#(T) data_ups;
  142.   input Bool stall_downs;
  143.   method Maybe#(T) data_downs = buffer;
  144.   method Bool stall_ups = isValid(buffer);
  145.   rule update;
  146.     if (!stall_downs) begin
  147.       // If downstream is consuming and we are providing data (buffer is valid), we need to clear the buffer,
  148.       // since we told upstream we are stalling, upstream won't change its output.
  149.       // If we were to take the output now, we would end up getting duplicate data in the next cycle.
  150.       // If we are not providing any data, it is fine to take data from upstream; in fact, we must.
  151.       buffer <= isValid(buffer) ? Invalid : data_ups;
  152.     end else begin
  153.       // Downstream isn't consuming any data. If we already have data in buffer, don't change.
  154.       // (It's fine, since we're telling upstream to stall.) Otherwise, must take data.
  155.       buffer <= isValid(buffer) ? buffer : stall_ups;
  156.     end
  157.   endrule
  158. endmodule
  159.  
  160. // Can we fix this issue by putting a BypassFIFO1 in front of FIFO1?
  161. // It seems when FIFO1 is empty, BypassFIFO1 helps us quickly get upstream data to the front.
  162. // On the next cycle, BypassFIFO1 takes an input in the same cycle that FIFO1 is being dequeued.
  163. // Since FIFO1 cannot immediately tell BypassFIFO1, that means BypassFIFO1 will have to put data in its buffer, and FIFO1 becomes empty.
  164. // Next cycle, BypassFIFO1 needs to put data in FIFO1. It can't tell upstream in the same cycle that it is putting data into FIFO1.
  165. // Putting a PipelinedFIFO1 in front of FIFO1 does not work for a similar reason.
  166. // First two cycles are fine, but then since FIFO1 stalls when dequeuing, PipelineFIFO1 needs to stall, causing upstream to stall.
  167. // Ultimately, the throughput is halved whenever FIFO1 is involved.
  168.  
  169. // We need something different altogether. We need a "skid buffer," which is equivalent to a conflict-free FIFO of size 2.
  170. // Here, conflict-free means, if the FIFO is neither empty nor full, it should be able to both dequeue and enqueue data
  171. // in the same cycle, where that data should be immediately available in the next cycle.
  172. // A skid buffer, intuitively, is like a pipeline register where the stall signal is also registered.
  173. // But, since the stall signal is registered, it means we can't immediately tell the upstream module we're not consuming its data.
  174. // The upstream module believes we are consuming its data, so we have no choice but to put that data in a new, "internal register".
  175. // We can summarize the operation of the skid buffer as follows.
  176. // When FIFO is empty:
  177. //   We tell upstream we're not stalling.
  178. //   If there's upstream data, it gets put into the "output register."
  179. // When FIFO has size 1:
  180. //   We tell upstream we're not stalling (i.e. we're consuming). The output register is shown to downstream.
  181. //   If downstream is consuming, then input data gets put into the output register, replacing the current data.
  182. //   If downstream is not consuming, input data gets put into an "internal register."
  183. // When FIFO has size 2:
  184. //   We tell upstream we are stalling (i.e. not consuming).
  185. //   If downstream is consuming, we move data from internal register to output register.
  186. //   Unfortunately, input can't go into the internal register yet, to avoid accidentally duplicating the data.
  187. // Now we fixed the throughput issue!
  188. // If the upstream steadily provides input data, it goes right into the output register,
  189. // which means the buffer remains empty, allowing upstream to keep inputting data.
  190. // But now, if upstream puts data in again, it goes right into the output register rather than the internal buffer.
  191. // So, the throughput is maintained!
  192. // Note, however, skid buffer does not smooth out irregularities in data rates. If downstream stalls, upstream will see a stall.
  193. // Use a longer FIFO instead to help smooth out these irregularities.
  194. // (Obviously, it's a lost cause if upstream data rate is consistently higher than downstream. We need unbounded FIFOs in that case.)
  195. module SkidBuffer#(type T);
  196.   Reg#(Maybe#(T)) output(Invalid);
  197.   Reg#(Maybe#(T)) internal(Invalid);
  198.  
  199.   input Maybe#(T) data_ups;
  200.   input Bool stall_downs;
  201.  
  202.   method Maybe#(T) data_downs = output;
  203.   method Bool stall_ups = isValid(internal);
  204.  
  205.   rule update;
  206.     if (!isValid(output)) begin // FIFO is empty
  207.       $assert(!isValid(internal));
  208.       output <= data_ups;
  209.     end else if (!isValid(internal)) begin // FIFO has size 1
  210.       $assert(isValid(output));
  211.       if (!stall_downs) begin
  212.         output <= data_ups;
  213.       end else begin
  214.         internal <= data_ups;
  215.       end
  216.     end else begin // FIFO is full
  217.       if (!stall_downs) begin
  218.         output <= internal;
  219.         internal <= Invalid;
  220.       end
  221.     end
  222.   endrule
  223. endmodule
  224.  
  225. // Actually, now that I think about it, can we simulate a skid buffer by putting a
  226. // bypass FIFO (internal) followed by a pipelined FIFO (output)?
  227. // When overall FIFO is empty, upstream input automatically goes to the output FIFO.
  228. // When the overall FIFO has an element, it is always at the output FIFO,
  229. // and bypass FIFO is still willing to take an element, so upstream is not stalled.
  230. // When there are two elements, then upstream is indeed stalled.
  231. // When an element is taken out, the pipeline FIFO tells bypass FIFO to also dequeue,
  232. // so it's essentially the same behavior as skid buffer.
  233. // I'm convinced to say they are equivalent, but I'll have to prove this later.
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement